<<

Prediction of Properties of Organic Compounds –

Empirical Methods and

Management of Property Data

Den Naturwissenschaftlichen Fakultäten

der Friedrich-Alexander-Universität Erlangen-Nürnberg

zur

Erlangung des Doktorgrades

vorgelegt von

Thomas Kleinöder

aus Marburg/Lahn Als Dissertation genehmigt von den Naturwissenschaftlichen Fakultäten der Universität Erlangen-Nürnberg

Tag der mündlichen Prüfung: 12.12.2005

Vorsitzender der Promotionskommission: Prof. Dr. D.-P. Häder

Erstberichterstatter: Prof. Dr. J. Gasteiger

Zweitberichterstatter: Prof. Dr. T. Clark Meinem Doktorvater

Herrn Prof. Dr. Johann Gasteiger

danke ich für die vielfältige Unterstützung und die wertvollen Anregungenen, ohne die diese Arbeit nicht möglich gewesen wäre.

Mein besonderer Dank gilt weiterhin:

Dr. Lothar Terfloth für die vielfältigen Diskussionen zu allen Bereichen der in dieser Arbeit behandelten Themen und die stete Hilfsbereitschaft,

Dr. Achim Herwig, Jörg Marusczyk und den anderen Entwicklern von MOSES für die gute Zusammenarbeit und die manchmal kontroversen aber immer produktiven Diskussionen,

den Administratoren der verschiedenen Betriebssystem-Plattformen und des Netzwer- kes für die Bereitstellung und Aufrechterhaltung einer stabilen Arbeitsumgebung,

Dr. Nico van Eikema Hommes und Prof. Dr. Tim Clark für die hilfreichen Diskussio- nen zu Fragen der Berechnung quanten-chemischer Atomladungen,

Christoph Schlenker für die hilfreichen Diskussionen und Anregungen zu allen Be- reichen der Software-Entwicklung,

Angela Döbler, Ulrike Scholz und Karin Holzke für die Unterstützung bei allen admi- nistrativen Aufgaben,

allen nicht genannten Kollegen des Arbeitskreises für die freundliche Aufnahme, viele fruchtbare Diskussionen und die positive und produktive Arbeitsatmosphäre.

Mein weiterer Dank gilt dem Bundesministerium für Bildung und Forschung für die finan- zielle Unterstützung im Rahmen der Forschungsprojekte „Suche und Optimierung von Leit- strukturen (SOL)” und „Bioinformatics for the Functional Analysis of Mammalian Genomes (BFAM)”.

Insbesondere danke ich meiner Frau, Diana Kleinöder, die mir immer Unterstützung und Verständnis beim Erstellen dieser Arbeit entgegengebracht hat. . Für Diana, Karla und meine Eltern . Contents

1 Introduction 1

2 Fundamentals and Methods 5 2.1 Empirical Approaches to the Prediction of Properties ...... 5 2.1.1 Linear Free–Energy Relationships (LFER) ...... 5 2.1.2 Quantitative Structure–Property Relationships (QSPR) ...... 7 2.2 Structure Descriptors ...... 9 2.3 Multivariate Data Analysis ...... 12 2.3.1 Multivariate Data ...... 12 2.3.2 Feature Selection and Transformation ...... 13 2.4 Learning Methods ...... 14 2.4.1 Unsupervised Learning ...... 15 2.4.2 Supervised Learning ...... 15 2.5 Model Building ...... 19 2.6 Parametrization ...... 21 2.7 Charge Calculation by Partial Equalization of ...... 23 2.7.1 The Concept of Electronegativity and its Equalization ...... 24 2.7.2 Partial Equalization of Orbital Electronegativity (PEOE) ...... 27 2.7.3 Partial Equalization of π-Electronegativity (PEPE) ...... 32 2.8 Software Development ...... 35 2.8.1 Programming Paradigms ...... 35 2.8.2 Design Techniques ...... 40 2.8.3 Techniques for Implementation and Maintenance ...... 44

3 MOSES 45

i ii CONTENTS

3.1 Introduction ...... 45 3.2 From Script-based to Integrated Workflows ...... 47 3.3 Overcoming Limitations of Existing In-house Libraries ...... 50 3.4 Representation of Chemical Structures ...... 51 3.4.1 Connection Tables ...... 53 3.4.2 RAMSES – A Structure Representation Based on σ/π Separation . 54 3.4.3 Extension to Hypervalent Atom Types ...... 57 3.4.4 Perception of Aromaticity based on RAMSES ...... 60 3.5 Architecture and Implementation ...... 64 3.6 Management of Properties ...... 69 3.6.1 Analysis ...... 70 3.6.2 Design and Implementation ...... 71 3.7 Current Status of the Calculator Sublibrary ...... 81 3.8 Applications ...... 82 3.8.1 MOSES::WORKFLOWMANAGER ...... 82 3.8.2 Polarizabilities Through a Zero Order Additivity Scheme ...... 84

4 Quantification of Atomic Partial Charges 89 4.1 Introduction ...... 89 4.2 Development of a Combined PEOE/HMO Calculation ...... 91 4.2.1 Substitution of PEPE by a Modified Hückel MO Treatment . . . . . 91 4.2.1.1 Hückel MO Theory ...... 92 4.2.1.2 Modified HMO: Accounting for the Inductive Effect . . . 98 4.2.1.3 Hyperconjugation ...... 101 4.2.2 Datasets ...... 102 4.2.3 The Problem of Parametrization ...... 106 4.2.4 Observable Properties ...... 107 4.2.5 Charges from Quantum Mechanical Calculations ...... 111 4.2.5.1 Methods ...... 112 4.2.5.2 Requirements for a Reference Charge Calculation Scheme 115 4.2.6 Analysis and Comparison of QM Charges ...... 116 4.2.6.1 Rates of Failures for QM Charge Calculation Methods . . 116 4.2.6.2 Direct Comparison of Charges from QM Calculations . . 117 CONTENTS iii

4.2.6.3 Dipole Moments from QM Charges ...... 119 4.2.6.4 Distribution of Charges for Various Atom Types . . . . . 121 4.2.6.5 Relationship to Electronegativity ...... 129 4.2.6.6 Shortcomings of the Merz-Kollman Scheme ...... 132 4.2.6.7 Conclusions of the Analysis ...... 133 4.2.7 Summary ...... 135 4.3 Workflow and Procedure for the Parametrization ...... 136 4.4 Calibration with Molecular Dipole Moments ...... 138 4.4.1 PEOE ...... 140 4.4.2 Modified HMO ...... 144 4.5 Calibration with DFT/NPA Charges ...... 148 4.5.1 General Considerations ...... 148 4.5.2 Modified PEOE ...... 149 4.5.2.1 Revised Electronegativity Parameters ...... 150 4.5.2.2 Negative Hyperconjugation ...... 153 4.5.3 Modified HMO ...... 159 4.5.4 Results of the Parametrization ...... 166 4.6 Evaluation ...... 177 4.6.1 C-1s ESCA Shifts ...... 177 4.6.2 π-Charges ...... 178 4.7 Summary ...... 181

5 Summary 183

6 Zusammenfassung 185

A Supplementary Material 189 A.1 To Section 4 ...... 189

B Datasets 195 B.1 To Section 3 ...... 195 B.2 To Section 4 ...... 202

Bibliography 217 . Table of Symbols and Abbreviations

Abbreviations and Acronyms

QSPR Quantitative Structure–Property Relationships QSAR Quantitative Structure–Activity Relationships LFER Linear Free–Energy Relationships PEOE Partial Equalization of Orbital MPEOE Modified Partial Equalization of Orbital Electronegativities PEPE Partial Equalization of Pi-Electronegativities HMO Hückel Molecular Orbital (theory) MHMO Modified Hückel Molecular Orbital (method) PEOE/MHMO(µ) combined PEOE/MHMO procedure fitted to molecular dipole moments MPEOE/MHMO(NPA) combined MPEOE/MHMO procedure fitted to DFT/NPA charges DFT Density Functional Theory NPA Natural Population Analysis MPA Mulliken Population Analysis AIM Atoms In Molecules (method) MK-ESP Merz-Kollman-Electrostatic-Potential (derived charges) RICOS Representation of Inorganic, Coordinative, and Organic Structures MOSES Molecular Structure Encoding System RAMSES Representation Architecture for Molecular Structures by Electron Systems CORINA Coordinates CACTVS Chemical Algorithms Construction, Threading, and Verification System UML Unified Modeling Language OOP Object Oriented Programming SP Structured Programming GP Generic Programming SA Simulated Annealing GA Genetic Algorithm Physicochemical Quantities and Units

property symbol unit

mean molecular polarizabilities α¯ [Å3] atomic partial charges q [e] electronegativities χ [eV] dipole moments µ [debyes] ([D]) ESCA shifts – [eV] Chapter 1

Introduction

Organic deals with understanding the mechanisms of organic reactions and with planning and performing the synthesis of organic compounds. The chemical structure of a compound is under consideration, its construction from simple structural building blocks by a synthesis strategy or its modification through the application of given reaction types. Tightly related to the chemical structure of a compound are its properties. As has been stated by George S. HAMMOND "the most fundamental and lasting objective of synthesis is not production of new compounds, but production of properties" (Norris Award Lecture, 1968). Thus, even though we deal with the chemical structure as a necessary abstract representa- tion, the fundamental physicochemical properties of a compound are of central concern and determine its behavior in chemical, biochemical or environmental processes. Having a compound or a set of compounds at hand, the property of interest can be mea- sured experimentally if an appropriate analytical method is available. However, experimental measurements are often tedious and time-consuming and may fail. With some fortune, an experimental value might be found in one of the available data bases (e.g., [1, 2]). Yet, one is likely to fail in finding the requested property considering the number of compounds available and the number of entries in a typical data base. For in- stance, the octanol- partition coefficient is the property having the most entries (13250) in the PHYSPROP data base [2]. Compared with the number of compounds currently reg- istered in the CAS data base [3], about 26 million, it becomes obvious that the chance of finding an experimental value of a property for a given compound is quite low. The demand for computational methods for the prediction of properties of compounds is therefore evident.

1 Chapter 1

The basic approach to the problem of predicting properties can be written in a very simple form that states that a molecular property P can be expressed as a function of the molecular structure C P = f(C) (1.1)

The function f(C) may have a very simple form, as is the case for the calculation of the molecular mass from the relative atomic masses. In most cases, however, f(C) will become more complicated, for instance, when it comes to describe the structure by quantum mechan- ical means and the property of interest may be derived from the wavefunction; for example the dipole moment may be obtained by applying the dipole moment operator. If we can describe the system of interest by first principles based on the theory of quan- tum mechanics no other information is required and if the level of theory is sufficient we can calculate the property of interest ab initio with good accuracy [4]. This approach to predicting properties is called deductive learning. It is only limited by the ability to handle the underlying mathematics in order to solve the Schrödinger equation and to find a way to derive the property of interest from the wavefunction. Even though the computational power of current hardware has increased dramatically over the last decades, ab initio calculations on a high level of accuracy are still restricted to medium-sized molecules and small datasets. The scope of the prediction methods developed in the present work should be applicable to very large datasets up to several millions of compounds. This is a typical size of datasets encountered in the field of chemoinformatics. Chemoinformatics has been established as new scientific discipline in the last years and applies techniques originating from the application of computer science and information technologies to chemical data that can easily become very large in number [5, 6]. The only way for predicting properties on such a large scale is to apply prediction models that have been derived by observation of small sets of compounds with known properties, either determined experimentally or calculated by theoretical methods with an accuracy that is comparable to experiment. This approach is known as inductive learning and defines what is meant by empirical approaches to the prediction of properties in the present work. Inductive learning is the traditional way chemists abstract from experimental observa- tions to find general rules to predict and understand systems of interest. One of the most well- known examples is given by the Hammett equation for the rate and equilibrium constants of reactions of meta- and para-substituted benzoic acid derivatives [7]. This pioneering work

2 Chapter 1

founded the cornerstone of the widely used field of linear free–energy relationships (LFER) which in turn provide the basis for the classical QSAR (quantitative structure–activity re- lationships) approach. As the LFER method can be applied only for series of congeneric compounds it is desired to find more general relationships between structure and property. The methods of generating structural representations, building relationships between these representations and the properties of interest and model validation are subsumed under the term QSPR (quantitative structure–property relationships). A successful application of the QSPR approach is heavily depending on the structure representation applied. The charge distribution of a compound is known to determine its properties [8]. Therefore, methods for the quantification of the charge distribution are one central concern of this work. Based on a separation of σ- and π-electron systems, methods for the calculation of atomic partial charges were developed in our several years ago [9, 10]. One aim of the present work was the reevaluation of these calculation schemes with respect to charge-dependent molecular properties and partial charges obtained from quantum mechanical calculations. A special concern was the treatment of hypervalent species such as sulfonic acids that were not included in the former calculation schemes. The developed charge calculation scheme, its parametrization and evaluation is presented in Section 4. A second central concern of this work is the management of property data. A property – disregarding if it was measured or calculated – has additional data attached to it that are often ignored. For instance, an experimental property should always be given with the exact information about its determination such as temperature or pressure. For a calculated prop- erty, details about the calculation method, the version of its implementation, or other factors influencing the calculation algorithm have to be considered. The design and implementation of a sophisticated management of properties and cal- culation modules was therefore another aim of this work. This has been realized with a new software library that was developed in our group as a joint project by T. KLEINÖDER, A. HERWIG, and J. MARUSCZYK, with the support of a number of other people. The core of this library is a new structure representation based on a separation of σ- and π-electrons and the representation of π-electrons by delocalized multi-center π-electron systems [11]. This structure representation fits well to the requirements for the presented charge calcula- tion scheme. Therefore, prior to the section about the quantification of atomic partial charges, this software library is presented in Section 3, with a special emphasis placed on the property management.

3 Chapter 1

4 Chapter 2

Fundamentals and Methods

2.1 Empirical Approaches to the Prediction of Properties

The basic concepts for the prediction of properties by empirical methods have been outlined in the Introduction. In the present section, a more detailed description will be given for the principles underlying the approaches of linear free–energy relationships (LFER) and quantitative structure–property relationships (QSPR).

2.1.1 Linear Free–Energy Relationships (LFER)

The general LFER approach grounds on the assumption that the influence of a structural feature on the free energy change of a chemical process is constant for a congeneric series of compounds [12, 13]. A property Φ (or its log value) that is linearly dependent on a free energy change can then be calculated by the property of the basic element of this series, the so-called parent element, and the constant φX for the structural feature X.

∆G = 2.3RT logΦ (2.1) − ∆∆G = ∆GR X ∆GR H − − − = 2.3RT logΦR X + 2.3RT logΦR H (2.2) − − − 1 logΦR X logΦR H = ∆∆G = k∆∆G = φX (2.3) − − − −2.3RT logΦR X = logΦR H + φX (2.4) − −

The first such relationship was established in 1937 by HAMMETT [7]. He derived his well-known equation from the observation that the electronic influence of a substituent on

5 2.1 Empirical Approaches to the Prediction of Properties Chapter 2

the acidity constant can be described by a substituent constant, σx, characteristic for a sub- stituent, X, and a proportionality constant, ρ, typical for the reaction (Eq. 2.5).

logKx = logK0 + σxρ (2.5) with log Kx and log K0, the equilibrium constants of the reaction of the substituted and unsubstituted compound, respectively. HAMMETT and others further showed that the sub- stituent constants can be used for a variety of equilibrium constants and also rate constants. Anyhow, the electronic substituent constant, σ, could not be applied to all three positions of the benzene ring in general but distinct scales had to be set up for the meta- and para- position. For the ortho-position Eq. 2.5 could not be applied at all. The ortho-position is in the direct environment of the reaction center and here field and steric effects as well as the polar nature of a substituent play an important role. In order to quantify effects in aliphatic systems, TAFT proposed another substituent constant, σ∗ [14]. The Hammett equation broke also down for derivatives with electron- withdrawing groups in para-position. To handle those types of compounds a σ−-scale was introduced [15] and for electron-releasing substituents an analogous σ+-scale [16]. From these few examples one can see the basic problem of linear free–energy relationships, they are quite restricted to distinct systems and often do not carry all important fundamental ef- fects to allow a general treatment. Nevertheless, the LFER approach has been used not only for the prediction of reactivity but also for physicochemical properties. One of the most prominent examples is the pre- diction of the n-octanol/water partition coefficient, log P. Again, under the precondition of regarding congeneric series of compounds, the log P value of a compound, R – X, can be calculated by the basic increment of the parent compound, R – H, and the hydrophobicity constant π for the substituent X [17]. Here, additivity of the π constants can be observed. For instance, the log P of chlorotoluene can be calculated as

logPchlorotoluene = logPbenzene + πCH3 + πCl (2.6) with a sufficient accuracy of 3.40 compared to 3.33 as the observed value. This additivity approach has led to more general fragmentation methods that break down the lipophilicity of a compound into contributions of small molecular fragments or even to atomic contri- butions [18]. Here is the link between the LFER approach and the more general method

6 Chapter 2 2.1 Empirical Approaches to the Prediction of Properties

of additivity schemes that will be used for the prediction of mean molecular polarizabilities (Section 3.8.2). Closely related to the LFER method is the traditional approach of quantitative structure– activity relationships (QSAR) that was introduced by HANSCH and co-workers in order to express relationships between structure and biological activity. The most important processes in drug action, distribution and receptor binding, are both driven by the change in ∆G. The first QSAR equation was relating the log P of a compound to its biological activity,

log(1/C) = a(logP )2 + b(logP ) + c (2.7)

where C is the molar concentration that produces a certain effect and a, b and c are regression coefficients [19]. The so-called Hansch equation takes also electronic and steric effects into account 2 log(1/C) = a(logP ) + b(logP ) + cEs + dσ + e (2.8)

with Es, Taft’s steric constant, and σ, the Hammett constant [20]. Over the years many molecular and fragmental descriptors have been used for QSAR studies [21]. All this kind of Hansch analysis studies have in common that they relate biological activity to certain physicochemical effects based on multiple linear–regression analysis. Furthermore, such type of equation is again only applicable to a certain system and not of general use. This lack in generality and transferability has led to the development of a more general methodology that is subsumed under the term QSPR.

2.1.2 Quantitative Structure–Property Relationships (QSPR)

Referring back to Eq. 1.1 (p. 2), the properties of a compound can be derived from its struc- ture. If no direct route can be found to express this functionality, which is most often the case, we have to find an indirect way of deriving a structure–property relationship. The general practice in QSPR is illustrated in Fig. 2.1.

7 2.1 Empirical Approaches to the Prediction of Properties Chapter 2

structure property

structure model representation building

structural descriptors

Fig. 2.1 The QSPR paradigm: a property that can not be derived directly from its structure can be calculated via building a model based on a specific structure representation.

In a more detailed outline, a QSPR study consists of the following steps [22, 23]

1. compiling a training and test dataset from compounds with known properties

2. compiling an initial set of molecular descriptors

3. calculating the descriptor set for the entire dataset

4. performing a statistical analysis of the descriptor set in order to find statistically sig- nificant descriptors

5. building a mathematical relationship between the descriptor set and the target property

6. testing the predictive ability with the test set

The prerequisite for generating a prediction model is to have a set of molecules for which the property of interest is known. The size of such datasets is usually in the range from hundreds to several thousands of molecules. When compiling the training dataset it is important to cover a broad space of chemical functionality. Because the QSPR approach follows the route of inductive learning, predictions with a QSPR model can only be done for compounds with structural features the model has "seen" in the training process. The most statistical model building tools give reliable predictions only for interpolation, therefore, the diversity of the training dataset defines the chemical space a prediction model can be applied to. In general, extrapolation of a prediction model should be used with care and a tool applied in a study should issue a warning when it has to extrapolate.

8 Chapter 2 2.2 Structure Descriptors

In the beginning of a QSPR study an initial guess has to be made about the molecular descriptors to be used that will be refined and optimized throughout the study. Basically, two approaches are used for defining the initial descriptor set: (i) the descriptors should contain as much information as possible that might have an influence on the process studied or, (ii) the initial guess should be made based on a working hypothesis about the system of interest and kept as focused as possible. The latter approach tries to avoid correlation by chance but might neglect important features if the working hypothesis turns out to be wrong, while with the former approach it is more demanding to identify the relevant features of a descriptor.

2.2 Structure Descriptors

In the majority of cases, chemical structures are electronically stored as connections tables (see Section 3.4.1). The task of building a mathematical model between structure and prop- erty requires the transformation of the information stored in a connection table into a numer- ical representation of a compound. This representation must encode the structural features that are related to the property of interest and, on the other hand, should not contain irrelevant information. Descriptors are most often real numbers but can also be boolean values or integer num- bers but at least have to have a data type that is convertible to real. While the most descriptors are scalar values, others are vectors and a few have matrix layout. In general, all descriptors can be represented as vectors, scalars as vectors of size one and matrices as vectors of size n m. This is important as a descriptor set used in a study is normally a combination of × several descriptors, that are concatenated to one single vector. The connection table contains information on the connectivity of the atoms, bond orders and number of free electrons which gives only a rough picture of a structure. A more detailed view requires a model of the arrangement of the atoms in space, the 3D coordinates. Based on a 3D molecular model, a further refinement is obtained by dealing with the molecular surface that determines the interaction with other molecules and systems. The most sophis- ticated representation of a molecule which implicitly contains connectivity, 3D coordinates and surface is the electronic wavefunction. Structure descriptors have been developed from all levels of the aforementioned structure representations. Tab. 2.1 gives an overview over the most important descriptors. Descriptors derived from the wavefunction are also included

9 2.2 Structure Descriptors Chapter 2

here even though they can not be calculated by empirical methods and, strictly spoken, are falling beyond the context of this chapter. Nevertheless, quantum mechanics is used here not in the classical way, i.e., by applying an operator in order to calculate a property from the wavefunction. Instead, molecular descriptors are derived from the wave function as a kind of structure representation and these are used in QSPR as any other kind of descriptors [24]. An extensive review over the descriptors available was given by TODESCHINI [25].

Table. 2.1 Common structural descriptors used in QSPR derived from various levels of structure represen- tations. (Abbr.: AC = autocorrelation, RDF = radial distribution function, CICC/CDCC = con- formation independent/dependent chirality code, (C)PSA = (charged) polar surface area, SASA = surface accessible surface area, µ = dipole moment, α = polarizability, HOMO/LUMO = highest occupied molecular orbital/lowest unoccupied molecular orbital)

descriptor structural information scalar vector connection table mol. weight, no. of structural keys, fingerprints, atom types, no. of top. AC funct. groups, topological indices 3D structure principal moments of inertia, 3D AC, RDF, CICC/CDCC molecular surface mol. volume, PSA, CPSA, AC of surface properties SASA

molecular wavefunction µ, α, HOMO/LUMO, ∆Hf µ~, α~, quadrupole moment tensor

In our group a variety of structure descriptors have been developed. They are organized in a hierarchical manner based on either the topology, the three-dimensional structure or the molecular surface [26]. These descriptors have been applied successfully for the develop- ment of QSPR models and are therefore shortly discussed. The basic functional form of the calculated descriptors is given by

n n A(d) = p p δ(d , d) (2.9) i · j · ij i=1 j=i X X A(d), the autocorrelation function, has been introduced by MOREAU and BROTO [27]. The summation runs over all atoms in a molecule and accumulates the product of a given property, p, of atom i and j for a distinct distance, d. The distance can be defined as shortest topological

10 Chapter 2 2.2 Structure Descriptors

path, 3D distance, or the distance between two points on the molecular surface according to the level of structural information applied. δ takes a value of 1 if the shortest path between atom i and j is equal to d for topological autocorrelation. For three dimensional and spatial autocorrelation, distance intervals, d, have to be defined for which δ gives a value of 1 if dij falls into the given interval. This is done by defining minimal and maximal distance to be sampled and dividing this distance into equidistant intervals. In all cases, δ takes a value of 0 if the distance condition is not fulfilled. Fig. 2.2 gives an overview for the generation of structural descriptors on the different levels of sophistication. A set of programs is applied. These are given in gray capital letters.

Fig. 2.2 Scheme of molecular descriptors based on either the topology, the three-dimensional structure or the molecular surface. With increasing degree of sophistication the computational costs are raising in parallel.

The properties to be autocorrelated are of central interest. The program PETRA [28] is a collection of calculation methods for the prediction of fundamental physicochemical properties, such as the charge distribution, polarizabilities or resonance effects [29]. Most

11 2.3 Multivariate Data Analysis Chapter 2

important is the charge distribution. A thorough study of calculation methods for the quan- tification atomic partial charges will therefore one of the central concerns in the present work and the development of a modified and newly parametrized calculation scheme based on the methods that were available in PETRA will be presented (see Section 4). For the conversion of the constitution to a 3D structure [30] the program CORINA [31] is available. The step to the generation of molecular surfaces is achieved by the program SURFACE [32]. The functionality of generating autocorrelation vectors from the surface is also incorporated in SURFACE. For the levels of topology and 3D structure, autocorrelation vectors are generated by the program AUTOCORR [33]. Other similar descriptors are described in [26] that are not mentioned here. Rather, the given example shown in Fig. 2.2 should draw the focus on a technical problem that is a second main issue of the present work. As has been seen, a variety of programs is required for achieving the generation of a set of descriptors being quite similar in nature. All descriptors can incorporate different properties and must be distinguished. The resulting descriptor is normally a vector of real numbers and does not bear any information about the way it was calculated. The information on the details of the calculation workflow, the so-called meta- data attached to the descriptors, are therefore of special importance. Within the scope of the present work, a programming library was developed with the focus on solving problems caused by workflows and related information as shown before (see Section 3). It should assist one in developing new molecular descriptors and in the task of property prediction as described in the previous sections.

2.3 Multivariate Data Analysis

Within the QSPR approach, compounds are represented by numerical structure descriptors and a dataset of compounds becomes thus a multivariate data matrix. This matrix has to be analyzed in order to find relationships either among the compounds in the dataset (classifi- cation) or to the property of interest (QSPR).

2.3.1 Multivariate Data

It is hard to find cases where systems or processes can be described by only one or few variables. In real-life applications, a vector of descriptor variables is required in order to

12 Chapter 2 2.3 Multivariate Data Analysis

encode the relevant informations related to the problem at hand. This data vector is called a feature vector and is often of high dimensionality, from a few to several hundreds of elements or features. In the context of neural networks, feature vectors are often referred to as pattern. From a statistical point of view a feature vector defines the independent variables, x. Objects that are representative for the system or process of interest are encoded by such feature vectors. For a dataset of objects, a matrix, X, is obtained that is of size n m, with n objects × and m features. To each object data might be attached that define one or more dependent variable. These data are also called target or response data and might be a membership to a certain class, a measured or calculated property, or a set of values such as a spectrum. For a dataset of n objects a vector, y, of size n is resulting or a matrix, Y, of size n p, with p × the number of response data (Fig. 2.3).

feature matrix (X) property vector (y) property matrix (Y)

x11 x12 . . . x1m y1 y11 y12 . . . y1p  x21 x22 . . . x2m   y2   y21 y22 . . . y2p  objects ...... or . . .. .  . . . .   .   . . . .         xn1 xn2 . . . xnm   yn   yn1 yn2 . . . ynp              Fig. 2.3 A dataset of n objects is described by a data matrix consisting of n feature vectors of size m. Attached to each object might be one or more properties.

In the context of property prediction a feature vector most often consists of a set of structure descriptors that are representing the chemical structure (see Section 2.2) and the response data is the property to be predicted. Throughout the present work feature vector, structure descriptor, and object pattern are used synonymously and the response data is re- ferred to as target property.

2.3.2 Feature Selection and Transformation

In the perfect case, each feature of an object pattern encodes an important aspect of the system to be investigated and is not related to any other feature. This is hardly ever the case and in real applications one has to deal with redundancy and noise in the feature space. Redundancy means that two or more features are related to the same information and noise means that a feature does not contain any relevant information at all or encodes significant information but that is not related to the problem at hand. Therefore, after having set up the

13 2.4 Learning Methods Chapter 2

initial vector of descriptor variables and prior to the model building step, this vector has to be analyzed and several features may have to be removed or transformed. If two features are highly correlated, one of those have to be removed in order to reduce redundancy. The decision which one of both should be removed can be based on statistical measures. Either the variance is taken as criterion, assuming that a higher variance indicates a higher information contents. Or the intercorrelation to the remaining features serves as a criterion: the feature having a higher correlation to the remaining features is considered to be less important and is removed [34]. Instead of removing complete column vectors, methods can be employed for transform- ing the set of m features into a reduced set of m0 < m variables that are less intercorrelated and have an enriched information density. This new set of variables is called latent variables. The most frequently used methods for extracting latent variables are principal component analysis (PCA) and partial least squares projection (PLS, also called projection of latent structures) [35].

2.4 Learning Methods

Learning methods are applied to find and establish relationships and orders that are implicitly contained within the data at hand. There are basically two approaches that differ in the way the response data are handled. Learning from the feature matrix only without taking the response data into account is called unsupervised learning. This is useful in order to find relationships among the objects of a dataset and is tightly related to the definition of similarity of objects in the feature space. Typical methods used here are Kohonen self-organizing maps and hierarchical clustering. On the other hand, supervised learning methods aim to establish a relationship between the feature matrix and the response data, i.e., the learning process is conducted by the ability of a model to reproduce the response data correctly. The error between the actual and the predicted y-values influences the learning process. Typical methods used here are multiple linear–regression analysis and feed forward neural networks.

14 Chapter 2 2.4 Learning Methods

2.4.1 Unsupervised Learning

Cluster Analysis Hierarchical cluster analysis is an iterative procedure where in each step the two most similar objects, i.e., the two objects having the smallest Euclidean distance, are merged into a new cluster. The new cluster is represented by its centroid and in the next step the most similar object to this cluster’s centroid is merged to build a new cluster. This is continued until all objects are merged into one single cluster. The resulting hierarchically organized clusters are visualized by dendrograms that allow one to recognize the inherent organization of the data.

Kohonen Neural Networks Kohonen neural networks [36, 37] are used to project objects represented by high-dimensional feature vectors into a plane. This plane is called a Koho- nen map and because of the algorithm’s unsupervised nature, Kohonen neural networks are also known as self-organizing maps (SOM). Kohonen networks are arranged as a s t × dimensional grid of neurons where each neuron consists of a vector of weights of size m matching the dimension of the feature vectors. These weights are adapted to the feature vectors throughout an iterative training process in order to memorize the input objects.

2.4.2 Supervised Learning

Supervised learning methods are applied to establish mathematical models between the fea- ture matrix and the response data. The aim is to be able to reproduce an object’s target property from its corresponding feature vector with a minimum error. Minimizing the pre- diction error is therefore the training criterion for supervised learning algorithms.

Multiple Linear–Regression Analysis The most important method that can be applied for linear problems is multiple linear–regression analysis (MLRA). This method has been traditionally used for deriving regression equations in classical LFER and QSAR (see Sec- tion 2.1.1) and is applied for the generation of additivity models (see Section 3.8.2). The basic principle of MLRA is to express the target property, y, as a linear combination of the single features

y∗ = c0 + c1x1 + c2x2 + . . . + cmxm. (2.10)

15 2.4 Learning Methods Chapter 2

The linear coefficients, c, have to be determined such that the sum of the squared errors n 2 between the predicted and the actual value of y, (y∗ y ) , gives a minimum. In theory, i i − i the coefficients are easily obtained by matrix algebraP applying Eq. 2.11

T 1 T cMLRA = (X X)− X y. (2.11)

In real applications yet the feature matrix often contains highly correlated columns (such matrices are said to be singular) which does not allow to apply Eq. 2.11. This is here not a problem of experimental design as described in Section 2.3.2 but a numerical one. In order to overcome this problem the feature matrix might be transformed into latent variables using PCA or PLS (see Section 2.3.2). Singularity can not happen then because latent variables are orthogonal by definition. Using MLRA in combination with PCA and PLS projection is referred to as principal component regression (PCR) and PLS regression, respectively. On the other hand the problem of singularity can also be tackled by splitting the training dataset in order to avoid highly correlated columns and conducting the model building step-wise deriving the coefficients for the current dataset and using them as constants in the next step.

Back-propagation Neural Networks Feed forward neural networks with back-propagation of errors, most often shortly referred to as back-propagation neural networks (BPG-NN), are the method of choice for building models when a non-linear relationship is assumed [37,38]. In accordance to biological neural networks the central unit in feed-forward networks is a neuron that is receiving signals from other neurons, transforming the input signals into an output signal, and is passing this to others neurons. The neurons are organized in layers and each layer’s neurons are linked to the neurons of the next layer. Typically, a feed- forward network consists of three layers: the input layer, one hidden layer and an output layer. Figure 2.4 illustrates schematically such a network. The size of the input and the output layer are determined by the size of the object pattern (i.e., the feature vector) and the response data, respectively. The number of neurons in the hidden layer can be varied and has a strong impact on the performance of a network. The number of neurons in each layer and the way they are connected defines the architecture of a network. It is written as X-Y-Z (e.g., 5-3-1 for a network consisting of five input neurons, three hidden neurons and one output neuron). Neurons are acting differently dependent on which layer they belong to. Input neurons just passing the value of the corresponding element of the object pattern unchanged to the

16 Chapter 2 2.4 Learning Methods

architecture object input hidden output target pattern layer layer layer property

i x1 1 i j x2 2 1

j2 k y i j xn n 3

1.signaltransduction W ij(t) W jk(t) error

2.weightcorrection W ij(t+1) W jk(t+1) DW ij DW jk training

Fig. 2.4 Architecture (upper part) of a feed-forward neural network and training by backpropagation of errors (lower part).

neurons of the hidden layer. Hidden and output neurons collect the incoming signals and generate a new output as described in the following paragraph. A connection from one neuron to another has a certain strength that is expressed by a numerical weight. Having two layers with n neurons in layer i and m neurons in layer j that are all connected to each other results thus in a nxm matrix of weights, Wij. The net signal that is received by a neuron j is the sum of the weighted output signals coming from

the neurons i(i1, i2, . . . , in) of the previous layer

n

Netj = outiwij + wbias,j (2.12) i=1 X The additional term wbias is the weight for the connection to the so-called bias neuron (de- picted as black squares in Fig. 2.4). A bias neuron does not receive any input signals and is transmitting always a signal of 1. This bias is added to introduce an offset to the internal coordinate system of the model space. The ability of neural networks to handle non-linear problems is now introduced by trans- forming the net input signal of a neuron into its output signal by applying a sigmoidal transfer function 1 outj = αNet (2.13) 1 + e− j

17 2.4 Learning Methods Chapter 2

with, α, a scaling parameter for adapting the response range for the incoming net signal. A training cycle of a feed-forward neural network consists of two steps (see lower part of Fig. 2.4): first, one object pattern is presented to the network, i.e., the feature vector’s elements are fed into the corresponding input neurons. The signal is propagated layer by layer and finally resulting in the output signal of the output neuron. This output value rep- resents the predicted target property and is a resultant of the weights of all connections in the network. In order to minimize the prediction error the optimal set of weights have to be found. Therefore, the weights have to be corrected in the second step. The most widely used algorithm used for correcting the weights is the back-propagation of errors. It starts with the error of the output layer, i.e., the prediction error. This error is used to calculate the correction of weights from the hidden to the output layer. Because the error for the hidden layer can not be determined directly the correction of the output layer is used to calculate the corrections for the previous layer. Thus, the error is propagated backwards layer by layer.

The generalized form of correcting a weight, wij, for a connection between neuron j of the current layer and neuron i of the previous layer is given by

∆w (t) = ηδ out + µ∆w (t 1) (2.14) ij j i ij −

The first term of the right-hand side of Eq. 2.14 gives the actual correction. δj is a function of the correction factors obtained for the following layer. The factor η is called the learning rate constant and is used to damp the correction in order to prevent a too quick convergence to a non-optimal solution. Factor µ of the second term is called the momentum constant and ∆w (t 1) is the weight correction of the previous training cycle. This momentum term ij − takes care that the tendency in the weight correction is kept in the next training cycle(s) in order to prevent oscillation or helps solving situations when very small corrections have to be by-passed. Details about the equations used and the derivation of the back-propagation algorithm are given in [37]. One training epoch is achieved after each object of the dataset has been presented to the network once. After each training epoch the progress of the learning process is evaluated by calculating the root mean-square error (RMSE) for the entire dataset. When to stop the training process has an important impact of the predictive power of an neural network. The training and the quality of the resulting network is dependent on the architecture of the network, i.e., the number of weights, and the values chosen for the parameters, α, η, and µ. The ratio between the overall number of adjustable parameters and the number of objects

18 Chapter 2 2.5 Model Building

in the training dataset, denoted with ρ, should be kept higher than 2.0, i.e., the number of training data should be at least twice than the number of adjustable parameters [39].

2.5 Model Building

Model building deals with the development of mathematical models to relate the optimized set of descriptors with the target property using statistical or neural network methods. The issue of dividing a dataset into training and test data is crucial to the reliability and robustness of a prediction model. Both portions should represent the chemical space of the entire dataset. If one of both is biased either the model will not generalize all structural features or the validation will probably not find deficiencies in the predictive power of the model for some structural features. Instead of splitting datasets by hand, algorithms are used based on random selection, on the distribution in Kohonen maps (e.g., [40]) or special algorithms have been developed for that task (e.g., [41]). It is common practice to assign about two-thirds of the compounds to the training dataset and to reserve one third for testing. If the learning method applies an iterative algorithm the training dataset is split further in order to obtain a so-called validation dataset. After each iteration step the validation dataset is used to judge the predictive power of the current model and to decide when to stop the learning process. Because the validation data are presented to the learning method in the training phase, the validation dataset does belong to the training data even though it is not used directly for the training, i.e., the parameters of a learning algorithm are not directly adapted to fit the validation data. In order to test the predictive power of a model, cross-validation techniques are applied. In k-fold cross-validation the training set is split into k subsets. Then k 1 subsets are − used as a training set and one subset as test set. This procedure is repeated k times. As we have now a prediction for each compound, we can calculate cross-validated errors of the predictions. Values of k usually range from 5 to 10. If k equals N, the number of cases in the training set, the procedure is called leave-one-out cross-validation.

Linear Regression The quality of a model is usually quantified by statistical measures. For each experimental value in the dataset a predicted value can be obtained with the model, resulting in two data vectors for which linear regression can be applied. Linear regression

19 2.5 Model Building Chapter 2

measures the correlation of two data vectors, x and y. The x-variable refers usually to the experimental value and the y-variable to the predicted value. A model can be visualized by scatter plots showing the predicated values versus the experimental values. Using the method of minimizing the least-squares of errors between x- and y-values, a linear regression equation can be obtained

y = b x + a (2.15) ·

Slope, b, and the y-axis intercept, a, are important indicators for the quality of a model and should not deviate significantly from b = 1 and a = 0. From the vectors x and y two important statistical measures can be calculated, the cor- relation coefficient r (Eq. 2.16) and the standard deviation of errors σ (Eq. 2.17)a.

n (x x¯)(y y¯) i − i − i=1 r = X (2.16) n n (x x¯)2 (y y¯)2 v i − i − u i=1 ! i=1 ! u X X t

1 n σ = (y x )2 (2.17) vn 2 i − i u i=1 u − X For the correlation coefficient, a tvalue reaching r = 1.0 indicates a perfect correlation. In studies, most frequently the square of the correlation coefficient, r2, is given, emphasizing the quality of good models reaching an optimal correlation. In the present work, derived models are visualized by scatter plots containing the statis- tical measures for the corresponding correlation and the correlation equation inside the plot. Furthermore, the number of data points, n, shown in the plot is given. The correlation equa- tion is visualized by a solid line and for comparison to a potential perfect correlation, y = x is given as dashed line. For illustration, Fig. 2.5 gives an example scatter plot.

aσ is also termed standard deviation of residuals or briefly standard deviation

20 Chapter 2 2.6 Parametrization 1.4 1.2 1.0 0.8 predicted property 0.6 y = 0.96 * x + 0.043 n = 6 2 = 0.985 0.4 r σ = 0.051

0.4 0.6 0.8 1.0 1.2 1.4

experimental property

Fig. 2.5 Scatter plot for visualizing a linear regression between a predicted (y-axis) and an experimental property (x-axis). The solid line corresponds to the regression line. The dashed line indicates a perfect correlation, y = x. The statistical measures given in the plot are explained in text.

2.6 Parametrization

Often, methods for the calculation of properties require a set of adjustable parameters that were derived from an underlying theory or given assumptions. It is expected that a set of pa- rameters can be found that gives the best agreement between the predicted and the observed property. The task of finding the optimal parameter set for a calculation scheme is thus defined by locating the global minimum in the multidimensional space spanned by the parameters with respect to a distinct quality measure. Within the application of QSPR, this quality measure is basically given by the criterion of an optimal fit of the calculated property to the target property. If the dependence of the property of interest on the parameters can be expressed in a well-defined analytical form, the minimum can be found by the first and second partial derivatives of the analytical expression with respect to the parameters. If such an expression is not available, stochastic algorithms have to be applied that optimize the parameter set with respect to the given quality measure. In terms of stochastic optimization the quality measure is often referred to as the cost, fitness or energy function. The most important algorithms for finding the global optimum are genetic algorithms

21 2.6 Parametrization Chapter 2

(GA) or more generally evolutionary algorithms (EA) [42] and simulated annealing (SA). The former two strategies are not discussed here. Rather, the principles of simulated anneal- ing will be introduced as this technique played an important role in the parametrization of the charge calculation methods described in Section 4.

Simulated Annealing This method is based on the annealing process in the physics of solids. Annealing denotes a physical process where the solid is first heated to a high tem- perature and then cooled slowly down to a low temperature. The high annealing temperature provides the particles of the solid with a high mobility. Consequently, the particles of the solid arrange themselves such that the system will have minimal bounding energy. The pro- cess of cooling down simulates the process of following the path to the energy minimum. The first who transferred these principles to the problem of optimization were KIRK- PATRICK et al. in 1983 [43]. The system is described by a set of parameters defining its state

s = [p1p2 . . . pn] (2.18)

Each state has a given energy value, the energy is a function of the state parameters. Fol- lowing the analogy with the physical process of annealing the system follows the path of decreasing energy, i.e., at lower temperatures states are preferred with lower energies. Ac- cording to the Boltzmann distribution the probability, P , of finding the system in a state with an energy, E, is given by 1 E/k T P E = e− b (2.19) { } Z(T ) · with Z(T ), a temperature dependent normalization factor, kb, the Boltzmann constant and, T , the current temperature. In order to simulate the annealing process in a computer algorithm the following proce- dure is applied: (i), the initial state is set randomly and the temperature is set to a high value, (ii), the initial energy is calculated, (iii), a new state is generated by modifying the state vari- ables by randomly generated increments, (iv), the energy of the new state is calculated, (v), according to a given acceptance function, the new state is accepted or the previous state is kept, (vi), the temperature is decreased according to a given cooling scheme. Of crucial importance is step (v). The system is transferred from state i to a new state j

22 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

at a given temperature, T , only if the acceptance function, Aij(T ), gives a larger probability than a randomly generated one

1 if ∆Eij 0 Aij(T ) = ≤ (2.20) ∆Eij /kbT ( e− if ∆Eij > 0 As can be seen from Eq. 2.20 a new state is always accepted when its energy is lower than that of the previous step. If the states with a higher energy would always be discarded the algorithm would find only local minima reachable from the initial state. In order to overcome this problem a state having a higher energy might be accepted as well if it has a higher probability than one found randomly. This probability is high if the energy jump is small and the temperature is high. Therefore, in the beginning of the procedure, state changes to higher energy are more likely and will occur very rarely when the temperature is approaching zero. This probabilistic approach provides the ability of finding the global minimum by the simulated annealing algorithm.

2.7 Charge Calculation by Partial Equalization of Electro- negativity

The quantification of the charge distribution in a molecule has a crucial importance for the prediction of many properties. One of the central aims of the present work was the develop- ment and parametrization of a new charge calculation method. This method has been based on two different procedures for the quantification of σ- and π-charge distribution developed in our group several years ago [9, 10]. Both procedures had been derived from the concept of electronegativity and its equalization on bond formation. The notion of electronegativity is directly linked to the concept of ionic bond character of covalent bonds and thus of partially charged atoms. The more electronegative atom of a bond will attract more electron density of the bonding electron pair and will therefore bear a negative partial charge. This basic concept was first introduced and quantified by PAULING in 1932 [44]. Electronegativity as an atomic property was later extended to states, i.e., to the electronegativity of atomic orbitals. This work along with the principle of elec- tronegativity equalization that was introduced by SANDERSON [45] provided the basis for the partial equalization of orbital electronegativity (PEOE) algorithm for the quantification

23 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

of partial charges in σ-systems [9]. The concept of partial electronegativity equalization was subsequently extended to π-systems (partial equalization of π electronegativity, PEPE) [10]. The following sections will review the basic concepts and the derived methods shortly.

2.7.1 The Concept of Electronegativity and its Equalization

The concept of electronegativity has been introduced by PAULING [44] as the measure of "the power of an atom in a molecule to attract electrons to itself". PAULING derived this basic concept from the fact that the energy, D, of a heteronuclear bond, A – B, is generally higher than the average bond energies of the homonuclear bonds, A – A and B – B 1 D(AB) = [D(AA) + D(BB)] + ∆ (2.21) 2 AB

The square root of the stabilization term, ∆AB, was found to be proportional to atomic con- stants that were defined by PAULING as atomic electronegativities, χ

∆ χ χ . (2.22) AB ∝ A − B p Eq. 2.22 was used to derive the well-known Pauling scale of electronegativity that is still in use. As Eq. 2.22 does not result in absolute values, PAULING defined the electronegativity of a atom arbitrarily to a value of 2.2. PAULING’s electronegativity is defined as an atomic property disregarding the molecular environment of the atom. This contrasts to the fact that atoms of the same element show a different behavior regarding to the power of attracting electrons dependent on their hy- bridization state. This can be seen when comparing the dipole moments of and chloroethyne, 2.05 and 0.44 debye, respectively. The dipole moment is strongly related to the charge distribution, even though it gives only a rough picture of the charge distribution (see Section 4.2.4). Nevertheless, the large difference of both chlorohydrocarbons can only be explained assuming that the Csp atom in chloroethyne resists more in releasing charge density to the more electronegative atom than the Csp3 atom of chloroethane. This can easily be understood since the sp-hybrid have a larger contribution of the energetic more stable 2s-orbital. Hence, electronegativity must be dependent on the valence state. The first who considered this fact was MULLIKEN [46]. In order to put the concept of electronegativity on a more theoretical basis, he derived his definition of electronegativity from the consideration how to quantify the energy that is required in going from a covalent

24 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

+ + bond, A – B, to the ionic states, A− – B and A – B−. This amount of energy is propor- tional to the energy that is required to remove one electron from atom A and to add it to atom B and vice versa. Hence, MULLIKEN defined the electronegativity of an orbital ν of an atom A as the mean of the correspondent ionization potential and electron affinity, 1 χ = (I + E ). (2.23) Aν 2 Aν Aν

HINZE, WHITEHEAD and JAFFÉ brought the idea of orbital electronegativities one step further. In a series of papers [47–50], they published orbital electronegativities for a large number of atoms in various valence states on the basis of MULLIKEN’s definition, Eq. 2.23. The required ionization potentials and electron affinities of the atomic orbitals were derived

from the ground state properties, Ig and Eg, that can be obtained from spectral data and the promotion energies, P , that are required to go from the ground state to the appropriate valence state for the neutral atom and the positive and negative ion, respectively

+ 0 Iν = Ig + P P 0 − (2.24) E = E + P P − ν g − The promotion energies were not directly accessible. They had to be calculated from linear combinations of spectroscopic state energies. The computational procedure used for such calculations had to apply various approximations [47]. In that procedure the main source of potential inaccuracies for the obtained electronegativity values may be seen. HINZE et al. could also show that the electronegativity of an orbital is a continuous function of its hybridization. They found a linear dependence with respect to the amount of s-character in the hybrid orbitals. The valence state of an atom is not the only factor influencing electronegativity. An empty orbital will have a higher attractive power to electrons than a doubly occupied orbital. Therefore, electronegativity is also dependent on the occupation of an orbital. This consid- eration was first expressed in terms of a dependence between the energy of an atom and its occupation number, n. ICZKOWSKI and MARGRAVE [51] showed that this dependence can be described by a polynomial of degree four

E(n) = a n + b n2 + c n3 + d n4 (2.25) · · · · This equation was shown to give a good approximation to the energy of an atom in various states of ionization. In the same paper [51] and in a paper published one year later by HINZE

25 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

et al. [48] a new definition of electronegativity was derived. First, the assumption was made that an orbital can be occupied by non-integral numbers of electrons in the range between 0 and 2, i.e., the energy of an orbital is a continuous function of its occupation. Then, the first derivative of the energy, Eν , of an orbital, ν, with respect to its occupation, nν, was defined as orbital electronegativity, χν, χ = ∂E /∂n (2.26) ν − ν ν This definition should reflect the power of an orbital to attract electrons: if the slope at a certain point of the energy function is negative for a given orbital the system can lower its energy by shifting a distinct amount of charge to that orbital. The more negative the slope is the more the system should tend to change its state in that direction. On the other hand, the corresponding orbital that in turn releases electron density needs to have a less negative slope for the negative value of the displaced charge, otherwise the sum of the energy changes of the entire process will not stabilize the system.

Fig. 2.6 Schematic representation of the change in energy of two atoms A (solid line) and B (dashed line) with variation of the electron occupation. The electronegativity is defined as the slope of the curve, therefore the displacement of a fraction of charge, dn, from atom B to A results in a gain of energy.

Fig. 2.6 qualitatively shows the behavior of two orbitals, A and B, with A having a higher electronegativity than B. With a small displacement of electronic charge, dn, from B to A,

26 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

the decrease of energy for orbital A is larger than the increase of energy for orbital B. This charge flow should finish when the electronegativities have reached the same value. This principle of total electronegativity equalization was first introduced by SANDERSON [45]. He proposed that on bond formation the electron distribution around the constituting atoms will rearrange such that the electronegativities of all atoms in a molecule equalize to an average value, χm. This mean molecular electronegativity was defined as the geometric mean of the individual atomic electronegativities. The resulting partial charge, q, of an atom i was defined as χm χi qi = − (2.27) k√χi As can be seen from Eq. 2.27, with total equalization of electronegativity all atoms having initially the same electronegativity will be assigned the same charge. This would mean that in a molecule such as lactic acid all four chemically different types of hydrogen atoms would bear the same atomic charge. This contradicts fundamentally to chemical intuition. The fact that the carboxylic hydrogen atom is much more acidic than the methyl hydrogen atoms can

not be reproduced neither the trend in acidity, COOH > COH > CH3. Thus, even though the Sanderson charges are well-defined and internally consistent, they are not able to reflect fundamental trends in chemical behavior and are therefore quite useless.

2.7.2 Partial Equalization of Orbital Electronegativity (PEOE)

To overcome the problems within the procedure of total equalization of electronegativity, GASTEIGER and MARSILI [9, 52, 53] developed a calculation scheme that was intented to take advantage of the progress in electronegativity theory at that time but should result in charges that are chemically reasonable and could be applied in chemical applications such as reactivity prediction. The basic problem in the total equalization of electronegativity was identified as com- pletely ignoring electrostatic interactions. In order to take these effects into account the principle of partial equalization of orbital electronegativity was introduced based on the fol- lowing considerations. Given the formation of a bond between two atoms, A and B, having different electronegativities, the more electronegative atom will attract electron density from the other atom in an initial stage. This electron transfer will create a charge separation that induces an electrostatic field directed exactly contrary to the direction of the electron flow.

Therefore, the electron flow will occur only as long as the amount of energy, ∆Eχ, gained

27 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

by the transfer of an amount of charge, δq, due to the electronegativity equalization (see Fig. 2.6) is large enough. This energy has to outweigh the work that is required in moving that amount of charge against the electrostatic field resulting from the charge, Q, separated so far Q ∆Eχ(δq) > δq (2.28) 4π0rAB As a consequence electronegativities do not equalize totally but only partially. GASTEIGER and MARSILI derived an algorithm to quantify the considerations of the previous paragraph. The development of the basic steps of that method should be retraced in the following paragraphs. First of all, Eq. 2.28 suggests an iterative procedure that simulates the continuous electron flow by certain distinct steps. The amount of transferred charge should narrow step by step due to the developing electrostatic field until the charge flow stops. Furthermore, an atom in a molecule is bound to one or more neighbors with different electronegativities. The process of electronegativity equalization of an atom should therefore be a combination of the single equalization processes for each bonded neighbor. This would account for the effect of different chemical environments and to chemically more reasonable charge values. In setting up the iterative calculation scheme the following three key features had to be defined:

1. the amount of charge that is transferred in each iteration step,

2. an analytical expression of the dependence of the electronegativity with respect to charge,

3. a way to decide when the charge transfer terminates due to Eq. 2.28.

The amount of charge that is transferred through a bond, A – B, at a certain step, α, had been set proportional to the difference in electronegativity of its constituting atoms

χB(qB,α 1) χA(qA,α 1) − − ∆qA,α = ∆qB,α = −+ (2.29) − χA with B, being more electronegative than A. The proportionality factor was defined to be the reciprocal of the electronegativity of

1 + the less electronegative atom in the cationic state, /χA. This quantity gives a measure how strongly the atom that is releasing electron density will resist to a further release as it be- comes more positive. The higher this value is the less charge will be transferred throughout

28 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

the iterative procedure. From a practical point of view this factor transforms the difference in electronegativity into a fraction of electron units. As can be seen from Eq. 2.29 the elec- tronegativity values to be used in step α are a function of the charges of the previous step, α 1. This to the second point, how to express the dependence of electronegativity on − charge. As has been discussed in the previous section, ICZKOWSKI and MARGRAVE found that the energy of an atom with respect to its charge can be expressed as a power series where they considered the first four terms (Eq. 2.25). As the electronegativity has been identified as the first derivative of the energy with respect to charge, Eq. 2.26, the electronegativity can also be characterized by a power series which is according to Eq. 2.25 then a polynomial of degree three. HINZE et al. showed that the terms up to the quadratic one in Eq. 2.25 are most significant and obtained thus a linear dependence of the electronegativity on charge. For the PEOE procedure the question had to be clarified how many terms of the power series had to be considered in order to obtain a quantitative description with a sufficient accuracy. Similar to the approach of ICZKOWSKI and MARGRAVE in fitting the energy, the orbital electronegativities in the anionic, the neutral, and the cationic state were used for the fitting of electronegativity with respect to charge. These values were accessible from the orbital ionization potentials and electron affinities as published by HINZE et al. [47,48,50] by applying MULLIKEN’s definition of electronegativity, Eq. 2.23. The electronegativity of the anionic state can be derived directly from the electron affinity of the neutral state assuming that this is equal to the ionization potential of the anionic state and setting the electron affinity of an doubly occupied orbital to zero.

1 (χiν)q= 1 = (Iiν− + Eiν−) − 2 1 0 0 = 2 Eiν, with Iiν− = Eiν and Eiν− = 0 1 0 0 (2.30) (χiν)q=0 = 2 (Iiν + Eiν) 1 + + (χiν)q=1 = 2 (Iiν + Eiν) A polynomial of degree two was found to be the best choice as it is more flexible than a linear dependence but ignores the mostly vanishing higher order terms

2 χiν = aiν + biν qi + ciνqi (2.31)

29 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

With Eq. 2.31 and 2.30 the orbital coefficients aiν , biν, and ciν can be determined as follows 1 0 0 aiν = 2 (Iiν + Eiν) b = 1 (I+ + E+ E0 ) (2.32) iν 4 iν iν − iν c = 1 (I+ 2I0 + E+ E0 ) iν 4 iν − iν iν − iν Fig. 2.7 shows as examples the fitted parabola for the sp3-orbitals of , and . The influence of the quadratic term is quite small, especially if the considered range of charge variation is also small, which is typically the case within the PEOE model. Therefore, other research groups have also used a linear dependence for modifications of the PEOE algorithm [54, 55].

Fig. 2.7 Dependence of orbital electronegativity, χ, on charge, q, as a polynomial of degree two, Eq. 2.31. For illustration the three functions for the sp3-orbitals of carbon, nitrogen and oxygen are shown. The orbital coefficients are those published in [9].

Finally, after each iteration step it has to be evaluated if the charge transfer is still resulting in a gain of energy or if the induced electrostatic field will prevent further charge separation. Instead of evaluating Eq. 2.28 which would require the calculation of the energies before and after the charge transfer and the knowledge about the bond length between the considered atoms, A and B, that is also dependent on the atomic charges, the PEOE algorithm applies

30 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

a simple damping approach in order to achieve convergence of the charge transfer. In each iteration step only a fraction of the charge separation calculated with Eq. 2.29 is transferred, α decreasing exponentially by a factor of fdamp, where α is the iteration number. Thus, Eq. 2.29 becomes χB(qB,α 1) χA(qA,α 1) α − − ∆qA,α = ∆qB,α = −+ fdamp (2.33) − χA · Fig. 2.8 summarizes the entire PEOE procedure as a flow chart diagram. Initially, the electronegativities are calculated for the uncharged atoms. The actual charge calculation

consists of three nested loops. The outer loop is performed for αmax iteration steps. For

αmax it was found that the procedure has generally converged after six iterations. In the middle loop it is iterated over each atom in the molecule. For each neighbor of an atom the charge transfer due to the electronegativity difference is calculated in the inner loop. The

damping factor, fdamp, was arbitrarily set to 0.5. After the procedure has stopped the total charge of an atom is obtained as the sum of all single charge shifts of the inner loop. Thus, the resulting PEOE charge, Q, of an atom i can briefly be expressed by Eq. 2.34.

6 M Ni <α 1> <α 1> χ − χ − iν : χ < χ Q = iν − jµ 0.5α k = iν jµ (2.34) i χ+ · α=1 i j k ( jµ : χiν > χjµ X X X with, α, the iteration number, M, the number of atoms in a molecule, j, the first-sphere neighbors of atom i, Ni, the number of neighbors of atom i, and the indices, iν and jµ, referring to the bonding orbitals of atom i and j, respectively. Beside the effect of implicit convergence of the procedure this approach has also the effect of accounting for the chemical environment of an atom. In the first iteration step only the first sphere of neighbors effects the considered atom, in the second step also the second sphere will be considered but in an attenuated manner and so forth. The PEOE procedure generates charges that reflect the electronegativity of an atom and of its next environment, whereas the influence of each sphere of neighbors decreases ex- ponentially due to the damping approach. This is appropriate to describe inductive effects in the σ-skeleton as here short-range effects are dominant that decrease rapidly with dis- tance [56]. Resonance effects in π-electron systems can effect atoms that are separated by larger distances especially in aromatic systems. Thus, PEOE can not be applied here.

31 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

Fig. 2.8 Workflow applied in the PEOE scheme.

2.7.3 Partial Equalization of π-Electronegativity (PEPE)

The charge distribution of π-systems and related electronic effects such as stabilization of charge centers are traditionally estimated by the concept of resonance. Here, a compound is treated as an ensemble of different valence bond resonance structures having a different distribution of localized π-bonds and formal charges (see Section 3.4.1). Each resonance structure is assigned a certain weight and the "real" structure is a hybrid of all contributing resonance forms. This concept is used in a qualitative manner in order to estimate centers of

32 Chapter 2 2.7 Charge Calculation by Partial Equalization of Electronegativity

charge excess or deficiency. For illustration, Fig. 2.9 shows the resonance forms of phenole: one lone pair of the oxygen atom is delocalized into the benzene ring resulting in negative partial charges at the ortho- and para-positions and a positive partial charge at the oxygen atom.

H H H OH O O O

d OH d d

d

Fig. 2.9 Resonance structures of phenole and resulting partial π-charges.

GASTEIGER and SALLER [10, 57] developed a calculation scheme for the quantification of partial π-charges based on the qualitative concept of resonance. In this method the weight of each resonance form is calculated and thus the contribution of a resonance structure to the π-charge distribution. As the method is iterative in nature and is shifting charges in order to achieve an electronegativity equalization it has been called in analogy to the PEOE procedure partial equalization of π-electronegativity (PEPE). As the starting point, all resonance structures are generated with a maximum charge separation of one unit charge. In the next step each resonance structure is assigned a weight factor consisting of a topological and an electronic contribution

w = w w (2.35) t · e

The topological weight, wt, is the product of three factors, faro, for judging the case of

destroying an aromatic system, fQ, for the introduction of a charge separation, and fcov, for the reduction of the number of covalent bonds

w = f f f (2.36) t aro · Q · cov

33 2.7 Charge Calculation by Partial Equalization of Electronegativity Chapter 2

The electronic weight judges the stability of the centers of charge introduced by separat- ing charge between two atoms, i and j, as shown in Fig. 2.10. DQ

R1 R3 R1 R3 N N i j R2 R4 R2 R4

Dcp

Fig. 2.10 Calculation of the electronic weight (Eq. 2.37) in the PEPE scheme. The stability of the resonance structure on the right-hand side is calculated by the difference in π-electronegativity of the charge centers, i and j, and a term representing the electrostatic influence of the first sphere of neighbors.

The example of Fig. 2.10 illustrates the effects playing a role in the stabilization of a charged resonance structure: (i) the donating orbital (the nitrogen lone pair in the example) has to have a low π-electronegativity in order to release charge and vice versa for the ac- cepting orbital. This is expressed as the difference of π-electronegativity, ∆χπ (first term in Eq. 2.37). (ii) the environment of the charge centers may have a stabilizing or destabilizing influence due to electrostatic effects. This electrostatic effect is expressed as the sum of ef- fective charges of the first sphere of neighbors (Eq. 2.38). The effective charge consists of the π-charge and a fraction of σ-charge (Eq. 2.39).

w = (χ χ ) + f (Q Q ) (2.37) e π,i − π,j e · i − j with Nk

Qk=i,j = qeff,l (2.38) Xl and q = q + f q . (2.39) eff π sp · σ The π-electronegativities required in Eq. 2.37 are calculated as a function of charge by polynomials of degree two as introduced in the PEOE scheme (see Eq. 2.31 on 29). The effective charges calculated by Eq. 2.39 are applied here in order to account for the influence of the σ-charges on the π-orbitals. As both terms in Eq. 2.37 are dependent on the π-charge itself the charge transfer is

34 Chapter 2 2.8 Software Development

distributed to N iteration steps and only a fraction 1/N is shifted in each step. After each step, the effective charge is updated and applied to calculate the quantities in Eq. 2.37. The factors, f, occurring in Eq. 2.36-2.39 were determined by correlation with experi- mental data: 13C-NMR chemical shifts of the para-carbon atom of twelve mono-substituted benzene derivatives, 13C-NMR chemical shifts of nine substituted pyridine derivatives, and ESCA C-1s chemical shifts at twelve different positions of seven fluoroethylenes.

2.8 Software Development

Handling and modeling chemistry with the computer requires both, a knowledge of chem- istry and of computer science. Computer science provides the technology for expressing chemical problems in a way that they can be solved computationally. Stepwise procedures for solving a given problem are called algorithms. After having developed an algorithm it has to be implemented in a way to be executable by a computer. For the implementation several programming languages are available. Common languages are supporting basically two different ways of organizing a program: structured and object-oriented programming. The implementation of an algorithm is normally embedded in code performing the coopera- tion of different program fragments and the management of required data as well as its input from or output to persistent memory. For the design of algorithms and the global structure of a program several techniques and tools have been developed. This section will give a short description of the software development technologies used in the present work. A comprehensive overview of modern software technology was given by BALZERT [58, 59].

2.8.1 Programming Paradigms

Structured Programming (SP) Structured programming [60] is characterized by global data structures that are visible from anywhere inside a program. Functions – in some pro- gramming languages also known as subroutines – are working with these global data and can access and modify them if required. Fig. 2.11 shows the layout of programs based on structured programming. With increasing size and complexity of projects it became evident that a separation of data and functions and the global accessibility of data leads to problems for maintaining and

35 2.8 Software Development Chapter 2

Fig. 2.11 Schematic layout of a program written in a structured programming language. Data are globally visible to all functions (based on a figure published in [61]). extension of software. If data can be modified from anywhere in a program, errors are hard to find since the entire code has to be searched for an erroneous modification of a global data field. Furthermore, a proper organization of data is increasingly hard to achieve with increasing size of a project. New data fields required for extensions might be hard to integrate into existing data structures. Therefore, with increasing size of software projects due to the rapid evolution of com- puter hardware in the late eighties and the early nineties, software development experienced a fundamental change from structured towards object-oriented programming [61].

Object-Oriented Programming (OOP) Object-oriented (OO) programming [62] inte- grates data and functions into objects. An object consists of the data required for the im- plementation of a distinct functionality and a set of functions to define the way the object can be used. The most important characteristic of object-orientation is that an object’s data – also termed as member variables – are inaccessible and even invisible to the rest of a pro- gram. This principle is called data encapsulation and can considerably help increasing the integrity of the internal state of a program. Fig. 2.12 outlines a program based on object- oriented programming. The interface, i.e., the set of functions – also known as member functions – an object

36 Chapter 2 2.8 Software Development

Fig. 2.12 Schematic layout of a program written in a object-oriented programming language. Objects are communicating via their interfaces. Data are local to each object and are visible only inside of an object’s scope (based on a figure published in [61]). provides, defines how to communicate with it. If an object is requested for a certain service it provides by its interface the implementation is commonly achieved in cooperation with other objects. Thus, shortly said, object-oriented software is implemented by objects that communicate through their interfaces. Objects of the same type can be abstracted into classes. Classes act as templates for a certain type of objects. It is said that an object is an instance of a given class. Classes constitute the basic elements of abstraction in the OO-paradigm. A further central concept in OOP is inheritance. From a certain class, specializations can be derived by subclassing. A subclass inherits the same interface as the parent class but often these functions are redefined and have a specialized behavior. This is called polymorphism: an object of a subclass can take the place of a base class’s object having the same interface but a different behavior. Relying code can not distinguish between the two types. Additionally, a subclass can also define new functions. Polymorphism is a further basic concept in OOP, introduced with the intention to ease the extensibility of existing software. All existing code that relies on the interface of a certain class accepts any of its subclasses. Therefore, no adaptations are required in existing code when new classes are introduced by subclassing.

37 2.8 Software Development Chapter 2

In order to demonstrate the concepts of inheritance and polymorphism, one may consider an application that should read different chemical data formats. A reasonable base class for the general task of reading would be a class ChemReader. For simplicity, this class provides only one function, getNextRecord(), that reads data from a file and returns the next record in the internal data structure. The application needs to know only this base class and calls getNextRecord() when needed. The actual implementations are performed in subclasses of ChemReader dependent on the format to be read. The subclasses are responsible to read and interpret the actual data from the file, i.e., they redefine the abstract behavior of get- NextRecord() in a concrete way. The application is parametrized with the appropriate reader class. If a new data format should be supported, only a new subclass of ChemReader has to be implemented and handed over to the application that requires no further modification because it relies only on the abstract interface of ChemReader. It is important to note that object-oriented and structured programming are not contra- dicting. Actually, basic algorithms are still implemented by structured principles but are implemented inside a given class. It probably can be said that the object-oriented paradigm is used for the large-scale design of software projects while structured programming is still applied when it comes to design and implement basic functionality such as algorithms for solving chemical or mathematical problems. Finally, it has to be stated that object-oriented programming is often considered to com- plicate software development more than it helps. Especially, in scientific computing the need for object-orientation often seems not to be evident and the advantage over the simplicity of structured design is not seen. This might be due to the fact that modern computer science seems often fond of introducing terms giving even simple concepts the semblance of high so- phistication. It is important to realize that the basic concepts of object-oriented programming essentially are not more complicated than the structured paradigm and that OOP constitutes a major progress in software development. For designing software, concentration on the key abstractions allows one to express prob- lems in a natural way. Key abstractions in chemistry are atoms, bonds, and molecules. There- fore, object-oriented systems for handling chemistry will deal with classes likely to be called Atom, Bond, and Molecule. Molecules consist of atoms and bonds. Exactly this relation- ship can naturally be expressed by OOP: objects of a class Molecule contain objects of class Atom and Bond in order to represent the chemical compound under consideration.

38 Chapter 2 2.8 Software Development

In a structured design, information on chemical objects is stored as lists of indices, i.e., of numbers representing the position of a data field related to a given chemical entity. Compar- ing both approaches, dealing with entities directly related to the key concepts of the problem at hand as it is possible within OOP seems much more appealing. Only by this small example the advantage of OOP over SP should immediately be evident.

Generic Programming (GP) A third programming paradigm, generic programming, should shortly be introduced here, that can equally be applied in OOP and SP. Generic programming [63, 64] can only be understood if the concept of data types is introduced. A type basically defines the memory layout of a data field. The type system of a programming language ensures that a variable of a given type is only used in a way that the memory is not corrupted. For instance, a compiler would not allow an assignment of a real value to an integer variable because the memory of an integer variable would not be large enough to hold a real value. An assignment would overwrite the following memory and cause a serious memory corruption. Thus, integer and real are basic data types. Furthermore, each class constitutes a new data type. The relationship of a class hierarchy is mapped to the type system. This allows a compiler to check whether an object is compatible to another one. In GP, software components are defined that perform operations reasonably applicable to different data types. As an example, one may consider a generic function for sorting a given set of variables. The algorithm will be the same as long as operations are available for defin- ing equality and a less than dependence. The generic sort function could be parametrized with basic data types such as integer or real but also with objects of a potential Atom class that might by sorted on the basis of atomic mass. Other examples are container classes such as vectors or lists that can potentially be defined for any kind of data type. A prominent ex- ample of the extensive use of GP in defining generic containers is the C++ standard template library (STL) [65]. Generic software elements consist of templates giving the instruction how to generate a distinct function or class. For OOP a generic class unifies a set of classes into one software component. New classes are generated from the template class as required. Because a tem- plate class is actually an abstract description of an entire set of real classes it is also termed as a metaclass and GP is also known as metaprogramming.

39 2.8 Software Development Chapter 2

2.8.2 Design Techniques

For each large project a basic plan, a roadmap, for the implementation of the project has to be developed. Constructing any kind of building requires a construction plan an architect works out. Writing a scientific publication requires an outline of the intended structure. In this section some basic techniques and tools will be introduced for the design and the documentation of the design for software that have been used in this work.

Flow Charts Flow charts [66] are often used to visualize the operational sequence of a program, an algorithm, or a process in general. This technique is tightly related to structured programming. As flow charts are used quite frequently in this work the basic graphical elements are given in Fig. 2.13.

start/end point of an process to be carried algorithm out (instruction)

decision to be made input/output to be (if/else/then) read/written

persistent memory control flow (file)

comment

Fig. 2.13 Symbols most frequently used within flow charts.

Unified Modeling Language (UML) With the broad acceptance and application of object- oriented programming the need for graphical representation of object-oriented software be- came evident. Several graphical notation schemes were developed in the first half of the nineties. Among those, especially BOOCH’s cloud notation [62] found widespread use. For instance, this method was used for representing the design of the RICOS system, a recent class library developed in our group [11] (see Section 3.3). In order to bring the different methods together and to define an industry standard for modeling object-oriented systems the Object Management Group (OMG) – an open industry consortium comprising the most important software companies – [67] adopted a draft for the Unified Modeling Language (UML) [68, 69].

40 Chapter 2 2.8 Software Development

UML provides several diagram types for representing static and dynamic aspects of object-oriented systems. The most important type is the class diagram in which symbolic representations for classes and the relationships among them allow one to capture basic de- sign decisions. All graphical representation of the class design in MOSES are based on this diagram type in the present work. The main symbols for representing classes are given in Fig. 2.14, (a) shows a normal class and (b) a generic template class.

(a) (b)

Fig. 2.14 Symbols for representing classes in UML. (a) shows the digram type for standard class, and (b) gives the diagram type for representing template classes.

Classes are represented by rectangles divided into three sections. The head section gives the class name. The middle section contains all member variables and the last section con- tains all member functions. Each entry is prefixed by a symbol indicating the visibility. A minus (-) means totally hidden from the outside, such members are called private. A plus (+) means full accessibility synonymous to public and a hash mark (#) indicates an intermediate visibility called protected that allows access only for derived classes. The data type of a member variable or of the return type of a function is given after a colon. For the representa- tion of generic classes the class symbol is extended by a smaller dashed rectangle containing the name of the type argument to be used for parametrizing the template class (Fig. 2.14 (b)). Comments possibly associated with all kinds of symbols in UML can be represented by a rectangle as shown in Fig. 2.14 (a). The dependencies between classes can be expressed by several line symbols as shown in Fig. 2.15. Fig. 2.15 (a) gives an example of inheritance as represented in UML. The class Derived is a subclass of Base. An association expresses a loose relationship between two class. This can be visualized by an arrow as in Fig. 2.15 (b). The lifetime of objects of the classes Associated and Class2 are independent of each other. How many instances of Associated might be associated with Class2 is specified by a string of the kind m..n termed as multiplicity. If a star (*) is given for m or n any number unequal to zero is possible. A closer relationship of

41 2.8 Software Development Chapter 2

Fig. 2.15 Example of using UML for modeling dependencies among classes. The arrow symbols are given for expressing inheritance (a), association (b), and composition (c). two classes is composition and is represented by an arrow having a filled diamond at the end pointing to the owning class (Fig. 2.15 (c)). The lifetime of the owned objects is bound to the lifetime of the owner. A multiplicity could also be given here. In order to illustrate the main symbols shortly, Fig. 2.16 gives a graphical representation of the example of ChemReader used in Section 2.8.1. Some further refinements of UML are used here.

Fig. 2.16 Example of using UML for modeling classes.

If a class is abstract, i.e., no object can be created of that class, the name is given in italics, also member functions that have no implementation and are intented to be overwritten. This is shown for getNextRecord() of class ChemReader. This function can reasonable be im- plemented only if the file format is known. Therefore, the corresponding method is written in normal font in the subclasses. It is likely that the ChemReader base class is responsible for storing the Compound objects that were generated on reading. The actual creation of Compounds is in turn in the responsibility of the base classes. While the former relationship is expressed as an association with multiplicity 0..* the latter is given by a dashed line that has not been introduced yet. This kind of relationship indicates a dependency. The type of

42 Chapter 2 2.8 Software Development

the dependency can be given as so-called stereotype surrounded by «...». In our example it is shown that Compounds are created by SDFReader or SMILESReader. Furthermore, interactions between objects can be modeled using sequence diagrams. This type of diagrams is related to the dynamic aspects of a system, i.e., how objects are cooperating in order to achieve a certain functionality. An illustration is given in Fig. 2.17 for the example of reading a SDFile in an application.

Fig. 2.17 Example of a sequence diagram in UML.

An object is represented by a rectangle with a vertical dashed line indicating the life time of the object. The name of the object and the related class is shown in that rectangle. An operation performed by an object is symbolized through a thin rectangle overlaying the life time line. Requests to other objects are represented by an arrow ending at the life time line of that object and triggering its activation. A return to the calling operation is shown by a dashed back arrow. Further diagram types are, e.g., use-case diagrams or object diagrams but will not be described further as they have not been used here.

Design Pattern Most implementation techniques aim at creating software components for solving problems in a general way in order to be reusable in different applications. On the level of software design often similar problems are encountered in different contexts. Design patterns are reusable approaches for the solution of such general problems. GAMMA et al. were the first who collected and systematized a broad range of design patterns [70]. This

43 2.8 Software Development Chapter 2

book was of such groundbreaking influence that the patterns published there are called gang of four (GOF) patterns (according to the four authors of [70]).

2.8.3 Techniques for Implementation and Maintenance

Assertive Programming Functions, both global functions and member functions, rely on certain expectations for the incoming arguments or the state of used objects or the state of the owned member variables. Furthermore, at some point of code some expectations might be crucial for accomplishing a requested functionality. In order to assure a correct program execution a developer should test for the correctness of those expectations. In case of a violation of these expectations the program is immediately terminated with an appropriate error message. The extensive use of assertion is termed assertive programming [71]. It is important to note, that assertions must not be used for treating program exceptions, such as inappropriate user inputs. Here, the technique of throwing exceptions is the method of choice that will not be discussed here in detail.

Testing The importance of testing software is certainly stressed as long as software devel- opment exists [72]. Nevertheless, OOP has induced the advent of some new techniques. In OOP, classes constitute the smallest building blocks. Therefore, it is manifest to test each class directly. This is called unit testing. Ideally, each class should have an associated unit test class. The importance of unit tests has been emphasized in particular by BECK who outlined a development process based on small iterative development cycles (extreme pro- gramming (XP)) [73]. The design of a system is refined in each iteration. This continuous refinement is called refactoring [74].The required changes in the code in one iteration must nevertheless not corrupt existing functionality of the previous iterations. Therefore, exist- ing code that might be adapted must be guarded as completely as possible by unit tests. Furthermore, all unit tests have to be performed automatically and thus included into the build process. Even though XP has more recently received some criticism (see e.g., [75]), automated unit testing has found widespread acceptance and appropriate programming li- braries are available for all popular programming languages helping with the set-up for unit testing [76].

44 Chapter 3

MOSES

3.1 Introduction

The aim of the present work is the development and application of computational methods for the estimation of properties of compounds. Any kind of computational procedure or algorithm requires its implementation in a computer program. Most often, the actual imple- mentation of a calculation procedure can be achieved by writing relatively small amounts of source code in the programming language of choice. Especially, when the programming is related to chemical problems – as in the current context – the larger parts of such a computer program deal with general tasks such as structure input/output, representation and handling of chemical structures, or perception of structural features such as rings, conjugation, or aromaticity and so forth. Here, we face a general problem in software development. Most software requires build- ing blocks of code for solving problems common to different application domains and are reusable in several contexts. For instance, in the C programming language only very ba- sic constructs are available for representing data, though in every non-trivial project more complex data structures are required such as lists, vectors, or maps. It was a frequently ob- served phenomenon that those data structures were implemented multiple times and existed in parallel in pure C-projects [77]. In order to overcome this problem, modern programming languages like C++ provide programming libraries with general code building blocks. For the given example of data structures, C++ provides the standard template library (STL) [65]. Besides the effect of releasing developers from the burden of writing large amounts of code not directly related to the problem at hand, such programming libraries have several other

45 3.1 Introduction Chapter 3

advantages: all applications relying on the same library will have the same behavior regard- ing the common functionalities. Fixes of programming errors (often referred to as "software bugs") apply to all applications. Furthermore, all improvements, optimizations, and new features in the library are immediately available to all related applications. The nature of a programming library is that it can not directly be used by the end-user but provides functionality reusable in specific applications. This functionality is implemented as functions or classes, each one related to a given problem and is accessed via a well-defined interface. An interface is separating the way how to use a function or class from its imple- mentation. A user of a programming library should not be confronted with implementation details but only with details related to the usage of a functionality. The quality of a program- ming library is strongly dependent on the way the interfaces of the functions and classes are defined, i.e., how they can be used and how their combination allows one to achieve the de- sired task. Dependent on size and scope of a library or a set of libraries, computer scientists often use the terms programming framework or programming platform synonymously. As with the general task of developing computer software, the implementation of calcu- lation methods in the context of QSPR can drastically profit from a specialized programming library. Therefore, it was one of the aims of the present work to design and realize the ba- sic functionality assisting the scientist to develop and implement methods for the empirical prediction of properties based on the principles as outlined in Section 2. The result has become manifest in the MOSES programming library. MOSES is an acronym for Mole- cular Structure Encoding System. This name should document the claim for providing a comprehensive programming platform for handling chemical structures, calculating struc- tural descriptors, applying statistical analysis, deriving empirical models using mathematical and pattern recognition methods, and developing end-user applications with well-designed graphical and non-graphical user interfaces. Moreover, MOSES was intented and planned to provide a platform for the implemen- tation of all scientific software in our group. One central research goal was since long the representation and prediction of chemical reactivity (see e.g., [78]). Therefore, this issue is another main focus of MOSES, though this point will not be discussed further here as this part is not included in the present work. Nevertheless, with the intention of adequately handling reactivity, a new representation of chemical structures was developed some years ago [11] that was designed in a way to overcome several shortcomings of the traditional

46 Chapter 3 3.2 From Script-based to Integrated Workflows

way of representing chemical structures by connection tables. This structure representation was slightly redesigned and was incorporated into MOSES and has been termed RAMSES (Representation Architecture for Molecular Structures by Electron Systems) [79]. It consti- tutes one of the key features in MOSES. Especially, the handling of conjugated π-electron systems is much improved in RAMSES compared to systems based on connection tables. This fits ideally to the requirement for the charge calculation methods that are a central concern in the present work (see Section 4). Therefore, RAMSES will be described in Sec- tion 3.4.2. It is obvious that such a comprehensive project like MOSES can not be realized by a single person. Indeed, the planning, design and realization of MOSES was a joint effort of several researchers of our group. To mention only the core developers team, A. HERWIG was responsible for the part dealing with all aspects related to reactivity [79] and with the implementation of the RAMSES data structure. J. MARUSCZYK was in charge of the devel- opment of input and output of chemical data and of the sublibrary concerning mathematics such as matrix operations [80]. The author of the present work developed the parts related to prediction of properties and the management of meta-data attached to all kind of property data as will be described in the present chapter.

3.2 From Script-based to Integrated Workflows

The calculation of a property normally requires several steps. For instance, the main steps for the prediction of a property based on a typical QSPR model consist of the calculation of molecular descriptors such as autocorrelation vectors that may rely on other atom based properties (see Section 2.2). If, for example, a counter-propagation neural network is used for the derivation of a model, these molecular descriptors are fed into the trained network that gives the property value as a response. The way these steps are successively processed is called a workflow or a process chain. Fig. 3.1 shows a typical workflow that had to be performed prior to the development of MOSES. Each calculation module was a single stand-alone application that had to be started separately with the output of the previous application as input. Additionally, each program expected a set of command-line arguments in order to control the details of the calculation algorithm. For instance, for the calculation of a topological autocorrelation vec-

47 3.2 From Script-based to Integrated Workflows Chapter 3

Fig. 3.1 Workflow for analyzing a dataset stored in a structure file with a counter-propagation neural net- work by three types of autocorrelation coefficients. The stand-alone applications used are given as comments at the right-hand side. Required input/output activities from and to persistent storage are indicated by I/O comments. tor, AUTOCORR requires at least the maximal topological distance and the property to be autocorrelated. A certain degree of automation was achieved by applying the scripting fa- cilities of the UNIX command-line. With UNIX-scripts all steps of a given workflow can be specified and performed as one command. This approach – and the implementation of single workflow steps as stand-alone applications – has several severe shortcomings that will be discussed in the following.

48 Chapter 3 3.2 From Script-based to Integrated Workflows

Meta-Data The results of a calculation module are dependent on the control parameters as given by command-line parameters. As shown in Fig. 3.1, both, topological and 3D au- tocorrelation vectors are calculated by the same program. Which kind of descriptor was calculated in the actual workflow depended on a command-line parameter given to AUTO- CORR. In addition, a calculation algorithm can be controlled by distinct control parameters. All these informations can normally not be reconstructed from the result files and with this kind of script-based workflows it is normally hard to achieve an appropriate management of these data. Secondly, a workflow step may rely on results of another program that has not been performed before. If this is the case the entire workflow has to be quit. If for example in Fig. 3.1 CORINA was not executed the calculation of 3D autocorrelation vectors would fail. There is no way of automatically calculating 3D coordinates when required by such a set-up. Furthermore, one workflow step may fail for several structures of the dataset to be processed. It can become a difficult task to track all failures over the entire workflow. Data that are related to a given procedure but are not required as input or are resulting as output data are called meta-data. It is a challenge for every software implementing workflows as described here to provide the facilities to manage such meta-data.

Data Flow Each single stand-alone program in Fig. 3.1 reads data from a file, constructs its internal data representation in memory and writes the read data along with the results back into a file. Each input/output operation (I/O) is marked by a comment in Fig. 3.1. For instance, the workflows with 3D autocorrelation or autocorrelation of surface proper- ties require six single I/O operations. The main disadvantage of frequent I/O operations is that access of hard disk memory is several orders of magnitude slower than access to the internal memory (RAM). For larger workflows applied to datasets with several thousands of molecules the I/O operations can increase the overall runtime of a workflow drastically. A practical problem can arise from the differing quality of the reading and writing functions or classes of the distinct programs. The processing rate of the entire workflow can only be as good as the worst single program with respect to the capability of reading and writing chemical data. These shortcomings can be overcome if the initial data file is read in the first step by a high-quality implementation of a required reading functionality and the internal representa- tions of chemical data are kept in memory over the entire execution of the workflow.

49 3.3 Overcoming Limitations of Existing In-house Libraries Chapter 3

Therefore, in order to provide a programming library that can help in developing software that takes into account the issues described in the previous paragraphs, MOSES was designed to be able

of managing meta-data • of running calculations on demand • of processing entire workflows in-memory •

3.3 Overcoming Limitations of Existing In-house Libraries

In our group, there is quite a long tradition in developing software in chemistry. The first ver- sion of the PEOE algorithm (see Section 2.7.2) was implemented in the PL/1 programming language more than twenty years ago [53]. The PEOE algorithm was one of the devel- opments with the intention of predicting reactivity based on physicochemical effects [29]. These efforts resulted in an expert system for reaction prediction and synthesis design called EROS [81]. Later, EROS, in version 6.0, was based on a programming library written in the FORTRAN77 programming language: liberos95. This library was designed in a way that code for achieving general tasks such as I/O, ring perception, etc. was separated in single functions and formed the basis for other stand-alone applications, e.g., PETRA 2.6 for the prediction of physicochemical properties [82] or WODCA 3.0 for synthesis planning [83]. liberos95 was designed and implemented based on the structured programming paradigm (see Section 2.8.1). With a constant growing of liberos95, it was realized that the extension of this library became increasingly difficult due to the shortcomings of the structured pro- gramming paradigm. Furthermore, a new representation of chemical structures [11] derived from principles of a molecular orbital treatment (see Section 3.4.2) was intended to build the basis of a redevelopment of the EROS system [84]. The required modification of the basic data structures in liberos95 was considered to be hardly achievable and consequently a new programming library was developed on the basis of the OOP paradigm [85]. This library – Representation of Inorganic, Coordinative, and Organic Structures (RICOS) – formed the basis of the stand-alone programs PETRA 3 and EROS 7.0. On the other hand, other projects were undertaken in parallel based on a different pro- gramming environment that evolved from the programming of the 3D structure generator

50 Chapter 3 3.4 Representation of Chemical Structures

CORINA [30]: libcorina. For instance, the stand-alone programs for calculating autocor- relation vectors, AUTOCORR 2.4, or for the calculation of surface properties, SURFACE 1.1, were based on libcorina. This library was written again with structured programming and suffered therefore from the known shortcomings and the structure representation was basically identical to that of liberos95. A further library – CACTVS (Chemical Algorithms Construction, Threading, and Veri- fication System) – was developed in our group by IHLENFELDT [86]. Some stand-alone programs were based on this library, e.g., SONNIA 4 [34] for information analysis by self- organizing neural networks or parts of WODCA 5 [87]. This library had a similar intention as we had with MOSES. Nevertheless, the code was based on a extension of the C language in order to apply object-oriented techniques for the implementation. Furthermore, the code was hard to understand and hard to maintain with minor documentation. Thus, this library was not an alternative to the development of MOSES, nevertheless, some applications had been realized in our group based on CACTVS. Hence, in order to abolish the divergence of programming environments in our group it was evident to create a new basis for further developments and we decided to develop a new programming library: MOSES. In summary, the objectives of this new library were

general applicability in order to provide a unified programming environment for all • research activities for both, modeling aspects related to structural properties and reac- tivity

support of both, the classical structure representation based on connection tables (va- • lence bond (VB) representation) and on molecular orbital theory (MO representation)

applying modern software development techniques in order to ensure simplicity, gen- • erality and high maintainability of the code

3.4 Representation of Chemical Structures

The representation of chemical structures is one of the very basic concerns in chemoinfor- matics and computational chemistry. An appropriate handling of chemical structures is the prerequisite to explaining chemical phenomena or for ensuring a good prediction model.

51 3.4 Representation of Chemical Structures Chapter 3

The essential information about chemical structures consists of the type of atoms in a compound, i.e., an atom’s position in the defining its core charge and its elec- tron shella, and the way these atoms are interconnected. For specifying different configura- tions or conformers also atomic coordinates in space have to be considered. On the other hand, chemists are used to interpret a high degree of implicit information contained in the way structures are represented. The trivial name of a compound for instance can normally be interpreted correctly and can be resolved to all implicit atoms and electrons involved in a structure even though no actual information about atoms and electrons is given. On the other end of the range of sophistication, in quantum chemical applications, compounds – or more precisely, molecular electronic structures – are encoded in a highly accurate way. Atomic orbitals are numerically expressed as linear combinations of primitive functions such as gaus- sian functions, molecular orbitals are treated as linear combinations of atomic orbitals and different configurations are linear combinations of single wavefunctions. Thus, along with the atomic coordinates, the output files of quantum chemical programs often consist of long columns of numbers resulting from finding the best sets of linear coefficients. Chemoinformatics requires the handling of structural information on an intermediate level of sophistication. A structure representation must encode basic structural features, atoms and bond types, free electrons and the connectivity of atoms but needs to be compact and simple enough allowing the handling of large collections of compounds up to a size of several millions of structures. The traditional way of encoding compounds in chemoinformatics is the use of connection tables which, nevertheless, have some crucial shortcomings. MOSES employs an alternative structure representation that was developed in our group in order to overcome these short- comings [11]. Since the way molecules and especially conjugated π-electrons are treated is of central importance to the calculation of physicochemical properties, the classical con- nection table, its shortcomings, and the data structure employed in MOSES will shortly be discussed here. A comprehensive survey of representations of chemical structures is given in [5].

amore precisely, along with the core charge, determined by the number of protons of the core, the number of neutrons has to be known giving the isotope of an element. This is ignored here as isotopes are not considered in the current context.

52 Chapter 3 3.4 Representation of Chemical Structures

3.4.1 Connection Tables

A connection table basically specifies the topology, i.e., the constitution of a compound by a list of atoms giving the and the number of free electrons and a list holding all bonds and bond orders. This information can be enhanced by further details such as the three dimensional coordinates of atoms. The topology of a compound is defined by its σ- bond framework. With the encoding of π-electrons problems arise when those are conjugated and delocalized over more than two atomic centers. In order to store such delocalized bonds in a list holding only two-center bonds, the conjugated π-electrons have to be localized into π-2c2e-bonds. This can not be achieved in an unambiguous manner. Therefore, compounds with conjugated π-systems can have more than one valid connection table representation. This is illustrated in Fig. 3.2 with the example of benzene.

Fig. 3.2 Benzene having six delocalized π-electrons. The representation with three π-2c2e-bonds can only be achieved by two equivalent connection tables.

As can be seen from Fig. 3.2, the concept of a connection table is equivalent to the struc- ture representation obtained by valence bond theory (VB). In terms of VB theory each bond is localized between two atoms having a certain bond order. Representing structures with de- localized π-electrons by several connection tables corresponds to the concept of resonance structures in VB theory. For the prediction of properties of compounds, the handling of delocalized π-systems by a set of resonance structures is often inappropriate. Especially, when heteroatoms are involved in π-systems and electronic properties should be modeled such as M-effects con-  nection tables are at their limit. For instance, the calculation of π-charges by the PEPE algorithm as described in Section 2.7.3 requires the generation of all possible resonance structures. From the intention to find a better structure representation for the modeling of electronic properties, the Hückel Molecular Orbital (HMO) theory stands to reason. The HMO theory provides an appropriate treatment of delocalized electron systems but is simple enough for

53 3.4 Representation of Chemical Structures Chapter 3

the application to large compound collections. The structure representation that had been derived from these considerations is shortly described in the next section.

3.4.2 RAMSES – A Structure Representation Based on σ/π Separation

The basic assumption in the HMO theory is orthogonality of the σ- and π-electron frame- work, i.e., both types of electron systems do not interact in a zero-order approximation [88] (see Section 4.2.1.1). A structure representation based on a separate treatment of σ- and π-electrons was sug- gested by BAUERSCHMIDT and GASTEIGER in order to overcome the limitations of tra- ditional connection tables [11]. This approach has been reworked and incorporated into MOSES by A. HERWIG and termed RAMSES (Representation Architecture for Molecular Structures by Electron Systems) [79]. Since RAMSES does not deal with bonds as in terms of the VB theory but also does not give a real representation of molecular orbitals having different energy levels and node properties, the notion of electron systems was introduced in [11].

σ-Electron Systems Single bonds and the σ-part of unsaturated bonds are represented by two-center electron systems having normally two σ-electronsb. This is shown in Fig. 3.3 for the central bond in ethane.

Fig. 3.3 Example of the representation of single bonds and the σ-part of unsaturated bonds by σ-electron systems in RAMSES.

σ-Electron systems will be depicted in the remainder of this section as solid lines as in normal Lewis formula.

π-Electron Systems π-Electron systems can span over more than two atom centers in or- der to represent delocalized π-systems but are defined in a broader sense than normally conceived by organic chemists. Along with the π-part of classical unsaturated bonds (see

bionization of a single bond can be represented by adding or removing one σ-electron

54 Chapter 3 3.4 Representation of Chemical Structures

also the next paragraph) also vacant orbitals, radicals, and free electron pairs are represented by π-electron systems. Fig. 3.4 gives examples for these cases.

(a) (b) (c)

Fig. 3.4 π-Electron systems in RAMSES: along with the π-part of unsaturated bonds (a) vacant orbitals, (b) singly occupied orbitals, i.e., radical centers, and (c) free electron pairs are treated as π-electron systems.

Moreover, for the sake of a general treatment, d-orbitals are also represented as π-electron systems. As d-orbitals are generally not involved in the bonding of main group elements (see Section 3.4.3) this is only of minor relevance. For the representation of bonding of transition a different type of electron systems was defined in [11] but will not be discussed further here. The advantage of such a general treatment of π-electrons is the perception of delocalized π-systems from valence bond based structure representations.

Conjugation Generally, chemical structures are stored in computer systems in file for- mats based on the concept of connection tables (see above). One of those, the MDL SDFile format [89], is ubiquitous. Normally, information about conjugation of electron systems is not provided by these file formats (even though SDFiles know an aromatic bond type). Therefore, after reading a structure from a file all bonds are stored internally as localized two-center bonds. In order to arrive at delocalized multi-center π-electron systems all lo- calized π-electron systems (in the definition given above) that are adjacent to each other are combined into one electron system. Often a situation is met where more than one π-system can potentially be conjugated, as found, e.g., with triple bonds, sp carbon center as in allenes, or multiple free electron pairs. Here, the largest possible π-system has always priority over other combinations. The algorithm that is implemented in MOSES as explained in [79] works iteratively and can shortly be summarized as follows: (i) the largest π-system is searched; (ii) the second largest π-system is searched having a σ-bond in common with the system found in the first step but do not have an atom in common; (iii) the electron systems found in the second step is

55 3.4 Representation of Chemical Structures Chapter 3

merged with the one found in (i) and erased. Steps (i) to (iii) are repeated until all π-systems in a molecule have been tested for conjugation. By giving the largest π-system priority in each step of this procedure and through the condition that a π-system can not be conjugated that has already an atom in common with the potential system to be merged with, it is ensured that orthogonal electron systems are represented correctly. For instance, the free electron pair of the nitrogen atom in pyridine is correctly considered to be independent from the delocalized ring π-system. This is shown in Fig. 3.5 (a). Furthermore, Fig. 3.5 gives examples for the conjugation of free electrons of the oxygen atom in phenole with the ring π-system (b) and of the RAMSES representation of acrylonitrile (c). The latter example illustrates the conjugation of a triple bond having one π-system orthogonal to the one that is conjugated and – here in the case of a triple bonded nitrogen – one additional π-system orthogonal to the former ones representing the free electron pair.

(a) (b) (c)

Fig. 3.5 Representation of delocalized π-electron systems in RAMSES of (a) pyridine, (b) phenole, and (c) acrylonitrile.

The basic algorithm for conjugation published in [79] had to be refined to work correctly with atoms having d-orbitals. The most important modification concerns the correct handling of hypervalent atom types. This will be described later (Section 3.4.3). As said above, d-orbitals are handled as π-electron systems. In the initial approach they were incorrectly conjugated with free electron pairs of adjacent atoms. See for example methanesulfenic acid (Fig. 3.6). It was found that both free electron pairs of the oxygen atom were conjugated with empty d-orbitals of the central atom. As will be discussed in detail below such an orbital interaction can be neglected and must therefore be suppressed in the RAMSES representation. Closing this section, it has to be pointed out again that the notion of an electron sys- tem in RAMSES does not include any kind of information about symmetry, energy or nodal

56 Chapter 3 3.4 Representation of Chemical Structures

Fig. 3.6 Delocalization of free electrons into vacant d-orbitals has to be suppressed. This is shown for the example of methanesulfenic acid (lone pairs on sulfur omitted for simplicity). properties. It is therefore somehow confusing to speak about a MO-based structure repre- sentation. RAMSES electron systems does not represent molecular orbitals by any means. Rather more, electron systems constitute one potential combination of atom based orbitals with an assumed correct symmetry – this combination is readily found for σ-systems but can be derived for conjugated π-electron system often only by using heuristics. In such a sense the RAMSES structure representation is equivalent to the first two steps in a Hückel MO treatment: (i) σ/π-separation, (ii) linear combination of atomic orbitals with π-symmetry. Nevertheless, the RAMSES representation of electron systems is ideally suited for a HMO treatment in order to gain information about the properties of the (π-) molecular or- bitals that are than attached to the corresponding π-electron systems.

3.4.3 Extension to Hypervalent Atom Types

Treating only atoms of the first-row allows a strict application of the classical octet rule formulated by LEWIS nearly one hundred years ago [90]. This is due to the fact that the first main lacks valence d-orbitals that may contribute to bonding. Elements of the second-rowc do have valence d-orbitals and also tend to expand their coordination sphere and seem formally to thus violate the octet rule. This is illustrated in Fig. 3.7 (a). The sulfur atom in methanesulfonic acid has formally two single and two double bonds. Such a valency can only be achieved when d-orbitals are utilized. Traditionally, textbooks give a hybridization for the sulfur atom of d2sp3 [91]. On the other hand, the sulfur atom could also obey the octet rule when dipolar bonds between the two formally double bonded oxygen atoms are assumed resulting in a twofold positive charge on the sulfur and on the two oxygen atoms a negative charge (Fig. 3.7 (b)). In terms of the resonance theory such a formal charge separation is

cthe discussion here will concentrate only on the second-row as phosphorus and sulfur are of special interest. The situation in heavier elements may be different due to higher effective core charges and shielding of inner electrons.

57 3.4 Representation of Chemical Structures Chapter 3

considered to be unfavorable and the uncharged resonance structure (a) would intuitively be assigned a higher weight. Thus, most chemists would prefer the uncharged form (a) when a group has to be written.

(a) (b)

Fig. 3.7 Resonance structures of methanesulfonic acid. (a) Sulfur atom with a valency of six with a participa- tion of 3d-orbitals (d2sp3-hybridization), (b) the sulfur atom obeys the octett rule by forming dipolar S – O-bonds.

The question is therefore if d-orbitals do play a role in the bonding of elements of the second-row and to which extent if so. And practically, how this can be represented within the RAMSES structure representation scheme. An extensive discussion about this issue and experimental and theoretical evidence pro and contra the participation of d-orbitals in the bonding of is given by HUHEEY [92]. Yet, a final statement is avoided by this author. SCHLEYER and REED reviewed the theoretical work done [93] in that field and came to the clear conclusion that d-orbitals do not play a significant role as valence orbitals. They argue that total d-orbital occupancies obtained from various population analysis methods such as natural population analysis (see Section 4.2.5.1) reach at most 0.3 e and that d-orbitals act rather as polarization functions required as acceptor orbitals for back-donation from ligand atoms. As main source for de- localization negative hyperconjugation from free electron pairs on the oxygen atoms into geminal σS∗X orbitals was identified. WEINHOLD and co-workers have developed a framework for analyzing molecular wave- functions in the view of localized Lewis-type structures [94,95]. The most prominent method of this tool-set is the natural population analysis that will extensively be used in Chapter 4. In a second step of this procedure, natural bond orbitals (NBOs) can be derived expressing a structure in terms of localized two-center bonds and free electron pairs called Lewis-type orbitals and two-center antibonds and vacant orbitals called non-Lewis-type orbitals. Delo- calization is expressed by donation from Lewis-type into non-Lewis-type orbitals. In order to gain a more detailed insight into the bonding situation in sulfonic acid groups a natural population and a natural bond order analysis was performed on methanesulfonic

58 Chapter 3 3.4 Representation of Chemical Structures

acid. The calculations were done on the B3LYP/6-311+G** level of theory with the fully optimized structure on the same level with the NBO v3.1 program incorporated in the GAUS- SIAN 98 package [96]. In accordance with the findings reported in [93], the overall occupation for the d-orbitals on the sulfur atom was 0.16 e ranging from 0.019 e (dxy) to 0.045 e (dz2 ) for the single orbitals. The Lewis-structure found by the NBO analysis shows two oxygen atoms with three free electron pairs and corresponding NPA charges of 2.23 on the sulfur atom and -0.88/-0.90 on the oxygen atoms in agreement with Fig. 3.7 (b). The situation in delocalization can also be reproduced with the NBO analysis. The non-

Lewis-type orbitals that have a significant occupation are the σ∗ orbitals of the S – CH3-, S – OH, and the two S – O-bonds with 0.17 e, 0.32 e and 0.14 e, respectively. This is com- plemented with two significantly depleted lone pairs on each of the singly bonded oxygen atoms with occupations of about 1.78 e. Of further importance with regard to the representation in RAMSES is the amount of con- jugation of a sulfonic acid group with a π-electron system. Benzenesulfonic acid is used as model compound for the discussion of this question. The bonding pattern in the sulfonic acid group is found to be very similar compared to the isolated sulfonic acid group in methanesul- fonic acid. In resonance theory, the sulfonic acid group is classified as a M-substituent and − should decrease the electron density in the phenyl ring. Therefore, the occupations of all ring

π-type orbitals (π- and π∗-bonds of the ring) were summed up resulting in a total occupancy of the ring π-system of 5.97 e. In order to set this value into relation to other benzene deriva- tives the same procedure was done for benzoic acid and aniline giving ring π-occupations of 5.92 e and 6.13 e, respectively. This fits also well into the picture of resonance theory as a carboxy group is also an M- and an amino group is a strong +M-substituent. Thus, even − though a delocalization of 0.03 e out of ring seems small in absolute terms it is in the range of magnitude compared to other benzene derivatives. Nevertheless, due to the small absolute value it is difficult to track down the interaction of the benzene ring with the sulfonic acid group. We will thus base our representation of the sulfonic acid group on the following assump- tions: (i) the two S – O-bonds are dipolar with the sulfur atom bearing a twofold positive formal charge; (ii) one conjugated π-electron system spans over all heavy atoms in the sul- fonic acid group. The type of the contributing orbitals is not explicitly specified and as-

59 3.4 Representation of Chemical Structures Chapter 3

sumed to have correct symmetry. (iii) the ring π-system is conjugated with the sulfonic acid group. The interacting orbitals are not explicitly specified. Therefore, RAMSES will repre- sent benzenesulfonic acid by one π-electron system spanning over all heavy atoms as shown in Fig. 3.8.

Fig. 3.8 RAMSES representation of benzenesulfonic acid. One π-electron system with 12 π-electrons spans over all heavy atom.

It was argued that 3d-orbitals may participate in valence bonding of second-row elements if an atom is bound to strongly electronegative atoms. Due to an increasing effective core charge, orbitals shrink and become more compact and can more efficiently overlap with s/p- orbitals [92]. If this assumption can not theoretically be supported even for a sulfonic acid group where a sulfur atom is bound to three oxygen atoms, we will generalize these findings to the rule that d-orbitals will not be considered for the bonding of main group elements in RAMSES. Therefore, as long as no atoms are considered, the RAMSES representation of electron systems can stick to the simple s/p-orbital picture. Consequently, all hypervalent species such as sulfoxides, phosphanoxides, or phosphonic acids will be represented with dipolar bonds and formally positively charged central atoms and π-electron systems reflecting conjugation in such systems as shown for benzenesulfonic acid.

3.4.4 Perception of Aromaticity based on RAMSES

Tightly related to the delocalization of π-electrons is the concept of aromaticity. Similar to electronegativity (see Section 2.7.1), aromaticity is not strictly defined but is one of the basic concepts in organic chemistry aiding in the assessment of properties and reactivity. Because of a missing clear definition the perception of aromaticity with general applicability

60 Chapter 3 3.4 Representation of Chemical Structures

is a difficult task. Quantitative measures based on ab initio calculations (e.g., [97] or [98]) are available but can not be applied routinely in the present context. The simplest way to the perception of aromaticity is grounded in the Hückel rule. It states that a planar cyclic conjugated π-system is aromatic when it is occupied with (4n + 2) π-electrons, n may be any positive natural number. Since the RAMSES data structure is representing delocalized π-electron system in a natural way only information about the ring systems in a molecule is additionally required for the perception of aromaticity in MOSES based on the Hückel rule. MOSES provides a calculation module for the derivation of the smallest set of smallest ring (SSSR) that was incorporated from the CORINA library [99]. With this information available an algorithm has been developed for the perception of aromaticity as outlined in Fig. 3.9. In the first steps, after reading structures from connection table based files the RAMSES data structure is generated and the SSSR is detected. All ring fragments are stored in a list and searched for larger condensed ring substructures in order to recognize structures such as naphthalene. In the main step, the π-electrons belonging to a given ring that are in cyclic conjugation are detected. This is not a trivial task. It is not sufficient to simply count all π-electrons a ring atom is contributing to the π-system. If so, structures like maleic anhydride would be found to be aromatic. In order to prevent such misclassifications, exocyclic double bonds have to be taken into account. Thus, ring systems with a formal π-sextet such as maleic anhydride or radialene can be handled correctly (Fig. 3.10) The advantage of the RAMSES data structure over connection tables becomes especially evident when it comes to handle charged compounds with aromatic ring systems. It is not required to generate any resonance structure for the perception of aromaticity. The same algorithm as before can be applied. The π-center bearing a formal positive charge does not contribute a π-electron to the π-electron system, whereas a negatively charged π-center contributes two π-electrons. Thus, aromatic ring systems in charged molecules can easily be detected (Fig. 3.11) Finally, due to the search for condensed ring systems prior to single rings, structures hav- ing a π-system spanning over several condensed rings fulfilling the Hückel rule are classified correctly to be aromatic (Fig. 3.12).

61 3.4 Representation of Chemical Structures Chapter 3

start

input of connection table

generation of RAMSES

ring perception (SSSR)

store SSSR in sssrlist

search sssrlist for condensed rings

store condensed rings and remaining single rings in ringlist

remove ring from ringlist ring := ringlist.begin() store ring fragments instead

YES

nr of p-electrons NO ring is a condensed system? in ring = 4n+2?

NO YES

store ring in aromrings

NO

ring := ringlist.next()

ring = ringlist.end()?

YES

stop

Fig. 3.9 Flow chart of the algorithm for the perception of aromaticity within MOSES based on the RAMSES data structure

62 Chapter 3 3.4 Representation of Chemical Structures

Fig. 3.10 Non-aromatic structures with rings having a formal π-electron-sextet.

Fig. 3.11 With the RAMSES representation of π-electron systems charged aromatic compounds are easily be classified correctly.

Fig. 3.12 Azulene and derivatives are classified correctly to be aromatic.

63 3.5 Architecture and Implementation Chapter 3

3.5 Architecture and Implementation

A well-designed software system is generally characterized by a modular layout. A further principle is given by organizing modules in layers that are composed hierarchically. The design of the MOSES system was realized by these principles. Fig. 3.13 gives the basic architecture of MOSES.

Fig. 3.13 Basic architecture of the MOSES library.

The core of the system is constituted by RAMSES, the representation of molecular struc- tures. Here, all types of chemical entities have their class counterparts, such as representa- tions for compounds, atoms, electron systems and a class for the generation and modification of such chemical entities. The ring embracing the core contains sublibraries designated for distinct tasks. All classes of this second ring have full access to the core and to the other members of that ring (some sublibraries have been omitted here that are of no interest in the current context such as the parts dealing with reactivity). The outer compartment is the application level. Applications have full access to the functionality of the inner parts of the MOSES library. The sublibraries concerning management of properties and calculators are of special in-

64 Chapter 3 3.5 Architecture and Implementation

terest here because they have been developed within the scope of the present work. These parts are discussed in more detail in Section 3.6. In the present section the remaining parts of the MOSES library are shortly described concentrating on the functionality related to the problem of property prediction. A comprehensive documentation of each class of the MOSES library can be found in the MOSES Programmer’s Guide [100] and the API (appli- cation programming interface) documentation [101]. Before going into detail with certain sublibraries of MOSES, the choice of the program- ming language used for implementation has to be discussed. As stated before, for the design and implementation the object-oriented programming paradigm should be applied. A broad variety of programming languages support the OO paradigm such as EIFFEL, SMALLTALK, JAVA, or C++ [102]. However, most of them did not find widespread use. Therefore, only C++ and JAVA were actually at our disposal. From a point of view concerning clarity and simplicity of the language’s design JAVA would have been the language of choice [103]. Unfortunately, JAVA is known to have deficiencies with respect to performance, especially with the execution of complex calculation algorithms [72]. In the field of chemoinformatics, applications often have to process large amounts of data and a good performance is crucial in that field. Therefore, we decided to implement MOSES in C++ mainly for the advantage in execution speed.

RAMSES – Chemical Objects The RAMSES data structure is implemented through a set of classes representing chemical entities. The common characteristic of a chemical entity is the existence of attached properties. Therefore, all chemical entities have the same base class: ChemObj. This class is a main abstraction with respect to property management and will be discussed below in detail. The other classes are derived naturally from their model counterparts: Compound, Atom, and ElecSys. Class ElecSys is the abstraction for electron systems in general and is special- ized into SigmaSys and PiSys for representing the two different types of electron systems.

Stores – Structure Input/Output For the input and output of chemical data a series of Stores are available. A store is the abstraction for the reading/writing functionality in MOSES. Stores should provide a unified mechanism for accessing chemical data from a per- sistent memoryd source disregarding the specific type of memory, e.g., a flat file stored on

dnon-volatile memory that stays intact after shut-down

65 3.5 Architecture and Implementation Chapter 3

a hard disk or a data base connection over a network. Subclasses are available for reading (InputStore) and writing (OutputStore) of both, structural and property data.

Statistics and Neural Networks A tool-set comprising the most important statistical meth- ods and neural networks is crucial for model building and evaluation as described in Sec- tion 2. Tab. 3.1 gives the functionalities that are provided by the corresponding MOSES sublibraries.

Table. 3.1 Functionality available within the statistics and neural network sublibraries in MOSES.

Class (cl) or function (f) description LinearRegression (cl) linear regression between an x and y vector of variables MLRA (cl) multiple linear–regression analysis pca (f) principle component analysis pls (f) partial least squares analysis KohonenNet (cl) self-organizing neural network BPGNetwork (cl) feed-forward neural network with back-propagation of errors

Feed-forward neural networks are of special interest for the modeling of non-linear rela- tionships (see Section 2.4.2). A design of this network type has been achieved in the scope of the present work. Parts of its implementation were done by J. ZHANG. This is discussed in more detail here as it gives a good example of the principles of object-oriented design. The main classes and their relationships are depicted in Fig. 3.14. The central class is FFNetwork. It is managing the network architecture, training and validation datasets and has functions for training and prediction. Feed-forward networks are structured in different layers. This is abstracted by the class FFLayer that consists of a set of FFNodes. The behavior of a certain layer is determined by the type of nodes. For each kind of neurons one subclass of FFNode is available. The concrete neuron types are implemented in the virtual functions stimulate and adaptWeights according to their expected behavior. Each node may be connected to several other nodes. This is expressed through the class Connection that holds references to the source and sink neuron. Each type of neuron generates from the incoming signals an output signal. This transfor- mation can be achieved by various transfer functions. Internally, an abstract base class is ap- plied. Several subclasses implement the most frequently used transfer functions: sigmoidal,

66 Chapter 3 3.5 Architecture and Implementation

Fig. 3.14 UML diagram showing the main classes required for the implementation of a feed-forward neural network. Different learning algorithms and transfer functions can easily be implemented by the use of abstract function classes.

67 3.5 Architecture and Implementation Chapter 3

hard-limiter and threshold logic (see also [37]). By such a design it is readily possible to extend the system by new types of transfer functions. A similar strategy was chosen for the learning algorithm. Internally, the network uses only an abstract base class LearningAlgorithm. The actual learning is achieved by concrete subclasses. The default is the back-propagation of errors (class BPGOfErrors). But any other type of learning can easily be added.

Optimizers Optimizer are required for the task of parametrization. The optimal set of parameters employed in a calculation method can be searched by stochastic optimization methods if no analytical relationship between the parameters and the property of interest is available (see also Section 2.6 and Section 4.3). Fig. 3.15 shows the main classes of the MOSES sublibrary.

Fig. 3.15 Main classes of the optimizer sublibrary.

The abstract interface is declared by class Optimizer. For a given set of parameters the response value has to be calculated. This is dependent on the problem at hand. As before, an abstract function class is defined that is used by Optimizer. This has to be subclassed by a given application and set to the Optimizer. In the run method of an optimizer a new set of parameters is generated and supplied to a CostFunctor’s evaluate method calculating the response and in turn giving this back to the Optimizer. Dependent on this response value an updated set of parameters is generated. This procedure is run iteratively until convergence of the response is achieved. Two specific optimizers are currently available: a brute force approach performing a global search for a set of parameters in a given range of possible values and with a given step

68 Chapter 3 3.6 Management of Properties

size; a more advanced optimizer is provided by class SimulatedAnnealing. The implemen- tation of this simulated annealing has been taken from Taygeta Scientific Inc. [104], that is available free of charge and was interfaced to the MOSES class.

Qt Graphics Applications provide the functionality of a library to the user. The user in- terface (UI) can be realized by a set of command-line options allowing one to control the program flow. A more convenient way can be achieved by the development of graphical user interfaces (GUI) that provide the functionality through dialog windows and allow an inter- active communication with the user. The development of GUI applications requires a library providing basic graphical building blocks such as buttons, input fields or dialog windows called widgets. The implementation of such widget libraries is strongly dependent on the operating system (OS) to be used and the choice of the wrong widget library can restrict the developed GUI application to a certain operating system. As MOSES was intented to be as OS-independent as possible we decided to base all developments concerning graphical inter- faces on the Qt library [105]. Qt is available for all common operation systems, Microsoft Windows, nearly all UNIX derivatives including Linux and even for Apple’s MacOS. This library has been published under the GNU general public license (GPL) [106] and is thus free of charge for academic use. A MOSES sublibrary is designated to provide Qt widgets performing tasks that are gen- erally required when developing applications in the domain of chemoinformatics. MOSES applications such as MOSES::WORKFLOWMANAGER (see Section 3.8.1) or the MOSES structure browser MIA [107] are based on that library.

3.6 Management of Properties

The MOSES project was started with the intention to provide a sophisticated management of both, property data and associated meta-data. It should be realized in a way to be easy to use but should be general enough to handle any kind of data and to be as extensible as possible. The project was started from scratch giving the opportunity to perform a thorough analysis of the requirements and a clean implementation without any restrictions imposed by existing code. Therefore, we could stick to the classical way of designing object-oriented software: analysis (Section 3.6.1) and design and implementation (Section 3.6.2).

69 3.6 Management of Properties Chapter 3

3.6.1 Analysis

One Property – Different Sources What do we refer to if we talk about a property? In a restricted manner the "real" value of a physicochemical property is meant. Say, the n- octanol-water partition coefficient is under consideration. We talk then about a compound’s log P value. That this value is measured in a n-octanol/water system is most often implied even if it is not stated explicitly. What is then the "real" value? The equilibrium of partition is dependent on several factors such as temperature or pressure. Normally, standard conditions are implied, i.e., 298.15 K and 1,013 hPa. But also this idealized "real" value is dependent on the accuracy of the experimental set up. Thus, it is actually just an approximation with a given error range. Furthermore, a variety of calculation methods is available for the estimation of log P [108]. In a discussion, one may refer to the experimental and the calculated log P value. Again, a series of questions is associated with a calculated value. Which program has been used for the calculation and which version of the program? If the calculation can be influ- enced by certain parameters which ones have been chosen? If quantum mechanical calcula- tions have been performed, which level of theory has been applied? Within a comprehensive framework for the management of chemical properties all the meta information about a property value has to be stored. A certain property has to have a unique identity in the systems with respect to the available meta-data.

Properties as Common Characteristics of Chemical Entities What are the common characteristics of chemical entities such as reactions, compounds, atoms or bonds? All chemical entities can have various kinds of properties, for instance, atoms have a , bonds have dissociation energy, compounds have a dipole moment or reactions have a rate constant. As a consequence, the characteristic may have properties is represented by a basic key abstraction ChemObj and all chemical entities have at least this abstraction in common.

Scope and Definition of the Term Property in MOSES To this point the term property has been used in quite a restricted manner related to single values of physicochemical prop- erties. However, from the perspective of a software library the concept of a property can be much more abstract. In a more chemical context, an infrared spectrum may be attached

70 Chapter 3 3.6 Management of Properties

to a compound as a property, in the context of a graphical software application the color of an atom to be depicted may be attached to an atom. In an even more extended sense, an alternative structural representation may be stored as a property of a compound. Therefore, the definition of the term property in MOSES is quite abstract: a property may be any kind of data that can be attached to any kind of chemical entity. This definition gives MOSES a high flexibility and extensibility.

Data Types of Properties Usually, applications and software systems expect properties to be of one of a basic data type, such as a real or a integer number, a boolean or a character string type. From the abstract definition of properties as was given in the previous paragraph results a high flexibility of the data types that MOSES must be able to handle. What has the real number value of the molecular dipole moment in common with a binary data block representing a molecule as a GIF depiction? What if some day a completely new data type should be integrated into the system? Furthermore, properties are often vectors of data, for instance, the cartesian 3D coordinates of atoms are three dimensional vectors of real numbers.

Validity of Properties The MOSES system is intended to handle both, structures and re- actions. For reactions the validity of properties that are associated with a chemical structure has to be discussed. Such properties must not be used anymore after a compound was chem- ically modified. Nevertheless, some properties have a special meaning and should be kept even after modification. Shortly, the user of the system should be able to define under which conditions a property has to be removed.

3.6.2 Design and Implementation

Three key abstractions are derived from the analysis: ChemObj manages the retrieval, stor- age and calculation on demand of properties, PropertyKey defines the identity of a property and Any is an abstract way of storing arbitrary types of data. These three concepts will be introduced following.

ChemObj Fig. 3.16 shows the three classes and their relationship. ChemObj provides functions for retrieving and setting of properties. Properties are stored into a std::map that

71 3.6 Management of Properties Chapter 3

provides the best performance for searching a given data element by guaranteeing a O(log N) complexity, with N, the number of elements in the map [109]. PropertyKeys are required to be unique. Class PropertyKey and the generation of PropertyKeys will be described below. The interface of ChemObj is the only way for the retrieval of properties. It is not possible to calculate a property directly by a calculation function. Through this restriction it is guar- antied that a property is calculated on demand and only once. The details of the calculation mechanisms will also be introduced later.

Fig. 3.16 Properties are available only through the interface of class ChemObj.

Any The handling of arbitrary data types is not strictly related to the problem at hand but is of general interest for similar projects written in C++. C++ is a so-called strongly typed lan- guage meaning that a variable of a given type, say an object representing a GIF picture, can not be assigned to a variable declared as an integer. But exactly this is required with prop- erties as defined above. In C the only way to solve this problem was to use rough memory addresses, so-called void* pointers, and to convert such pointers to the correct type when re- quired. This is called type cast. Such an approach requires the introduction of a user-defined system of type codes with the enormous danger of corrupting the memory and the risk of unexpected program aborts when such type casts were erroneous. With C++ mechanisms were introduced for type-safe casting [110], i.e., the run-time system of C++ allows the de- tection and conversion from one type to another if they are compatible. Otherwise a error code is given or an exception thrown. On the other hand, if the concrete type of the data to

72 Chapter 3 3.6 Management of Properties

be handled is known at compile-time the generic programming facilities of C++ can be used (see Section 2.8.1).

Fig. 3.17 Class hierarchy for the representation of arbitrary data types in MOSES.

73 3.6 Management of Properties Chapter 3

Using these facilities provided by the language, a class hierarchy inspired by the any concept used in CORBA [111] was designed that fits perfectly to our requirements:

1. storage of arbitrary data

2. storage of both, scalar and vectorial data

3. type-safe access

4. easy extensibility

Fig. 3.17 shows the class hierarchy of Any types as UML diagram. The base class Any declares an abstract interface for the handling of data. The actual data are stored in the class AnyDataHolder that can be parametrized with any kind of data type. These data can be accessed through the base class by a set of functions that can be parametrized by the correct data type. AnyDataHolder is specialized into ScalarAny and VectorAny implementing the interface of Any referring to scalar data types and of vectorial types, respectively. Finally, in order to end up with concrete Any types ScalarAny and VectorAny have to be bound to the data types of interest. This is achieved by the typedef construct. Examples are given in the lower part of Fig. 3.17. An important characteristic of Anys is that they know how to reproduce themselves, i.e., having an object without knowing the concrete type allows nonetheless its cloning and the creation of a new Any with the same dynamic type. The create() method follows the factory method pattern. A further way in treating Anys in an abstract way is the opportunity to convert them into and create them from string representations. The factory method along with the latter feature has been designed especially for a simplified handling with respect to I/O tasks.

PropertyManager It has been said that PropertyKeys have to be unique with respect to the property source. Property sources may be calculation modules or other external sources that are not further specified. The generation of a PropertyKey and the mapping with its source must be achieved by a central abstraction. This has been introduced by the class Property- Manager. This class is of central importance for the property management. It provides an interface to the entire package and represents therefore the facade pattern. Furthermore, only

74 Chapter 3 3.6 Management of Properties

one instance can be active in the system at a time. In order to ensure this, PropertyManager is implementing the singleton pattern. The main classes that cooperate together with PropertyManager in the property manage- ment are shown in Fig. 3.18.

Fig. 3.18 Main classes required to implement the property management in MOSES.

A variety of meta-data can be attached with a property. All these data are stored in objects of the class PropertyDescription. These objects are composed of PropertyInfos holding the basic information about a property such as name, physical unit, a description and the type of chemical entities the property is valid for. PropertyDescription is specialized into two subclasses: CalculablePropertyDescription for representing properties having a calculation method known to the system and ReadProp- ertyDescription for properties having an external source. Both types of subclasses may have additional information stored about the way a property was generated, either details read from a file or actual control parameters for a calculation module. The corresponding class is ControlParameterList. External data sources do not have a common class and are therefore represented by a placeholder PropertySource in Fig. 3.18. Such sources are represented by strings, e.g., the file name from where the property where read or the name of the laboratory the property was experimentally determined.

75 3.6 Management of Properties Chapter 3

PropertyKey and PropertySearchKey Internals about PropertyKey and its generation via PropertyManager are given in Fig. 3.19. The PropertyManager provides a set of methods for the generation of PropertyKeys on several levels of sophistication. Reasonable default settings are used if this information is not given explicitly. According to the two basic types of properties, those that have an associated calculator and those from external data sources, two basic types of methods are declared starting with calculable or read. All methods expect the name of the property and optionally additional information about calculator or source. If the calculator should be described in full detail a helper class CalculatorDescription has to be used (not shown explicitly in Fig. 3.19). Objects of that class provide all control parameters available for a Calculator and are also generated by PropertyManager.

Fig. 3.19 Two types of property keys can be generated by PropertyManager. In order to make them usable in a polymorphic manner class PropertyGetter is available.

As has been pointed out before, PropertyKeys are unique and are comparable in order

76 Chapter 3 3.6 Management of Properties

to use them as key types for storing properties in a std::map. This is required for setting a property. Retrieval of a property should often be more tolerant. A user might request for a distinct property but is not interested in any details of its generation. Therefore, class PropertySearchKey has been introduced. PropertySearchKeys provide the opportunity to apply search strategies for finding prop- erties. The default SearchStrategy simply searches for the first property that matches a given name. A PropertySearchKey which was generated by the PropertyManager will use this strategy. Nevertheless, other – more sophisticated – strategies can be implemented by deriv- ing subclasses from SearchStrategy. Please note that PropertyKey and PropertySearchKey are not compatible. For using them in a polymorphic manner class PropertyGetter is avail- able (see below). Applying a PropertySearchKey generated by PropertyManager to retrieve a property from a ChemObject will result in the first property with the given name. If no such property can be found, the system will try to calculate it if an appropriate Calculator is available.

PropertyGetter PropertyKeys are used by value and not by reference. As C++ allows using polymorphism only by reference types the behaviour of PropertyKey can not be refined by subclassing. Therefore, a PropertyKey can not be substitued by a PropertySearchKey; they are not directly related through inheritance. Nevertheless, there are situations when a general request for a property should be re- fined to a more specific one. For instance, an application sets an initial default value for a property retrieval by name. Later, the user should be enabled to refine the conditions for the request to a property. It should than be calculated by a distinct calculator using a specific set of control parameters. The latter case is only possible by using a PropertyKey. Here, class PropertyGetter has to be used that provides polymorphic usage of PropertyKeys and PropertySearchKeys. This is shown in the lower part of Fig. 3.19.

Calculators The UML diagram shown in Fig. 3.20 gives a more detailed view for the concept of calculators and their management. Objects of class CalculatorInfo hold all infor- mation about a calculation module such as name, version and a description. Furthermore, the list of properties provided by the calculation module is defined by a set of Property- Infos. References to scientific publications about the calculation details are represented by class Literature. Beside the name and version the set of control parameters available held by

77 3.6 Management of Properties Chapter 3

class ControlParameterList is crucial for defining the identity of corresponding properties. CalculatorInfos have such a list with reasonable default settings for the ControlParameters. As control parameters can contain any kind of data type, again the Any concept is used for storing those data.

Fig. 3.20 Detail view including parts of the interfaces for the implementation of the calculator management.

A calculation module is introduced to the system by registering a CalculatorInfo at the PropertyManager. If a specific calculation module is needed the PropertyManager requests the generation of an instance of class Calculator from the corresponding CalculatorInfo. The ControlParameterList is also provided by CalculatorInfo and can be modified to the users need. If a Calculator is already existent with the requested settings stored in a Control- ParameterList the existing instance is used. As shown in Fig. 3.18 the mapping between a calculator with a certain parametrization and corresponding PropertyKey is achieved through objects of class CalculablePropertyDescription. Calculators can only be created by the Prop- ertyManager. This is the only class that is allowed to create CalculatorCreationPolicy re- quired for the instantiation of Calculators. The policy pattern is a smart way in restricting

78 Chapter 3 3.6 Management of Properties

the access to parts of a classes interface without exposing the entire internals by giving a friendship privilege. Class Calculator is an interface class providing the calculate() method for triggering the execution of the actual calculation. This method is again protected by a policy class CalculationPolicy. The only type that has access to that policy is class ChemObj. By such a design it is assured that properties can only be accessed via ChemObj. In addition, this class is exclusively responsible for the direct management of properties. Furthermore, Calculator provides functionality assisting with the implementation of subclasses. Two intermediate classes are derived from Calculator helping with concrete implemen- tations: ChemObjCalculator and CompoundCalculator. Concrete Calculators have to be derived from one of those. ChemObjCalculator is intented to be used for calculations re- lated to a single ChemObj. This class provides helper functionality for converting the ab- stract ChemObj object into a concrete type. The second way of implementing a calculation is defined by CompoundCalculator that expects subclasses to calculate properties for all ChemObj of a given type for the entire compound they are associated with. For instance, the PEOE algorithm (see Section 2.7.2) calculates partial atomic charges. This calculation can not be performed for a single atom. Thus, the calculator CalcPEOE is derived from Com- poundCalculator and is calculating atomic charges in a batch for all atoms in a compound.

Collaboration Between the Main Classes Up to this point the static view on the property management has been discussed. In this paragraph the dynamic behavior should shortly be demonstrated by the example of calculating the information if an atom is aromatic or not. A UML sequence diagram is given in Fig. 3.21. Step 2. and 3. in Fig. 3.21 starting from the application life line give the general scheme for the retrieval of a property: request for an appropriate PropertyKey from the Property- Manager and request for the property from the ChemObj using the PropertyKey. Property- Manager provides additional methods for specifying the calculator to be used and its control parameters. These details are not shown in Fig. 3.21. The procedure shown assumes that no property was already calculated. Therefore, after the atom has not found an appropriate entry for the requested property in the internal property map the calculation with appropriate calculator is triggered. Again, class PropertyManager helps with the retrieval of the Calculator attached to the PropertyKey at hand (this step is not shown in Fig. 3.21). As has been described in Section 3.4.4 the smallest set of smallest rings

79 3.6 Management of Properties Chapter 3

Fig. 3.21 UML sequence diagram of the mechanism of property retrieval and required actions within the MOSES property management. The procedure is shown in a simplified way. Details in the actual implementation may deviate slightly.

(SSSR) is required for the perception of aromaticity. Here, the principle of calculation on demand is illustrated. At that point when the calcarom objects needs the SSSR its derivation is triggered by the same mechanisms as described before. The initial application object needs not to know about the details of the perception of aromaticity. In an older system such as liberos95 the aromaticity perception would have failed if the perception of the SSSR had not been performed before.

80 Chapter 3 3.7 Current Status of the Calculator Sublibrary

3.7 Current Status of the Calculator Sublibrary

A main intention of the MOSES development was to create an integration platform for all calculation methods in our group. In particular, the concept of Calculators and their abstract design provides the means of incorporating existent calculation modules that bear years of development and testing and can not easily be rewritten. Consequently, the original code bases for CORINA and SURFACE have been interfaced to MOSES, i.e., separated Calcula- tors have been written that forward request for the corresponding properties to the original functions. This requires the conversion of the RAMSES data structure into the data structures used by the old programs and writing the results back as MOSES properties. Other calculation methods could have easily been reimplemented based on the RAMSES data structure. A set of calculation methods were subject of the present work and have com- pletely be reworked. Tab. 3.2 gives a summary of the main Calculators currently available referring to the previous stand-alone programs as described in Section 2.2.

Table. 3.2 Former stand-alone programs that are available inside of MOSES as Calculators. CORINA and SURFACE have not been rewritten. The related MOSES Calculators are just interface classes to the existing code.

program MOSES Calculator rewritten comment

CORINA 3.2 CalcCorina3d no 3D structure CalcSSSR no ring perception PETRA 3.2 CalcTotalCharges yes total partial charges, residual electronegativities CalcPEOE yes PEOE and modified PEOE (see Section 4) CalcMHMO yes modified HMO (see Section 4) CalcMeanPolariz yes mean molecular polarizabilities (see Section 3.8.2) CalcEffectPolariz yes effective atomic polarizabilities CalcAromaticity yes perception of aromaticity (see Section 3.4.4) AUTOCORR 2.4 Calc2DAutocorr yes top. autocorrelation Calc3DAutocorr yes 3D autocorrelation SURFACE 1.1 CalcSurface no generation of molecular surface and calculation of spatial autocorrelation

81 3.8 Applications Chapter 3

3.8 Applications

3.8.1 MOSES::WORKFLOWMANAGER

As has been discussed previously, the calculation of properties and descriptors is often or- ganized by workflows giving the sequence of calculation steps (see Section 2.2 and Sec- tion 3.2). The MOSES library has been designed in a way to assist with the generation and manage- ment of such workflows and the attached meta-data. A software library is quite an abstract construct and can not be used directly. Therefore, in order to prove the usability of the implemented functionality, in particular with respect to property management and calcula- tors, a prototypic GUI application has been developed for the generation and management of workflows in the field of QSPR: MOSES::WORKFLOWMANAGER. As an example, a simple workflow will be developed with the MOSES::WORKFLOW- MANAGER that is schematically shown in Fig. 3.22. The workflow starts with a dataset containing a certain class of compounds. A frequently used set of prefiltering rules is ap- plied in order to reduce the dataset to drug-like compounds [112]. In the subsequent step, topological autocorrelation vectors should be calculated for all compounds remaining from the prefiltering.

Fig. 3.22 Simple workflow for demonstrating the prototypic MOSES::WORKFLOWMANAGER application.

The basic GUI design of the MOSES::WORKFLOWMANAGER is shown in Fig. 3.23. In the background two main parts can be seen, the current workflows in a tree view on the left-hand side of the window (Fig. 3.23 a) and a workspace view occupying the right- hand side (Fig. 3.23 b). In the current implementation, workflows have two root items, (i) model building and, (ii) prediction. A workflow can be developed step-by-step by adding new workflow items to existing ones. A workflow step can be activated by a mouse click triggering the generation of a corresponding widget in the workspace. This widget provides options for configuring the workflow step or for retrieving further information. In the descriptor widget displayed in the workspace in Fig. 3.23 (b), a selection box is

82 Chapter 3 3.8 Applications

Fig. 3.23 Basic GUI design of the prototypic MOSES::WORKFLOWMANAGER application. Shown are the main widgets: (a) representation of workflows, (b) workspace window, (c) dialog for the configura- tion of calculation modules, and (d) dialog showing informations on specific calculation modules. provided giving all molecular descriptors available in MOSES. This list is generated from the informations given in the corresponding CalculatorInfos with the help of the PropertyMan- ager. A dynamic generation of such dialogs has the advantage of automatically including new functionality available without the need of updating any dialog manually. For each descriptor, a dialog window is provided allowing the adaption of control pa- rameters available for the corresponding calculation module. Again, this dialog is generated from the informations provided by the property management and is thus always up-to-date. The descriptor configuration dialog is shown in Fig. 3.23 on the left-hand side (c). Along with the set of available control parameters, citations to publications related to the calcula-

83 3.8 Applications Chapter 3

tion modules are stored. These can be displayed also by a separated dialog window as shown in Fig. 3.23 on the right-hand side (d). The implementation of the workflow steps is achieved by subclassing an abstract class, WorkflowStep. For each workflow step, all steps have to be specified that can potentially be attached to it. Class WorkflowStep provides an abstract method process() that is called when a workflow is executed. This method has to be overwritten by derived steps in order to implement the actual functionality. Thus, new workflow steps can easily be added to the system. The basic design of the MOSES::WORKFLOWMANAGER as described here has been extended by J. ZHANG and was successfully applied to the prediction of ionization constants, pKa [113].

3.8.2 Polarizabilities Through a Zero Order Additivity Scheme

As a second application of MOSES, the calculation of mean molecular polarizabilities will be described in this section. Electrons in a molecule react to the perturbation through an external electrical field by a distinct displacement. This results in an induced molecular dipole moment, µ~ ind., propor- tional to the applied electrical field strength, E~ ,

µ~ = α E~ (3.1) ind. ·

The proportionality constant, α, is called molecular polarizability. Polarizability is actually a tensor property but can be measured only as a mean value. Therefore, the experimental polarizability is a scalar value and termed mean molecular polarizability. It is known since long time that the molecular polarizability can be calculated from con- tributions of the constituing atoms [114, 115]. However, additivity of atomic increments is only valid if the environment of an atom is taken into account, i.e., for each hybridization state one atom type has to be defined. MILLER and SAVCHIK proposed a general procedure for calculating mean molecular polarizability from atomic hybrid components (ahc) [116] based on a theoretical interpretation of variational perturbation calculations according to Eq. 3.2. 2 4 α(ahc) = τ (ahc) (3.2) N A " A # X

84 Chapter 3 3.8 Applications

with, N, the number of electrons in a molecule and, τA(ahc), the atomic hybrid component of atom, A. Later, KANG and JHON showed that molecular polarizabilities can also be obtained by simple additivity of atomic hybrid polarizabilities (ahp) [117]

α(ahp) = αA(ahp) (3.3) A X More recently, MILLER reviewed additivity methods for the calculation of molecular polarizability and derived a new set of parameters for both schemes (Eq. 3.2 and Eq. 3.3) using a dataset of about 386 compounds [118]. The concept of additivity of atomic contributions can be applied to the prediction of sev-

eral other molecular properties, such as log P o/w (e.g., [119]), molar refraction (e.g., [120]) or even average drug-receptor binding energies [121]. On the other hand, for other proper- ties atomic additivity breaks down and increments representing larger structural features are required. The most prominent example here is the calculation of heats of formation [122]. Increment types can be defined on the level of bond or group contributions. Putting this in a more general context, additivity schemes can be applied based on atomic, bond or group contributions referred to as 0th-, 1st- and 2nd-order approximation to additivity rules [123]. In an ongoing project aiming at the prediction of heats of formation by additivity of group contributions [124] we developed a generalized framework for the fragmentation of com- pounds according to one of the given levels of approximation and the derivation of fragment contributions to a given property [125]. This application, the ADDITIVITYMODELBUIL- DER, is based on the MOSES library and was one of the first programs making extensive use of the functionality described in the previous sections. The fragmentations are performed in separate Calculators and the ADDITIVITYMODELBUILDER is using the property man- agement for retrieving the resulting frequency maps of fragments. Thus, the two different tasks, fragmentation and model building, can completely be decoupled and be reused in dif- ferent contexts. For instance, the same fragmentation scheme applied for the model building can then be used in a new Calculator for the target property calculated by the model ob- tained. The derivation of a quantitative model is achieved by application of the multiple linear–regression analysis module from the mathematics sublibrary. The usefulness of the MOSES library should be demonstrated by this application deriving a model for predicting molecular polarizabilities on the 0th-order approximation of additivity. The dataset published by MILLER [118] with 386 structures and experimental molecular

85 3.8 Applications Chapter 3

polarizabilities was revised. For several compounds more than one experimental value was given. For those compounds the average value was taken. Furthermore, some structures could not be deduced from the name given in [118]. After the cleanup, a dataset of 289 compounds was obtained (all structures are given in the Appendix, Tab. B.1, p. 195). The fragmentation scheme applied was basically the same as in MILLERS work with some minor modifications. The atom types for nitrogen, oxygen, and sulfur atoms with two pure π-electrons (PI2) were discarded because these atom types are underrepresented in the dataset. For the hypervalent species new atom types were introduced. As has been shown in Section 3.4.3, hypervalent sulfur and phosphorus are positively charged in order to obey the octet rule. Therefore, it seemed appropriate to represent these atoms types by distinct fragments. The model building was performed with full leave-one out cross-validation, i.e., each compound was excluded from the training dataset and the model was built for the remaining compounds. With this model, the value for the excluded compound was predicted. The cross-validated model for the entire dataset is graphically presented in Fig. 3.24. The quality of the model is in good agreement with the results of the original paper (r2 = 0.998, σ = 0.49 Å3). The increment values obtained in the present work are listed in Tab. 3.3 and are compared with the values from [117] and [118]. The three series agree well with some exceptions. The values for nitrogen show the largest deviations and are considerably smaller in the present study. A reasonable explanation for that could not be found since the resulting models are of similar quality. It might be due to the slightly larger value found for hydrogen that compensates the smaller values for nitrogen. In general, the values for the first-row elements are slightly smaller than found in the older studies. The values for the second-row elements obtained in the present study seem to be more reasonable compared with the values from MILLER. The increment values for elements from the same row in the PSE should decrease with increasing core charge. However, in the Miller scheme, the value for phosphorus (PTE) is considerably smaller than the values for sulfur contradicting this trend. In the present study, the value for PTE is much larger and slightly larger than the values for sulfur except of STR4. The increments for the hypervalent atom types are smaller compared to the standard types. This can be expected because the hypervalent atom types are electron-deficient. However, from that point of view one might have expected smaller values of SIV and SVI.

86 Chapter 3 3.8 Applications 40 ) 3 30 20 calc. molecular polarizabilities (Å 10 y = 0.992 * x + 0.12 n = 289 r2 = 0.994 σ = 0.576 0

0 10 20 30 40

expt. molecular polarizabilities (Å3)

Fig. 3.24 Calculated versus experimental mean molecular polarizabilities. All data points are obtained by leave-one out cross-validation.

In summary, it can be said that a successful application of the MOSES library to the problem of predicting molecular polarizabilities has been proven. Certain functionalities of sublibraries can readily be combined in order to achieve a desired task. The developed ADDITIVITYMODELBUILDER is implemented in a general way and can be applied to vir- tually any molecular property to be modeled by the approach of additivity schemes. The generation of models can be achieved with minor demands of computational resources. For instance, the given example of the generation of 289 leave-one out cross-validations has been completed within about twelve seconds CPU-time on a standard Linux work-station.

87 3.8 Applications Chapter 3

Table. 3.3 Atomic hybridization state increments for the calculation of mean molecular polarizabilities. Val- ues obtained by MLRA in this work using a dataset extracted from [118] are compared to values from literature.

3 αA(ahp) (Å )

symbola hybrid this work MILLER [118] KANG/JHON [117] H σ 0.412 0.387 0.386 CTE tetetete 1.002 1.061 1.064 CTR trtrtrπ 1.364 1.352 1.382 CBR trtrtrπ 1.700 1.896 1.529 CAR trtrtrπ – – 1.230 CDI didiππ 1.380 1.283 1.279 NTE te2tetete 0.669 0.964 1.094 NPI2 trtrtrπ2 – 1.090 1.090 NTR2 tr2trtrπ 0.573 1.030 1.030 NDI di2diππ 0.942 0.956 0.852 OTE te2te2tete 0.579 0.637 0.664 OTR4 tr2tr2trπ 0.530 0.569 0.460 OPI2 tr2trtrπ2 – 0.274 0.422 F σ 0.186 0.296 – PTE te2tetete 2.864 1.538 1.743 PV tetetete 1.791 – – STE te2te2tete 2.741 3.000 – STR4 tr2tr2trπ 3.643 3.729 – SPI2 tr2trtrπ2 – 2.700 – SIV te2tetete 2.803 – – SVI tetetete 2.648 – – Cl σ 2.215 2.315 – Br σ 3.206 3.013 – I σ 5.411 5.415 – a TE = tetrahedral; TR = trigonal; DI = diagonal; PI = pure π-orbital; BR = branched; AR = aromatic; IV, V, VI =

88 Chapter 4

Quantification of Atomic Partial Charges

Erich with his Ψ can do calculations quite a few. But one thing has not been seen just what does Ψ really mean. — Walter Hückel, translated by Felix Bloch

4.1 Introduction

The electron distribution of a compound determines its observed properties and its behavior in chemical reactions and biological actions. The laws of quantum mechanics provide the means of obtaining the electron distribution in the form of electronic wavefunctions. More exactly, the square of the wavefunction at a given point, r, gives the probability of locating an electron at that point ρ(r) = Ψ(r) 2 (4.1) | | The electronic wavefunction, Ψ(r), is obtained by finding solutions to the Schrödinger equa- tion. As the Schrödinger equation can not be solved exactly for many-electron systems the wavefunction has to be approximated. Modern quantum chemistry provides various methods for finding such approximate solutions on various levels of accuracy [126]. Nevertheless, chemists have always had problems in thinking of electrons in the form of wavefunctions and probabilities of finding an electron at a position in spacea. Chemists are

asee the quotation of Walter HÜCKEL, the brother of Erich HÜCKEL, the founder of the HMO theory. Even though W. HÜCKEL was professor of chemistry, obviously, he did not get used to the representation of the electron density by Eq. 4.1 89 4.1 Introduction Chapter 4

used to implicitly assigning the electron density around an atom to it and dealing with the electron excess or deficiency compared to the isolated atom. This qualitative assignment of electron density to an atom is based on its electronegativity. An electron excess is denoted as a negative, a deficiency as a positive partial charge. Partial charges in such a manner are treated as if they were located at the atomic nucleus. Atomic partial charges are used by chemists to qualitatively assess basic properties, such as solubility by the amount of a compound’s polarity, intermolecular interactions by identi- fying sites of Coulomb interactions or hydrogen bond acceptors/donors or sites of reactivity, such as nucleophilic or electrophilic attack. However, partial charges lack any theoretical definition nor are they directly accessible by experiment. With other words, a quantum me- chanical operator with the atomic charge as expectation value does not exist and there is no experimental means to measure an atomic charge directly. The question is therefore, how to reduce the electron density distribution that is a four-dimensional function (the probability of finding an electron at a given point in space), to a meaningful set of distinct atomic point charges at the locations of the atomic nuclei. First of all, it is obvious from the missing physical definition that there is not a universal way of determinating partial charges. Charges obtained by a distinct calculation method must be seen as a tool for solving the problem at hand. Whereas modeling of receptor-ligand interactions may take advantage of charges obtained by fitting the electrostatic potential, those charges may fail in modeling reactivity. Basically, two major approaches can be applied in generating atomic charges. If the molecular wavefunction is available from quantum mechanical calculations a variety of methods can be used for analyzing the wavefunction in order to obtain charges [127]. As quantum mechanical calculation are often to time-consuming to be applied, several empir- ical methods have been developed to derive charges from the molecular structure. Most of the empirical methods are based on the concept of electronegativity and its equalization (see Section 2.7.1 and [128]). In the present chapter, the development and parametrization of a new empirical charge calculation scheme will be presented. This work is based on the well-established method of partial equalization of orbital electronegativities (PEOE) that was reviewed in Section 2.7.2.

90 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

4.2 Development of a Combined PEOE/HMO Calculation

Both, σ- and π-charges calculated with PEOE/PEPE have been shown to give a realistic representation of the charge distribution in organic molecules [53, 57], nonetheless, only the PEOE procedure has a well-defined theoretical foundation. The PEPE scheme lacks this basis and has not found broad application. Therefore, a new calculation scheme has been developed and is presented in Section 4.2.1. This is based on the Hückel Molecular Orbital (HMO) theory and has been modified to include inductive effects that are normally not accounted for. The HMO theory has initially been developed for unsaturated hydrocarbons. In order to be able to treat heteroatoms as well, parameters for those atom types have to be found. Here, the question arises on which target property such a parametrization should be based. As said before, charges are neither clearly defined nor directly accessible via experiment. Therefore, the parametrization can either be based on an observable property that is tightly related to a molecule’s charge distribution or can be based on charge values that have a sound theoretical foundation and have been shown to have practical relevance. After a discussion of this problem in Section 4.2.3, two models are presented in this chapter: one based on molecular dipole moments and one based on charges obtained with natural population analysis (NPA) from wavefunctions calculated with density functional theory (DFT).

4.2.1 Substitution of PEPE by a Modified Hückel MO Treatment

The PEPE calculation scheme should be substituted by a calculation method with a more profound basis. As the RAMSES data structure provides the information about conjugated π-electron systems, the application of the Hückel Molecular Orbital (HMO) theory [129,130] was the most proximate choice. Actually, the separation of σ- and π-electrons embodied in the RAMSES molecular representation (see Section 3.4.2) is based on the approximation formulated in the Hückel theory. The advantage of the HMO theory is that it has the sound theoretical foundation of quan- tum mechanics but is still simple enough to be handled without demanding a high computa- tional effort and is thus applicable to high-throughput calculations as required in the current context. Nevertheless, the Hückel method was developed for unsaturated hydrocarbons and

91 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

must therefore be modified to be applicable for heteroatoms also. In the present work the ex- tension of the HMO method to heteroatoms is applied as outlined by STREITWIESER [129]. Furthermore, the complete neglect of σ-π interactions gives unrealistic results for com- pounds having strongly polar groups attached to a π-system. Accounting for inductive ef- fects is achieved by a modification of the classical HMO method by including the σ-charge distribution obtained by the PEOE calculation method into the Hückel calculation scheme (Section 4.2.1.2).

4.2.1.1 Hückel MO Theory

In the HMO theory, as with all kinds of quantum mechanical calculations, approximated solutions to the Schrödinger Equation have to be found

HΨ = EΨ (4.2)

Instead of treating all electrons of a given molecule, only π-electrons are considered and the wavefunction representing a solution to Eq. 4.2 is taken as a linear combination of the constituting carbon 2pz orbitals (linear combination of atomic orbitals, LCAO)

n

Ψ = crϕr (4.3) r X The coefficients, cr, can be determined by applying the variational principle that states that any wavefunction, Ψ, gives an energy of the system, ε, that can never be below the true energy, E0, ΨHΨdτ ε = E (4.4) Ψ2dτ 0 R ≥ Finding the lowest energy is thus a problemR of minimizing ε with respect to the coefficients, cr, ∂ε = 0 (4.5) ∂cr

92 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

Substituting Eq. 4.3 into Eq. 4.4 results in

crϕr H crϕr dτ Z r ! r ! ε = X X2 (4.6)

crϕr dτ r ! Z X

crcs ϕrHϕsdτ r s = X X Z (4.7) crcs ϕrϕsdτ r s X X Z The HMO theory now introduces some formalisms and approximations simplifying the treatment of Eq. 4.7. For the integrals in Eq. 4.7, symbolic expressions are introduced

H ϕ Hϕ dτ (4.8) rs ≡ r s Z S ϕ ϕ dτ (4.9) rs ≡ r s Z

Srs is called the overlap integral. From its definition, S is equal to the normalization condition if r = s and thus Srr = 1. For atomic orbitals centered on different atoms orthog- onality is assumed resulting in Srs = 0.

Hrr is called the Coulomb integral as it represents the energy of an electron in the field of the atom that its basis orbital is located at. Hrs represents the energy of the interaction between two orbitals and is referred to as the resonance integral. For both types of integrals new symbols are introduced that play an important role in the HMO treatment

H α (Coulomb integral) (4.10) rr ≡ H β (Resonance integral) (4.11) rs ≡

Dealing with the symbolic terms and approximations introduced above has the advantage that neither the exact form of the Hamiltonian operator, H, nor the form of the atomic or- bitals, ϕ, has to be known. On the other, hand results are obtained only as relative quantities nevertheless being internally consistent.

93 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

In substituting the symbolic terms and approximations into Eq. 4.7 and differentiating with respect to each coefficient, cr, a system of n linear equations is obtained

c (α ε) + c β + . . . + c β = 0 1 − 2 12 n 1n c1β21 + c2(α ε) + . . . + cnβ2n = 0 . .− . . (4.12) . . .. . c β + c β + . . . + c (α ε) = 0 1 n1 2 n2 n − Eq. 4.12 is leading to the Hückel matrix for the considered π-electron system

α ε β . . . β − 12 1n  β21 α ε . . . β2n  − (4.13) . . .. .  . . . .     βn1 βn2 . . . α ε  −    The Hückel matrix has n eigenvalues which are the energy values of the resulting molecular orbitals (MOs) and n corresponding eigenvectors which are the orbital coefficients of the basis orbitals contributing to the MOs. Computationally, the eigenvalues and eigenvectors to the Hückel matrix are determined by performing a matrix diagonalization resulting in the vector of energy eigenvalues and the coefficient matrix. In practice, the Hückel matrix is further simplified by defining all resonance integrals for two atoms that are not having a σ-bond in common are set to zero. Furthermore, as the real values for α and β are not known nor necessarily have to be known, the value of α, the energy of an electron in a C-2pz orbital, is set to the origin of the internal energy scale. The value for the resonance integral between two C-2pz orbitals is defined to be one. Finally, having found the coefficients for the molecular orbitals, the charge distribution in a given π-system can be determined by performing a simple population analysis, i.e., sum- ming up the fractions of electrons an atomic orbital is contributing to all molecular orbitals. As the probability of finding an electron in space is defined as the square of the wavefunction and the LCAO approach has been applied, the electron density for the molecular orbital, j, is obtained by 2 2 2 2 2 Ψj = cjrϕrdτ = cjr ϕrdτ (4.14) r r Z Z X X Z 2 The integral, ϕrdτ, over all space is equivalent to the normalization condition for the atomic orbital,Rϕr, and therefore equal to one.

94 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

Thus, the π-charge, qπ, of atom r, is obtained by

Nocc q = ν ν c2 (4.15) π,r r − j jr j=1 X

with j running over all occupied molecular orbitals, νr, the occupation of the contributing atomic orbital, and νj, the occupation of molecular orbital j.

Heteroatoms So far the HMO method as outlined above is restricted to unsaturated hy- drocarbons. However, the extension to heteroatoms is achieved as described following. The

LCAO approach applies equally if instead of using C-2pz orbitals a corresponding orbital of a heteroatom is assumed. In having ignored the exact form of a C-2pz orbital being rep- resented by the Coulomb integral αC-2pz , an orbital participating from a heteroatom X can be represented by a value for the corresponding Coulomb integral αX. An analogous con- sideration applies to the resonance integrals between carbon and a heteroatom, β CX, or also between two heteroatoms, βXY.

As energies are taken as relative quantities expressed in units of β CC the following defi- nitions are applied [129]

αX = αC2pz + hX βCC (4.16)

βXY = kXY βCC (4.17)

With the definitions of αC-2pz = 0 and βCC = 1 the terms αX and hX can be used synony-

mously; the same does apply for βXY and kXY. While some authors are using the hX and kXY terms [131, 132], in the present work solely the α/β terms are used. The Coulomb and resonance integrals for heteroatoms can not be derived on a theoretical basis. They have to be determined to give best fit to external properties (this point will be discussed in greater detail in Section 4.2.3). Several authors have published Hückel parame- ters that are given in Tab. 4.1 (p. 97). STREITWIESER [129] summarized the early efforts in the application of the HMO theory and gave a set of "recommended" parameter values. He stated that these values should be applied in semi-quantitative work only and that for quan- titative applications a refinement is required. A more comprehensive set of parameters was derived by VAN-CATLEDGE [131] applying a procedure of extracting the parameters from Pariser-Parr-Pople calculations. This set can be considered to be the most comprehensive one published including all combinations of heteroatoms important in organic chemistry.

95 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

More recently, ABRAHAM and SMITH [132] published two sets of Hückel parameters opti- mized with experimental dipole moments and with π-charge densities obtained from CNDO calculations. Both parameter sets are given in Tab. 4.1. Concerning the sign of the given values (and values for Hückel parameters in general) it should be kept in mind that both Coulomb and resonance integrals are actually negative quantities representing the energy released when an electron is brought into a given orbital and the interaction energy of two orbitals, respectively. Even though the energy scale is taken not in absolute terms it is clear from Eq. 4.16 that positive values for α represent more negative values for a Coulomb integral, i.e., a tighter binding of an electron in that orbital compared to an C-2pz orbital. For β, a value of less than one indicates a less effective interaction between two orbitals compared to a CC double bond.

96 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

Table. 4.1 Parameters published by other authors for treating heteroatoms within the HMO method: Coulomb integrals, α, and resonance integrals, β, for a series of CX-π-bonds. The number of electrons an atom is contributing to the π-system is indicated by a single or a double dot.

ABRAHAM and SMITH [132]

a b STREITWIESER [129] VAN-CATLEDGE [131] µ πCNDO atom type Coulomb integrals (αi,0) . C. 0.00 0.00 0.00 0.00 ..N 0.50 0.51 0.33 0.27 N. 1.50 1.37 1.48 1.97 ..O 1.00 0.97 0.31 0.59 O. 2.00 2.09 1.69 1.45 ..P – 0.19 – – P. – 0.75 – – ..S – 0.46 – – ..S – 1.11 – – ..F 3.00 2.71 1.67 1.39 ..Cl 2.00 1.48 1.03 – Br 1.50 – 1.03 – bond type resonance integrals (βij) . . C. – C. 1.00 1.00 1.00 1.00 C. – ..N 1.00 1.02 0.72 1.17 C. – N. 0.80 0.89 0.71 1.28 C. – ..O 1.00 1.06 1.05 1.60 C. – O. 0.80 0.66 0.59 0.79 C. – ..P – 0.77 – – C. – P. – 0.76 – – C. – ..S – 0.81 – – C. – ..S – 0.69 – – C. – ..F 0.70 0.52 0.54 0.56 C. – ..Cl 0.40 0.62 0.29 – C – Br 0.30 – 0.29 – a parameters derived by fitting to experimental dipole moments in [132] b parameters derived by fitting to π-charges from CNDO calculations in [132]

97 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

4.2.1.2 Modified HMO: Accounting for the Inductive Effect

From all applied approximations in the Hückel theory the complete neglect of σ-π inter- actions constitutes certainly the most severe oversimplification (see the discussion below). Electron-withdrawing neighbors attached to a π-center may induce a significant positive partial charge. This positive charge should result in an increased Coulomb attraction of a π-electron by the nucleus and requires hence an adaption of the corresponding Coulomb integral. In order to analyze this effect, functional groups have to be attached to a π-system having no direct π-interaction but exerting a significant inductive effect. Thus, electron-withdrawing groups such as nitro or nitrile groups are not suitable for an analysis of the inductive effect on π-systems. Neither atoms can be used with a high electronegativity but having free electrons that may interact with the π-system. The group of choice was the trifluoromethyl group as it has a high group electronegativity but is still small enough to avoid steric interactions. Ethene was used as model system having the most basic π-system. The influence of one or two trifluoromethyl groups and the resulting polarization of the π-bond was studied. As a strict separation of σ- and π-systems as is used within the RAMSES structure representation is quite artificial and not applied in quantum mechanical calculations, π-charge densities can not directly be obtained from quantum chemical programs. Thus conclusions have to be drawn from relative changes compared to analogous saturated compounds. Hence, for analyzing the σ-π interactions the group charges of compounds 1-4 have been calculated by natural population analysis (see Section 4.2.5). The structures and charge values are given in Fig. 4.1. The group charges of the trifluoromethyl groups are basically the same for the saturated and unsaturated compounds 1/3 with one and 2/4 with two substitutions. The trifluoromethyl groups are polarizing the attached alkyl moiety. For the saturated compounds 1 and 2 a pos- itive charge on the terminal methyl groups is found. Although bound to the electronegative groups the α-carbon groups attain a negative charge. This is certainly due to maximizing + + electrostatic interactions as the carbon skeleton has then a charge alternation, δ – δ−– δ . For the unsaturated counterparts 3 and 4 this polarization effect is pronounced. Due to the inductive effect exerted by an adjacent trifluoromethyl group the α-carbon atom be- comes electron deficient and attracts π-charge from the β-carbon atom. This amount of shifted charge is larger in the unsaturated system as π-electrons can more easily be polar-

98 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

1 2

3 4

compound group 1 2 3 4

CF3 0.0280 0.0192 0.0306 0.0187 1 − − C Hn 0.0192 0.1247 0.0699 0.2247 2 − − − − C Hm 0.0472 0.0862 0.1005 0.1872

Fig. 4.1 Model compounds for studying the influence of electron-withdrawing groups on the π-charge dis- tribution. The table gives the group charges of the corresponding groups in the model compounds 1-4. All charges were calculated by natural population analysis from B3LYP/6-311+G(d,p) wave- functions of the fully optimized structures. ized. Taking the sigma charges as constant in the saturated and unsaturated systems, the amount of π-charge shifted from the β- to the α-carbon atoms is approximately 0.05 due to the influence of one trifluoromethyl group. The problem imposed by the dependence of the Coulomb integral on the partial charge has been addressed as early as five decades ago by WHELAND and MANN [133]. In that pa- per it was proposed to perform several HMO treatments iteratively and to adapt the Coulomb integrals after each iteration step in order to achieve a self-consistent procedure. It was pro- posed to modify α0, the Coulomb integral for an uncharged atom, after each iteration step by

99 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

an increment being linearly dependent on the current atomic partial charge

αi = α0 + ∆αi (4.18) ∆α = ω q i · i STREITWIESER called this approach the ω-technique [134] according to the linear coefficient in Eq. 4.18 and later literature is referring to that term. Subsequently, the ω-technique was applied to account for the inductive effect by including the σ-charge in the second term of Eq. 4.18 [135–137]. In the present work, a deficiency of that approach was identified for alkyl substituted π-systems and was found to be especially serious for the calibration with DFT/NPA charges (see Section 4.5). The problem will be demonstrated with the example of toluene. Compared to benzene this compound is known to be slightly activated in terms of electrophilic substitu- tion having a higher reactivity in ortho- and para-positions. Each C – H-carbon atom in the ring attracts some charge from the attached hydrogen atom due to the higher electronegativ- ity of the carbon atom. This negative charge is less pronounced in the ipso-position because the methyl carbon atom is still less electronegative than the sp2-ring atom but releases less charge than an hydrogen atom. What does this mean for the Hückel calculation? In terms of the unmodified HMO method, the Hückel matrix is identical to that of benzene. Apply- ing the ω-technique and adding a fraction of the σ-charges to the Coulomb integrals has the effect that all unsubstituted ring atoms obtain a more negative value than the ipso-atom. The result is a cumulation of π-charge on the ipso-atom. In turn, the unsubstituted ring atoms are effectively deactivated and a charge pattern is obtained showing the ortho- and para-positions with decreased charge values compared to benzene instead of an increase in π-charge density. In order to overcome this problem, a new approach has been chosen based on electroneg- ativity instead of charge. This was motivated by a previous publication showing that residual electronegativities give a quantitative measure for the inductive effect [138]. Residual elec- tronegativities are obtained after completion of the PEOE calculation and had been related to the power of an atom in a molecule to further attract electron density. In setting up a new equation for ∆α the absolute values of the residual electronegativities had to be taken into account. Whereas in the previous approach a charge value of zero indicated that no polar influences are experienced by an atom, electronegativity values are almost always different from zero. Hence, the zero point for the electronegativity as measure

100 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

for the inductive effect had to be defined. From a survey of the literature, this zero point was defined to be the residual electronegativity of a methyl carbon atom in ethane. Even though in textbooks a slight +I effect is assumed for the methyl group [56], in more recent publications the polar substituent constants for a methyl group are given to be zero based on experimental data [139, 140] or on theoretical estimations [141, 142]. Consequently, ∆α for a given π-center, i, is calculated as the averaged difference of the

residual electronegativities, χσ,j, and the zero value, χσ,Cmethyl , for all atoms, j, of the first

topological sphere, Ni of atom i,

N 1 i ∆ασ π,i = fσ π (χσ,j χσ,C ) (4.19) − − · N − methyl i j X 4.2.1.3 Hyperconjugation

As will be shown in Section 4.5.3, σ-π interactions can well be modeled by Eq. 4.19 within the HMO approach. Nevertheless, the influence of an alkyl group on a π-system as observed in toluene was not well covered. As was mentioned above, the influence of an alkyl group should not be attributed to an inductive effect. Actually, the electron donating capabilities are hyperconjugative in nature. Therefore, another minor modification was introduced in the HMO procedure. As STREITWIESER described in [129], approaches have been developed to model hyperconjugation by treating appropriate groups as pseudo-π-centers and finding Hückel parameters as for other hetero atoms (see above). Such an approach was found to be hard to be applied here because of the RAMSES data structure. The special feature of RAMSES is the separate treatment of σ- and π-electron systems and the approach of using pseudo-π-centers would require a combination of both. Therefore, a further correction term was introduced for π-centers being attached to hyperconjugative alkyl group. This term was

set to be proportional to the σ-charge, qσ,j, on the adjacent alkyl carbon atoms, j,

nalkyl ∆α = f q (4.20) hyp,i hyp · σ,j j X One may argue that this correction will not give an increased electron density in the π-sys- tem as no charge is actually shifted from the alkyl group. Nevertheless, the expected charge distribution is obtained for the following reason: the applied modification results in a redistri- bution of charge from the ipso-carbon atom (in the case of toluene) to the remaining π-center.

101 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

As will be shown, the obtained charges give the established charge pattern. Therefore, this modification might be seen as a pseudo-model of the hyperconjugative effect.

In summary, the total correction for the Coulomb-integral, ∆αi, for a given π-center is calculated as

∆αi = ∆ασ π,i + ∆αhyp,i (4.21) −

4.2.2 Datasets

For the parametrization of the developed charge calculation procedure and for the study and comparison of other charge calculation methods, a dataset of 457 compounds was compiled. It was decided to use the molecular dipole moment as the experimental fitting property (see Section 4.2.3). Consequently, compounds were chosen in preference if an experimental gas- phase dipole moment was available. An initial set of molecules was taken from [143]. This set was extended with compounds from [144] and [145]. In addition, the dataset was aug- mented with structures for which no experimental dipole moment was available but that were needed for representing required structural features, such as the S – O σ-bond in sulfenic acid. The dataset was divided into one calibration and two test datasets. The calibration was performed stepwise considering the smallest number of atom types possible in each step. Consequently, the calibration dataset was populated with small mono-functional organic compounds covering the most common functional groups containing the elements nitrogen, oxygen and fluorine of the first-row, silicon to chlorine of the second-row, and . A detailed listing of all elements and their functional groups contained in the calibration dataset can be found in Tab. 4.2. All compounds are given in the Appendix (Section B.2, p. 202) along with the experimental and calculated dipole moments. One may note that some compounds are chemically unstable such as vinylalcohol that is much less stable than the tautomeric acetaldehyde. Nevertheless, these compounds have been included as they were required for representing an interaction of interest, in the case of vinylalcohol the inter- action of an oxygen lone pair with the π-system of a vinyl group. The test datasets consist of multi-functional organic compounds in order to demonstrate that the parametrizations obtained for compounds, having one isolated functional group each, can successfully be applied to multi-functional compounds. The first test dataset contains all twenty amino acids and three compounds having some medicinal relevance. The second

102 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

test dataset contains 48 small multi-functional organic compounds. A summary of the test datasets is given in Tab. 4.3 and all compounds are given in the Appendix (Section B.2 p. 212) along with the experimental and calculated dipole moments.

Table. 4.2 Calibration dataset. The entire dataset consists of 386 mono-functional small organic compounds. Given are the main compound classes (bold font) and subclasses (normal font). compound class no. of occurrence compound class no. of occurrence hydrocarbons 46 oxygen (cont.) nitrogen 77 10 amines 15 heterocycles 1 imines 4 fluorine 31 amides 9 silicon 17 nitriles 18 phosphorus 49 heterocycles 11 P(III) 33 N-O-linkage 14 P(V) 16 N-N-linkage 6 sulfur 50 oxygen 63 S(II) 37 alcohols 15 S(IV),S(VI) 13 aldehydes 6 chlorine 32 ketones 13 bromine 21 acids 6 386 ethers 12 P

Table. 4.3 Test dataset. The entire dataset of size 71 consists of two subdatasets representing bio-medicinal and multi-functional organic compounds, respectively.

compound class no. of occurrence compound class no. of occurrence test dataset I 23 test dataset II 48 amino acids 20 multi-functional cmpds. 48 medicinal 3 71 P

Quantum Mechanical Calculations For all structures of the entire dataset, 3D coordi- nates, calculated dipole moments and atomic charges by various charge calculation schemes were required (see below). In order to achieve a good accuracy with a modest effort in

103 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

computing power, calculations have been performed with the DFT (density functional the- ory) method of Becke [146] with the Lee-Yang-Parr correlation functional [147] (B3LYP). The B3LYP method has been shown to give dipole moments and partial charges in good agreement with those obtained by high-level ab initio calculations [148]. The medium sized 6-311+G** basis set was used for all calculations. This basis set is considered to be the smallest possible one that can reasonably be applied for a method taking electron correla- tion into account as the B3LYP method does [149] and allows the treatment of elements up to [150]. All calculations were performed with the GAUSSIAN 98 program pack- age [96]. Unfortunately, quantum mechanical calculations are still quite tedious and require many single steps. Because in the present work quite a large dataset was compiled and had to be processed by GAUSSIAN 98, an automated workflow was established based on a set of scripts written in the Perl programming language [151]. This workflow is shown as flow chart diagram in Fig. 4.2. Instead of manually setting up GAUSSIAN 98 input files structure by structure, a file containing the entire dataset can be input by a molecular editor and given in CTX format. The Z-matrices have thus not to be constructed by hand. As start geometries for the structure minimization the CORINA 3D structure generator [30, 31] was used. The disadvantage of that approach is that the structure can not be optimized on the highest symmetry possible. This should not pose any problem here as the results are not supposed to give highly accurate structures and energies but are supposed to be used for the parametrization of an empirical charge calculation method. For each structure a frequency calculation was performed subsequent to the optimization in order to identify geometries that were not local energy minima indicated by negative IR frequencies (NImag) [150]. Such structures had to be processed further by hand until a minimum was reached. Furthermore, even if a minimum is located it might happen that the structure is far from the global optimum and the dipole moment may then significantly differ from the observed one. Such cases had also to be treated manually. Following the optimization step, the tools for calculating the atomic charges were ap- plied. Finally, all data required were extracted from the GAUSSIAN 98 output files and converted back into CTX connection tables and property entries. The connection tables were generated from the 3D coordinates of the GAUSSIAN 98 output files. This was found to be

104 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

Fig. 4.2 Automated workflow for the quantum mechanical calculations. On the right-hand side, the programs used are given as comments.

105 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

necessary because GAUSSIAN 98 or other tools applied may reshuffle the order of the atoms. As the task of deriving the connection table from the 3D coordinates only is somewhat error- prone, hash codes [152] were generated from the initial structures and the structures obtained from the conversion in a last step. If the hash codes was differing the workflow had failed and the results could not be used further.

4.2.3 The Problem of Parametrization

The combined PEOE/MHMO procedure proposed for calculating partial charges is depen- dent on several parameters. All adjustable parameters are summarized in Tab. 4.4. As can be seen, the PEOE procedure has only one adjustable parameter in its original outline. Later, it was found that new parameters had to be introduced. This will be discussed in Section 4.5.

Table. 4.4 Summary of adjustable parameters in the proposed PEOE/MHMO scheme.

parameter method description

fdamp PEOE damping factor

fσ π MHMO factor for weighting the inductive effect − fhyp MHMO factor for the pseudo-model of hyperconjugation

αi,0 MHMO initial values for atomic Coulomb integrals

βij MHMO bond resonance integrals

The concept of expressing the electron density as atomic point charges lacks any theoret- ical foundation and any kind of charge calculation method is assigning charges more or less arbitrarily (see Section 4.1). This means that there is no unambiguous reference system that can serve as a target for fitting an empirical charge calculation scheme. Basically, two different routes can be followed when parameterizing a charge calculation scheme: (i) finding charges that can reproduced an observable property as good as possible, or (ii) fitting the charge values to charges obtained from other charge calculation schemes. The former route requires a property that can be measured for all structures under consid- eration. Furthermore, this property should only be dependent on the charge distribution. If other factors play a role, it may be hard to separate the influence of the different factors. For (ii) a variety of methods is available based on the analysis of the wavefunction obtained by quantum mechanical calculations. A method has to be found providing meaningful and

106 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

well-defined charges that are of practical relevance, otherwise a fitting would be of minor value. In Section 4.2.4, experimental properties are discussed that might serve as a fitting prop- erty, Section 4.2.5 will give a thorough evaluation of quantum mechanical charge calculation schemes. In Section 4.2.7, conclusions are drawn from these analyses and the parametriza- tions obtained are presented in the remaining sections.

4.2.4 Observable Properties

ESCA Chemical Shifts In photo-electron spectroscopy compounds are exposed to X-rays. The energy of the emitted electrons is recorded resulting in spectra reflecting the binding energy, Eb, of the electrons as the difference of the frequency of the X-rays and the kinetic energy of the emitted electrons E = hν E (4.22) b − kin Inner shell electrons are not involved in bonding and remain basically characteristic to the atom. Nevertheless, the observed energy of an inner electron liberated by X-rays reflect the atom’s environment in a compound as it has to pass through the valence electron shell. If an atom is bound to electronegative atoms it will be electron deficient. An inner electron will pass more easily through the electron shell having a higher kinetic energy in the spectrum compared to an electron located at an atom, bearing an negative partial charge. The photo- electron spectroscopy of inner shell electrons has therefore some relevance in the analysis of chemical binding and was termed ESCA (Electron Spectroscopy for Chemical Analysis) by SIEGBAHN et al. [153, 154]. ESCA shifts have been used for both, calibration [10, 155] and evaluation [9, 156–158] of empirical charge calculation schemes. However, it has to be emphasized that using ESCA shifts for the correlation with atomic charges is not without problems. When an electron is hit out of a compound its electron shell will reorganize influencing the kinetic energy of the emitted electrons. It is assumed that this energy contribution is constant if the electrons are localized, i.e., in saturated systems. For conjugated electron systems this is not the case and the relaxation is no longer a constant quantity. Therefore, the correlation of ESCA shifts with charges in unsaturated compounds has to be taken with care. Aiming at a broadly applicable charge calculation method the calibration has to be based on medium-sized datasets of several hundreds of molecules with a sufficient diversity in

107 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

atom types and structural features. Thus, by using an atomic property for the calibration requires several thousand experimental data to be found (assumed that small molecules with about ten atoms per compound are used). This is practically infeasible. Both, the uncertainty with respect to unsaturated compounds along with the problem of finding a large number of experimental data made calibrating our charge calculation scheme with ESCA shifts imprac- ticable.

NMR Shifts Another atomic property that is associated with the partial charge are nuclear magnetic resonance (NMR) shifts. Atomic nuclei having a spin unequal zero interact with electromagnetic radiation in an applied magnetic field. The resonance frequency is depen- dent on the electron shell which is shielding the nucleus from the external magnetic field. Analogous to the core binding energies discussed in the previous paragraph, NMR shifts reflect the chemical environment of an atom. Electronegative neighbor atoms cause a down- field shift through reducing electron density and a corresponding deshielding-effect. Unfortunately, the relationship between NMR shifts and partial charges is much less clear than with ESCA shifts. Even in saturated systems, chemical shifts are influenced also by anisotropic magnetic fields from atoms that can be several bonds remote to the atom under consideration. Furthermore, π-electron systems can induce magnetic fields that may be directed against or towards the external field experienced by the atom under consideration. As HUHEEY stated, initially, NMR shifts promised to give a good measure for partial charges but the expectations did not hold [159]. Nevertheless, it was found that for certain series of closely related compounds it may be justified to expect correlations between NMR shifts and atomic charges [160]. In such a manner, charges from the PEOE procedure were correlated with 1H-NMR shifts with some success [53]. Anyway, as a target property for calibration, NMR shifts were considered to be inappropriate.

Molecular Dipole Moments In principle, from their definition, dipole moments should be the best measure for reflecting point charges. For two equal point charges, q, with opposite sign separated by a distance, ~r, the dipole moment is defined as

µ~ = q ~r (4.23) ·

Thus, there is a direct unambiguous relationship between dipole moment and charge.

108 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

The dipole moment of a compound is the resultant of all bond dipoles. The bond dipole results from an unsymmetrical charge distribution due to differences in electronegativity and is thus related to the partial atomic charges. It is clear that the molecular dipole moment is also strongly dependent on the conformation. It can be calculated for uncharged molecules from the partial charges for all atoms, N, and the three dimensional atomic coordinates

N µx = i qixi PN µy = i qiyi (4.24) PN µz = i qizi

P 2 2 2 µ = µx + µy + µz q The dipole moment is a vector property but in most cases only the absolute value is con- sidered. This value was measured for a broad range of structures [161] with an experimental error within a range of 0.05 debye [162]. In order to avoid solvent effects included in the  experimental value that are not accounted for in the present charge calculation scheme only gas-phase dipole moments should be considered. On the other hand, dipole moments can directly be obtained from quantum mechanical calculations as expectation values from the dipole moment operator. This provides the op- portunity to enhance datasets with calculated dipole moments for compounds for which no experimental values are available. In order to prove that such calculated dipole moments agree with experimental values on the applied level of theory (see Section 4.2.2) and can be used as fitting property, the calculated values were correlated with experimental values. An observed dipole moment could be found for 310 molecules in the dataset of 457 structures presented in Section 4.2.2. The correlation found (Fig. 4.3) is quite good and it seemed justified to use calculated dipole moments if an experimental value is missing. The slope of 1.072 shows a systematic deviation and should be considered when mixing experimental and calculated dipole moments. The strongest deviation is found for disilox- ane that has an experimental dipole moment of 0.78 debye whereas it was predicted to have no dipole moment. This deviation is due to a differing conformation. Experimentally, the torsion angle 6 Si – O – Si was reported to be 144.1◦ [163]. In contrast, the energy mini- mum found by GAUSSIAN 98 gave a linear conformation explaining the vanishing dipole

109 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4 6 5 4 3 2 SCF dipole moment (debyes)

1 y = 1.072 * x + 0.036 n = 310 r2 = 0.973 σ = 0.192 0

0 1 2 3 4 5 6

expt. dipole moment (debyes)

Fig. 4.3 Correlation of calculated dipole moments obtained directly from the wavefunctions applying the dipole moment operator with observed values. moment. Some other points in Fig. 4.3 show significant deviations. Therefore, the mixing of experimental and calculated SCF dipole moments should be considered with care. Finally, it has to be stated that also the fitting to dipole moments is not without prob- lems. As the partial charge does not reflect anisotropy of the charge distribution, the center of charge may not coincide with the nucleus. This is always the case for atom types having free electron pairs. When we calculate the dipole moment based on the assumption of partial charges at the nucleus we therefore introduce a certain error in such cases. In order to over- come this problem KOLLMAN considered lone pair moments explicitly which is not trivial as the center of charge of the lone pairs is hard to determine [164]. A different approach was suggested by RAUHUT and CLARK using a multi-center point charge approach [165]. In-

110 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

stead of locating the partial charge at the nucleus, in that model nine point charges are located around an atom at the charge centroid of the lobes of the corresponding hybrid orbitals. For the sake of simplicity of our charge calculation scheme neither of such methods is attractive. Both approaches require additional three dimensional information but neither the PEOE nor the HMO method take the three structure into account. Therefore, we assume that the error introduced by electronic anisotropy is small with respect to the approximative nature of our charge calculation scheme. Furthermore, the molecular dipole moment as fitting property for charge calculation schemes has found broad acceptance in the literature for empirical methods [55, 132, 136, 144, 166] and even for the refinement of partial charges obtained from quantum chemical calculations [167–169].

4.2.5 Charges from Quantum Mechanical Calculations

The result of a quantum mechanical calculation for a chemical structure is obtained as a molecular wavefunction represented as a Slater determinant of molecular orbitals (MO). Molecular orbitals are constructed by linear combinations of atomic orbitals (LCAO) and the atomic orbitals in turn are usually constructed by linear combinations of primitive Gaussian- type functions [4]. Consequently, the chemical information is contained in a large set of linear coefficients representing the best solution of the probe wavefunction. Unfortunately, a wavefunction is of little practical value as the list of linear coefficients is hard or even impossible to be interpreted directly. Therefore, the information contained in the wavefunction has to be transformed or extracted to known chemical concepts. Beside global properties, such as the molecular dipole moment or the energies of the frontier orbitals, the derivation of atomic partial charges from the wavefunction has a long tradition as charges have such a broad acceptance in the notion of an organic chemist. The task of calculating atomic charges from quantum chemical calculations is easily de- fined. The electron density is given by the square of the wavefunction (see Eq. 4.1, p. 89). Any calculation algorithm has to find a way of formally partitioning the molecular electron density into atomic contributions and deriving from the atomic fragments the electronic pop- ulation of an atom. A variety of methods has been developed following one of three major routes: (i) analyzing the occupancy of the atomic orbitals that constitute the basis set of the LCAO-MOs (referred to as orbital-based methods), (ii) defining the spatial occupancy of an atom in a molecule and integrating the electron density over the thus defined space (topo-

111 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

logical methods), or (iii) determining the atomic point charges such that they give the best fit to the electrostatic potential (ESP) obtained directly from the wavefunction (ESP-fitted charges). It is not the aim of this section to give a comprehensive introduction to the methods mentioned above. Here, the basic concepts should be introduced and the methods should be analyzed in the light of the requirements that are imposed by the application of fitting our charge calculation method to quantum chemical charges. A comprehensive review of the methods mentioned above was given by BACHRACH [127]. Furthermore, WIBERG and RABLEN gave a general comparison of atomic charges obtained by different procedures of all three major approaches [170].

4.2.5.1 Methods

Orbital-based methods If all atomic basis orbitals in a molecule are orthogonal the atomic electron population is easily obtained by summing up the occupations of the atomic orbitals contributing to all molecular orbitals. This most basic way of defining atomic charges is called Coulson population analysis [171] and is applied to calculating π-charges from the HMO calculation as described in Section 4.2.1.1. Orthogonal atomic orbitals are of course hardly ever found when standard basis sets are applied to non-trivial molecules. Conse- quently, the overlap population between different atomic centers has to be apportioned to the constituting atoms. Within the Mulliken population analysis (MPA) [172] the overlap population between two atoms is simply divided evenly and assigned one part to each atom ignoring the nature of the atom types, i.e., differences in electronegativities or type of atomic orbitals. Because of its simplicity and ease of implementation, Mulliken charges are contained by default in the output of most QM program packages and have therefore found widespread use. Nev- ertheless, several shortcomings have been reported in the literature. The most severe one is that the Mulliken charges are strongly dependent on the basis set used [4]. Only minimal basis sets give reasonable charge values whereas inclusion of diffuse functions often results in counterintuitive charge values. Those basis functions have a large spatial extent and the centroids of their charge density may be nearer to other atomic centers but are still assigned to the atom the basis function is centered on. In spite of that, Mulliken charges have been used by other authors to calibrate empirical charge calculation methods (e.g., see [173]).

112 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

In order to overcome the problems associated with the MPA, WEINHOLD and co-workers developed the natural population analysis (NPA) [174]. The central step in this method con- sists of a transformation of the electron density matrix into sub-blocks, each one associated with an atomic center. These localized atomic basis sets are referred to as natural atomic orbitals (NAOs) and are orthonormal to each other thus solving the dilemma of dividing the overlap population. As REED et al. stated "the NAOs are intrinsic to the wavefunction, rather than to a particular choice of basis orbitals, and are found to converge smoothly to- wards well-defined limits as the wavefunction is improved" (citation from [174], p. 736). In other words, the NAOs represent an atom in its chemical environment in the molecule instead of a ’generic’ representation as with the input basis set. Furthermore, the quality of the NAOs (and the natural charges) should improve with better basis sets leading to better wavefunctions. This is a big advantage over the MPA that requires the application of small basis sets and consequently leads to worse representations of the electron density. The transformation to NAOs is only the first step in a series of transformations leading to natural hybrid orbitals (NHOs) or natural bond orbitals (NBOs). The methods are subsumed under the term natural bond orbital methods that were developed in order to transfer the information contained in the calculated wavefunctions into classical concepts more familiar to organic chemists such as hybridization or Lewis representation of molecular structures. The NBO methods have been reviewed in [94] and [95].

Topological methods This group of methods does not rely on the occupation of atomic orbitals as a means of defining an atom in a molecule but tries to define the spatial occupation of an atom in the molecule’s electron distribution. Such an approach requires algorithms for locating the atomic boundaries inside a molecule. After having located these boundaries, atomic populations are obtained by integrating the electron density over the entire space belonging to the atomic fragments. The most well-known algorithm of that group is the atoms in molecule (AIM) methodol- ogy developed by BADER [175]. As the first step, so-called bond critical points are located which are the points of minimal electron density between the bond paths. As a bond between two atoms is the path with the maximum electron density between them, bond critical points thus have a local minimum in one direction and two local maxima in the other directions. Following the steepest descent of electron density in the directions of the maxima leads to so-called zero-flux surfaces that represent the atomic boundaries. After all zero-flux surfaces

113 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

are known each atom is defined by a unique atomic basin. All quantities required for defining the atomic basins are quantum mechanically well-defined and no arbitrary assumptions have to be made. A different scheme for defining the atomic basins was proposed by HIRSHFELD [176]. This method relies on the definition of so-called proatoms which represent spherically sym- metrical neutral atoms in their isolated standard state. These proatoms are superimposed onto the location of the corresponding nucleus found in the molecule of interest. The partial charge is results from the difference between the occupation of the electron density of the proatom and the molecular electron distribution found in the volume element occupied by the proatom. Thus, Hirshfeld charges account for changes in the atomic electron distribution due to the chemical environment an atom experiences in a molecule. Hirshfeld charges are also referred to as Stockholder charges [177] and have found more attention and application in recent years (e.g., [178–180]).

ESP-fitted Charges Finally, a group of methods has found broad application providing charges that are determined by reproducing an observable property that can readily be cal- culated from the wavefunction: the electrostatic potential (ESP). The procedure is briefly summarized as follows: the molecule of interest is placed inside a point grid and the ESP at each point is calculated from the electron density. The ESP can also be calculated for a given point by placing a unit positive test charge at that point when a set of atomic charges is assumed. The values of the atomic charges are optimized in order to obtain a minimum root mean square error between the two ESP values for all points in the grid. The basic differ- ences in the proposed methods of this group are the topologies of the point grid and the way the minimization problem is solved. In general, the sample points are chosen to be outside of at least one van der Waals radius and are not too far from the molecule. The most important methods are the CHELP, the CHELPG and the Merz-Kollman (MK- ESP) schemes. The CHELP (charges from electrostatic potential) scheme [181] places points symmetrically on spherical shells around each atom. BRENEMAN and WIBERG have found that CHELP charges are sensitive to a molecule’s orientation in the grid and published a modified version (CHELPG) with an improved point sampling scheme in order to achieve rotational invariance [182]. The method of BESLER, MERZ and KOLLMAN [183] applies a point sampling on a series of nested Connolly surfaces. A further calculation scheme has been developed in the group of T. CLARK for the calculation of ESP-fitted charges with

114 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

the semi-empirical VAMP program that has been referred to as VESPA (Vamp electrostatic potential-derived atomic) charges [184]. A review focusing on the issue of this group of charge calculation schemes and a critical evaluation was given by SIGFRIDSSON and RYDE [185].

4.2.5.2 Requirements for a Reference Charge Calculation Scheme

The method for calculating reference charge values used for calibration has to fit to the following requirements: the charge values should

1. represent trends in electronegativities

2. be independent of conformation for topologically similar atoms

3. have reasonable absolute charge values

4. have a computationally robust implementation of the calculation algorithm

The first requirement is of major importance because the PEOE calculation scheme is directly based on electronegativity. If the charge values for an atom type do not relate to its electronegativity a fitting can not be possible. The background of the second point is that topologically similar atoms should have similar charges. As both the PEOE and the HMO method only consider the topological information of a molecule these methods are not able to reproduce charge variation due to conformational differences. The third requirement applies in general to any charge calculation scheme. Even though the absolute values of different calculation methods may vary considerably (see below) the charges should give a realistic picture of the electron distribution. Notably, dipole moments calculated with atomic charges of a given type should be in a similar order of magnitude compared to experimental dipole moments. Furthermore, bond polarities are expected to fit to chemical intuition and experience. The fourth requirement is of mere practical relevance. If a method is improperly implemented and the number of calculation failures is too large the intented chemical space may be underrepresented. The following sections will focus on the analysis of QM charge calculation methods on the background of the requirements given above.

115 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

4.2.6 Analysis and Comparison of QM Charges

It is obvious that it is practically impossible to perform an exhaustive comparison of all charge calculation methods mentioned above. Therefore, only a couple of methods were selected being representative for the three groups of methods. From the orbital-based methods the Mulliken as well as the natural population analysis was chosen. The Mulliken population analysis has found widespread use and the natural population analysis can be considered as the most advanced orbital-based method. For the topological methods the situation is more difficult. The AIM method is the only one that has found broad acceptance and that is implemented in the standard GAUSSIAN 98 program package [96]. Stockholder charges are less straightforwardly accessible as there is no standard implementation of the calculation procedure. WIBERG and RABLEN have given a detailed route for the calculation of Stockholder charges [170] but the implementation of the method was outside of the scope of this thesis. Therefore, the AIM charges are used here even though the published data about the Stockholder scheme (e.g., [177], [179]) give the impression that those charges would fit to the requirements formulated above. For the ESP-derived charges the three most popular methods (CHELP, CHELPG, and Merz-Kollman (MK-ESP)) are available within the GAUSSIAN 98 program package. For the known shortcomings of the CHELP scheme a decision had to be made between CHELPG and MK-ESP. The CHELPG implementation has been found to be only applicable for elements up to and thus the MK-ESP scheme remains the method of choice here since the scope of our calculation scheme should include also bromine. In summary, the present investigation will focus on charges obtained by the MPA, NPA, AIM, and MK-ESP methods. Furthermore, the original implementation of the PEOE/PEPE charges as implemented in the PETRA 2.6 program package [82] will be included here in order to set them in relation to the QM charges.

4.2.6.1 Rates of Failures for QM Charge Calculation Methods

Tab. 4.5 gives the percentage of failure for all compounds of the dataset described in Sec- tion 4.2.2. Only the implementation of the AIM method shows a poor stability and AIM charges could not be calculated for nearly half of the entire dataset of 457 compounds (a detailed summary of percentage of failures with respect to the various groups of compounds can be found in the Appendix, Tab. A.1, p. 189). The other methods were running success-

116 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

fully for all compounds. The AIM method did not fail equally for all groups of compounds of the dataset. In general, small molecules with atoms of the first-row failed less often whereas molecules containing hypervalent phosphorus or sulfur failed in about 70% of all cases. Fur- thermore, amide groups and heterocycles failed for more than 66% of the molecules. Finally, for none of the compounds of test dataset I, containing the amino acids and nucleotide bases, AIM charges could successfully be calculated. The analysis below should therefore be con- sidered with care concerning AIM charges, nevertheless, the general conclusions should be valid.

Table. 4.5 Rates of calculation failures for the entire dataset consisting of 457 organic structures.

MPA NPA AIM MK-ESP PEOE/PEPE no. of failures 0% 0% 190% 0% 0% percentage of failure 0% 0% 42.1% 0% 0%

4.2.6.2 Direct Comparison of Charges from QM Calculations

A systematic statistical analysis has been carried out for the intercorrelation of all methods by performing a pairwise linear regression of each two methods. As a consequence of the high failure rate of the AIM implementation the evaluation of those charges can be based only on a reduced number of 2446 atoms compared to the total number of 5088 atoms con- tained in the entire dataset. Tab. 4.6 shows the statistical parameters obtained from the linear regressions, y = b x + a; n is the number of data points and r2 the square of the correlation · coefficient. The intercepts, a, were zero for all correlations because only neutral compounds were considered and are thus not shown in Tab. 4.6. In order to additionally gain a visual impression of the quality of the correlations, Fig. 4.4 shows plots of the NPA charges versus the other calculation schemes. None of the methods shows a high correlation with one of the others. This is quite surprising as all methods actually aim to quantify the same property. The MPA charges correlate most poorly with all other methods. As can be seen from Fig. 4.4 (a) the Mulliken charges have the most extreme values for atoms for which the natural charges have values near zero. All those charges belong to carbon atoms being part of a π-system or being in neighborhood to a heterocyclic π-system. Such unrealistic charge

117 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

Table. 4.6 Statistical parameters for the direct comparison of the QM and PEOE/PEPE charges.

x y n r2 slope (b) NPA MPA 5088 0.442 0.477 AIM 2446 0.535 0.979 MK-ESP 5088 0.679 0.651 PEOE/PEPE 5088 0.563 0.238 MPA AIM 2446 0.139 0.710 MK-ESP 5088 0.297 0.600 PEOE/PEPE 5088 0.186 0.191 AIM MK-ESP 2446 0.488 0.401 PEOE/PEPE 2446 0.585 0.182 MK-ESP PEOE/PEPE 5088 0.576 0.304 values are certainly due to the known sensitivity of the MPA to diffuse functions in the applied basis set (6-311+G**). Fig. 4.4 (b) gives the impression that AIM and NPA charges are actually more highly correlated than the statistical parameters suggest. Also the slope is somewhat misleading as it would be estimated to be about 1.5 by visual impression. The reason for that can be found in the distribution of the carbon charges. Whereas the charges for most atom types are indeed higher by a factor of about 1.5, the carbon charges by AIM are very small or even slightly positive for atoms for which the NPA scheme gives negative values down to -1.0 (this point will be evaluated and discussed later in more detail).

118 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation 3 3 2 2 1 0 1 AIM charges Mulliken charges −1 0

y = 0.477 * x + 0 y = 0.979 * x + −0.001 n = 5088 n = 2446

−2 r2 = 0.442 r2 = 0.535 σ = 0.218 −1 σ = 0.365

−1 0 1 2 −1 0 1 2

Natural charges Natural charges (a) (b) 1.5 0.6 1.0 0.4 0.5 0.2 0.0 0.0 MK−ESP charges PEOE/PEPE charges −0.2

−0.5 y = 0.651 * x + 0 y = 0.238 * x + 0 n = 5088 n = 5088 r2 = 0.679 r2 = 0.563 σ = 0.182 −0.4 σ = 0.085 −1.0

−1 0 1 2 −1 0 1 2

Natural charges Natural charges (c) (d)

Fig. 4.4 Comparison of natural charge values with charges obtained by (a) MPA, (b) AIM, (c) MK-ESP, and (d) PEOE/PEPE.

4.2.6.3 Dipole Moments from QM Charges

As has been stated before, different charge values for the same atom type do not necessarily mean that one value is better than the other. In order to gain an estimate of the quality of the different calculation schemes, molecular dipole moments have been calculated with the charge values obtained by the various methods and compared with experimental dipole moments. The resulting plots are shown in Fig. 4.5. 119 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

y = 1.136 * x + 0.454 7 n = 310 r2 = 0.229 σ = 2.254 6 15 5 4 10 3 2 5 y = 1.112 * x + 0.439 n = 310 1

Natural charge based dipole moment (debyes) 2 Mulliken charge based dipole moment (debyes) r = 0.607 σ = 0.966 0 0

0 1 2 3 4 5 6 0 1 2 3 4 5 6

expt. dipole moment (debyes) expt. dipole moment (debyes) (a) (b) 6 10 5 8 4 6 3 4 2

y = 2.033 * x + 0.155 y 2 = 1.065 * x + 0.067

= 210 1 AIM charge based dipole moment (debyes) n n = 310 2 = 0.662 2 r MK−ESP charge based dipole moment (debyes) r = 0.966 σ = 1.511 σ = 0.215 0 0

0 1 2 3 4 0 1 2 3 4 5 6

expt. dipole moment (debyes) expt. dipole moment (debyes) (c) (d)

Fig. 4.5 Correlations of dipole moments based on charges obtained from various charge calculation methods with experimental dipole moments. Charge values were obtained by (a) MPA, (b) NPA, (c) AIM, and (d) MK-ESP. For (c) only 210 data points are shown due to missing AIM charges for a fraction of compounds.

The calculations have been performed with the coordinates of the fully optimized struc- tures (see Section 4.2.2) according to Eq. 4.24 (p. 109). Since the calculation of AIM charges failed quite frequently, only for 210 compounds both values were available, the experimental

120 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

dipole moment and the dipole moment calculated with AIM charges. As might be expected, the unrealistic charge values obtained by the MPA did not give good dipole moments. For instance, the largest value in Fig. 4.5 (a) belongs to 3,3,3- trifluoropropyne and corresponds to the most positive Mulliken charge value in Fig. 4.4 (a). Because the MPA seems not reasonably applicable with the basis set used, Mulliken charges are not considered further in the present analysis. Considering the calculation algorithm it is not surprising that MK-ESP charges give the best correlation (Fig. 4.5 (d)). Actually, the correlation is nearly as good as that obtained by dipole moments calculated directly from the wavefunction (Fig. 4.3, p. 110). This high agreement between dipole moments calculated from ESP derived charges with those from the wavefunction has been reported by other authors previously but with much smaller datasets [181, 186]. This finding is supported by the high correlation coefficient, r2 = 0.998, and a standard deviation of, σ = 0.059 debyes, for the much larger and diverse dataset used in the present work (the corresponding plot is given in the Appendix, Fig. A.1, p. 190). Charges from the other two calculation methods give dipole moments that reproduce the experimental values quite poorly. Whereas the AIM charges give dipole moments that are twice as large as the observed ones the slope of the NPA based correlation is near unity. From this point of view natural charges seem to give a more realistic representation of the charge distribution than the AIM charges despite of the often praised clear definition of the latter.

4.2.6.4 Distribution of Charges for Various Atom Types

As a next step, a further analysis of the distribution of the charge values was done. Tab. 4.7 gives a summary of the minima, maxima and the absolute ranges of the charge values ob- tained by the considered calculation methods.

Table. 4.7 Distribution of the charge values obtained by different calculation schemes. Given are the mini- mum, maximum, and the absolute range.

Natural charges AIM charges MK-ESP charges PEOE/PEPE charges min -1.270 -1.444 -1.108 -0.529 max 2.517 3.544 1.474 0.644 abs. range 3.787 4.988 2.582 1.173

121 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

AIM charges are occupying the largest range. It is surprising to find such a large range for a dataset of organic compounds. Also the NPA charges show a broad variation. The main difference is found with the maxima while the minima of the QM charges are quite close. A maximum of 3.544 for AIM and 2.517 for NPA can only result from strongly heteropolar bonds. Compared to the QM charges the PEOE/PEPE scheme produces much smaller absolute values. In order to gain a more detailed insight into the distribution of the charges for all elements in the dataset and to identify the groups causing the extrema found in Tab. 4.7, a series of histograms has been generated. Each one of Fig. 4.6-4.10 contains the histograms for two elements and the distribution for the charge values obtained by the considered methods. NPA charges are at the top and PEOE/PEPE charges at the bottom. The interval size for all histograms was set to 0.01 electron units. Each histogram contains the minimum value at the left-hand side, the median in the middle and the maximum value at the right-hand side of the plot (the median has been chosen here because it is a more stable measure with respect to a non-uniform distribution compared to the mean value). Common to all elements is that the MK-ESP charges are most broadly distributed while the other schemes have mostly one or only few main peaks. This can especially be seen for carbon. The reason is presumably found in the neglect of any topological constraints resulting in slightly different charges for atoms that the other schemes produce very similar charge values.

Hydrogen Atoms The large peak found in all plots for hydrogen atoms corresponds to the mean charge of hydrogen atoms in alkyl groups. Special notice should be taken here for the median of the AIM charges that indicates a low bond polarity for C – H bonds while the NPA and MK-ESP schemes assign significant positive charges to alkyl hydrogen atoms. This point will be discussed in more detail below. It is further striking that for AIM quite negative hydrogen charges are found that belong to silicon bonded hydrogen atoms (-0.70 ∼ -0.65). This group is also found to have the most negative charges for the NPA and MK-ESP scheme but the absolute values are much less negative (-0.26 -0.15 and -0.26 -0.16, ∼ ∼ respectively). A second group of negatively charged hydrogen atoms is due to P – H bonds, again with AIM assigning more negative charges (-0.56 -0.50) than NPA (-0.14 -0.01) or ∼ ∼ MK-ESP, where only a few values are negative and most other phosphorus bound hydrogen atoms receive a positive charge. On the other hand, the MK-ESP scheme assigns significant

122 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

hydrogen atoms carbon atoms

0.206 −0.217 120 −0.259 0.55 −1.168 1.156 60 400 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Natural charges Natural charges 80

250 0.014 0.036 −0.699 0.697 −0.777 2.248 40 100 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

AIM charges AIM charges 40 250 0.117 −0.115 −0.264 0.496 −1.011 1.195 20 100 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

MK−ESP charges MK−ESP charges

0.052 250 −0.029 1000 0.004 0.267 −0.154 0.508 100 400 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

PEOE/PEPE charges PEOE/PEPE charges

Fig. 4.6 Distribution of charge values for the elements in the investigated dataset. The interval size is for all histograms 0.01 electron units. The three values in each plot give the minimum (left-hand side), the median (middle), and the maximum (right-hand side). negative charges to hydrogen atoms bonded to carbon atoms that do not follow any rational pattern (e.g., one of the side chain hydrogen atoms in leucine has a charge of -0.12). For the PEOE/PEPE scheme none of the hydrogen atoms has a negative charge and the lowest values are found for silane (0.00). The maxima of the hydrogen charges follow pretty well the trend of electronegativity of the bonding partners.

123 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

nitrogen atoms oxygen atoms

−0.554 40 −0.624 25 −1.184 0.495 −1.27 −0.318 20 10 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Natural charges Natural charges 8 −1.086 20 −1.089 −1.437 0.438 −1.444 −0.454 4 10 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

AIM charges AIM charges

8 −0.666 −0.543 20 −1.108 0.83 −0.812 −0.086 4 10 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

MK−ESP charges MK−ESP charges

40 −0.289 −0.331 60 −0.525 0.071 −0.529 −0.119 20 30 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

PEOE/PEPE charges PEOE/PEPE charges

Fig. 4.7 Distribution of charge values for the elements in the investigated dataset (see caption of Fig. 4.6 for details).

Carbon Atoms The charges on carbon atoms occupy quite a large range for all calculation schemes. They are comparable for NPA and MK-ESP but having surprisingly large positive values for AIM. The most positive carbon atoms here with AIM charges of more than +2 stem from carbonyl fluoride and the central atom of diethyl carbonate. In general it is found that the carboxy group shows a very strong charge separation within the AIM scheme. All carbonyl carbon atoms have a charge of about +1.5 and the attached oxygen atoms more

124 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

fluorine atoms silicon atoms

−0.35 1.299 −0.641 −0.247 0.588 2.327 20 1.5 0 0.0 −2 −1 0 1 2 3 −2 −1 0 1 2 3

Natural charges Natural charges

25 −0.607 2.835 −0.839 −0.581 2.68 3.093 1.5 10 0 0.0 −2 −1 0 1 2 3 −2 −1 0 1 2 3

AIM charges AIM charges

15 −0.215 0.979 1.2 −0.452 0.097 0.374 1.474 0.6 5 0 0.0 −2 −1 0 1 2 3 −2 −1 0 1 2 3

MK−ESP charges MK−ESP charges 4

30 −0.17 0.163 −0.325 −0.05 −0.015 0.644 2 15 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

PEOE/PEPE charges PEOE/PEPE charges

Fig. 4.8 Distribution of charge values for the elements in the investigated dataset (see caption of Fig. 4.6 for details). than a full negative charge. Also the NPA and MK-ESP methods give a strong polarity for a carbonyl group with about +0.8/-0.6 (C/O). Not surprisingly, for all schemes including PEOE/PEPE the trifluoromethyl group generates the most positive carbon atoms (1.0/NPA, 1.7/AIM, 0.8/MK-ESP, 0.4/PEOE/PEPE). As observed for the hydrogen atoms, the most negatively charged carbon atoms in the NPA and AIM scheme are those attached to silicon followed by phosphorus. The MK-ESP charges do not follow this trend. Here the values look quite irregularly assigned with a minimum value for the carbon atom in tribromomethane

125 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

phosphorus atoms sulfur atoms

4 0.809 4 0.173 0.021 2.517 −0.275 2.293 2 2 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Natural charges Natural charges

1.598 4 0.206 4 1.166 3.544 −0.048 3.231 2 2 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

AIM charges AIM charges

4 0.005 −0.161 −0.344 1.314 4 −0.39 1.167 2 2 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

MK−ESP charges MK−ESP charges 6

0.031 8 −0.101 −0.155 0.472 −0.232 0.302 4 4 2 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

PEOE/PEPE charges PEOE/PEPE charges

Fig. 4.9 Distribution of charge values for the elements in the investigated dataset (see caption of Fig. 4.6 for details).

(-1.01) and a rather low value for the carbon atom in dibromomethane (-0.88). As bromine is commonly considered to be more electronegative than carbon and hydrogen, these charges can only be interpreted as a shortcoming of the MK-ESP scheme. Most values of the PEOE/PEPE charges on carbon atoms are close to zero and are rarely below -0.1 or larger than 0.3. The few cases where half a positive charge is found on carbon atoms are caused by polyfluorinated compounds. Considering the distribution of charges on carbon atoms in general, one dominating peak

126 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

chlorine atoms bromine atoms

−0.003 8 0.078 15 −0.375 0.156 −0.194 0.213 4 5 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Natural charges Natural charges 6

8 −0.204 −0.049 −0.693 0.037 −0.203 0.172 4 4 2 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

AIM charges AIM charges

−0.084 −0.029 15 −0.31 0.206 4 −0.226 0.172 2 5 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

MK−ESP charges MK−ESP charges

30 −0.118 −0.071

−0.236 0.017 10 −0.161 −0.015 15 5 0 0

−2 −1 0 1 2 3 −2 −1 0 1 2 3

PEOE/PEPE charges PEOE/PEPE charges

Fig. 4.10 Distribution of charge values for the elements in the investigated dataset (see caption of Fig. 4.6 for details). is found for all schemes. This can be attributed to carbon atoms of C – H bonds. For AIM this peak is slightly positive but rather close to zero, indicating a reverse C – H polarity compared with the other methods. This point will be discussed later in more detail. One might also notice three peaks in the plot for the NPA. Each peak from left to right belongs to carbon atoms with one, two, or three hydrogen neighbors: it is generally observed in NPA that a hydrogen atom is releasing a relatively constant amount of charge to the attached carbon atom irrespective of the carbon’s hybridization state. This is counterintuitive having in mind

127 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

the concept of varying electronegativity with respect to the hybridization state.

Nitrogen Atoms In the plots for nitrogen atoms (left-hand side of Fig. 4.7, p. 124), the peak for nitrogen atoms of nitro groups can be found for each method: for NPA and AIM such nitrogen atoms all have very similar values ( 0.48 and 0.44) while the MK-ESP charges ∼ ∼ show a broader variation (0.61 0.83). PEOE/PEPE assigns only a slightly positive charge ∼ ( 0.05). All other nitrogen charges are negative in all schemes. For the MK-ESP charges a ∼ broad variation is found again whereas NPA and PEOE/PEPE have a main peak belonging to unconjugated amino nitrogen atoms (-0.85/NPA, -0.32/PEOE/PEPE). If an amino group is conjugated, the charge on the nitrogen atom is generally less negative compared to a non- conjugated amino group because of delocalization of the lone pair. Nitrogen atoms in cyano groups have generally a broader variance in all schemes due to delocalization effects.

Oxygen, Fluorine, Phosphorus and Sulfur Atoms Because of their high electronegativ- ity, oxygen and fluorine atoms have only negative charges (except of a few fluorine atoms in polyhalogenated hydrocarbons having a slightly positive MK-ESP charge). Again, AIM pro- vides the most extreme values, about -1.4 for oxygen and -0.6 for fluorine bound to silicon. Another group of oxygen atoms having about one full negative charge for both, the NPA and AIM method, is found for oxygen atoms bound to hypervalent phosphorus or sulfur atoms. This is also reflected in the highly positive charges for the corresponding phosphorus and sul- fur atoms (Fig. 4.9, p. 126). Actually, the formal P = O and S = O bonds in such hypervalent groups have to be considered as being strongly ionic as found with NPA and AIM and are + + more accurately written as P – O− or S – O− (a more detailed discussion about hyperva- lent species is given in Section 3.4.3). Nevertheless, MK-ESP charges are much smaller for such groups only slightly exceeding one positive charge on the central atom. The main peaks in the plots for NPA and AIM represent non conjugated hydroxy oxygen atoms at about - 0.75/NPA and -1.05/AIM. In the case of conjugation (as in phenole or acidic groups) NPA gives slightly more positive values (-0.70) while AIM gives more negative charges. This is puzzling as here a certain π-charge donation should be expected which is only predicted by NPA. Oxygen atoms found to have less negative charges can be attributed to oxygen atoms in nitro groups.

128 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

Silicon Atoms With silicon atoms (right-hand side of Fig. 4.8, p. 125) quite extreme charge values are again found for AIM. Even in silane or methylsilane the silicon atoms are assigned a positive charge of about +2.7. With fluorine substituents AIM charges are rising up to more than three full positive charges. Within the NPA scheme silicon atoms are also positive but much less extreme. In silane, the central atom has a charge value of 0.59. With highly electronegative neighbors as in trifluoromethylsilane a charge value of 2.3 is reached. For PEOE/PEPE silicon is nearly uncharged if bound to hydrogen and carbon atoms and at- tains a significant positive charge only with more electronegative atoms. Obviously, the QM methods find a much smaller electronegativity for silicon as is assumed for PEOE/PEPE.

Chlorine and Bromine Atoms A surprising situation is encountered for the atoms chlorine and bromine (Fig. 4.10, p. 125). Although commonly considered to be quite elec- tronegative, chlorine is found to have NPA charges which are only slightly negative. In the case of polyhalogenated carbon atoms chlorine has quite often a small positive charge. The same is valid for bromine with a slight shift to more positive values. In the AIM scheme, chlorine and bromine can be considered to be quite non polar. However, chlorine atoms attain a negative charge of -0.7 if bound to silicon. The MK-ESP charges support this ob- servation of finding charges on chlorine and bromine atoms close to zero in all QM charge calculation schemes.

4.2.6.5 Relationship to Electronegativity

With the intention of analyzing the relationship of the charge values with electronegativity, charges for the binary hydrogen compounds of the elements under consideration have been calculated. The charges on hydrogen were plotted against electronegativity in Fig. 4.11. Here, the hydrogen atoms can be considered as probe atoms and should reflect the effective electron-withdrawing power of the atom bound to it. The general trend is reproduced by the three calculation methods and the NPA gives a good linear relationship with a correlation coefficient, r2 = 0.958 (the correlation line is given as a dashed line in Fig. 4.11). The correlations for AIM and MK-ESP are distinctly poorer with r2 = 0.892 and r2 = 0.889, respectively. The silicon atom is found to be the most electropositive element as would be expected. While this is pronounced very strongly in the AIM scheme, NPA and MK-ESP give a much

129 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

NPA AIM MK−ESP 0.5 0.0 charge on hydrogen −0.5

SiH4 PH3 H2 H2S CH4 HBr HCl NH3 H2O HF

1.5 2.0 2.5 3.0 3.5 4.0 4.5

electronegativity

Fig. 4.11 Relationship between electronegativity of various elements and hydrogen charges for the correspond- ing hydrids. Charge values were calculated with three different calculation methods: NPA, AIM and MK-ESP. NPA charges show the best correlation with r2 = 0.958, the regression line is given as dashed line. Electronegativity values were taken from the ALLRED and ROCHOW scale of elec- tronegativity [187]. smaller Si – H bond polarity. A similar observation is found for phosphorus. However, whereas AIM considers phosphorus as significantly less electronegative than hydrogen the relationship is reversed for the MK-ESP scheme. The natural charges for phosphane indicate that NPA assumes an equal electronegativity for phosphorus and hydrogen atoms. Before discussing the central part of Fig. 4.11, the elements nitrogen, oxygen, and fluo- rine can be treated briefly: all three methods give the trend of increasing electronegativity. MK-ESP shows a smooth rise in polarity from nitrogen to fluorine and AIM with a stronger charge separation from the nitrogen to the fluorine atom. The elements with intermediate electronegativity and chlorine and bromine reveal more deviations from expectation. The MK-ESP charges for hydrogen sulfide and hydrogen bro- mide indicate a similar electronegativity of sulfur and bromine. For the NPA, it is found that

130 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

bromine is even less electronegative than carbon. AIM charges reproduce the trend correctly for those atoms but, firstly, there is quite a large difference between bromine and chlorine compared to their similar electronegativities, and, secondly, the charges for hydrogen sulfide and methane are all close to zero.

Discussion of the C – H bond polarity The charges of methane lead to the question of the bond polarity of the C – H bond. This has controversially been discussed in the literature [127,170,188–193]. Considering electronegativity, all empirical scales agree that the carbon atom is more electronegative than the hydrogen atom [H = 2.2/C = 2.55 (Pauling), 2.59/2.75 (Sanderson), 2.2/2.5 (Allred-Rochow), 2.25/2.48 (Mulliken-Jaffé) [187]]. This should result in a slightly polar bond with a negative charge on the carbon atom. + Nevertheless, it has been argued that the reverse polarity, C – H−, should be assumed to be in agreement with bond dipole moments derived from experimental infrared intensi- ties [190]. This has been interpreted as support for the AIM method that predicts the reverted C – H bond polarity for higher (see Tab. 4.8, p. 133) [170]. In contradiction, in a more recent study it was found that the infrared intensities might had been misinterpreted + and a more detailed analysis including overtone intensities led to the classical C− – H po- larity [192]. Furthermore, as can be seen from the MK-ESP charges for methane (Tab. 4.8, p. 133) the electrostatic potential can be reproduced correctly with the classical bond po- + larity. From a practical point of view, a C – H− bond polarity can not be obtained with the PEOE procedure without changing the electronegativity values of either the carbon or the hydrogen atom. Therefore, AIM charges are inappropriate as reference charges for the fitting. Finally, absolute charge values found for C – H bonds have to be discussed. For methane a strongly negative natural carbon charge (-0.813) is obtained. This is due to the fact that, within the NPA scheme, each hydrogen atom releases a relatively constant amount of charge irrespective of the charge that accumulates on the carbon atom. A hydrogen atom has quite a constant positive charge between 0.18 and 0.20 if it is not in the direct neighborhood of polarizing substituents. This is surprising because one might expect a kind of charge saturation of the central carbon atom but that is nevertheless not observed. Therefore, even though an isolated C – H bond is only slightly polar in accordance with electronegativity, quite large negative NPA charges are found for the carbon atoms in hydrocarbons.

131 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

4.2.6.6 Shortcomings of the Merz-Kollman Scheme

Summarizing the results of the present analysis up to here, it has been revealed that NPA and MK-ESP charges seem to fit better to the requirements imposed above. The AIM method fails for a series of compounds, generates often quite extreme charge separations and cor- relates most poorly with electronegativity. Thus, as a final point of the present analysis, the known shortcomings of the MK-ESP method should be studied and the extent of the unfavorable behavior should be estimated. First, the dependence of MK-ESP charges on the conformation is evaluated. One extreme example was found for cyclopropanol. The optimized structure with the MK-ESP charges for the groups is shown in Fig. 4.12. The hydroxy group is directed towards one of the methylene groups. As a result, the MK-ESP charge value for the carbon atom of that group is significantly more negative than the MK-ESP charge on the carbon atom of the other methylene group (∆q = 0.236). The natural charges, given in parenthesis in Fig. 4.12, vary much less(∆q = 0.019).

Fig. 4.12 MK-ESP charges of cyclopropanol. The direction of the OH bond has a strong influence on the

charges of the topologically equivalent CH2 group atoms. Natural charges (given in parenthesis) are much less sensitive.

Examples with similar strong variations have been found for triethylamine and N,N- dimethylformamide. In the former compound even the conformationally equivalent carbon atoms show a variation of -0.392/-0.301/-0.270 (∆maxq = 0.129). The latter compound has charges on the methyl carbon atoms of -0.426 and -0.256 (∆q = 0.170). Furthermore, it was reported that atoms buried inside a molecule may have unrealistic

132 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

charge values [185]. Thus, this effect has been studied by the series of compounds from methane to neopentane by successively substituting a hydrogen atom by a methyl group. The charges of the central carbon atoms are reported in Tab. 4.8.

Table. 4.8 Charges on the central carbon atom of the series of compounds obtained from methane by succes- sive substitution of one hydrogen atom by a methyl group.

charge on central carbon compound NPA AIM MK-ESP CH 0.813 0.007 0.507 4 − − − CH CH 0.578 0.037 0.019/ 0.034 (C1/C2) 3 3 − − − CH2(CH3)2 0.389 0.070 0.306 − CH(CH3)3 0.237 0.090 0.559 − C(CH3)4 0.103 0.095 0.808 −

Even though the natural charges change notably, they are all found to be negative, while the MK-ESP charges not only reverse the sign but also change from a large negative value to large positive values. This behavior can not be brought in accordance with electronegativity. The central carbon atoms should become less negative going to neopentane but still should have a charge less than zero. With respect to the PEOE scheme natural charges are most reasonable: the more hydrogen atoms are attached to a carbon atom the more negative it becomes; in neopentane the charge on the central carbon atom is zero in the first iteration step of a PEOE run. In the next steps the methyl groups accumulate negative charge and thus become more electropositive than the central atom which will subsequently receive small amounts of charge from the methyl groups.

4.2.6.7 Conclusions of the Analysis

In order to identify the QM charge calculation method that is most appropriate for the cali- bration of the combined PEOE/MHMO calculation scheme introduced in the present work, a broad variety of methods have been surveyed. From all the schemes only Mulliken popula- tion analysis, natural population analysis, charges from the Atoms-in-Molecule theory, and Merz-Kollman ESP-fitted charges have been identified to be of choice because of availability reasons or scope and limitations. The Mulliken charges were readily found to be unusable with the basis set used for the QM calculations and have not been further evaluated in this

133 4.2 Development of a Combined PEOE/HMO Calculation Chapter 4

study. The AIM method that is considered to be the one having the soundest theoretical foundation is obviously also the most demanding one in terms of implementation of the al- gorithm. The standard implementation incorporated in the GAUSSIAN 98 package failed for nearly half of the molecules. Furthermore, the AIM charges adopt the most extreme values occupying a range of five unit charges. Finally, the AIM method gives C – H bond polarities that contradict the empirical electronegativities. Out of the remaining two methods, MK-ESP charges give the most moderate absolute charge values, reproduce the dipole moment very well, are sufficiently related to electroneg- ativity, and have a fast and robust standard implementation. Unfortunately, this method has some severe shortcomings that makes it inappropriate for the purpose of calibrating our cal- culation scheme. First, only atoms near the molecular surface have reasonable charge values. The more an atom is buried inside a molecule the less it is "seen" by the ESP and therefore it may adopt unrealistic values. Topologically equivalent atoms may arrive at strongly differ- ent charges because of a shielding effect of an unsymmetrical conformation. Furthermore, the charges for bromine containing compounds were revealed to have unintuitive values with strongly negative carbon atoms. Finally, the natural population analysis seems to be the best compromise having the most benefits and the least shortcomings. An organic chemist may feel uncomfortable about the magnitude of the absolute values of the natural charges. Also the highly negative carbon atoms in hydrocarbons may be considered with doubt. Nevertheless, the order of magnitude found for the charge-based dipole moments indicates that these values give a realistic repre- sentation of the charge distribution. On the other hand, the calculation algorithm is properly defined. The problem of dividing the overlap population is solved without any arbitrariness. The implementation is robust and no calculation failure was experienced. The resulting charge values are stable with respect to topological equivalence and conformational issues and are well-defined irrespective of an atom’s location inside the molecule. Furthermore, dipole moments calculated with the NPA charges are rather in the proper order of magnitude even though giving only a moderate agreement with experimental dipole moments. Impor- tantly for the present work, natural charges show the best correlation with electronegativity. As the final conclusion it has been decided to calibrate the PEOE/MHMO scheme with NPA charges.

134 Chapter 4 4.2 Development of a Combined PEOE/HMO Calculation

4.2.7 Summary

Methods for the calculation of partial atomic charges were developed in our group several years ago treating σ- and π-electron systems separately. The calculation scheme for the determination of σ-charges (PEOE) is based on a sound theoretical basis having only one empirical parameter (the damping factor). The charge values have shown broad applicability and relevance. Hence, in the course of the revision of our charge calculation scheme, the PEOE method should basically remain unchanged with a reexamination of the value of the damping factor. The π-charge distribution is quantified by a calculation scheme that was based on the empirical weighting of resonance structures (PEPE) with a set of empirical factors that were determined by correlation with small datasets of experimental data. This calculation scheme lacks a clear theoretical foundation and the parametrization is quite restricted. Instead of reworking and reparameterizing the PEPE scheme, it has been decided to substitute it with a Hückel MO calculation. The Hückel theory is well established and derived from quantum mechanical principles. Even though a variety of approximations is applied, a HMO treatment should give a better description of π-charge densities. The most severe approximation, the complete neglect of σ-π interactions, should be overcome by a modification of the Coulomb integrals with correction terms accounting for inductive effects and hyperconjugation. A general application of the HMO theory to organic structures requires the parametrization of the Coulomb and resonance integrals for heteroatoms. A parametrization of a charge calculation scheme can be based on experimental proper- ties depending directly on the charge distribution or on charges from quantum mechanical calculations. After a study and discussion of the opportunities available it was decided to develop two parametrizations: one based on molecular dipole moments and one calibrated with DFT/NPA charges. Even though dipole moments give only a crude picture of the charge distribution, using dipole moments has been identified to be the most appropriate and prac- tically most feasible way of fitting charges to an experimental property. The parametrization based on DFT/NPA charges, on the other hand, will not give a charge distribution represent- ing the electrostatic properties very well (as NPA charges correlated quite poorly with dipole moments) but can be expected to be more useful in answering chemical questions such as in chemical reactivity. For both parametrizations, a calibration dataset of 386 small organic molecules was set

135 4.3 Workflow and Procedure for the Parametrization Chapter 4

up. The structures therein are strictly mono-functional with the purpose of studying only the effects of one functional group at a time. In order to test if this separate treatment can be justified and gives reasonable results when groups are mixed up, a test dataset of size 71 was compiled consisting of 48 multi-functional organic and 23 bio-medicinal compounds. All compounds were fully geometry optimized on the B3LYP/6-311+G(d,p) level of theory and the dipole moments and NPA charges were calculated with the same model chemistry. In Section 4.3-4.6, the results for both parametrizations will be presented and their rele- vance will be evaluated. Modifications of the PEOE procedure had to be introduced in the course of the parametrization with NPA charges. These modifications will be presented and discussed.

4.3 Workflow and Procedure for the Parametrization

The general task of parametrization has been described in Section 2.6. It has been said that a set of parameters can be determined deterministically if the parameters can be related to the property of interest in an analytical expression. In our case, finding the optimal parameter set for the PEOE/MHMO procedure, an ana- lytical expression for the dependence of the charges on the parameters cannot be given. The PEOE scheme is an iterative method and the HMO treatment is indirectly dependent on the results of the PEOE calculation. For the calculation of dipole moments a further calculation step is required. Here, a probabilistic search algorithm has to be applied in order to locate the optimal parameter set from amoung the huge amount of combinations possible that may result from a large number of parameters. If only a few parameters are to be optimized and the number of combinations is up to several thousands a brute force approach is also practi- cal by testing all possible combinations. The workflow that has been applied in the present work is outlined in Fig. 4.13. This workflow has been implemented in a computer program based on the MOSES programming library. Initially, the program reads the structural data along with the target properties from a structure file. Additionally, configuration data and the initial guess for the parameter set is read from a configuration file. The application is configuring and triggering the property calculation modules. Based on the results and the target properties read, the quality measure is derived (see below). The quality measure is passed to the optimizer that is providing an

136 Chapter 4 4.3 Workflow and Procedure for the Parametrization

Fig. 4.13 Workflow applied for the parametrization of the combined PEOE/MHMO procedure.

updated set of parameters reentering the loop. As optimizers a simulated annealing (Sec- tion 2.6) or a brute force optimizer can be chosen. The process is terminated after conver- gence of the quality measure has been reached or after a given number of iteration steps has been performed. The entire program run is monitored and all temporary results are logged into a file. By default the best 20 parameter sets are stored internally and the resulting models are written as data files before program termination. As a quality measure, two different types can be used. On the one hand, a least-squares sum can be calculated from the individual deviations of all objects in the dataset (Eq. 4.25). Objects are all either atoms in the dataset for the direct fitting of the charge values or all molecules in the case of fitting indirectly via molecular dipole moments. The best fit is

137 4.4 Calibration with Molecular Dipole Moments Chapter 4

reached when fopt,least squares approaches zero. − N 2 fopt,least squares = (ptarget pcalculated) (4.25) − s − On the other hand, the quality can alsoXbe judged from the statistical parameters of a linear regression between the target and the fitting property. Here, the regression coefficient and the standard deviation of errors could directly be used. In practice, it has been shown that these values cannot be used directly because the standard deviation decreases if the absolute values of the calculated properties become smaller. Actually, it is desired to obtain a regression with a slope near unity and also a y-axis intercept near zero. Therefore, a more robust measure is obtained when using a combination of the statistical parameters (Eq. 4.26). These have to be slightly modified in order to express an optimal fit by a value approaching zero and not to obtain negative values in accordance with Eq. 4.25.

f = (1 r2) + 1 b + σ + a (4.26) opt,stat. − | − | with, r, the regression coefficient, b, the slope of the regression line, σ, the standard deviation of errors, and a, the y-axis intercept.

4.4 Calibration with Molecular Dipole Moments

For the calibration, a rather large dataset consisting of 457 molecules had been compiled (see Section 4.2.2). An experimental dipole moment was available for 310 compounds. It had been shown that the dipole moment can be calculated by QM calculations with a high accuracy (see Section 4.2.4). The question had to be clarified which values should be used, only experimental values, the experimental value if available and otherwise the calculated one, or only calculated val- ues. At first glance, it seems to be most reasonable to use only experimental data in order to fit against a real observable. On the other hand, the dipole moment is not only dependent on the charge distribution but also on the 3D structure. Therefore, even if an experimental dipole moment is known, observed structural data are required that are usually difficult to find. Al- ternatively, fitting to the experimental dipole moment based on the calculated 3D structure might introduce inconsistencies if the calculated structure deviates from the observed one. Such deviations should be small if the calculated dipole moments are within good agree- ment with the observed ones. Nevertheless, it was finally decided to fit against the calculated

138 Chapter 4 4.4 Calibration with Molecular Dipole Moments

dipole moments using the optimized 3D coordinates for the sake of (i) having an internally consistent model and (ii) having full flexibility in extending the dataset with new compounds of interest without requiring necessarily an experimental dipole moment. As has been shown before, the calculated dipole moments show a systematic deviation from the experimental values by a factor of 1.072 (see slope of the linear regression in Fig. 4.3, p. 110). Therefore, all QM values had initially been scaled by a factor of 0.933 in order to obtain the correct magnitude with respect to the experimental values. The parametrization has been performed in two steps as outlined by Fig. 4.14.

Fig. 4.14 Workflow applied for the parametrization of the combined PEOE/MHMO procedure by fitting to molecular dipole moments.

All adjustable parameters are summarized in Tab. 4.9. In order to find the optimal value for the PEOE damping factor, fdamp, only compounds without any unsaturated bond were considered. Here, it was assumed that the damping factor is the same when applied to pure σ- bonds or to σ-bonds that are part of an unsaturated system. In the second step, the parameters for the modified HMO calculation were determined.

139 4.4 Calibration with Molecular Dipole Moments Chapter 4

Table. 4.9 Summary of adjustable parameters in the proposed PEOE/MHMO scheme.

parameter method description reference

<α 1> <α 1> χ − χ − f Q = 6 M Ni iν − jµ f α damp PEOE i α=1 i j χ+ damp Eq. 2.34 (p. 31) k ·

P P P1 Ni fσ π MHMO ∆ασ π,i = fσ π (χσ,j χσ,C ) Eq. 4.19 (p. 101) − − − · Ni j − methyl P f MHMO ∆α = nalkyl f q Eq. 4.20 (p. 101) hyp hyp,i j hyp · σ,j P αi,0 MHMO initial values for atomic Coulomb integrals (p. 92)

βij MHMO bond resonance integrals (p. 92)

4.4.1 PEOE

As might be expected, the optimization of only one parameter does not significantly improve the agreement between the PEOE charge based and the SCF dipole moments. With the variation of the damping factor the amount of charge separation is controlled. Therefore, only the slope of the correlation can be affected. Initially, the quality of the unmodified PEOE charges with respect to the ability of re- producing dipole moments has been evaluated. Unfortunately, the agreement of the PEOE based dipole moments with the QM values is generally quite poor. For all 155 saturated com- pounds in the calibration set a correlation coefficient of r2 = 0.70 was obtained with a stan- dard deviation of σ = 0.43D. A further analysis showed that the correlation for compounds containing only first-row hetero atoms (Fig. 4.15) is notably better than for compounds with heavier atoms (Fig. 4.16). Gaining a further insight into Fig. 4.15 it is found that the strongest outliers are the binary hydrogen compounds (, hydrazine, water, and hydrogen fluoride). They are consistently underestimated. This was already found in a previous study [194] and had been attributed to the anisotropy of the free electron pairs. This explanation is reasonable and it had further been stated that this general trend in smaller dipole moments for compounds with free electron pairs can be reproduced by PEOE charges. This finding can not be supported in the present study. In Fig. 4.15, amines, alcohols, and ethers are emphasized by gray boxes. For the group of amines that all have a pyramidal conformation, the two left-most

140 Chapter 4 4.4 Calibration with Molecular Dipole Moments

Fig. 4.15 Correlation of dipole moments calculated based on PEOE charges with values obtained from QM calculations. The symbols refer to the compound main classes as described above. The PEOE

charges were calculated with the classical damping factor, fdamp = 0.50. The dataset used was a subset of the calibration dataset containing only first-row elements. points belong to trimethyl- and triethylamine. Even though these compounds are quite small and the contribution of the lone pair should have a strong influence, they are overestimated. For the oxygen compounds, the alcohols fall exactly on the correlation line whereas the ethers are again significantly overestimated while an underestimation would be expected. Due to the double contribution of two lone pairs this underestimation should be even more pronounced than for the amines. Fluorine containing compounds are also in the proper order of magnitude or predicted to be more polar in contrast to three lone pair contributions. The trends in a given group are not well reproduced nor can a systematic error be found due to free electron pairs. However, considering the global model, it can be said that the dipole

141 4.4 Calibration with Molecular Dipole Moments Chapter 4

Si

3.0 P S Cl Br 2.5 2.0 1.5 1.0 PEOE based dipole moment (debyes)

y = 0.745 * x + 0.289 0.5 n = 84 r2 = 0.506 σ = 0.488 0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

SCF dipole moment (debyes)

Fig. 4.16 Correlation of dipole moments calculated based on PEOE charges with values obtained from QM calculations. The symbols refer to the compound main classes as described above. The PEOE

charges were calculated with the classical damping factor, fdamp = 0.50. The dataset used was a subset of the calibration dataset containing only hetero atoms of the second-row and bromine. moments agree quite well with SCF values for the first-row. In ignoring the values for the binary hydrogen compounds (which is justified because of the strong relative contribution of the lone pairs) the correlation improves to r2 = 0.91 and σ = 0.26D and the slope rises to b = 1.05. The picture for the heavier elements is much worse (Fig. 4.16). The dipole moments of silicon containing compounds are generally predicted to be too high. While phosphorus and sulfur containing compounds are occupying the entire range of the plot the dipole mo- ments of these compounds are scattered quite strongly. Chlorine and bromine containing compounds are quite consistently estimated to have lower dipole moments compared to the

142 Chapter 4 4.4 Calibration with Molecular Dipole Moments

experimental values in accordance with the anisotropy of the free electron pairs. In this plot the binary hydrogen compounds can not be identified as outliers. In general, it is surprising that the order of magnitude of the dipole moments resulting from the PEOE charges is in the range of the observed ones. Recalling the distribution of the absolute charge values (see Section 4.2.6) one might expect to find smaller values. The molecular dipole moment is the resultant of all bond dipoles (and atomic dipoles in case of non spherical electron distributions) and is obviously not dependent on the absolute charge values but only on the relative bond polarities. This amount of charge separation is controlled by the damping factor that should be optimized here. Unfortunately, the overall correlation for the entire dataset seems not sufficiently good to allow a determination of the damping factor. Because first-row elements play a more important role in standard applications it has been decided to base the calibration of PEOE only on compounds containing solely atoms from the first-row. Fig. 4.17 shows the variation of the damping factor and the impact on the quality of the correlation shown in Fig. 4.15 (without the binary hydrogen compounds). It is found that the quality measure is dominated by the second term in (Eq. 4.26) based on the slope. The optimum value is therefore found for a slope closest to unity and is fdamp = 0.48. 0.5 0.4 0.3 quality measure 0.2

0.40 0.45 0.50 0.55 0.60

PEOE damping factor

Fig. 4.17 Dependence of the quality measure obtained from statistical parameters of a linear regression be- tween dipole moments from PEOE charges and QM calculations with variation of the PEOE damping factor.

143 4.4 Calibration with Molecular Dipole Moments Chapter 4

4.4.2 Modified HMO

As has been described in Section 4.2.1, the HMO procedure takes into account the σ-charge distribution that is determined in a preliminary PEOE run. The factors, fσ π and fhyp for − weighting the modifications for σ-π interactions and pseudo-hyperconjugation, respectively, have to be found simultaneously with the Hückel parameters. Performing the parametriza- tion with the simulated annealing (SA) optimizer, it turned out that it was hard to locate the global optimum with the entirety of parameters to be determined simultaneously. There- fore the calibration process was split into several steps, one per compound main class (see

Tab. 4.2, p. 103). As fσ π and fhyp act as global parameters they were required before − the single optimization runs and had to be determined beforehand. For the calibration with

DFT/NPA charges (Section 4.5.3) fσ π was optimized for a small set of trifluoromethy- − lalkenes. This did not succeed in the present context because the general quality of the dipole moments calculated with PEOE charges is not sufficient for the quantification of such a subtle effect. Because the descriptor for representing σ-π interactions is based on the resid- ual electronegativities of the first sphere and the electronegativity values in both calibration schemes are quite similar, the value found for the DFT/NPA scheme has been adopted here

(fσ π = 0.029). − fhyp could have been found with an iterative optimization using the nitrogen, oxygen and fluorine subdatasets. The Hückel parameters for these subsets were optimized. With the parameters found, the optimum for fhyp was determined. With this value the Hückel parameters were reoptimized and so forth. This procedure resulted in a value of fhyp = 0.58.

With the fixed values for fσ π and fhyp the stepwise optimization of the Hückel pa- − rameters, α0,i and βij, was performed. The results represented as statistics for the optimal fits are summarized in Tab. 4.10. Plots for the entire calibration and test dataset are given in Fig. 4.18 and Fig. 4.19, respectively. As silicon atoms normally does not build stable π-bonds, the calibration dataset did not contain compounds with silicon atoms being part of a π-system. Therefore, only two compounds containing silyl groups attached to π-systems are found in Tab. 4.10 and are actually not part of the calibration. The agreement of the calculated dipole moments with the QM values is generally good. Actually, the quality of the fitting is comparable to the results of other charge calculation schemes published recently. BAGOSSI et al. obtained with an empirical method standard deviations in the range of 0.19 to 0.59D for a dataset of 309 compounds [144]. With a

144 Chapter 4 4.4 Calibration with Molecular Dipole Moments

Table. 4.10 Results for the parametrization of the modified HMO calculation with molecular dipole moments. Given are the statistical measures obtained from the linear correlation of the PEOE/MHMO based dipole moments with the QM values.

compound class no. of cmpds r2 σ (D) slope y-axis intercept (D) calibration data hydrocarbons 28 0.90 0.03 0.42 0.01 nitrogena 58 0.92 0.32 1.04 0.06 − oxygena 43 0.95 0.25 0.94 0.18 fluorinea 17 0.75 0.24 0.78 0.33 silicona (2) – – – – phosphorus (III,IV)a 27 0.95 0.28 0.99 0.04 − sulfur (II,IV,VI)a 32 0.95 0.30 0.96 0.09 chlorinea 14 0.63 0.24 0.64 0.39 brominea 10 0.29 0.24 0.26 0.81 all first-row 146 0.97 0.28 1.03 0.07 − all post first-row 85 0.93 0.32 0.99 0.06 − all calibration data 231 0.96 0.30 1.02 0.07 − test data test dataset I 23 0.87 0.40 0.94 0.26 test dataset II 37 0.91 0.34 0.91 0.07 all test data 60 0.89 0.39 0.94 0.10 a subset with compounds containing the given element as heteroatom

smaller set (119 cmpds.) GERBER achieved an accuracy of 0.36D also with an empirical method [136]. The group of TRUHLAR and CRAMER developed a procedure for fitting semi-empirical and ab initio charges to dipole moments and yielded fittings in the range 0.25–0.39D for a dataset of 382 compounds [195]. As can be seen from Tab. 4.10, the model presented here can compete with the published ones. Examining the various groups in more detail, one may notice that the group of hydro- gives a very small slope, nonetheless, the trend is well reproduced. The small slope in that group may give a hint that the charge separation in C – H-bonds should indeed be stronger than predicted with PEOE. Actually, the dipole moment here is dominated by the σ-charge as the HMO method gives only small π-charges for hydrocarbons. Furthermore, it is striking that the halogen containing compounds give the worst results

145 4.4 Calibration with Molecular Dipole Moments Chapter 4 5 4 3 2 PEOE/MHMO based dipole moment (debyes)

1 y = 1.019 * x + −0.073 n = 231 r2 = 0.955 σ = 0.299 0

0 1 2 3 4 5

SCF dipole moment (debyes)

Fig. 4.18 Comparison of dipole moments based on the fitted PEOE/MHMO charges and QM dipole moments for the calibration dataset. with respect to the correlation coefficient, slope and intercept (though the standard deviations are comparable to the remaining groups). The reason is found in the fact that the diversity of the subdatasets used is quite restricted. This is especially the case for the chlorine and bromine dataset. Actually, the fit for these datasets was not sufficient to optimize both, Coulomb and resonance integrals and therefore parameters for the resonance integrals had to be taken from the literature [132]. Considering the phosphorus and sulfur containing subsets, a very good quality of the calculated dipole moments is found. The standard deviations are not significantly worse

146 Chapter 4 4.4 Calibration with Molecular Dipole Moments 5 4 3 2 PEOE/MHMO based dipole moment (debyes) y = 0.939 * x + 0.104 1 n = 60 r2 = 0.89 σ = 0.385

0 1 2 3 4 5

SCF dipole moment (debyes)

Fig. 4.19 Comparison of dipole moments based on the fitted PEOE/MHMO charges and QM dipole moments for the test dataset.

compared to the other compound classes. This has to be emphasized as the hypervalent atom types that were of special interest in the present work are contained in these datasets. The structure representation for hypervalent atom types (Section 3.4.3) gives a realistic picture of the charge distribution in hypervalent compounds with respect to the resulting dipole moments. The Hückel parameters obtained will not be presented here but will be given and dis- cussed in comparison with the values obtained for the NPA-fitting in Section 4.5.4.

147 4.5 Calibration with DFT/NPA Charges Chapter 4

4.5 Calibration with DFT/NPA Charges

4.5.1 General Considerations

Calibration against reference charge values has the advantage of having a direct and unam- biguous relationship of the fitting and the target values. Nevertheless, the charges obtained by the presented PEOE/MHMO scheme are composite values having a fraction of σ- and another of π-charge. Even though this separation is well established and justified as dis- cussed above it is somewhat artificial in nature and quantum mechanical calculations do not provide separate values but only the total partial charges. In order to overcome this prob- lem the same approach as outlined in Fig. 4.14 (p. 139) was applied: the parameters for the PEOE procedure – that had to be modified for the fitting to NPA charges as explained below – are determined with pure saturated compounds only. The total charge is equal to the σ-charge for those compounds. The Hückel parameters are optimized in a subsequent step by keeping the PEOE parameters constant for a subset of unsaturated compounds and obtaining the π-charge to regress against as the difference of the total natural charge and the PEOE σ-charge. The procedure of using saturated compounds for studying σ-charge effects and quantifying π-charge effects by subtracting the σ-charge effect from the global effect is traditionally applied for the quantification and separation of substitution effects (e.g., field- /inductive and resonance effects) [140]. A second more serious problem had to be faced in reaching the absolute magnitude of the natural charges. As has been shown in the previous analysis (Section 4.2.6), PEOE charges are in general much smaller than natural charges. In preliminary tests, it was found that it is impossible to obtain such large charge values by PEOE because the difference in electronegativities is generally not sufficient. On the other hand, it has also been shown that absolute charge values vary quite strongly for different calculation schemes and WIBERG has stated that the absolute values may differ but the relative changes regarding the different atom types are important [170]. The solution of this problem was to aim at reproducing the trend of the natural charges and not the absolute values by scaling the PEOE/MHMO charges such that they reached a range comparable to the NPA charges. Practically, each charge value obtained by the PEOE/MHMO procedure was linearly scaled by a constant factor prior to the comparison with the corresponding natural charge value. This scaling is artificial in nature and the scaling

148 Chapter 4 4.5 Calibration with DFT/NPA Charges

factor could not be derived on a pure rational basis. Since with the PEOE method each charge value corresponds to an electronegativity value (see Eq. 2.31, p. 29), the scaling factor was defined such that the residual electronegativities of atoms accumulating high negative charges remain still more electronegative than the attached atoms. This is especially the case for methane and ammonia where the central atom obtains natural charge values of - 0.81 and -1.05, respectively. With an scaling factor of 4.0 the residual electronegativities in both molecules are still slightly higher for the central atom than for the hydrogen atoms (7.56/7.49 eV for C/H in methane for a charge of -0.81/4 = -0.20 on the carbon atom; and 8.75/7.71 eV for N/H in ammonia for a charge of -1.05/4 = -0.26 on the nitrogen atom). Eq. 4.27 gives the equation applied for the fitting of the total charges obtained by the combined MPEOE/MHMO procedure to NPA charges employing a scaling factor of 4.0. As the total charge is the sum of σ- and π-charge the scaling applies equally for both portions.

qNPA 4 qMPEOE/MHMO = 4 qMPEOE + 4 qMHMO (4.27) ⇐ · total · σ · π The scaling is employed after all calculations have been performed. Therefore, quantities

related to the charges (e.g., residual electronegativities, χσ) refer to the unscaled charge values.

4.5.2 Modified PEOE

As before, a significant improvement of the correlation of PEOE with natural charges could not be achieved by variation of the single damping factor. Which other parameters in the PEOE scheme might thus be available for an optimization? The coefficients for the elec- tronegativity functions were derived from spectroscopic data and build the foundation of the PEOE method. They should basically not be modified. Only the parameters for calculating the electronegativity for hydrogen atoms were subject to some discussion. The linear and the quadratic term in the equation for calculating electronegativities (Eq. 2.31, (p. 29) and

the electronegativity for the proton, χH+ , can not be obtained experimentally and had to be estimated [54,55,196]. NO et al. developed a modified PEOE procedure [55] and tried to ad- just the electronegativity parameters to reproduce molecular dipole moments. As mentioned, these parameters should be kept fixed but a second modification was introduced in [55]: in- stead of using one global damping factor for all bonds, one distinct factor per bond type was defined. This idea was taken up in the present work in order to introduce more adjustable

149 4.5 Calibration with DFT/NPA Charges Chapter 4

parameters. With this multiple damping factor approach the basic equation in PEOE (see Eq. 2.34, p. 31) for the calculation of the atomic charges changes to

L M Ni <α 1> <α 1> χ − χ − iν : χ < χ Q = iν − jµ f α k = iν jµ (4.28) i χ+ · ij α=1 i j k ( jµ : χiν > χjµ X X X with, α, the iteration number, L, the number of iteration steps, M, the number of atoms in a molecule, j, the first-sphere neighbors of atom i, Ni, the number of neighbors of atom i, the indices, iν and jµ, referring to the bonding orbitals of atom i and j, and, fij, the bond type dependent damping factors.

4.5.2.1 Revised Electronegativity Parameters

In the general PEOE/MHMO scheme based on a fitting to dipole moments, only the MHMO part for the calculation of π-charges was calibrated. The PEOE part remained basically unchanged. Therefore, σ-charges obtained from the PEOE/MHMO method from the cal- ibration to dipole moments can be seen as backward compatible to previous versions. In the current context, the PEOE scheme had to be modified and the charges will be quite dif- ferent to previous implementations. Thus, the opportunity was availed for a revision of the electronegativity parameters. As has been described in Section 2.7.2, the orbital electronegativities are dependent on the atomic charge and the hybridization state. In the original paper [9], the polynomial coefficients are given only for carbon, nitrogen, and oxygen atoms for different hybridization states. This has been extended in the implementation of the PEOE method in PETRA 2.6 to fluorine, sulfur, chlorine, bromine, and [82]. Furthermore, this implementation applied a generalized approach for calculating the orbital electronegativities as a continuous function of charge and hybridization state. This was achieved by expressing the polynomial coefficients as functions of the hybridization. Each coefficient can be obtained by Eq. 2.32 (p. 30) for several hybridization states. These points were then fitted by polynomials of degree three coef = α + β hyb + γ hyb2 + δ hyb3 (4.29) · · · with coef being one of a, b, or c in the quadratic equation, χ = a + b q + c q2, and hyb, the · i · i hybridization as characterized by the amount of p-orbital contribution, thus, 0 hyb 100. ≤ ≤

150 Chapter 4 4.5 Calibration with DFT/NPA Charges

The resulting twelve parameters allow the calculation of orbital electronegativities for each combination of charge and hybridization. As this set of parameters was not complete with regard to the data available and espe- cially phosphorus and silicon were represented only by one set of default parameters, i.e., for a standard hybridization state, the derivation of coefficients (Eq. 4.29) was redone and extended to all elements given in [47, 48, 50] (all parameters obtained can be found as sup- plementary material in Tab. A.3 (p. 191)). The shape of the electronegativity function is generally quite similar for all elements and for illustration it is shown for carbon in Fig. 4.20. As expected the electronegativity decreases continuously with increasing p-orbital contribution and negative charge.

25 electronegativity

20

15

10

5

0 20 %p−contribution −1.0 40 −0.5 60 0.0

80 charge 0.5

100 1.0

Fig. 4.20 Representation of electronegativity as function of charge and orbital hybridization state (given as percentage of p-orbital contribution). The parameters are taken in this example for carbon.

Next the electronegativity for the proton was evaluated. In the original work [9], this value was taken from [50] to be 20.02 eV. However, this value was only an estimate due to the non-existent ionization potential of a proton. The electronegativity function applied for hydrogen would in contrast lead to a value of 12.85 eV because the authors argued that the

151 4.5 Calibration with DFT/NPA Charges Chapter 4

slope around the uncharged state should be less steep than would be required for reaching a value of 20.02 eV for the cation. Eventually, this value can not be determined on a pure rational basis and must be seen as a scaling factor for the charge transfer involving hydrogen atoms. NO et al. derived in their modification of the PEOE method a value for the proton electronegativity of 9.32 eV [55]. For the calibration to NPA charges the value of 12.85 eV was applied for the following reasons. As has been shown above the C – H bond polarity is much stronger in NPA compared to PEOE. Therefore, a smaller value for χH+ allows a larger charge shift. Secondly, this value is then consistent with the electronegativity function for hydrogen. Finally, in preliminary studies it was found that several bond polarities obtained by NPA could not be reproduced by PEOE with the Hinze/Jaffé parameters for electronegativity. The natural charges for compounds containing silicon, phosphorus, and sulfur indicate that the electronegativities of these elements should be less than those applied in the original PEOE scheme. For instance, while the hydrogen atoms in silane are negatively charged in NPA, the PEOE charge is close to zero (see also Section 4.2.5). The NPA gives negative carbon charges in C – S bonds while the PEOE charges give a reverse polarity. Even though one may argue for one or the other polarity, it is clear that for the task of a calibration all bond polarities must necessarily be in accordance with the electronegativity relationships. From the outline of the algorithm, charge can not be transferred against a contrasting electronegativity difference. Therefore, the electronegativities for silicon, phosphorus, and sulfur had to be adapted (in contrast to the initial intention to keep the electronegativity parameters as derived from the literature). The question was how to achieve this adaption. A modification of all parameters obtained by the procedure described above would not be feasible as there are too many (i.e., twelve per element). NO et al. also faced the problem of a large number of adjustable parameters in their optimization. In their modification only a set of electronegativity functions for a fixed number of hybridization states was applied, analogous to the original PEOE approach, and they decided to drop the quadratic term as this term is of minor importance. In the present work a different approach was found. The electronegativity can also be modified by varying the hybridization state. A higher p-orbital contribution results in a decreased electronegativity. Even though such a modification is questionable from a physical point of view, a – somewhat artificial – modification of the hybridization is practically more attractive

152 Chapter 4 4.5 Calibration with DFT/NPA Charges

because then only one parameter has to be determined per element: a correction increment to be added to the actual hybridization found. In the course of the calibration process, a correction of +30% p-contribution seemed to be the best choice for the elements silicon, phosphorus and sulfur. A further correction had to be applied to carbon. The natural charges on carbon in hydrocarbons required an enhanced electronegativity. This was achieved by decreasing the p-contribution by a correction increment of -15%. For a comparison of the effective electronegativities obtained by the modifications, elec- tronegativity values for the elements under consideration were calculated for the uncharged states and are given in Tab. 4.11. As can be seen from these values, the order required by the natural charges is obtained. Table. 4.11 Electronegativity values for the elements studied after applying adjustments for carbon, silicon, phosphorus, and sulfur. The hybridization states (% p-contribution) are given in parenthesis and were taken from the natural bond orbital analysis for the binary hydrogen compounds of the ele- ments. The correction increments given in the text had been applied before calculating the elec- tronegativity values.

electronegativity (eV) electronegativity (eV) element adapted orig. element adapted orig. hydrogen 7.17 7.17 silicon 6.11 7.30 (67%) carbon 9.32 7.98 (75%) phosphorus 6.97 8.91 (83%) nitrogen 11.68 11.71 (74%) sulfur 7.92 9.18 (85%) oxygen 14.99 13.88 (76%) chlorine 10.81 10.71 (86%) fluorine 16.12 16.19 (79%) bromine 9.60 9.59 (88%)

4.5.2.2 Negative Hyperconjugation

The concept of hyperconjugation describes the effect of orbital interactions between a C – H σ-bond and a π-electron system or a vacant orbital. Since its first formulation [197] it has become a well-established concept. The interaction of a free electron pair with a C – X

σ∗-antibonding orbital is termed negative hyperconjugation but this concept has been dis- cussed more controversially (see, e.g, [198–200], experimental evidence for that effect is put forward in [201]). It was not the aim of the present work to study this effect but it was found that natural

population analysis quantifies the phenomenon of n–σ∗ interactions. This posed a problem

153 4.5 Calibration with DFT/NPA Charges Chapter 4

because the PEOE algorithm is not able to reproduce such interactions. In order to illustrate the situation, Fig. 4.21 shows mono-, di-, and trifluoromethane and the corresponding natural charges. The hydrogen charges in methane are 0.1927 in comparison.

Fig. 4.21 Effect of negative hyperconjugation shown for successive fluorination of methane resulting in de- creasing charge on hydrogen.

At first glance, it is counter-intuitive that the charges on hydrogen decrease with in- creasing substitution by fluorine. PEOE expresses what would be expected from chemical intuitution: each fluorine atom withdraws electron density from the central carbon atom. With an increasing positive charge, the carbon atom becomes more electronegative resulting in an increasing positive charge on the adjacent hydrogen atoms (0.058, 0.098, and 0.141 for the considered series of fluoromethanes). NPA gives the reverse order. The reason for that might be seen in electrostatic interactions. The central carbon atom becomes increasingly more electron deficient with each fluorine atom. An increasing positive charge on the adja- cent hydrogen atoms would be unfavorable and thus the charge on hydrogen is decreased by donation of a fluorine lone pair. This effect of back-donation has been found in a general manner for the electronegative elements of the first-row having non-conjugated free electrons. For illustration, Tab. 4.12 gives the natural charges on hydrogen atoms of several methane derivatives with such sub- stituents. However one may consider this effect, in the current context one has to deal with it as the natural charges are used as target property for the calibration. Again, we face a limitation of the PEOE algorithm as here no orbitals are considered but only topological informations. Thus, a further modification had to be introduced. As PEOE is an iterative procedure, the effect can readily be modeled by back-donation of charge from a heteroatom after each cycle of electronegativity equalization. This back-donation is applied if a nitrogen, oxygen, or

154 Chapter 4 4.5 Calibration with DFT/NPA Charges

Table. 4.12 Natural charges on the methyl hydrogen atoms of various methane derivatives, CH3X. The effect of negative hyperconjugation from lone pairs of the electronegative substituents to the geminal hydrogen atoms is demonstrated by a decreasing charge.

X = H1 H2 H3

– CH3 0.1927 0.1927 0.1927

– NH2 0.1836 0.1544 0.1836

– NHCH3 0.1834 0.1537 0.1921

– N(CH3)2 0.1919 0.1574 0.1919 – OH 0.1528 0.1528 0.1773

– OCH3 0.1548 0.1548 0.1851 – F 0.1579 0.1579 0.1579

fluorine atom with a non-conjugated lone pair is geminal to one or more C – H bonds. As can be seen from Tab. 4.12, the charge donation is dependent on geometrical constraints and not equal for all hydrogen atoms. Such differences can not be modeled with a topological method and therefore a distinct amount of charge, qb.d. (b.d. = back donation), is distributed equally to all potential acceptor atoms (Eq. 4.30).

q = q f f f α (4.30) b.d. X · LP · X · b.d.,X

The amount of back-donation is set to be proportional to the actual partial charge of the donating atom. The more negative an atom is the higher is its tendency of releasing charge.

Secondly, qb.d. was found to be dependent on the number of lone pair that can take effect. From fluoroethane (not shown) it is seen that the hydrogen atoms geminal to the fluorine atom achieve basically the same charge value as in fluoromethane. In the first case, two lone pairs and in the latter case all three lone pairs can donate, as in fluoroethane two C – H- antibonding orbitals are available and three in fluoromethane. Therefore, the second factor, fLP , in Eq. 4.30 is either the number of lone pairs if at least the same number of acceptor groups is available or it is the number of acceptor groups. Finally, the third factor, fX , in Eq. 4.30 gives a weight that indicates how effective a back-donation might be. These factors have been derived manually in order to achieve an optimal fit between PEOE and

NPA charges for the considered compounds. The default value is 1.0. Factor fX is set to 2.0 for planar oxygen (e.g., for carbonyl oxygen) that donates more efficiently and to 0.3

155 4.5 Calibration with DFT/NPA Charges Chapter 4

for divalent oxygen with one lone pair that is conjugated. If a fluorine atom is conjugated the effect is also found to be weaker (fX = 0.3). Finally, for nitrogen atoms adjacent to one acceptor hydrogen atom the donation is also less efficient and fX is set to 0.5 here.

As with the charge flow due to electronegativity equalization, qb.d. is damped by the num- ber of the current iteration step, α, in order to achieve convergence. Here, one damping factor per donating atom type is introduced, i.e., three additional parameters had to be included in the calibration. Fig. 4.22 (p. 157) shows plots of the resulting fittings obtained by (a) ignoring or (b) including the effect of negative hyperconjugation as described above. A subset of the cali- bration dataset (see Section 4.2.2) was used here consisting of compounds having a nitrogen, oxygen, or fluorine atom attached to at least one Csp3 – H bond. Including this effect im- proves the correlation significantly. In particular the charges for hydrogen atoms that show the reverse trend between MPEOE and NPA charges in plot Fig. 4.22 (a) are well predicted in Fig. 4.22 (b). Furthermore, natural charges for carbon atoms are better reproduced. Fig. 4.23 (p. 158) gives an overview of the fittings obtained for all saturated compounds in the calibration dataset shown for first-row elements (a) and second-row elements and bromine (b). The agreement between the NPA charges and the fitted MPEOE charges is very good for both subdatasets. Only several carbon atoms in the subset of second-row elements and bromine deviate more strongly. These carbon atoms are bound to chlorine or bromine atoms. This will be discussed in more detail below. All parameters for the modified PEOE procedure obtained by the calibration and a more detailed presentation of the results will be given in context with the overall model in Section 4.5.4.

156 Chapter 4 4.5 Calibration with DFT/NPA Charges

1.0 H C N O F 0.5 0.0 MPEOE charges

−0.5 y = 1.002 * x + 0 n = 321 r2 = 0.98 σ = 0.05

−0.5 0.0 0.5 1.0

(a) DFT Natural charges

1.0 H C N O F 0.5 0.0 MPEOE charges

−0.5 y = 0.998 * x + 0 n = 321 r2 = 0.995 σ = 0.024

−0.5 0.0 0.5 1.0

(b) DFT Natural charges

Fig. 4.22 Correlation of MPEOE charges with NPA charges, (a) neglecting the effect of negative hypercon- jugation into account, (b) with back-donation. The dataset used was a subset of compounds with

nitrogen, oxygen or fluorine atoms exerting a n-σ∗-interaction.

157 4.5 Calibration with DFT/NPA Charges Chapter 4

1.0 H C N O F 0.5 0.0 MPEOE charges −0.5

y = 1.011 * x + 0 n = 828 r2 = 0.996 σ = 0.022 −1.0

−1.0 −0.5 0.0 0.5 1.0

(a) DFT Natural charges

H C 2.0 N O F

1.5 Si P S Cl 1.0 Br 0.5 MPEOE charges 0.0

−0.5 y = 0.939 * x + 0 n = 805 r2 = 0.985

−1.0 σ = 0.052

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

(b) DFT Natural charges

Fig. 4.23 Correlation of MPEOE charges obtained by calibration to DFT/NPA charges with the target values. (a) comprises only a subset of saturated compounds with elements of the first-row. (b) refers to a subset for representing saturated compounds with elements of the second-row and bromine.

158 Chapter 4 4.5 Calibration with DFT/NPA Charges

4.5.3 Modified HMO

In order to estimate the inductive effect exerted by polar groups on the π-charge distribution some trifluoromethylalkenes were used as model compounds in Section 4.2.1.2. For quanti- fying this effect a descriptor was suggested based on the residual σ-electronegativities of the first sphere of neighbors (Eq. 4.19, p. 101).

For determining the weighting factor, fσ π, it was focused again on the trifluoromethyl − group attached to simple π-systems (see Section 4.2.1.2). Three compounds, 3,3,3-trifluoro- propene, 3,3,3-trifluoro-2-(trifluoromethyl)propene, and (trifluoromethyl)benzene, were cho- sen as model compounds. From the previous calibration of the MPEOE procedure σ-charges with very good agreement with NPA charges could be obtained. It was assumed that with the same model the σ-charge distribution in unsaturated compounds can be obtained with the same quality. If the inductive effect was not significant the π-charges should be zero for the symmetrical π-electron systems of the model compounds and MPEOE σ- and NPA charges should coincide. Otherwise potential differences between NPA charges and MPEOE σ-charges should reflect an unsymmetrical π-charge distribution resulting from the inductive effect on the π-systems. 2 Since the three model compounds have π-systems including only C-sp centers, the fσ π − parameter could be determined without taking any other parameters into account simultane- ously. Fig. 4.24 shows the results for both, fσ π = 0, indicating a complete neglect of − σ-π interactions and the optimal value found, fσ π = 0.029. −

159 4.5 Calibration with DFT/NPA Charges Chapter 4

Fig. 4.24 Influence of the inductive effect on the π-charge distribution of 3,3,3-trifluoropropene, 3,3,3-tri- fluoro-2-(trifluoromethyl)propene, and (trifluoromethyl)benzene. The plot in the background (a) gives the correlation for all atoms, the cutout in the foreground (b) shows carbon π-centers only. Points represented by squares refer to π-charges obtained by unmodified HMO. The starred points

give the modified HMO charges obtained with an optimized value for fσ π = 0.029 (see Eq. 4.19, − p. 101).

As can be seen in Fig. 4.24 (a), the overall correlation is quite good. Nevertheless, the charges of the π-centers are deviating significantly when σ-π interactions are neglected (square symbols). These deviations vanish indeed nearly totally when the suggested modifi- cation is applied (Fig. 4.24 (b)). Tab. 4.13 gives the single charge values for the π-centers of

160 Chapter 4 4.5 Calibration with DFT/NPA Charges

Table. 4.13 Natural and MPEOE/MHMO(NPA) charges for the compounds used for quantification of the in-

ductive effect. The π-charges were obtained with fσ π = 0.029 (see Eq. 4.19, p. 101). The atomic − labels for the ethene derivatives refer to the skeleton X – C1 = C2.

MPEOE/MHMO(NPA) charges

atom qNP A qtot qσ qπ 3,3,3-trifluoropropene C1 0.290 0.290 0.210 0.080 − − − − C2 0.308 0.296 0.376 0.080 − − − 3,3,3-trifluoro-2-(trifluoromethyl)propene C1 0.225 0.200 0.039 0.160 − − − − C2 0.253 0.213 0.373 0.160 − − − (trifluoromethyl)benzene ipso 0.150 0.088 0.023 0.065 − − − − ortho 0.175 0.170 0.191 0.021 − − − meta 0.197 0.192 0.194 0.002 − − − para 0.184 0.176 0.194 0.018 − − − C (ethene) 0.369 0.379 0.379 0.000 − − − C (benzene) 0.205 0.194 0.194 0.000 − − −

the compounds under consideration. For 3,3,3-trifluoropropene the agreement is nearly perfect. The trifluoromethyl group is polarizing the π-system in a way that the electron deficiency in the σ-charge of C1 (compared to unsubstituted ethene) is compensated by a π-charge-shift from C2. For 3,3,3-trifluoro-2- (trifluoromethyl)propene this effect is less well captured. While C1 would require a stronger charge shift C2 releases slightly too much charge. A very encouraging situation is found for (trifluoromethyl)benzene. As can be seen from the natural charges all positions in the benzene ring are deactivated and the charge pattern typically found for M-substituents is observed. With the proposed modification of the − HMO both effects are well predicted.

fσ π has been derived for a very small dataset of compounds where the polar substituent − has no direct interaction with the π-system. In order to demonstrate the general validity of our approach Tab. 4.14 gives the charges for several compounds with substituents being part of the π-system for both, modified (fσ π = 0.029) and unmodified HMO (fσ π = 0). For − − all compounds shown it is found that the proposed modification improves the quality of the 161 4.5 Calibration with DFT/NPA Charges Chapter 4

Table. 4.14 Charges for a set of unsaturated compounds obtained by NPA and MPEOE/MHMO(NPA). π-charges are calculated with or without accounting for the inductive effect on the π-charge distri-

bution. A factor of fσ π = 0.029 (see Eq. 4.19, p. 101) as obtained for the trifluoromethylalkenes − is applied for weighting the inductive effect (see text for details). The atomic labels for the ethene derivatives refer to the skeleton X – C1 = C2.

MPEOE/MHMO(NPA) charges

fσ π = 0.029 fσ π = 0 − −

atom qNP A qtot qπ qtot qπ qσ hydroxyethene C1 0.173 0.180 0.084 0.270 0.174 0.097 C2 0.529 0.527 0.156 0.618 0.248 0.370 − − − − − − O 0.684 0.683 0.073 0.681 0.074 0.756 − − − − fluoroethene C1 0.257 0.250 0.028 0.356 0.133 0.222 C2 0.478 0.450 0.082 0.558 0.189 0.369 − − − − − − F 0.363 0.363 0.054 0.361 0.056 0.417 − − − − nitroethene C1 0.082 0.075 0.092 0.016 0.000 0.016 − − − C2 0.298 0.281 0.092 0.372 0.000 0.372 − − − − N 0.465 0.571 0.471 0.578 0.478 0.100 O 0.374 0.396 0.235 0.400 0.239 0.161 − − − − − − phenole ipso 0.316 0.306 0.052 0.383 0.129 0.254 ortho 0.254 0.243 0.057 0.262 0.077 0.186 − − − − − − meta 0.183 0.182 0.013 0.190 0.004 0.194 − − − − para 0.237 0.223 0.029 0.245 0.052 0.194 − − − − − − O 0.681 0.686 0.066 0.685 0.067 0.752 − − − − total charges obtained by MPEOE/MHMO(NPA) significantly. A second modification has been proposed in Section 4.2.1.3 for taking hyperconjuga- tion into account. In order to determine the optimal value for the corresponding weighting factor, fhyp, all unsaturated hydrocarbons were extracted from the calibration dataset (Sec- tion 4.2.2). For unsaturated hydrocarbons only C-sp2 and C-sp centers are to be considered in the MHMO treatment having defined values for αi,0 and βij. Therefore, the optimal value for fhyp could be determined by varying only this parameter. The value for fσ π was not −

162 Chapter 4 4.5 Calibration with DFT/NPA Charges

varied and taken as found before. The optimal value for fhyp was found to be 0.178. Tab. 4.15 gives the charges obtained for several unsaturated compounds having a po- tential hyperconjugative group attached to the π-system (the modification for the inductive effect has only a minor impact on the π-charge values obtained).

Table. 4.15 Natural and MPEOE/MHMO(NPA) charges for some unsaturated compounds with alkyl groups

attached to a π-system. The π-charges were obtained with fhyp = 0.178 (see Eq. 4.20, p. 101). The atomic labels for the ethene derivatives refer to the skeleton X – C1 = C2.

MPEOE/MHMO(NPA) charges

atom qNP A qtot qσ qπ propene C1 0.172 0.199 0.241 0.042 − − − C2 0.393 0.418 0.376 0.042 − − − − C (methyl group) 0.613 0.547 0.547 – − − − 1-butene C1 0.168 0.217 0.237 0.020 − − − C2 0.388 0.396 0.376 0.020 − − − − C (methylen group) 0.422 0.379 0.379 – − − − 2-methylpropene C1 0.006 0.017 0.100 0.083 − − − C2 0.404 0.457 0.374 0.083 − − − − C (methyl groups) 0.602 0.544 0.544 – − − − toluene ipso 0.032 0.022 0.054 0.032 − − − ortho 0.207 0.208 0.191 0.016 − − − − meta 0.196 0.190 0.194 0.004 − − − para 0.214 0.201 0.194 0.007 − − − − C (methyl group) 0.590 0.543 0.543 – − − − C (methyl group in propane) 0.575 0.582 0.582 – − − − C (methylene group in butane) 0.385 0.415 0.415 – − − −

First of all, it is observed that natural charges for carbon atoms in alkyl groups attached to unsaturated systems are more negative compared to the saturated counterpart. This is astonishing because the alkyl carbon atom is bound to a C-sp2-center in the first case and to a sp3-carbon atom in the latter case. Thus, the MPEOE charges give the reverse trend,

163 4.5 Calibration with DFT/NPA Charges Chapter 4

i.e., a less negative alkyl carbon charge in the unsaturated compound. The crucial point is that the MHMO π-charges are leveling the MPEOE σ-charges basically in a correct manner, albeit the agreement of the total charges with the natural charges is not as good as before. Nevertheless, the π-charges for the substituted ethene derivatives give the expected effect of an increased π-charge density on the β-carbon atoms. The charges for toluene reflect the known reactivity of the carbon centers. ortho- and para-position are slightly activated while the meta-position is slightly deactivated. Only the higher charge density of the para-position compared to the ortho-position as found for the natural charges is not reflected by the MPEOE/MHMO charges. Fig. 4.25 gives the best fittings of MHMO charges to NPA charges obtained by finding the optimal set of Hückel parameters keeping the MPEOE parameters and the parameters fσ π and fhyp as found in the previous steps. The subsets of elements of the first-row only − (a) and elements of the second-row and bromine (b) give correlations similar in quality. The fitting is not as good as obtained for the MPEOE charges. This is not surprising as in conjugated π-electron systems electron delocalization over larger distances occur and the charge distribution in π-systems is generally more difficult to capture. This will be discussed in Section 4.5.4 along with a detailed presentation of the parameters obtained.

164 Chapter 4 4.5 Calibration with DFT/NPA Charges

H

1.0 C N O F 0.5 0.0 MPEOE/MHMO charges

y = 0.95 * x + 0 −0.5 n = 1681 r2 = 0.981 σ = 0.043

−0.5 0.0 0.5 1.0

(a) DFT Natural charges

H C N O 2 Si P S Cl Br 1 MPEOE/MHMO charges 0

y = 0.998 * x + 0 n = 949 r2 = 0.989

−1 σ = 0.058

−1 0 1 2

(b) DFT Natural charges

Fig. 4.25 Correlation of MHMO/MPEOE charges obtained by calibration to NPA charges with the fitting val- ues. (a) comprises only a subset with unsaturated compounds with elements of the first-row. (b) refers to a subset for representing unsaturated compounds with elements of the second-row and bromine.

165 4.5 Calibration with DFT/NPA Charges Chapter 4

4.5.4 Results of the Parametrization

In this section, the entire process of calibrating the combined MPEOE/MHMO procedure to NPA charges will be summarized and the results and the obtained parameters will be presented and discussed in more detail. As with the calibration to molecular dipole moments the parametrization was conducted in two main steps. With a subset consisting of all saturated compounds the MPEOE pa- rameters were determined. Based on these values, the parameters for the modified Hückel calculation were derived in the second step. This approach could only succeed because the damping factors for the PEOE scheme were considered to be independent of the hy- bridization state of the bound atoms. For instance, the damping factor for a Csp3 – Osp3 -,

Csp2 – Osp2 -, or Csp2 – Osp3 -bond is taken to be the same value (f C – O) and was derived for the saturated systems only. Changes in the σ-charge-distribution in unsaturated systems are expected to result from different electronegativities only due to changes in the hybridization. In Tab. 4.16 all parameters obtained for the calibration of modified PEOE with natural charges are listed. As can be seen several bond types, however, take the hybridization into account. These are solely bonds involving hydrogen atoms. The reason for this is that the hydrogen natural charges does not change as expected with hybridization of the atom bound. For instance, the hydrogen natural charges for the series ethane, ethene, and ethyne change from 0.193 to 0.185 to 0.225, respectively. While ethane and ethyne show the expected trend, ethene does not. A similar situation was found for hydrogen atoms bound to nitrogen atoms. The natural charges drop slightly for acetaldimine (qH = 0.328) compared to ethylamine (qH = 0.352). A rational reasoning for that behavior could not be found neither in the literature nor by own investigations (e.g., a n-σ∗ interaction as found for the electronegative elements of the first- row is not operative here). However, parameters for bond types involving hydrogen atoms were required for a fitting taking the hybridization of the atom bound into account. The bond types involving unsaturated atoms had to be optimized simultaneously with the Hückel parameters. The question might be raised if the parameters obtained can be interpreted. Do they have any physical relevance? From the initial outline of the PEOE algorithm it is clear that the damping approach simulates the development of an electrostatic field throughout the progressive charge transfer and the resulting resistance against further charge shift. This

166 Chapter 4 4.5 Calibration with DFT/NPA Charges

Table. 4.16 Parameters obtained by calibration of the modified PEOE scheme with DFT/NPA charges. The b.d subscript refers to the back-donation modification as described in Section 4.5.2.2. Some pa- rameters have been put into parenthesis that refer to only one compound, namely HF, HCl, and HBr. bond type fij bond type fij atom type fb.d. dflt. 0.190 C – S 0.307 Nb.d. 0.151

Csp3 – H 0.255 C – Cl 0.197 Ob.d. 0.138

Nsp3 – H 0.238 C – Br 0.189 Fb.d. 0.168

Osp3 – H 0.158 N – O 0.203

Sisp3 – H 0.343 N – P 0.363

Psp3 – H 0.117 N – S 0.340

Ssp3 – H 0.394 O – Si 0.187

Csp2 – H 0.176 O – P 0.239

Nsp2 – H 0.160 O – S 0.209

Psp2 – H 0.051 Si – F 0.205

Csp – H 0.139 Si – Cl 0.172 PV – H 0.102 P – F 0.249 C – N 0.294 P – Cl 0.172 C – O 0.216 (H – F 0.206) C – F 0.294 (H – Cl 0.206) C – Si 0.433 (H – Br 0.194) C – P 0.396 basic meaning of the damping approach is not affected by the modification of introducing multiple damping factors. Thinking further in the given physical background a larger value of a damping factor for a given bond would mean that the electrostatic field would develop more slowly when charges are shifted between the two atoms. Or the other way round, in higher iteration steps more charge is shifted when the damping factor is large indicating that the electrostatic field is low. However, by studying Tab. 4.16, no clear trends or patterns can be found. Only the tendency of larger values for bond types involving second-row elements can be observed. This could be interpreted such that a positive partial charge on a relatively large and diffuse atom is less effective in generating an electrostatic field and therefore a charge transfer is more easily achieved. On the other hand, varying damping factors are more likely due to the adjustment of the electronegativity functions of silicon, phosphorus and sulfur (see Sec-

167 4.5 Calibration with DFT/NPA Charges Chapter 4

tion 4.5.2.1). The damping factors rather seem to play the role of a fine adjustment for the fitting as the corrections for the electronegativities were just estimations. If these corrections had arbitrarily been set to different values the corresponding damping factors would have changed also. Under these premises a physical interpretation of the damping factors is ques- tionable. The most reasonable way of interpreting the multiple damping factors is probably that they serve as corrections for the electronegativity functions in order to improve the fit of the resulting charges to the reference values. In such a manner the multiple damping factors have no other meaning than introducing more adjustable parameters to the system. From this point of view, the only conclusion that can be drawn from Tab. 4.16 is that the electronegativity of oxygen seems to be too high compared to nitrogen and fluorine. All parameters containing oxygen are consistently smaller compared to the nitrogen or fluorine containing parameters.

In an intermediate step the two parameters, fσ π and fhyp, for the modifications to ac- − count for inductive and hyperconjugative effects were determined. This could be done in a separate step prior to the actual parametrization of the Hückel calculation because only the parameters for carbon, αC and βCC , were required. These are defined to have a value 0.0 for the former parameter and 1.0 for the latter one, therefore, no other parameters had to be known. Details about these modifications have been given above. In the second main step of the calibration, the parameters for the Hückel calculation were determined. All parameters found in the previous steps were kept constant here. Tab. 4.17 gives the Hückel parameters obtained. Also included into Tab. 4.17 are the results from the fitting to molecular dipole moments. As mentioned in Section 4.4.2 these parameters are listed here in order to allow a comparison of both schemes. For a comparison of the Hückel parameters one has to be keep in mind that the π-charges for the fitting to NPA charges were scaled by factor of 4.0 (see Section 4.5.1). Hence, as such a scaling is not employed for the fitting to dipole moments, a direct comparison has to be taken with some reservation. Nevertheless, some general trends can be observed and some general remarks can be made. First of all, it is striking that for the NPA scheme less parameters were required. For the dipole-fitted parametrization different parameters were required for several atom and bond types that are formally equal but in different valence states. The NPA-fitting went without such distinctions. One may consider for instance the αi,0 value for sulfur atoms contributing

168 Chapter 4 4.5 Calibration with DFT/NPA Charges

Table. 4.17 Hückel parameters obtained from the calibration to dipole moments and DFT/NPA charges.

Coulomb integrals, α0, for the atom types considered are shown in the upper part. Resonance

integrals, βij , for the bond types considered are to be found in the lower part.

αi,0 α0 atom type µ NPA atom type µ NPA . .. C. 0.00 0.00 P. 2.55 1.55 N 0.34 0.08 P(V ) 0.03 – .. . − N 1.15 3.87 S 0.13 0.02 .. .. − N. nitro 2.08 – S. 2.87 2.50 ..O 0.14 0.14 ..S(IV ) 0.03 – ..O 2.10 4.95 ..S(V I) 1.18 – F. 3.03 4.99 ..Cl 1.12 2.73 P 0.10 0.13 Br 1.10 0.00 − βij βij bond type µ NPA bond type µ NPA

. . . .. a C. – C. 1.00 1.00 C. – Br. 0.29 0.10 C. – ..N 1.19 1.59 N. – ..N 0.68 3.90 C. – ..N 0.68 0.76 ..N – ..N 0.67 0.92 C. – N. nitro 0.85 0.01 N. – ..N 1.20 0.01 C. – ..O 1.08 1.89 ..N – ..O 0.84 0.94 C. – ..O 0.65 0.80 ..N – O. 1.19 0.01 C. – F. 0.63 0.70 N. – O. nitro 1.19 0.71 C. – ..P 0.66 1.33 ..N – P. 0.63 2.98 C. – P. 0.34 0.10 ..N – ..P(V ) 0.29 0.01 C. – P.(V ) 0.09 – N. – S.(V I) 0.12 0.41 C. – ..S 0.72 0.93 ..O – P.(V ) 0.77 1.90 C. – S. 0.48 0.40 O. – P.(V ) 0.39 0.45 C. – ..S(IV ) 0.10 – O. – ..S(IV ) 0.30 0.72 C. – ..S(V I) 0.48 – ..O – ..S(V I) 0.52 0.66 C – Cl 0.29a 0.31 O – S(V I) 0.10 0.10

a taken from ABRAHAM and SMITH [132]

169 4.5 Calibration with DFT/NPA Charges Chapter 4

two π-electrons. The same value obtained for mercapto groups can be used for sulfur atoms in hypervalent state as in sulfon groups considering the fitting to NPA charges. This value .. .. differs significantly in the dipole-fitted scheme (2.87 for S versus 1.18 for S(V I)). This may be interpreted such that the σ-charge distribution for hypervalent atom types is not very well described in the dipole-fitted version. The π-charges may include a contribution to level the total charge in order to achieve the correct dipole moment. The only such parameter that differs in the NPA-fitting is the βij value for nitrogen atoms in nitro groups attached to a carbon π-center. This value differs from the one found for nitrogen atoms donating a free electron pair such as in amino groups (0.76 versus 0.01). The β -parameters for the interaction of two lone pairs differ significantly in the both ij .. .. schemes (e.g., 1.20 against 0.01 for N – N in the dipole-fitted and the NPA-fitted parametriza- tion, respectively). While the dipole-fitting gives a strong interaction the NPA-fitting indi- cates a negligible interaction. In contrast, for double bonds with both atoms contributing one electron, the NPA-fitting gives higher values for β than the dipole-fitting. Furthermore, the .. ..ij dipole-fitting gives for the β -parameter of N N a higher value for the interaction of the ij – . . two lone pairs compared with the interaction of two single electrons in N – N (0.68 against 1.20, respectively). In these cases, the NPA-fitting seems to give the more reasonable pa- rameters because the interaction of two single electrons should be higher than for two lone pairs.

What is more striking in the dipole-fitted parametrization is the relative order of the αi,0- parameters. The values for the elements of the first-row are reasonable (1.15, 2.10, and 3.03 ...... for N, O, and F, respectively) as they correspond to the trend in electronegativity. While ...... the value for S is slightly higher than the value for P it drops for Cl and Br (2.55, 2.87, .. .. 1.12, and 1.10 for this series of atoms). Furthermore, the values for P and S are larger .. .. than for N and O. This may be due to the problems of the PEOE charges in reproducing dipole moments accurately (see Section 4.4.1). In general, the Hückel parameters obtained by fitting to NPA charges give a more reasonable impression than the parameters from the dipole-fitting.

170 Chapter 4 4.5 Calibration with DFT/NPA Charges

A summary of the statistical parameters obtained for the correlations between the fitted MPEOE/MHMO(NPA) charges with the target natural charges is given in Tab. 4.18 for both types of subdatasets, with saturated and unsaturated compounds only. The corresponding rows are labeled with σ and π for brevity. Plots with the correlations for the entire calibration and the test datasets are given in Fig. 4.26 (additionally, separate plots for the two test datasets can be found in the Appendix, Fig. A.2, p. 192).

Table. 4.18 Results for the parametrization of the MPEOE/MHMO procedure with DFT/NPA charges. Given are the statistical measures obtained from the linear correlation of MPEOE/MHMO with natural charges. Each measure is given for both, a corresponding subset of saturated and unsaturated compounds for each compound class. The former is denoted by σ the latter by π.

no. r2 σ (std. dev.) b (slope)

compound class σa πb σa πb σa πb σa πb calibration data hydrocarbons 253 361 0.997 0.988 0.016 0.026 1.029 0.946 nitrogen 158 498 0.997 0.971 0.018 0.052 1.000 0.947 oxygen 236 488 0.994 0.983 0.025 0.046 1.001 0.932 fluorine 122 175 0.995 0.992 0.027 0.028 1.000 0.978 silicon 127 24 0.989 0.972 0.070 0.075 0.931 1.269 phosphorus 188 353 0.988 0.991 0.051 0.065 0.920 1.003 sulfur 193 361 0.995 0.991 0.025 0.052 0.981 0.996 chlorine 182 131 0.956 0.984 0.057 0.027 0.962 0.911 bromine 115 80 0.968 0.971 0.051 0.036 0.959 0.857 all 1st row 828 1681 0.996 0.981 0.022 0.043 1.011 0.950 all post 1st row 805 949 0.985 0.989 0.052 0.058 0.939 0.998 all calibration data 1633 2630 0.988 0.985 0.042 0.050 0.965 0.980 test data test dataset I – 425 – 0.989 – 0.047 – 0.957 test dataset II 74 326 0.970 0.966 0.070 0.058 1.076 0.983 all test data 74 751 0.970 0.983 0.070 0.052 1.076 0.964 a refers to saturated compounds in the corrsp. dataset b refers to unsaturated compounds in the corrsp. dataset

171 4.5 Calibration with DFT/NPA Charges Chapter 4

H C N O 2 F Si P S Cl Br 1 MPEOE/MHMO charges 0

y = 0.974 * x + 0 n = 4263 r2 = 0.986 −1 σ = 0.047

−1 0 1 2

(a) DFT Natural charges

H C

1.0 N O F S Cl Br 0.5 0.0 MPEOE/MHMO charges

−0.5 y = 0.972 * x + 0 n = 825 r2 = 0.981 σ = 0.055

−0.5 0.0 0.5 1.0

(b) DFT Natural charges

Fig. 4.26 Correlation of MPEOE/MHMO(NPA) charges obtained by calibration to DFT/NPA charges with the target values. (a) calibration dataset, (b) test dataset.

172 Chapter 4 4.5 Calibration with DFT/NPA Charges

The quality of the fitting is quite good for all considered compound classes for both, MPEOE and MPEOE/MHMO charges. Particularly, the agreement of σ-charges for com- pounds containing only first-row elements is very high. As described above this accuracy is mainly achieved because of accounting for n-σ∗ interactions (Section 4.5.2.2). Nevertheless, for few compound classes the fitting did not succeed as well. As can be seen from the corre- lation coefficients in Tab. 4.18 this is the case for unsaturated nitrogen and silicon containing compounds, and for compounds with chlorine and bromine both for saturated and unsatu- rated compounds. The reason for the poorer correlation of test dataset II is due to the fact that this dataset is focused on polyhalogenated compounds with a high rate of chlorine and bromine atoms. Therefore, the results are comparable with those obtained for the chlorine and bromine subdatasets of the calibration data. For giving an insight in the scope and limitation of the presented charge calculation scheme, the results for the problematic compound classes mentioned above are discussed in more detail in the following paragraphs.

Nitriles Nitriles were identified to cause the problems for calibrating the nitrogen contain- ing compounds. Tab. 4.19 gives the NPA and the fitted MPEOE/MHMO(NPA) charges for some simple compounds containing the cyano group demonstrating the problem. The agreement for hydrogen cyanide is good. The hydrogen charge reflects well the in- ductive effect of the cyano group taking the hydrogen natural charge in ethane as reference (-0.1927). In cyanoethane the central methylene carbon atom is bearing a negative natu- ral charge of -0.4923. This is quite surprising as this value is significantly more negative compared to the methylene carbon charge in propane (-0.3885). Even more pronounced is the effect in dicyanomethane with a natural charge of -0.5781. With the fitted charges, the trend is contrary. In the same series of compounds, the MPEOE/MHMO(NPA) charges rise from -0.4187 to -0.3070 and -0.1954 for propane, cyanoethane and dicyanomethane, respectively. This reflects the trend in substituting successively one hydrogen atom with an electron-withdrawing group. Beside the practical implication of failing in reproducing the natural charges in the parametrization, the question is what might be the reason for the counter-intuitive behav- ior of the NPA and what relevance the obtained charges have. The inductive effect of the cyano group is not reflected at adjacent carbon atoms. However, the hydrogen atoms at- tached to the adjacent carbon atoms become more positive with each cyano group added.

173 4.5 Calibration with DFT/NPA Charges Chapter 4

Table. 4.19 NPA and MPEOE/MHMO(NPA) charges for some simple nitriles. Propane is included for com- parison.

charges atom NPA MPEOE/MHMO(NPA) HCN HCN 0.2201 0.2318 HCN 0.0902 0.1120 HCN 0.3104 0.3438 − − NC – CH2 – CH3 CN 0.2953 0.2185 CN 0.3409 0.3407 − − CH2 0.4923 0.3070 − − CH2 0.2371 0.2118

CH3 0.5611 0.5801 − a − CH3 0.2082 0.1952

NC – CH2 – CN CN 0.2890 0.2148 CN 0.2810 0.3350 − − CH2 0.5781 0.1954 − − CH2 0.2811 0.2179

H3C – CH2 – CH3

CH3 0.5753 0.5820 − a − CH3 0.1945 0.1952

CH2 0.3885 0.4187 − − CH2 0.1861 0.2058

a average over the hydrogen atoms in CH3

This reflects the fact of increasing α-CH acidity pretty well. From that point of view, the nat- ural charge seem reasonable. The puzzling behavior of the carbon charges might be due to electrostatic effects as has been assumed for the fluoroalkanes discussed in Section 4.5.2.2. Coulomb attraction can be maximized when the central carbon atom becomes more negative with positive substituents such as the carbon atom of the cyano group. PEOE can not model such effects. Nevertheless, the increasing α-CH acidity is also reflected by the MPEOE/MHMO(NPA) charges, not by an increasing hydrogen charge but through a less negative carbon atom that can bear a resulting negative charge on breaking the

174 Chapter 4 4.5 Calibration with DFT/NPA Charges

CH bond better compared to the more negative central carbon atom in propane for instance. Furthermore, such a negative charge would be stabilized by resonance with a cyano group. This might implicitly be reflected in the NPA charges but would need to be modeled explicitly with the MPEOE/MHMO(NPA) scheme.

Silicon The poorer results for unsaturated silicon containing compounds is due to the fact that silicon atoms are known to avoid π-bonding. Consequently, the compounds being in- cluded in the subdataset did not contain any π-bond type including silicon directly but only silyl groups attached to π-systems. A fitting was therefore not possible and the results in Tab. 4.18 show actually how well the inductive effect of a silyl group on a π-system is cap- tured by the model. The poorer correlation compared to the other subsets indicates that this effect is not described as well as for the elements of the first-row.

Chlorine and Bromine Chlorine and bromine posed some serious problems. The discus- sion here will concentrate only on chlorine but the situation for bromine is analogous. The discussion will concentrate on some simple chloroalkanes given in Tab. 4.20. First of all, it was found with the QM charge calculation schemes that chlorine atoms in chloroalkanes do only have a slight negative charge or are often even slightly positive (see also Section 4.2.6.4). Even though, one would expect intuitively to find more negative chlorine atoms, all quantum chemical charge calculation schemes considered in Section 4.2.6 agreed in the trend to assign only moderate negative charges and also positive values to chlorine atoms. When considering the compounds in Tab. 4.20, it is found that NPA and MPEOE/- MHMO(NPA) charges agree well for chloromethane. When going from chloro- to trichloro- methane this agreement becomes worse. While MPEOE/MHMO(NPA) charges remain quite constant for the chlorine atoms, the natural charges rise and even change sign from dichloro- to trichloromethane. In parallel, the carbon atom becomes less negatively charged in the MPEOE/MHMO(NPA) scheme while remaining strongly negative in NPA. A further striking observation is demonstrated for 1,1,1-trichloroethane. Even though the chlorine atoms in the trichloro-group are slightly positive, the hydrogen atoms in the adja- cent methyl group reflect well the inductive effect expected for a trichloromethyl-group. The effect of an decreased charge on a carbon atom attached to a polar group as found before is observed here. With the MPEOE/MHMO(NPA) scheme, the inductive effect in 1,1,1-

175 4.5 Calibration with DFT/NPA Charges Chapter 4

Table. 4.20 NPA and MPEOE/MHMO(NPA) charges for some simple chloroalkanes.

charges atom NPA MPEOE/MHMO(NPA)

CH3Cl C 0.5351 0.5171 − − H 0.2049 0.1987 Cl 0.0797 0.0790 − − CH2Cl2 C 0.3868 0.2785 − − H 0.2121 0.2131 Cl 0.0187 0.0739 − − CHCl3 C 0.3159 0.0222 − − H 0.2169 0.2280 Cl 0.0333 0.0686 − CH3CCl3

CCl3 0.1543 0.1789 − CCl3 0.0609 0.0645 − CH3 0.6178 0.5714 − − CH3 0.2306 0.1953

CH3CH3 C 0.5780 0.5854 − − H 0.1927 0.1951 trichloroethane is hardly seen at all compared to the charges in ethane. This is due to the adaption of electronegativities (see Section 4.5.2.1) required to reproduce the small charge separation in C – Cl bonds. Again, we meet here a situation where the NPA seems to give counter-intuitive charge values but can reflect chemically effects correctly. The MPEOE/- MHMO(NPA) scheme can only reproduce the bare charge values and must fail when effects do not rely directly on charge and electronegativity. From this point of view, the MPEOE/- MHMO(NPA) charges should be used with care for chlorine and bromine containing com- pounds.

176 Chapter 4 4.6 Evaluation

4.6 Evaluation

Closing this study, the charges obtained by the proposed calculation scheme should be eval- uated for both parametrizations. In order to show that both calculation schemes give realistic charge values correlations have been performed with experimental ESCA shifts in the next section. As such correlation are only reliable for saturated compounds, an evaluation of π-charge effects is given in Section 4.6.2.

4.6.1 C-1s ESCA Shifts

The meaning of ESCA shifts as experimental property for the evaluation of charge calcu- lation scheme has been discussed in Section 4.2.4. ESCA shifts might be seen as the best measure for testing partial charges. Therefore, a dataset with 89 C-1s ESCA shifts has been compiled from the literature [157, 158] (the dataset with all experimental values along with the calculated charges is given in the Appendix Section B.2). Only sp3 carbon centers have been considered due to the relaxation effects associated with π-electron-systems as discussed in Section 4.2.4. Fig. 4.27 and 4.28 give the obtained correlations, with charges obtained by the fitting to dipole moments and with charges from the fitting to NPA charges, respectively. The variation of the ESCA shifts is well reproduced by both schemes. With the standard PEOE method (a) the correlation is better than with MPEOE(NPA). Nevertheless, the agree- ment is not as good as has been reported for PEOE in the original paper [9]. There, a smaller dataset was used with only 22 data points and the correlation coefficient was r2 = 0.986. In Fig. 4.28 it can be seen that carbon atoms attached to silicon containing substituents constitute a group of points having consistently too negative charge values with respect to the ESCA shifts. Here, the NPA seems to consider silicon atoms being too electropositive.

177 4.6 Evaluation Chapter 4

Fig. 4.27 Correlation of charges with C-1s ESCA chemical shift. Charges were calculated with the standard PEOE scheme. Only sp3 carbon atoms were considered.

4.6.2 π-Charges

As has been stated before, the separation of σ- and π-electron-systems is artificial. Therefore, it is difficult to assess the quality of the obtained π-charges. Nevertheless, chemists are familiar with the effects of substituents on the π-charge distribution in benzene derivatives. Therefore, two small dataset were constructed containing benzene derivatives with common substituents. Tab. 4.21 gives the π-charges of the considered substituent. The strongest charge donation into the ring is expected for the amino group. Due to the rising electronegativity, a hydroxy group should donate less and a fluorine substituent should donate least. This order is well reproduced by both parametrizations with the PEOE/- MHMO(µ) charges having smaller absolute values compared to the MPEOE/MHMO(NPA)

178 Chapter 4 4.6 Evaluation

Fig. 4.28 Correlation of charges with C-1s ESCA chemical shift. Charges were calculated with modified PEOE calibrated with DFT/NPA charges. Only sp3 carbon atoms were considered.

charge values. For the substituents of the second-row and bromine a decreased π-donation is found for both schemes. This is reasonable because the overlap between the more diffuse free electron pairs of the heavier elements should be less strong. This was also reflected in the values for the Hückel resonance integrals for the respecting bond types (see Tab. 4.17, p. 169). It is striking that the π-charge value for the phosphine group is considerably smaller than for all other substituents. This is found for both parametrizations independently. Furthermore, it has to be recalled that the π-charges for chlorine and bromine obtained with the PEOE/- MHMO(µ) scheme have to be taken with care. The parameters for chlorine and bromine could not be determined by the fitting to dipole moments and had to be taken from the literature. This might be the reason that the trend for the phosphine, mercapto, chlorine, and

179 4.6 Evaluation Chapter 4

Table. 4.21 π-charges of various groups attached to a benzene ring. The charges reflect the expected order of π-charge donation by conjugation of a free electron pair.

π-donation X µ-fitted DFT/NPA-fitted

NH2 0.0786 0.0886 OH 0.0369 0.0661 F 0.0211 0.0489

PH2 0.0080 0.0053 SH 0.0135 0.0456 Cl 0.0148 0.0237 Br 0.0151 0.0283 bromine substituent is reverse in the two schemes. It is well-known that substituents attached to a benzene ring influence the reactivity of the positions in the ring. By these charge patterns, the regio-selectivity of the electrophilic aromatic substitution is explained. Tab. 4.22 gives the charge distribution for the ortho-, meta-, and para-positions in benzene derivatives with common substituents. The well-known charge pattern in the ring caused by the substituents are obtained with both parametrizations. For +M-substituents a negative π-charge on the ortho-, and para- position is found. The meta-position is electron-deficient for all substituents. For M- − substituents the ortho-, and para-positions are consistently more electron-deficient compared with the charges on the meta-carbon atoms. Only, the effect of a nitro group in the PEOE/- MHMO(µ) scheme seems questionable. The charges on the ortho-position is much less positive compared with substituents exerting a similar strong M effect such as the cyano − group. This deficiency is not observed in the MPEOE/MHMO(NPA). With the methyl-, trifluoromethyl-, and trichloromethyl-group substituents are included in Tab. 4.22 that are not part of the conjugated π-system. These groups demonstrate the success achieved with the modification of the HMO method for including inductive and hyperconjugative effect. The methyl-group induces negative π-charges in ortho- and para- position as it is expected. The trifluoro- and trichloromethyl-group gave an charge pattern typical for electron-withdrawing substituents. Such effects could not have been modeled with the PEOE/PEPE charge calculation scheme.

180 Chapter 4 4.7 Summary

Table. 4.22 π-charge distribution in substituted benzene derivatives. Shown are common M substituents.  Several groups that are classically considered to have no M-effect are also included in order to show the capability of the modified HMO method for accounting of the inductive effect on π- systems of such substituents.

π-charges ortho meta para subst. µ NPA µ NPA µ NPA NH 0.0427 0.0834 0.0031 0.0104 0.0311 0.0508 2 − − − − OH 0.0263 0.0574 0.0036 0.0126 0.0152 0.0288 − − − − F 0.0167 0.0372 0.0039 0.0143 0.0072 0.0122 − − − − CH 0.0037 0.0162 0.0004 0.0039 0.0021 0.0073 3 − − − − CF3 0.0035 0.0214 0.0018 0.0023 0.0048 0.0177

CCl3 0.0026 0.0093 0.0012 0.0027 0.0034 0.0095 CCH 0.0001 0.0073 0.0014 0.0049 0.0022 0.0105 CN 0.0229 0.0236 0.0011 0.0041 0.0022 0.0022

NO2 0.0029 0.0202 0.0031 0.0070 0.0089 0.0219 CHO 0.0140 0.0331 0.0008 0.0025 0.0138 0.0294

SO2NH2 0.0030 0.0053 0.0023 0.0015 0.0063 0.0078

4.7 Summary

A charge calculation scheme is presented based on the partial equalization of orbital elec- tronegativity (PEOE) for the treatment of the σ-charge distribution. The π-charge distribu- tion is quantified by a modified Hückel MO treatment. This MHMO scheme was developed in order to substitute the former PEPE treatment. The proposed modifications of the HMO calculation were introduced in order to account for inductive and hyperconjugative effects exerted by the σ skeleton on the π-electron system. The combined PEOE/MHMO procedure had to be parameterized using an appropriate target property. The molecular dipole moment and DFT/NPA charges were identified to be the best choices for that task. For the fitting to DFT/NPA charges additional modification had to be introduced. All applied modifications are summarized in Tab. 4.23. For some atom types the fitting posed problems. Nitriles and chlorine-, or bromine- containing compounds gave a poorer agreement compared to the other compound class

181 4.7 Summary Chapter 4

Table. 4.23 Summary of the parameters and applied modifications for the modified PEOE/HMO procedure for both calibration schemes, fitted with dipole moments and DFT/NPA charges, respectively.

parameters µ-fitted NPA-fitted

PEOE

χiν = f(qi, hybi) unmodified revised + corrections of hybridization for C, Si, P, S

χH+ 20.02 eV 12.85 eV

damping factor fdamp = 0.48 fij, bond dependent

neg. hyperconjugation – back-donation after each iteration for N, O, F

HMO

fσ π = 0.029 fσ π = 0.029 ∆α − − fhyp = 0.580 fhyp = 0.178

α0,i optimized optimized

βij optimized optimized

within the MPEOE/MHMO(NPA) scheme. For the fitting to dipole moments it was gen- erally found that PEOE charges for second-row elements and bromine does not give good dipole moments. The fitting of the π-charge within this scheme has to be taken with care therefore. Nevertheless, for both parametrizations good agreements with the target properties was achieved. The relevance of the obtained charges was shown by correlation with ESCA C-1s chemical shifts and a qualitative evaluation of π-charge effects in substituted benzene deriva- tives.

182 Chapter 5

Summary

The aim of the present work was the development of empirical methods for the prediction of properties of organic compounds. The success of finding relationships between struc- tures and properties is dependent on an appropriate structure representation. If a structure is described properly with respect to the property of interest, statistical and pattern recogni- tion methods can be applied in order to find quantitative relationships between structure and properties. In this work, special emphasis was placed on two aspects of the general process of pre- dicting properties by empirical methods. The first aspect was to find an appropriate repre- sentation of the molecular charge distribution by the concept of atomic partial charges. Sec- ondly, from a software development’s point of view the derivation of calculation algorithms and the management of properties and associated data was a central concern. An integrated programming library, MOSES, was designed and implemented by means of modern software development technologies. An advanced management of properties, calculation modules, and associated meta-data was developed. Furthermore, MOSES assists with the development of models for the prediction of properties by a comprehensive set of required functionality such as methods for statistical data analysis, pattern recognition methods, input and output of chemical data and parameter optimization. With two applications, the usability of MOSES was demonstrated. A model for the pre- diction of mean molecular polarizability was developed based on additivity of atomic hybrid contribution by means of multiple linear–regression analysis. Secondly, an application was presented for the generation and management of workflows typically encountered within the domain of property prediction.

183 Chapter 5

The core of the MOSES library, RAMSES, a structure representation of electron systems based on a separation of σ/π electrons, was extended in order to handle hypervalent com- pounds, such as sulfonic acids, correctly. This representation was based on an analysis of hypervalent atom types by means of natural bond orbital analysis. The RAMSES data structure grounds on the basic philosophy incorporated in the Hückel MO theory. Therefore, this data structure was well suited for the development of a new calculation scheme for the quantification of atomic partial charges. A scheme for the calculation of partial charges is presented that grounds on the well- established PEOE algorithm. For the quantification of π-charges, a modified Hückel MO treatment was derived taking into account the inductive effect on the π-charge distribution by the σ-skeleton. It was shown that this modification improves the quality of the calculated π-charges considerably. A second modification was introduced for redistributing π-charge such as to model the hyperconjugative effect. For the charge calculation schemes, a series of parameters had to be optimized. A - ough analysis was performed in order to identify the best way for achieving the parametriza- tion. From a variety of experimental properties related to the charge distribution and quantum mechanical charge calculation scheme, two parametrizations were achieved. One is based on molecular dipole moments and the other on fitting to DFT/NPA charges. For the fitting to DFT/NPA charges the PEOE algorithm had to be modified. For both parametrizations a high agreement for the fitting was obtained. Charges from both parametrizations give a realistic picture of the molecular charge distribution. This was shown by correlations with ESCA chemical shifts and by the evaluation of π-charge-effects in substituted benzene derivatives. Finishing, in Tab. 5.1 the main models for predicting properties obtained in this work are summarized by their statistical parameters.

Table. 5.1 Summary of the models for predicting properties obtained in this work. For the cases when more than one model was obtained only the best one is shown.

statistics property no. r2 σ (std. dev.) b (slope) a (intercept) mean molecular polarizabilities 289 0.994 0.580 0.992 0.120 molecular dipole moments 291 0.944 0.320 1.006 0.046 NPA charges 5088 0.985 0.049 0.974 −0.000 ESCA C-1s shifts 89 0.973 0.559 1.000 0.005

184 Kapitel 6

Zusammenfassung

Ziel der vorliegenden Arbeit war die Entwicklung empirischer Methoden zur Vorhersage von Eigenschaften organischer Verbindungen. Das erfolgreiche Auffinden von Beziehungen zwischen Struktur und Eigenschaft ist dabei entscheidend abhängig von einer angemesse- nen Strukturrepräsentation. Wenn eine solche Repräsentation gefunden wurde, können eine Vielzahl statistischer Methoden und Verfahren der Mustererkennung angewandt werden, um quantitative Beziehungen zwischen Struktur und der betrachteten Eigenschaft abzuleiten. In der vorliegenden Arbeit wurde ein besonderes Augenmerk auf zwei Aspekte der em- pirischen Vorhersage von Eigenschaften gelegt. Ein Schwerpunkt lag auf der Entwicklung einer Methode für die Berechnung von atomaren Teilladungen mit dem Ziel einer möglichst realistischen Darstellung der molekularen Ladungsverteilung. Zum Anderen sollte ein allge- mein anwendbares Software-System entworfen werden, dass den Prozess der Eigenschafts- vorhersage unterstützt und die Möglichkeit bietet, Daten die in diesem Zusammenhang an- fallen, effizient zu verwalten. Mit der integrierten Entwicklungs-Plattform MOSES wurde ein Software-System ent- wickelt, das eine ausgefeilte Verwaltung von Eigenschaften, Berechnungsmodulen und den anfallenden Metadaten bietet. Für die Umsetzung von MOSES wurden aktuelle Techniken der Informatik angewandt, die Grundlage dafür sind, eine einfache Erweiterbarkeit und gu- te Wartbarkeit des Systems zu erreichen. MOSES unterstützt die Entwicklung von neuen empirischen Berechnungsmethoden dadurch, dass eine Vielzahl an Methoden bereit gestellt wird, von der Ein- und Ausgabe chemischer Daten bis zur Optimierung von bestimmten Berechnungsparametern. Zwei konkrete Anwendungen wurden vorgestellt, um die praktische Anwendbarkeit von

185 Kapitel 6

MOSES zu demonstrieren. Erstens wurde ein Modell für die Berechnung von mittleren mo- lekularen Polarisierbarkeiten entwickelt, dass auf der Additivität von atomaren Beiträgen zu molekularen Polarisierbarkeit beruht. Des weiteren wurde eine Anwendung vorgestellt, die das Erstellen und Verwalten von Arbeitsprozessen ermöglicht, die typischerweise im Bereich der Vorhersage von Eigenschaften anfallen. Der zentrale Bestandteil der MOSES-Klassenbibliothek, RAMSES, eine Strukturdarstel- lung von Elektronensystemen basierend auf einer Separation von σ/π-Elektronen, wurde auf die Behandlung von hypervalenten Verbindungen, wie etwa Sulfonsäuren, erweitert. Die RAMSES-Datenstruktur gründet sich auf die selben Annahmen wie sie in der Hückel- MO-Theorie formuliert werden. Diese Datenstruktur war daher besonders geeignet für die Entwicklung einer neuen Methode für die Quantifizierung der Ladungsverteilung. Die vorgestellte Methode für die Berechnung von atomaren Teilladungen greift auf den wohl bekannten PEOE-Algorithmus zurück. Zur Berechnung des π-Ladungsanteils wurde eine Modifikation der etablierten Hückel-MO-Methode vorgeschlagen, die die Berücksich- tigung der induktiven Einflüsse des σ-Gerüstes ermöglicht. Eine weitere Modifikation er- laubte es, den Einfluss hyperkonjugativer Gruppen abzuschätzen. Es konnte gezeigt werden, dass die vorgeschlagenen Modifikationen zu einer deutlich verbesserten Darstellung der π- Ladungsverteilung führen. Für die Ladungsberechnung wurden eine Reihe von Parametern benötigt, deren Opti- malwerte zu bestimmen waren. Anhand welcher Eigenschaft diese Optimierung durchge- führt werden sollte, wurde in einer umfangreichen Analyse ermittelt. Dabei erwiesen sich zwei Möglichkeiten als gangbar: von einer Reihe experimenteller Eigenschaften, die direkt von der Ladungsverteilung abhängig sind, schien das molekulare Dipolmoment am geeig- netsten als Zielgrösse für eine Parameterisierung. Von der Vielzahl an quanten-chemischen Methoden für die Berechnung von Atomladungen erwiesen sich DFT/NPA-Ladungen am besten geeignet. Für die Parameterisierung mit DFT/NPA-Ladungen musste allerdings der PEOE-Algorithmus modifiziert werden. Beide Parameterisierungen ergaben eine gute Über- einstimmung mit den Zielgrössen und es konnte gezeigt werden, dass die berechneten La- dungen eine realistische Darstellung der molekularen Ladungsverteilung geben. Dazu wur- den ESCA-Verschiebungen herangezogen, die mit den σ-Ladungen aus beiden Parameteri- sierungen gut korrelieren. Für eine Bewertung der π-Ladungen konnte gezeigt werden, dass π-Ladungseffekte in substituierten Benzenderivaten richtig widergegeben werden.

186 Kapitel 6

Abschliessend werden die wichtigsten Modelle für die Berechnung von Eigenschaften, die in der vorliegenden Arbeit erhalten wurde, in Tabelle 6.1 zusammengefasst.

Tabelle. 6.1 Zusammenfassung der wichtigsten Modelle für die Vorhersage von Eigenschaften, die in der vorliegenden Arbeit erhalten wurden. Für den Fall, dass für eine Eigenschaft mehrere Modelle vorlagen (z.b. im Falle der ESCA-Verschiebungen) ist nur das jeweils beste Modelle angegeben.

statistische Kenngrössen property nr. r2 σ (Std.-Abw.) b (Steigung) a (y-Achsenabschn.)

mittl. molekulare 289 0.994 0.580 0.992 0.120 Polarisierbarkeiten molekulare 291 0.944 0.320 1.006 0.046 Dipolemomente − NPA Ladungen 5088 0.985 0.049 0.974 0.000 ESCA C-1s 89 0.973 0.559 1.000 0.005 Verschiebungen

187 Kapitel 6

188 Appendix A

Supplementary Material

A.1 To Section 4

Table. A.1 Percentage of failures (in %) for the calculation of atom-in-molecules (AIM) charges for the var- ious groups of compounds in the calibration dataset. Calculations have been performed on the fully optimized structures at the B3LYP/6-311+G** level of theory with GAUSSIAN 98. See Sec- tion 4.2.6.

compound class rate of failure compound class rate of failure hydrocarbons 46 % oxygen (cont.) nitrogen 56 % esters 20 % amines 20 % heterocycles 100 % imines 0 % fluorine 32 % amides 67 % silicon 18 % nitriles 17 % phosphorus 47 % heterocycles 82 % P(III) 33 % N-O-linkage 50 % P(V) 75 % N-N-linkage 67 % sulfur 56 % oxygen 29 % S(II) 51 % alcohols 40 % S(IV),S(VI) 69 % aldehydes 17 % chlorine 34 % ketones 46 % bromine 19 % acids 33 % avg. 42.1 % ethers 25 %

189 A.1 To Section 4 Chapter A

Table. A.2 Percentage of failures (in %) for the calculation of atom-in-molecules (AIM) charges for the vari- ous groups of compounds in the test datasets. Calculations have been performed on the fully opti- mized structures at the B3LYP/6-311+G** level of theory with GAUSSIAN 98. See Section 4.2.6.

compound class rate of failure compound class rate of failure test dataset I test dataset II amino acids 100 % multi-functional cmpds. 33 % medicinal 100 % avg. 55 % 6 5 4 3 2 MK−ESP charge based dipole moment (debyes)

1 y = 0.998 * x + 0.016 n = 457 r2 = 0.998 σ = 0.059 0

0 1 2 3 4 5 6

SCF dipole moment (debyes)

Fig. A.1 Correlation of dipole moments obtained from MK-ESP charges and dipole moments calculated di- rectly from the wavefunctions. See Section 4.2.6.3.

190 Chapter A A.1 To Section 4

Table. A.3 Parameters for the calculation of the electronegativity coefficients, a, b, and c dependent on the hybridization state. Each coefficient is calculated by a polynomial of degree three: coefficient = α + β hyb + γ hyb2 + δ hyb3. The hybridization is characterized by the amount of p-orbital · · · contribution, thus, 0 hyb 100. See Section 4.5.2.1. ≤ ≤

2 4 6 element coefficient α β[10− ] γ[10− ] δ[10− ] a 10.31 6.92 2.69 1.70 boron b 8.44 −3.40 1.38 −8.56 c 0.98 3.51 −7.98 5.90 − a 14.96 7.81 4.12 2.78 carbon b 10.88 −6.73 −11.23 7.00 c 0.38 −4.81 14.99 −7.83 − − a 20.49 2.06 32.83 1.88 nitrogen b 13.35 3.86 − 2.05 1.88 c 0.12 −9.89 26.09 −13.64 − − a 27.25 5.61 23.23 12.55 oxygen b 16.23 −3.54 −16.61 8.43 c 1.72 3.09 −50.36 1.54 − − a 31.26 19.08 – – fluorine b 15.72 − 2.15 – – c 3.24 −6.38 – – − a 8.60 7.02 6.62 4.27 aluminium b 6.64 −0.34 6.15 −3.69 c 0.50 1.36 −2.70 2.12 − a 12.13 3.61 7.53 5.02 silicon b 8.17 −4.30 −6.31 4.59 c 0.49 −1.98 8.34 −4.89 − − − a 14.34 0.41 16.55 9.86 phosphorus b 10.34 1.67 − 2.71 1.61 c 0.24 −4.14 −11.55 6.32 − − a 15.81 2.86 11.61 7.06 sulfur b 11.19 −1.00 − 8.23 4.28 c 1.15 0.50 −0.89 0.82 − a 18.91 9.53 – – chlorine b 12.77 −3.71 – – c 0.42 −1.12 – – a 18.28 9.88 – – bromine b 11.99 −4.24 – – c 0.96 −0.23 – – a 15.94 7.85 – – iodine b 10.94 −3.87 – – c 1.71 −0.98 – – −

191 A.1 To Section 4 Chapter A

H C N

0.5 O S 0.0 MPEOE/MHMO charges −0.5 y = 0.957 * x + 0 n = 425 r2 = 0.989 σ = 0.047

−0.5 0.0 0.5

(a) DFT Natural charges

H C N 1.0 O F S Cl Br 0.5 0.0 MPEOE/MHMO charges

y = 1.005 * x + 0 n = 400 2 = 0.966 −0.5 r σ = 0.061

−0.5 0.0 0.5 1.0

(b) DFT Natural charges

Fig. A.2 Correlation of MHMO/MPEOE charges obtained by calibration to DFT/NPA charges with DFT/NPA charges obtained by QM calculations. (a) all compounds of test dataset I. (b) all compounds of test dataset II. See Section 4.5.4.

192 Chapter A A.1 To Section 4

Table. A.4 Charges on the central carbon atom of the series of compounds obtained from methane by suc- cessive substitution of one hydrogen atom by a methyl group. See Tab. 4.8 in Section 4.2.6.6. Results of the fitting of PEOE/HMO with dipole moments and DFT/NPA charges are shown here in addition beside the discussed QM charge schemes.

charge on central carbon compound NPA AIM MK-ESP MPEOE/MHMO(NPA) PEOE/MHMO(µ) CH 0.813 0.007 0.507 0.739 0.076 4 − − − − − CH CH 0.578 0.037 0.027a 0.585 0.066 3 3 − − − − CH2(CH3)2 0.389 0.070 0.306 0.419 0.056 − − − CH(CH3)3 0.237 0.090 0.559 0.238 0.047 − − − C(CH3)4 0.103 0.095 0.808 0.044 0.037 − − − a average between the two carbon atoms

193 A.1 To Section 4 Chapter A

194 Appendix B

Datasets

B.1 To Section 3

Table. B.1 Dataset consisting of 289 compounds used in Section 3.8.2 for deriving an additivity model for the prediction of mean molecular polarizabilities. Given are the experimental values taken from [118] and the calculated values from the leave-one-out cross validated models, i.e., each calculated value given in the table was obtained by generating a model with all other compounds and using the obtained model for predicting the given compound.

mean molecular polarizability (Å3) no. compound calc. expt.

1 H2 0.79 0.83 2 methane 2.60 2.65 3 ethane 4.47 4.48 4 n-propane 6.29 6.30 5 n-butane 8.12 8.13 6 n-pentane 9.99 9.96 7 n-hexane 11.81 11.78 8 n-heptane 13.65 13.61 9 n-octane 15.48 15.43 10 n-nonane 17.36 17.26 11 n-decane 19.15 19.09 12 n-undecane 21.03 20.91 13 n-dodecane 22.81 22.74 14 iso-butane 8.14 8.13 15 neo-pentane 10.20 9.95 16 cyclopentane 9.12 9.13 17 cyclohexane 10.93 10.96 18 3-methylheptane 15.44 15.43 19 2,2,4-trimethylpentane 15.44 15.43 20 fluoromethane 2.62 2.42 21 1-fluoropentane 9.95 9.73 22 1-fluorohexane 11.80 11.55 23 1-fluoroheptane 13.66 13.38

195 B.1 To Section 3 Chapter B

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 24 1-fluorooctane 15.46 15.20 25 1-fluorononane 17.34 17.03 26 1-fluorodecane 19.18 18.85 27 1-fluoroundecane 21.00 20.68 28 1-fluorododecane 22.83 22.50 29 1-fluorotetradecane 26.57 26.14 30 bromomethane 5.53 5.44 31 bromoethane 7.28 7.27 32 1-bromopropane 9.07 9.10 33 1-bromobutane 10.86 10.93 34 1-bromopentane 12.65 12.75 35 1-bromohexane 14.44 14.58 36 1-bromoheptane 16.23 16.41 37 1-bromoctane 18.02 18.24 38 1-bromononane 19.81 20.07 39 1-bromodecane 21.60 21.89 40 1-bromododecane 25.18 25.55 41 1-bromohexadecane 32.34 32.89 42 1-bromoctadecane 35.92 36.56 43 trifluoromethane 2.81 1.97 44 tetrafluoromethane 2.92 1.72 45 bromomethane 5.61 5.44 46 dibromomethane 8.68 8.18 47 tribromomethane 11.84 10.71 48 iodomethane 7.59 7.65 49 diiodomethane 12.90 12.58 50 triiodomethane 18.04 17.24 51 chloromethane 4.56 4.45 52 dichloromethane 6.62 6.24 53 trichloromethane 8.38 8.03 54 tetrachloromethane 10.49 9.75 55 chloroethane 6.40 6.28 56 Cl2 4.61 4.42 57 hydrogen fluoride 0.80 0.60 58 hydrogen chloride 2.63 2.63 59 hydrogen bromoide 3.61 3.62 60 hydrogen iodid 5.45 5.85 61 ethene 4.26 4.38 62 1-pentene 9.65 9.86 63 2-pentene 9.84 9.86 64 1-hexene 11.65 11.68 65 1-heptene 13.51 13.51 66 1,1-dichloroethene 7.83 8.34 67 trans-dichloroethene 8.15 7.98 68 cis-dichloroethene 7.91 7.99 69 trichloroethene 10.03 10.13 70 trans-bromochloroethene 9.28 8.96 71 cis-bromochloroethene 9.19 8.97

196 Chapter B B.1 To Section 3

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 72 ethyne 3.33 3.63 73 1-heptyne 12.87 12.69 74 1,5-hexadiyne 10.21 9.75 75 benzene 10.37 10.67 76 toluene 12.10 12.49 77 1,3,5-trimethylbenzene 15.76 16.15 78 1,2,4,5-tetramethylbenzene 17.40 17.98 79 hexamethylbenzene 20.81 21.64 80 fluorobenzene 9.86 10.78 81 chlorobenzene 12.25 12.81 82 bromobenzene 13.62 13.80 83 1,2-difluorobenzene 9.80 10.88 84 1,2-dichlorobenzene 14.22 14.97 85 1,3-dichlorobenzene 14.28 14.96 86 1,4-dichlorobenzene 14.20 14.97 87 1,4-difluorobenzene 9.80 10.88 88 1,3,5-trifluorobenzene 9.74 11.00 89 1,2,3,4-tetrafluorobenzene 9.69 11.13 90 1,2,4,5-tetrafluorobenzene 9.69 11.13 91 pentafluorobenzene 9.63 11.29 92 hexafluorobenzene 9.58 11.47 93 p-fluorotoluene 11.70 12.60 94 p-chlorotoluene 13.70 14.64 95 p-bromotoluene 14.80 15.65 96 p-iodotoluene 17.10 17.87 97 o-xylene 14.15 14.32 98 m-xylene 14.21 14.31 99 p-xylene 14.11 14.32 100 naphthalene 17.03 17.62 101 anthracene 25.55 24.54 102 phenanthrene 24.70 24.56 103 naphthacene 32.27 31.49 104 1,2-benzanthracene 32.86 31.47 105 chrysene 32.60 31.48 106 1,2:5,6-dibenzanthracen 41.31 38.29 107 acenanphthene 20.61 20.44 108 fluoranthene 28.34 27.95 109 pyrene 28.78 27.94 110 dodecahydrotriphenylene 29.89 30.17 111 fluorene 21.68 22.19 112 2,3-benzfluorene 30.21 29.07 113 difluorenyl 42.81 43.69 114 triphenylen 31.07 31.54 115 coronene 43.64 41.35 116 α-methylnaphthalene 19.35 19.44 117 β-methylnaphthalene 19.52 19.44 118 α-ethylnaphthalene 21.19 21.27 119 β-ethylnaphthalene 21.36 21.26

197 B.1 To Section 3 Chapter B

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 120 α-chloronaphthalene 19.30 19.76 121 β-chloronaphthalene 19.58 19.75 122 α-bromonaphthalene 20.34 20.76 123 α-iodonaphthalene 22.41 22.98 124 β-iodonaphthalene 22.95 22.95 125 octafluoronaphthalene 17.64 18.74 126 (1-naphthyl)methanal 19.75 19.85 127 (2-naphthyl)methanal 20.06 19.84 128 α-naphthylamine 19.50 19.02 129 β-naphthylamine 19.73 19.01 130 styrene 14.41 14.55 131 α-methylstyrene 16.05 16.38 132 α,β,β-trimethylstyrene 19.64 20.04 133 9-chloroanthracene 27.35 26.68 134 9-bromoanthracene 28.32 27.66 135 ammonia 2.26 1.90 136 n-propylamine 7.70 7.38 137 isopropylamine 7.77 7.38 138 diethylamine 9.61 9.20 139 di-n-propylamine 13.29 12.85 140 triethylamine 13.38 12.85 141 tri-n-propylamine 18.87 18.32 142 aniline 11.74 12.08 143 N-methylaniline 13.82 13.90 144 N-ethylaniline 15.32 15.74 145 N,N-dimethylaniline 15.32 15.74 146 N,N-diethylaniline 19.01 19.39 147 p-fluoroaniline 11.51 12.20 148 p-chloroaniline 13.50 14.23 149 p-bromoaniline 14.55 15.24 150 2,6-dimethylaniline 16.11 15.72 151 3,5-dimethylaniline 16.31 15.72 152 2,6-dichloroaniline 16.20 16.36 153 3,5-dichloroaniline 16.57 16.34 154 pyrrole 7.94 8.19 155 p-toluidine 13.47 13.91 156 pyridine 9.25 9.46 157 chinoline 16.29 16.41 158 isochinoline 16.18 16.42 159 2-methylchinoline 18.65 18.22 160 1-methylisochinoline 18.28 18.23 161 chinoxaline 15.13 15.22 162 2,3-dimethylchinoxaline 18.70 18.88 163 phenazine 23.43 22.01 164 2,3:4,5-dibenzophenazine 33.42 36.48 165 nitroethane 7.00 5.72 166 hydrogen cyanide 2.59 2.74 167 dicyan 5.01 4.57

198 Chapter B B.1 To Section 3

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 168 acetonitrile 4.48 4.56 169 propanenitrile 6.24 6.39 170 2-methylpropanenitrile 8.05 8.22 171 2,2-dimethylpropanenitrile 9.59 10.06 172 dicyanomethane 5.79 6.61 173 2-chloroacetonitrile 6.10 6.38 174 trichloroacetonitrile 10.42 9.90 175 9-cyanoanthracene 28.32 26.70 176 N2 1.76 1.92 177 hydrazine 3.46 2.96 178 phenylhydrazine 12.91 13.18 179 N,N-methylphenylhydrazine 14.81 15.00 180 N,N-ethylphenylhydrazine 16.62 16.82 181 3-aminobutyronitrile 9.17 9.30 182 p-cyanotoluene 13.90 14.77 183 3-(dimethylamino)butyronitile 12.87 12.95 184 pyrazole 7.23 6.98 185 N-methylpyrazole 8.99 8.80 186 1,5-dimethylpyrazole 10.72 10.63 187 1-ethyl-5-methylpyrazole 12.50 12.46 188 imidazole 7.19 6.98 189 N-methylimidazole 8.86 8.81 190 N-(n-propyl)imidazol 12.40 12.47 191 water 1.45 1.40 192 methanol 3.23 3.23 193 ethanol 5.01 5.06 194 1-propanol 6.77 6.89 195 2-propanol 6.97 6.88 196 cyclohexanol 11.56 11.54 197 1,2-ethanediol 5.71 5.63 198 dimethyl ether 5.16 5.05 199 diethyl ether 8.73 8.71 200 methyl propyl ether 8.75 8.71 201 ethyl propyl ether 10.59 10.53 202 di-(n-propyl)ether 12.53 12.36 203 oxirane 4.43 4.23 204 dioxane 9.02 8.41 205 O2 1.60 1.02 206 CO2 2.65 2.42 207 acetone 6.37 6.37 208 methyl ethyl ketone 8.16 8.20 209 diethyl ketone 9.93 10.03 210 n-propyl methyl ketone 9.93 10.03 211 diisopropyl ketone 13.53 13.68 212 formaldehyde 2.45 2.73 213 propionaldehyde 6.35 6.37 214 n-butyraldehyde 8.18 8.20 215 anthrachinone 24.46 25.57

199 B.1 To Section 3 Chapter B

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 216 furan 7.23 7.70 217 formic acid 3.32 3.30 218 5.10 5.12 219 propionic acid 6.93 6.95 220 butanoic acid 8.58 8.78 221 methyl formiate 5.05 5.13 222 ethyl formiate 6.88 6.95 223 methyl acetate 6.81 6.95 224 ethyl acetate 8.62 8.78 225 methyl propionate 8.65 8.78 226 ethyl propionate 10.41 10.61 227 methyl butanoate 10.41 10.61 228 ethyl butanoate 12.23 12.43 229 CH2OHCH2OCH3 7.44 7.46 230 CH2OHCH2OC2H5 9.28 9.29 231 CH2ClCH2OH 6.88 6.86 232 CH2ClCH2OCH3 8.71 8.68 233 CH2ClCH2OC2H5 10.56 10.51 234 CH2ClCH2CH2COOH 10.45 10.58 235 CH2ClCH2CH2COOCH3 12.27 12.41 236 CH2ClCH2CH2COOC2H5 14.11 14.24 237 CH3CHClCH2COOH 10.54 10.58 238 CH3CHClCH2COOCH3 12.31 12.41 239 CH3CHClCH2COOC2H5 14.13 14.24 240 CH3CH2CHClCOOH 10.61 10.58 241 CH3CH2CHClCOOCH3 12.33 12.41 242 CH3CH2CHClCOOC2H5 14.16 14.23 243 C2H5CHClCH2OH 10.70 10.51 244 CH3CHClCH2CH2OH 10.38 10.51 245 CH3CHClCH2OH 8.89 8.68 246 CH2ClCH2CH2OH 8.84 8.68 247 CH3CH2CHClCOOH 10.87 10.57 248 CH3CHClCH2COOH 10.80 10.57 249 CH2ClCH2CH2COOH 10.69 10.58 250 CH3CH2CH2CHClCOOH 12.69 12.40 251 CH3CH2CHClCH2COOH 12.57 12.40 252 CH3CHClCH2CH2COOH 12.53 12.40 253 ethyl α-naphthoate 23.97 24.43 254 ethyl β-naphthoate 24.19 24.42 255 ethyl α-naphthoate 22.03 22.25 256 ethyl β-naphthoate 22.21 22.25 257 formamide 4.01 3.80 258 acetamide 5.53 5.63 259 N-methylformamide 5.90 5.62 260 N,N-dimethylformamide 7.75 7.45 261 N-ethylacetamide 9.45 9.28 262 N-methylacetamide 7.82 7.45 263 N,N-diethylacetamide 12.96 12.93

200 Chapter B B.1 To Section 3

Table. B.1 (continued) mean molecular polarizability (Å3) no. compound calc. expt. 264 benzamide 12.75 14.35 265 p-nitroaniline 13.90 13.71 266 NO 1.70 1.45 267 N2O 3.00 1.93 268 p-nitrotoluene 14.10 14.14 269 nitrobenzene 12.92 12.27 270 H2S 3.83 3.52 271 mercaptoethane 7.38 7.19 272 diethyl thioether 11.00 10.85 273 dimethyl thioether 7.53 7.16 274 dimethyl sulfoxide 7.97 7.65 275 dimethyl sulfone 8.40 7.96 276 diphenyl thioether 23.79 23.93 277 diphenyl sulfoxide 24.34 24.66 278 diphenyl sulfone 24.66 25.10 279 thiophene 9.00 10.00 280 SO2 3.90 3.78 281 CS2 8.74 8.20 282 guanine 13.60 13.98 283 adenine 13.10 12.90 284 cytosine 10.30 10.65 285 thymine 11.23 12.08 286 acridine 25.49 23.28 287 trimethyl phosphate 10.86 10.72 288 (CH3)C(CH2O)3PO 12.84 12.74 289 (CH3)C(CH2O)3PS 15.74 16.01

201 B.2 To Section 4 Chapter B

B.2 To Section 4

Table. B.2 Dataset consisting of 386 compounds used for the calibration of the modified PEOE/HMO pro- cedure. Given are the dipole moments based on charges obtained by the indicated methods, the dipole moments obtained directly from the wavefunction, and the experimental values if available (taken from [143–145]). All values are given in debyes. See Section 4.2.2.

method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 1 methane 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 ethane – 0.00 0.02 0.00 0.00 0.00 0.00 3 propane – 0.09 0.08 0.04 0.08 0.03 0.01 4 butane – 0.00 0.00 0.00 0.00 0.00 0.00 5 isobutane 0.13 0.13 0.13 0.10 0.12 0.05 0.01 6 neopentane – 0.00 0.00 0.01 0.00 0.00 0.00 7 ethene – 0.00 0.00 0.00 0.00 0.00 0.00 8 propene 0.36 0.43 0.37 0.42 0.36 0.62 0.16 9 1-butene – 0.42 0.42 0.37 0.35 0.51 0.15 10 trans-2-butene – 0.00 0.00 0.01 0.00 0.00 0.00 11 cis-2-butene 0.26 0.26 – 0.32 0.06 0.44 0.10 12 2-methylpropene – 0.59 0.45 0.59 0.61 1.09 0.28 13 butadiene – 0.00 0.00 0.01 0.00 0.00 0.00 14 2-methylbutadiene – 0.29 – 0.31 0.18 0.55 0.13 15 2,3-dimethylbutadiene – 0.00 – 0.02 0.00 0.00 0.00 16 3-methylenpenta-1,4- – 0.02 – 0.07 0.08 0.12 0.02 diene 17 ethyne – 0.00 0.00 0.00 0.00 0.00 0.00 18 propyne 0.75 0.85 0.75 0.83 0.52 0.96 0.28 19 2-butyne – 0.00 – 0.00 0.00 0.00 0.00 20 1-buten-3-yne 0.40 0.43 0.65 0.42 0.27 0.50 0.21 21 cyclopropane – 0.00 – 0.00 0.00 0.00 0.00 22 cyclobutane – 0.00 0.00 0.00 0.00 0.00 0.00 23 cyclopentane – 0.02 – 0.01 0.06 0.08 0.01 24 cyclohexane – 0.00 0.00 0.00 0.00 0.00 0.00 25 cyclopropene 0.45 0.46 0.15 0.46 0.62 0.73 0.11 26 cyclobutene 0.13 0.20 0.24 0.24 0.15 0.90 0.11 27 cyclopentene – 0.25 – 0.28 0.10 0.83 0.22 28 cyclopentadiene – 0.49 – 0.50 0.71 1.09 0.21 29 1,3-cyclohexadiene – 0.50 – 0.48 0.42 1.17 0.25 30 1,4-cyclohexadiene – 0.00 – 0.01 0.00 0.00 0.00 31 benzene – 0.00 – 0.02 0.00 0.00 0.00 32 toluene – 0.40 – 0.38 0.42 0.58 0.17 33 o-xylol – 0.68 – 0.70 0.72 1.00 0.29 34 m-xylol – 0.39 0.39 0.38 0.42 0.58 0.17 35 p-xylol – 0.00 – 0.01 0.00 0.00 0.00 36 1-methyl-[1,1,0]- – 0.83 0.51 0.82 0.92 0.56 0.10 bicyclobutane 37 1-methyl-[1,1,1]- – 0.23 – 0.23 0.35 0.00 0.15 bicyclopentane

202 Chapter B B.2 To Section 4

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 38 [2,1,0]-bicylopentane – 0.28 0.19 0.22 0.28 0.20 0.06 39 spirobipropane – 0.00 – 0.01 0.00 0.00 0.00 40 [4,2,0]-bicyclooctane – 0.17 – 0.14 0.07 0.16 0.01 41 1-methyl- – 0.55 – 0.50 0.62 0.42 0.37 [2,1,1,0(5,6)]- tricylohexane 42 [3,1,1,0(6,7)]- – 0.47 0.12 0.44 0.54 0.65 0.26 tricycloheptane 43 methylcubane – 0.15 0.20 0.18 0.27 0.02 0.16 44 naphtalene – 0.00 0.00 0.02 0.00 0.00 0.00 45 anthracene – 0.00 – 0.03 0.00 0.00 0.00 46 phenanthrene – 0.02 – 0.02 0.05 0.09 0.00 47 ammonia 1.47 1.70 1.74 1.78 1.84 1.82 0.59 48 1.31 1.41 1.74 1.34 1.61 1.71 0.70 49 dimethylamine 1.01 1.05 1.61 1.03 1.47 1.74 0.76 50 ethylamine 1.30 1.35 1.69 1.33 1.57 1.62 0.70 51 aziridine 1.89 1.88 3.09 1.85 2.52 2.50 1.29 52 trimethylamine 0.61 0.58 1.47 0.65 1.47 1.88 0.80 53 2-aminopropane 1.19 1.30 1.58 1.27 1.76 1.91 0.75 54 cyclopropylamine 1.19 1.25 1.63 1.20 1.66 1.78 0.71 55 propylamine 1.17 1.29 1.62 1.28 1.61 1.59 0.71 56 piperidine 0.82 0.88 1.52 0.91 1.43 1.61 0.76 57 1,2,5,6-tetrahydro- 0.99 1.04 – 1.07 1.51 1.61 0.73 pyrimidine 58 aniline 1.13 1.59 – 1.63 1.43 2.20 0.95 59 diethylamine 0.92 0.95 1.48 0.96 1.34 1.60 0.72 60 triethylamine 0.66 0.56 – 0.57 1.38 1.69 0.73 61 pyrrolidine 1.58 1.02 1.83 1.03 1.64 1.77 0.85 62 formamide 3.96 4.05 6.45 4.01 4.85 5.22 3.75 63 acetamide 3.91 3.98 6.71 3.97 4.89 5.53 3.82 64 N-methylformamide 4.29 4.09 – 3.99 4.78 5.21 3.71 65 N,N-dimethyl- 4.18 4.24 – 4.22 5.40 5.63 4.04 formamide 66 N-methylacetamide 4.15 3.91 – 3.87 4.68 5.46 3.75 67 N,N-dimethyl- 3.95 4.04 – 4.04 5.37 5.88 4.06 acetamide 68 benzamide 3.39 3.74 – 3.76 4.77 5.65 4.04 69 γ-butyrolactam 3.55 4.36 – 4.37 5.50 5.95 4.28 70 urea 4.56 4.44 6.32 4.46 5.18 6.43 4.53 71 propyleneimine 1.77 2.14 4.49 2.09 2.54 2.66 1.97 72 acetaldimine 2.06 2.18 4.46 2.14 2.53 2.77 1.97 73 N-methylacetaldimine 1.50 1.57 3.78 1.55 2.03 2.80 1.86 74 N-methylformaldimine 1.53 1.63 3.71 1.63 1.96 2.84 1.82 75 hydrogen cyanide 2.98 3.06 7.00 3.01 2.84 3.09 3.36 76 acetonitrile 3.92 4.05 8.48 4.05 3.54 3.90 3.71 77 dicyanomethane 3.73 3.88 8.43 3.89 3.34 4.25 3.93 78 propanenitrile 4.05 4.16 8.67 4.13 3.61 3.88 3.72 79 butanonitrile 3.91 4.30 8.89 4.29 3.77 3.86 3.73 80 cyclopropanecarbonitrile 4.13 4.43 8.96 4.40 3.86 3.77 3.69

203 B.2 To Section 4 Chapter B

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 81 cyclobutanecarbonitrile 4.04 4.35 8.96 4.31 3.78 3.84 3.74 82 tert-butyl cyanide 3.95 4.19 8.83 4.20 3.78 3.79 3.75 83 pentanonitrile 4.12 4.41 9.02 4.40 3.80 3.88 3.72 84 benzonitrile 4.18 4.74 – 4.72 4.17 3.96 4.57 85 propenenitrile 3.92 4.05 8.59 4.05 3.63 3.91 4.59 86 methacrylonitrile 3.69 4.09 8.76 4.11 3.56 3.79 4.56 87 2-butenonitrile 4.75 4.87 9.50 4.86 4.34 4.53 4.80 88 cyanoallene 4.28 4.51 9.12 4.53 4.13 3.93 4.58 89 1-cyano- 4.25 4.72 9.36 4.70 4.22 4.12 5.33 cyclopentadiene 90 2-cyanopyridine 5.78 5.95 – 5.90 5.66 5.90 5.53 91 3-cyanopyridine 3.66 4.03 8.08 4.00 3.60 3.64 3.98 92 4-cyanopyridine 1.96 2.01 – 1.99 1.24 0.89 2.29 93 pyridine 2.22 2.37 – 2.35 2.55 2.91 1.96 94 2-methylpyridine 1.85 1.96 3.53 1.94 2.18 2.55 1.84 95 4-methylpyridine 2.70 2.89 4.61 2.85 3.08 3.46 2.13 96 pyrrole 1.77 1.88 – 1.85 1.52 3.90 1.92 97 indole 2.10 2.13 – 2.14 1.66 4.08 2.00 98 quinoline 2.29 2.17 – 2.17 2.42 3.03 2.17 99 isoquinoline 2.73 2.72 – 2.69 2.89 3.14 2.11 100 pyrimidine 2.33 2.47 – 2.46 2.73 3.19 2.07 101 2-methylpyrimidine 1.68 1.75 – 1.77 1.87 2.69 1.82 102 imidazole 3.80 3.83 – 3.76 3.78 5.93 3.60 103 3-methylpyridine 2.40 2.64 – 2.61 2.83 3.24 2.04 104 hydroxylamine 0.59 0.68 1.70 0.71 1.34 1.55 0.70 105 N,N-dimethyl- – 0.96 2.35 0.96 1.96 1.97 0.91 hydroxylamine 106 isoxazolidine – 2.05 – 2.02 3.47 3.51 2.06 107 isoxazole 2.90 3.19 8.11 3.08 3.88 3.76 3.09 108 3-amino-2,3- 1.77 1.24 – 1.23 2.46 2.14 1.31 dihydroisoxazole 109 nitromethane 3.46 3.81 6.27 3.81 4.88 5.35 3.83 110 nitroethane 3.23 3.98 6.42 3.99 5.15 5.38 3.87 111 1-nitropropane 3.66 4.13 6.66 4.15 5.30 5.37 3.89 112 2-nitropropane 3.73 4.10 6.58 4.12 5.42 5.35 3.90 113 nitrobenzene 4.22 4.92 – 4.92 6.24 5.24 4.25 114 nitroethene – 4.25 – 4.25 5.50 4.96 4.17 115 2-nitrobuta-1,3-dien – 3.94 – 3.93 5.30 5.04 4.18 116 3-nitropropene – 3.92 – 3.96 5.13 5.26 3.83 117 1,1-di(nitromethyl)- – 3.45 – 3.46 4.48 4.18 3.26 ethene 118 hydrazine 1.75 2.04 1.91 2.00 1.99 2.28 0.83 119 N,N-dimethyl- – 1.63 2.17 1.65 2.24 2.79 1.19 hydrazine 120 pyrazolidine – 2.08 – 1.98 2.94 3.03 1.62 121 phenylhydrazine – 2.66 – 2.66 2.25 3.53 2.54 122 pyridazine 4.22 4.42 – 4.22 4.48 4.61 4.20 123 pyrazole 1.60 2.39 – 2.35 2.52 3.69 2.30 124 water 1.85 2.16 3.08 2.21 2.58 2.49 1.12

204 Chapter B B.2 To Section 4

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 125 methanol 1.70 1.89 3.51 1.86 2.78 2.94 1.50 126 ethanol 1.44 1.76 3.34 1.72 2.72 2.67 1.48 127 propanol 1.55 1.66 3.25 1.65 2.79 2.63 1.49 128 2-propanol 1.58 1.77 3.37 1.75 2.89 2.96 1.52 129 1,2-propanediol 2.57 2.59 – 2.56 4.14 4.15 2.21 130 1-butanol 1.66 1.71 3.36 1.69 2.70 2.66 1.49 131 cyclopropanol 1.46 1.67 3.52 1.63 2.71 2.89 1.49 132 cyclobutanol 1.62 1.66 3.32 1.62 2.75 2.92 1.51 133 hydroxyethene 1.02 1.09 3.56 1.08 1.77 1.62 0.86 134 phenole 1.22 1.38 – 1.37 2.10 2.36 1.22 135 p-cresol 1.50 1.42 – 1.39 2.31 2.66 1.32 136 2-methoxyethanol 2.36 2.56 – 2.57 4.50 5.02 2.52 137 benzylalcohol 1.71 1.59 – 1.64 2.76 2.75 1.51 138 cyclohexanol 1.80 1.79 – 1.78 2.88 2.94 1.53 139 formaldehyde 2.33 2.48 6.22 2.46 3.43 3.54 2.60 140 acetaldehyde 2.75 2.95 6.74 2.94 4.05 4.26 2.75 141 propionaldehyde 2.52 3.07 6.80 3.05 4.11 4.18 2.75 142 butanal 2.72 3.15 6.97 3.14 4.17 4.18 2.75 143 e-2-butenal 3.67 4.32 7.82 4.30 5.38 5.88 3.62 144 benzaldehyde 3.21 3.57 – 3.56 4.54 4.63 3.14 145 acetone 2.88 3.15 6.89 3.15 4.44 4.76 2.87 146 ethylmethylketone 2.78 2.99 6.67 2.96 4.34 4.78 2.85 147 cyclopentanon 3.30 3.29 7.17 3.31 4.10 4.46 2.90 148 phenylmethylketone 3.02 3.25 – 3.24 4.46 4.91 3.17 149 cyclobuta-1,2-dione 3.83 4.08 – 4.12 5.23 6.22 3.92 150 cyclobutanone 2.89 3.11 – 3.14 3.86 4.61 2.88 151 cyclopropanone 2.67 3.06 6.85 3.05 3.68 4.79 3.02 152 4-cyclo- 1.68 1.78 3.50 1.77 2.29 2.06 1.58 pentene-1,3-dione 153 3-cyclopentenone 2.79 2.90 – 2.93 3.89 3.63 2.67 154 cyclohexanone 2.87 3.47 – 3.46 4.50 4.59 2.86 155 2-cyclopentenone 3.64 4.06 – 4.06 5.19 5.90 3.65 156 diketene 3.53 3.78 10.00 3.79 5.57 5.44 3.52 157 3-oxetanone 0.89 0.86 2.09 0.90 0.01 0.45 0.88 158 formic acid 1.42 1.50 3.35 1.50 1.60 1.59 1.79 159 acetic acid 1.70 1.74 4.23 1.75 2.10 2.32 1.99 160 formylformic acid 1.86 2.00 – 2.00 2.56 2.86 1.82 161 propionic acid 1.46 1.61 4.28 1.59 2.06 2.34 1.99 162 acrylic acid 1.46 1.58 4.21 1.57 2.11 2.55 2.27 163 benzoic acid 1.76 2.13 – 2.13 2.63 2.84 2.48 164 vinylmethyl ether 0.96 1.07 3.60 1.11 1.98 2.12 1.14 165 methylpropyl ether 1.11 1.24 3.36 1.28 2.86 3.31 1.65 166 dimethyl ether 1.30 1.45 3.64 1.52 2.91 3.41 1.69 167 tetrahydrofuran 1.75 2.01 4.26 1.99 3.34 3.53 1.85 168 diethyl ether 1.15 1.23 3.17 1.24 2.66 3.21 1.60 169 phenylmethyl ether 1.38 1.40 – 1.42 2.31 2.89 1.54 170 tetrahydropyran 1.58 1.64 – 1.62 3.07 3.50 1.78 171 1,3-dioxane 2.06 2.29 – 2.26 4.48 4.96 2.53

205 B.2 To Section 4 Chapter B

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 172 oxetane 1.94 2.17 4.80 2.18 3.75 4.02 2.03 173 3-methyleneoxetane 1.63 1.80 4.69 1.80 3.57 3.73 2.00 174 ethylmethyl ether 1.17 1.34 3.40 1.37 2.80 3.36 1.65 175 oxirane 1.90 2.12 5.41 2.14 3.97 4.38 2.18 176 methyl acetate 1.72 1.89 3.39 1.88 1.94 1.98 1.75 177 methyl formate 1.77 1.96 3.01 1.96 2.00 1.91 1.69 178 ethyl formate 1.98 2.25 3.40 2.26 2.38 1.90 1.77 179 ethyl acetate 1.78 2.09 3.73 2.12 2.23 1.96 1.82 180 pentyl formate 1.90 2.47 3.69 2.45 2.48 1.94 1.77 181 diethylcarbonate 1.10 0.57 0.18 0.63 0.53 0.99 0.55 182 maleicanhydride 3.94 4.40 – 4.39 6.17 6.34 4.17 183 beta-propiolactone 4.18 4.47 9.46 4.48 6.03 6.96 4.20 184 gamma-butyrolacton 4.27 4.91 10.12 4.93 6.56 7.16 4.68 185 2(5h)-furanone 4.91 5.27 – 5.27 7.12 7.85 5.21 186 furan 0.72 0.75 – 0.73 1.62 0.87 1.02 187 hydrogen fluoride 1.82 1.98 3.09 2.00 2.44 2.50 1.13 188 fluorocyclohexane 2.11 2.43 – 2.44 3.95 3.74 1.96 189 1,2-difluoroethane 2.67 3.03 6.45 3.02 5.31 5.48 2.87 190 fluoropropane 2.05 2.23 4.43 2.24 3.76 3.67 1.93 191 fluoromethane 1.86 2.09 4.28 2.10 3.44 3.60 1.89 192 1,1-difluoroethane 2.27 2.52 5.64 2.52 4.16 4.07 2.24 193 fluoroethane 1.94 2.18 4.37 2.16 3.66 3.69 1.92 194 2-fluoropropane 1.96 2.19 4.42 2.20 3.84 3.73 1.95 195 difluoromethane 1.98 2.17 5.08 2.20 3.62 3.72 2.09 196 trifluoromethane 1.65 1.78 4.72 1.83 3.04 2.97 1.82 197 1,1,1-trifluoroethane 2.35 2.57 6.16 2.59 4.06 3.84 2.19 198 pentafluoroethane 1.54 1.68 4.62 1.68 3.10 3.18 1.84 199 1,1,1,2,2,3,3- 1.62 1.58 – 1.58 3.16 3.23 1.80 heptafluoropropane 200 2-fluoro-2-methyl- 1.96 2.19 4.41 2.21 4.00 3.72 1.97 propane 201 cis-1,2-difluoroethene 2.42 2.60 7.34 2.53 5.27 5.56 2.74 202 3-fluoropropene 1.94 2.15 4.45 2.21 3.71 3.69 1.91 203 fluoroethene 1.47 1.60 4.68 1.56 3.08 3.05 1.49 204 2-fluoro-1-propene 1.61 1.81 4.77 1.80 3.30 3.22 1.58 205 cis-1-fluoro-1-propene 1.46 1.66 – 1.59 3.18 3.12 1.51 206 1,1-difluoroethene 1.39 1.50 6.43 1.51 2.86 2.61 1.29 207 trifluoroethene 1.32 1.42 5.00 1.37 2.90 3.07 1.53 208 3,3,3-trifluoro-1- 2.45 2.72 – 2.74 4.26 3.78 2.14 propene 209 fluorobenzene 1.60 1.75 – 1.80 3.30 3.25 1.57 210 (trifluoromethyl)benzene 2.86 3.16 – 3.21 4.68 3.93 2.20 211 o-fluorotoluene 1.37 1.44 – 1.46 3.02 3.00 1.49 212 m-fluorotoluene 1.86 1.99 – 2.00 3.52 3.56 1.65 213 p-fluorotoluene 2.00 2.19 – 2.21 3.73 3.83 1.74 214 m-difluorobenzene 1.51 1.72 – 1.72 3.29 3.27 1.58 215 1,2,3,4-tetrafluoro- 2.42 2.75 8.82 2.77 5.49 5.71 2.76 benzene

206 Chapter B B.2 To Section 4

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 216 o-difluorobenzene 2.46 2.89 8.79 2.89 5.60 5.65 2.73 217 3,3,3-trifluoropropyne 2.32 2.65 5.93 2.69 4.30 3.76 1.99 218 silane 0.00 0.00 0.00 0.01 0.00 0.00 0.00 219 methylsilane 0.74 0.70 0.80 0.66 0.76 2.54 0.13 220 trimethylsilane 0.53 0.52 0.98 0.56 0.72 2.28 0.11 221 vinylsilane 0.66 0.63 1.08 0.60 0.89 4.53 0.35 222 phenylsilane 0.85 0.84 – 0.82 1.00 5.12 0.43 223 silanol 1.35 1.49 2.38 1.44 2.74 3.71 1.91 224 methylsilanol 1.68 0.85 3.16 0.81 3.06 4.92 1.91 225 methylsilyl ether 1.15 1.24 3.09 1.27 2.66 3.54 1.82 226 di(trimethylsilyl) ether 0.78 0.00 – 0.01 0.00 0.00 0.00 227 fluorosilane 1.27 1.80 2.13 1.81 3.65 4.55 2.70 228 trifluorosilane 1.26 1.76 2.17 1.80 3.41 5.09 2.67 229 fluoromethylsilane 1.71 2.13 1.99 2.10 3.48 4.52 2.69 230 difluoromethylsilane 2.11 2.60 2.22 2.61 3.81 4.84 3.07 231 trifluoromethylsilane 2.34 2.87 1.81 2.89 3.29 3.40 2.92 232 chlorosilane 1.31 1.86 – 2.00 2.73 2.18 2.45 233 trichlorosilane 0.86 1.17 2.21 1.30 2.10 2.16 2.15 234 trichlorofluorosilane 0.49 0.73 0.18 0.67 1.96 2.43 0.57 235 phosphane 0.58 0.82 5.57 0.83 0.08 0.07 0.30 236 methylphosphane 1.10 1.21 5.48 1.16 0.90 1.33 0.30 237 dimethylphosphane 1.23 1.29 5.70 1.25 1.44 1.48 0.31 238 trimethylphosphane 1.19 1.24 6.05 1.20 1.82 1.42 0.34 239 phospholane – 1.55 5.30 1.42 1.56 1.84 0.32 240 vinylphosphane – 0.89 5.80 0.90 1.25 2.11 0.67 241 divinylphosphane – 1.00 5.95 0.96 1.64 2.39 0.55 242 trivinylphosphane – 1.03 – 0.96 2.03 2.36 0.77 243 phenylphospane – 1.26 5.62 1.27 0.97 2.43 0.72 244 dimethylphenylphosphane – 1.27 – 1.25 1.96 1.99 0.87 245 phosphole – 0.76 – 0.74 2.38 2.99 0.53 246 phosphabenzene – 1.68 – 1.62 1.91 3.23 1.38 247 1,3-diphosphabenzene – 1.45 – 1.41 1.54 3.31 1.24 248 phosphinousamide – 1.52 5.46 1.45 1.72 2.53 0.65 249 N,N-dimethyl- – 1.32 5.12 1.32 1.37 2.71 0.75 phosphinousamide 250 (N,N-dimethyl)- – 0.70 – 0.60 3.43 3.49 0.35 dimethylphosphinous- amide 251 1,2-azaphospholidine – 1.78 – 1.72 2.20 2.88 0.86 252 phosphinimine – 2.23 6.31 2.09 3.83 4.95 2.43 253 (vinylimino)phosphane – 0.81 7.72 0.90 3.64 5.07 0.76 254 1,2-aza- – 3.13 3.55 2.98 2.44 3.76 3.08 phosphabenzene 255 phosphinous acid – 2.31 3.95 2.23 2.96 4.31 1.86 256 methylphosphinous – 0.71 8.16 0.72 4.29 4.75 1.36 acid 257 dimethylphosphinous – 0.55 8.36 0.58 4.55 4.76 1.34 acid

207 B.2 To Section 4 Chapter B

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 258 methylphosphinous – 0.19 8.30 0.33 4.34 4.74 1.27 acid methyl 259 1,2-oxaphospholane – 2.75 – 2.66 2.93 4.29 2.19 260 fluorophosphane – 1.96 – 1.94 4.42 4.78 2.19 261 difluorophosphane – 1.98 7.80 1.98 5.63 5.89 2.34 262 trifluorophosphane – 1.68 8.46 1.69 6.44 6.45 2.02 263 dimethylfluorophosphane – 2.49 6.95 2.50 4.68 4.65 2.24 264 chlorophosphane – 2.07 – 2.10 3.06 2.96 1.90 265 dichlorophosphane – 1.79 – 1.87 3.44 3.55 1.95 266 trichlorophosphane – 1.08 5.32 1.16 3.48 3.76 1.60 267 dimethylchlorophosphane – 2.83 5.95 2.88 3.50 2.90 1.94 268 dimethylphosphineoxide – 4.57 4.85 4.57 6.34 7.17 4.14 269 divinylphosphineoxide – 4.33 – 4.34 6.11 6.96 3.89 270 phenylvinylphosphineoxide – 4.25 – 4.22 5.81 6.28 3.85 271 methylphosphinic acid – 2.90 3.07 2.92 4.10 5.03 3.18 272 vinylphosphinic acid – 3.38 – 3.37 4.22 4.88 3.21 273 phenylphosphinic acid – 3.97 – 3.99 4.65 5.10 3.29 274 methylphosphonic acid – 1.47 1.68 1.53 1.22 2.00 2.00 275 vinylphosphonic acid – 1.74 – 1.77 1.55 3.18 1.98 276 phenylphosphonic acid – 2.54 – 2.58 2.64 3.41 2.45 277 methylphosphonic – 2.04 – 2.03 2.10 1.08 1.68 aciddimethyl ester 278 phosphoric acid methyl – 3.77 5.13 3.74 4.31 3.75 3.06 ester 279 phosphoric acid – 3.70 – 3.67 4.89 4.74 3.47 trimethyl triester 280 phosphoric acid – 3.24 – 3.25 3.69 2.74 2.76 vinylphenyl diester 281 phosphoric acid – 1.09 – 1.06 0.51 1.37 1.13 dimethylphenyl triester 282 phosphoricamide – 2.50 – 2.50 2.20 1.33 2.34 methyl ester 283 N-methyl- – 3.78 – 3.73 4.14 3.76 3.52 phosphoramidic acid 284 methanethiol 1.52 1.70 – 1.73 0.71 0.80 0.99 285 ethanethiol 1.58 1.78 0.92 1.78 0.70 0.87 0.98 286 propanethiol 1.60 1.76 0.99 1.82 0.80 0.83 0.99 287 2-methyl-2- 1.66 1.82 1.00 1.89 0.81 0.86 1.01 propanethiol 288 ethylmethyl sulfide 1.56 1.70 0.28 1.68 0.21 0.20 1.10 289 dimethyl sulfide 1.55 1.69 0.01 1.72 0.20 0.17 1.11 290 diethyl sulfide 1.54 1.70 0.07 1.63 0.14 0.22 1.09 291 hydrogen sulfide 0.97 1.35 0.31 1.50 1.23 0.97 0.85 292 thietane 1.85 2.05 – 2.07 0.34 0.13 1.50 293 thiacyclohexane 1.78 1.94 – 1.93 0.31 0.17 1.16 294 mercaptoethene – 1.25 0.44 1.26 1.03 2.30 1.13 295 2-mercaptobutadiene – 1.00 – 1.01 0.44 1.44 0.87 296 mercaptobenzene – 1.18 – 1.25 0.74 2.00 0.99 297 thiophene 0.55 0.52 – 0.52 1.64 2.54 0.67

208 Chapter B B.2 To Section 4

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 298 2-methylthiophene 0.64 0.73 – 0.73 1.73 2.89 0.64 299 2,5-dihydrothiophene 1.75 1.89 – 1.89 0.50 0.65 1.02 300 3-methylthiophene 0.91 0.94 – 0.93 1.27 2.02 0.82 301 thioformaldehyde 1.65 1.79 2.55 1.82 0.40 0.34 1.76 302 thioacetaldehyde 2.33 2.59 – 2.63 1.16 1.06 1.97 303 methanesulfenic acid – 2.21 3.80 2.22 3.15 3.37 1.60 304 methanesulfenic – 2.10 – 2.14 3.18 3.37 1.90 acidmethyl ester 305 ethanesulfenic – 2.20 – 2.21 3.24 3.33 1.89 acidmethyl ester 306 1,2-oxathiolane – 3.12 – 3.12 3.61 3.97 2.68 307 methanesulfenamide – 0.70 1.96 0.75 1.51 1.79 0.47 308 N-methyl- – 0.90 1.80 0.87 1.39 1.70 0.56 methanesulfenamide 309 N,N-dimethyl- – 2.08 – 2.14 1.21 1.82 1.68 methanesulfenamide 310 isothiazolidine – 2.91 2.52 2.90 2.22 2.66 1.89 311 propenethial – 3.08 1.27 3.14 1.77 0.56 2.78 312 penta-1,4-dien-3-thion – 2.67 – 2.75 1.17 0.44 2.60 313 thioacetic acid – 1.86 4.32 1.95 0.50 0.48 1.66 314 thioacetamide – 4.61 0.99 4.68 3.01 3.74 4.06 315 dithioacetic acid – 1.89 – 1.95 1.63 2.01 1.37 316 thioacetic-s- acid – 1.80 7.16 1.79 3.60 4.41 1.97 317 thioacrylic acid – 1.79 – 1.88 1.34 1.46 2.20 318 thioacrylamide – 4.30 – 4.39 2.76 2.66 4.00 319 dithioacrylic acid – 1.89 – 1.93 1.22 1.34 1.88 320 thioacrylic-s- acid – 1.86 6.87 1.84 3.44 4.00 2.22 321 dimethylsulfoxide 3.96 4.44 8.70 4.42 6.91 7.51 4.13 322 methylphenylsulfoxide – 4.31 – 4.30 6.57 8.06 4.08 323 dimethylsulfone 4.49 4.96 9.74 4.97 6.73 6.97 4.76 324 divinylsulfone – 4.83 – 4.86 6.71 6.59 4.85 325 methanesulfonic acid – 3.13 6.46 3.14 3.84 3.91 3.19 326 benzenesulfonic acid – 5.05 – 5.07 6.26 5.37 4.09 327 vinylsulfonic acid – 3.47 – 3.46 4.42 4.25 3.50 328 1-butadienylsulfonic – 4.10 – 4.14 4.96 4.91 3.86 acid 329 sulfamide – 3.11 4.95 3.10 3.68 3.66 3.26 330 benzenesulfonamide – 3.99 – 4.00 5.12 4.54 3.89 331 N-methyl- – 4.66 – 4.69 6.08 5.35 4.28 benzenesulfonamide 332 p-aminol- – 5.20 – 5.18 5.74 5.28 4.50 benzenesulfonamide 333 N-phenyl- – 5.07 – 5.07 7.06 6.37 4.22 methanesulfonamide 334 hydrogen chloride 1.08 1.41 1.58 1.55 1.60 1.72 1.13 335 chloromethane 1.89 2.11 2.43 2.17 1.70 1.66 1.66 336 chloroethane 2.05 2.30 2.70 2.37 1.94 1.69 1.68 337 cyclopropyl chloride 1.78 2.08 2.64 2.16 1.64 1.66 1.60 338 chloropropane 1.95 2.38 2.77 2.45 2.05 1.67 1.69

209 B.2 To Section 4 Chapter B

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 339 2-chloropropane 2.17 2.42 2.86 2.51 2.12 1.71 1.71 340 1-chloro-2- 2.00 2.31 – 2.36 2.05 1.72 1.70 methylpropane 341 2-chlorobutane 2.04 2.38 – 2.45 2.08 1.78 1.72 342 1-chlorobutane 2.05 2.48 2.95 2.53 2.06 1.68 1.69 343 2-chloro-2- 2.13 2.50 2.93 2.60 2.22 1.71 1.74 methylpropane 344 1-chloropentane 2.16 2.50 2.98 2.56 2.13 1.67 1.69 345 chlorocyclohexane 2.44 2.71 3.22 2.77 2.21 1.68 1.71 346 chloroethene 1.45 1.56 2.44 1.63 1.23 0.99 1.29 347 cis-1-chloropropene 1.67 1.82 – 1.81 1.50 1.23 1.34 348 trans-1-chloropropene 1.97 2.11 3.03 2.18 1.66 1.52 1.43 349 2-chloropropene 1.65 1.82 2.69 1.92 1.42 1.19 1.35 350 3-chloropropene 1.94 2.21 2.81 2.33 1.96 1.55 1.65 351 chlorobenzene 1.69 1.86 – 1.99 1.44 1.08 1.36 352 o-chlorotoluene 1.56 1.61 – 1.70 1.23 0.96 1.30 353 p-chlorotoluene 2.21 2.36 – 2.46 1.94 1.66 1.53 354 chloroethyne 0.44 0.37 1.99 0.44 0.06 0.11 0.74 355 3-chloropropyne 1.68 1.92 2.42 2.03 1.80 1.63 1.57 356 dichloromethane 1.60 1.83 2.36 1.99 1.41 1.94 1.74 357 trichloromethane 1.04 1.20 1.76 1.39 0.86 1.72 1.38 358 1,1-dichloroethane 2.06 2.27 – 2.38 1.84 1.97 1.87 359 1,1,1-trichloroethane 1.76 2.01 3.01 2.06 1.51 1.71 1.71 360 pentachloroethane 0.92 0.95 – 1.08 0.88 1.67 1.35 361 1,3-dichloropropane 2.08 2.03 2.28 2.16 1.77 1.97 1.81 362 1,1-dichloroethene 1.34 1.46 2.89 1.53 0.90 0.93 1.21 363 cis-1,2-dichloroethene 1.90 1.95 – 2.08 1.66 1.80 2.21 364 o-dichlorobenzene 2.50 2.65 – 2.81 2.08 1.88 2.33 365 m-dichlorobenzene 1.72 1.74 – 1.81 1.34 1.09 1.37 366 hydrogen bromide 0.82 1.09 0.52 1.26 1.33 1.33 1.05 367 bromomethane 1.82 2.04 1.57 2.14 1.17 1.25 1.48 368 bromoethane 2.04 2.31 1.98 2.42 1.49 1.29 1.50 369 1-bromopropane 2.18 2.40 2.05 2.52 1.59 1.26 1.51 370 2-bromopropane 2.21 2.47 2.21 2.62 1.72 1.32 1.53 371 1-bromobutane 2.08 2.50 2.20 2.59 1.62 1.28 1.50 372 2-bromobutane 2.23 2.46 – 2.58 1.72 1.40 1.53 373 1-bromopentane 2.20 2.53 2.23 2.65 1.69 1.26 1.51 374 1-bromoheptane 2.16 2.59 2.23 2.70 1.73 1.26 1.51 375 bromoethene 1.42 1.54 1.41 1.66 0.71 0.55 1.08 376 bromobenzene 1.70 1.87 – 2.06 0.91 0.67 1.13 377 bromoethyne 0.23 0.18 0.25 0.27 0.64 0.28 0.80 378 3-bromopropyne 1.54 1.83 1.61 2.00 1.26 1.28 1.38 379 cis-1-bromopropene 1.57 1.85 – 1.89 1.07 0.89 1.13 380 trans-1-bromoethene 1.69 2.13 2.01 2.23 1.16 1.10 1.21 381 2-bromopropene 1.51 1.82 1.75 1.96 0.96 0.82 1.13 382 dibromomethane 1.43 1.60 1.25 1.78 0.71 1.50 1.53 383 tribromomethane 0.99 0.97 0.78 1.19 0.22 1.35 1.19 384 cis-1,2-dibromoethene 1.35 1.71 1.11 1.93 0.68 1.02 1.86

210 Chapter B B.2 To Section 4

Table. B.2 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 385 tribromoethene 0.82 0.67 0.55 0.80 0.20 0.61 0.98 386 m-dibromobenzene 1.24 1.70 – 1.80 0.77 0.68 1.13

211 B.2 To Section 4 Chapter B

Table. B.3 Dataset consisting of 71 compounds used as independent test set for the modified PEOE/HMO procedure. Given are the dipole moments based on charges obtained by the indicated methods, the dipole moments obtained directly from the wavefunction, and the experimental values if available (taken from [143–145]). All values are given in debyes. See Section 4.2.2.

method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 1 alanine 2.14 2.14 – 2.16 2.79 2.68 2.23 2 arginine – 5.11 – 5.08 5.44 6.88 5.34 3 asparagine – 4.98 – 4.96 6.31 6.72 4.12 4 aspartic acid – 2.87 – 2.85 3.40 3.35 2.40 5 cysteine – 1.11 – 1.16 2.40 2.69 1.27 6 glutamic acid – 2.30 – 2.28 2.60 2.77 2.84 7 glutamine – 4.79 – 4.79 5.86 6.42 5.24 8 glycine – 2.12 – 2.10 2.71 2.63 2.18 9 histidine – 4.02 – 3.99 4.01 5.94 4.24 10 isoleucine – 2.45 – 2.42 3.12 2.85 2.24 11 leucine – 2.44 – 2.42 3.21 2.85 2.22 12 lysine – 2.36 – 2.41 2.54 2.66 1.94 13 methionine – 2.34 – 2.41 2.94 2.84 2.54 14 phenylalanine – 2.29 – 2.31 2.77 2.54 2.14 15 proline – 1.80 – 1.75 2.60 2.32 2.08 16 serine – 3.22 – 3.19 4.10 3.77 2.25 17 threonine – 3.03 – 3.02 4.67 4.89 2.99 18 tryptophan – 3.28 – 3.34 3.42 5.00 2.69 19 tyrosine – 3.05 – 3.06 3.37 3.38 2.59 20 valine – 2.39 – 2.37 3.07 2.91 2.25 21 histamine – 3.52 – 3.46 3.06 5.24 3.17 22 barbituric acid – 0.10 – 0.11 0.51 0.75 0.77 23 hydantoin – 2.74 – 2.74 3.49 3.98 2.94 24 oxazole 1.50 1.57 – 1.59 1.50 2.13 1.11 25 pyridine-2-aldehyde 5.30 5.40 – 5.38 6.68 7.31 4.86 26 4-methoxypyridine 3.05 3.41 – 3.41 3.54 3.91 2.41 27 morpholine 1.55 1.65 – 1.66 2.19 2.57 1.23 28 methylthiocyanate 3.34 4.33 – 4.34 4.50 5.05 3.51 29 thiazole 1.65 1.66 – 1.64 3.78 5.02 1.21 30 dicyanogen sulfide 3.02 3.07 – 2.94 3.78 5.85 3.31 31 acetyl chloride 2.72 3.02 – 3.03 3.87 4.05 2.60 32 carbonyl chloride 1.17 1.22 – 1.14 2.89 2.86 1.59 33 carbonyl fluoride 0.95 0.94 – 0.93 0.48 0.89 1.42 34 p-chlorophenole 2.11 2.29 – 2.36 1.74 2.13 1.26 35 cyanogen fluoride 2.12 2.18 – 2.17 0.75 0.38 2.13 36 trifluroacetonitrile 1.26 1.13 – 1.08 0.66 0.03 1.55 37 cyanogen chloride 2.83 2.95 – 2.83 2.98 2.31 2.75 38 2-chloronitrobenzene 4.64 5.16 – 5.22 6.35 5.80 4.95 39 3-chloronitrobenzene 3.73 4.20 – 4.18 5.59 4.78 3.75 40 4-chloronitrobenzene 2.83 3.37 – 3.27 5.10 4.15 2.91 41 4-fluoronitrobenzene 2.87 3.28 – 3.27 3.07 2.00 2.72 42 formyl fluoride 2.02 2.19 – 2.20 3.14 3.24 2.45

212 Chapter B B.2 To Section 4

Table. B.3 (continued) method no. compound expt. SCF AIM MK-ESP NPA MPEOE/MHMO(NPA) PEOE/MHMO(µ) 43 acetyl fluoride 2.96 3.14 – 3.13 4.10 4.38 2.78 44 trifluoroacetic acid 2.28 2.36 – 2.38 3.03 2.59 1.55 45 2-fluoropyridine 3.36 3.61 – 3.63 5.14 5.40 3.10 46 2-chloropyridine 3.22 3.57 – 3.67 3.56 3.60 2.88 47 2-bromopyridine 3.11 3.56 – 3.75 3.22 3.32 2.67 48 3-chloropyridine 2.02 2.17 – 2.18 2.26 2.58 1.77 49 3-bromopyridine 2.00 2.14 – 2.16 2.30 2.66 1.72 50 4-chloropyridine 0.84 0.72 – 0.60 1.33 1.82 0.60 51 trichloroacetic acid – 1.81 – 1.80 1.25 1.33 1.34 52 trifluorobromomethane 0.65 0.67 – 0.51 2.62 2.35 0.76 53 chlorofluoromethane 1.82 2.02 – 2.11 3.11 3.34 1.93 54 chlorodifluoromethane 1.42 1.59 – 1.68 3.08 3.18 1.71 55 fluorotrichloromethane 0.46 0.47 – 0.42 2.66 2.01 0.42 56 chloropentafluoroethane 0.52 0.64 – 0.53 2.50 2.11 0.46 57 1-chloro-1-fluoro- 2.07 2.44 – 2.51 3.54 3.47 2.06 ethane 58 1,1-dichloro-2-fluoro- 2.43 2.16 – 2.19 3.14 3.20 1.76 propene 59 trifluorochloromethane 0.50 0.52 – 0.44 2.20 1.96 0.48 60 difluorodibromo- 0.66 0.70 – 0.67 3.44 2.73 0.78 methane 61 difluorodichloro- 0.51 0.55 – 0.45 2.82 2.29 0.51 methane 62 1-chloro-1,1-difluoro- 2.14 2.41 – 2.44 3.89 3.63 2.04 ethane 63 cis-1-bromo-2-fluoro- 1.94 2.18 – 2.25 3.24 3.41 2.30 ethene 64 cis-1-bromo-2-chloro- 1.56 1.83 – 2.01 1.22 1.42 2.04 ethene 65 chlorotrifluoroethene 0.58 0.60 – 0.48 2.42 2.20 0.37 66 bromotrifluoroethene 0.76 0.80 – 0.66 3.07 2.64 0.60 67 trans-1-bromo-2- 0.39 0.33 – 0.21 2.60 2.50 0.43 fluoroethene 68 trans-1-bromo-2- 0.00 0.11 – 0.07 0.61 0.45 0.22 chloroethene 69 m-chlorofluorobenzene 1.50 1.73 – 1.81 2.86 2.87 1.48 70 m-bromofluorobenzene 1.42 1.72 – 1.80 2.97 2.97 1.41 71 m-bromochloro- 1.52 1.72 – 1.81 1.17 0.95 1.27 benzene

213 B.2 To Section 4 Chapter B

Table. B.4 Dataset consisting of 89 ESCA C-1s chemical shifts for evaluating the quality of obtained σ-charges. Given are the charge values for the atoms under consideration calculated by modified PEOE/HMO with the two parametrizations obtained in this work and the experimental ESCA shifts (values taken from [157, 158]). All ESCA shift values are given in eV. See Section 4.6.1.

MPEOE/MHMO charges no. compound atom ESCA C-1s shift µ-fitted NPA-fitted 1 trifluoroacetic acid C1 299.28 0.478 1.090 2 propyne C1 291.77 -0.002 -0.481 3 1,1,1-trifluoroethane C1 292.07 0.025 -0.555 4 1,1,1-trifluoroethane C2 298.64 0.390 1.037 5 chloroethane C1 292.10 0.048 -0.348 6 chloroethane C2 291.10 -0.046 -0.581 7 fluoroethane C1 293.39 0.090 0.057 8 fluoroethane C2 291.19 -0.037 -0.574 9 ethanol C1 292.50 0.047 0.022 10 ethanol C2 291.10 -0.042 -0.573 11 1,1-difluoroethane C1 296.05 0.240 0.596 12 1,1-difluoroethane C2 291.62 -0.007 -0.563 13 acetaldehyde C2 291.35 -0.010 -0.539 14 acetone C2 291.15 -0.008 -0.538 15 acetic acid C4 291.55 0.021 -0.533 16 hexafluoroethane C1 299.72 0.481 1.067 17 tetrabromomethane C1 294.81 0.299 0.084 18 tetrachloromethane C1 296.39 0.370 0.253 19 dichlorodifluoromethane C1 298.93 0.458 0.839 20 bromotrifluoromethane C1 299.33 0.485 1.066 21 chlorotrifluoromethane C1 300.31 0.507 1.103 22 di(trifluoromethyl) ether C1 301.09 0.531 1.292 23 tetrafluoromethane C1 301.85 0.560 1.349 24 trichlorofluoromethane C1 297.54 0.413 0.555 25 1,1-dichloromethane C1 293.90 0.150 -0.278 26 1,1-difluoromethane C1 296.44 0.233 0.498 27 bromomethane C1 292.12 0.021 -0.556 28 ethane C1 290.71 -0.066 -0.585 29 chloromethane C1 292.40 0.039 -0.517 30 fluoromethane C1 293.70 0.081 -0.070 31 methylamine C1 291.60 -0.014 -0.355 32 nitromethane C1 293.04 0.073 -0.316 33 methanol C1 292.30 0.038 -0.159 34 methanethiol C1 291.41 -0.019 -0.670 35 methane C1 290.90 -0.076 -0.739 36 trichloromethane C1 295.10 0.257 -0.022 37 trifluoromethane C1 299.10 0.382 0.970 38 cyclopropane C1 290.68 -0.103 -0.412 39 dimethyl ether C2 293.30 0.041 -0.255 40 dimethyl thioether C2 291.17 -0.016 -0.660 41 trimethylamine C2 291.26 -0.009 -0.385 42 oxirane C1 292.91 0.008 -0.067

214 Chapter B B.2 To Section 4

Table. B.4 (continued) MPEOE/MHMO charges no. compound atom ESCA C-1s shift µ-fitted NPA-fitted 43 trimethylphosphine C2 290.30 -0.056 -0.756 44 ethylbromo acetate C5 290.90 -0.038 -0.576 45 ethylbromo acetate C6 292.38 0.116 -0.334 46 methylchlorosilane C1 290.45 -0.057 -0.998 47 methyldichlorosilane C1 290.68 -0.038 -0.985 48 methyltrichlorosilane C1 290.79 -0.018 -0.971 49 methylfluorosilane C1 290.53 -0.048 -0.969 50 methyldifluorosilane C1 290.80 -0.020 -0.926 51 methyldiiodosilane C1 290.37 -0.055 -0.990 52 methyliodosilane C1 290.42 -0.066 -1.000 53 methylsilane C1 290.31 -0.077 -1.010 54 chlorodimethylsilane C1 290.24 -0.055 -0.968 55 dichlorodimethylsilane C1 290.40 -0.035 -0.954 56 fluorodimethylsilane C1 290.33 -0.046 -0.939 57 difluorodimethylsilane C1 291.62 -0.018 -0.894 58 2,2,2-trifluoroethanol C1 292.77 0.137 0.053 59 2,2,2-trifluoroethanol C2 298.71 0.413 1.049 60 2,2,2-trifluoroethylamine C1 292.77 0.086 -0.156 61 2,2,2-trifluoroethylamine C2 298.42 0.404 1.045 62 di(trifluoromethyl) thioether C1 299.41 0.451 0.893 63 iododimethylsilane C1 290.28 -0.063 -0.971 64 dimethylsilane C1 290.14 -0.074 -0.981 65 di(methylsilyl) ether C1 290.20 -0.053 -0.972 66 di(methylsilyl) thioether C1 290.40 -0.066 -1.008 67 3,3,3-trifluoropropyne C3 299.70 0.459 1.142 68 3,3,3-trifluoropropene C3 298.72 0.413 1.076 69 methyltrifluoro acetate C3 293.34 0.061 -0.285 70 methyltrifluoro acetate C5 299.03 0.478 1.090 71 perfluoropropene C1 299.60 0.447 1.086 72 propene C1 290.68 -0.045 -0.547 73 methyl acetate C1 292.55 0.021 -0.532 74 methyl acetate C4 291.30 0.061 -0.285 75 ethyl formate C3 292.45 0.070 -0.108 76 ethyl formate C5 291.04 -0.038 -0.576 77 N,N-dimethylformamide C2 292.03 0.002 -0.402 78 iodotrimethylsilane C3 289.95 -0.061 -0.941 79 ethylchloro acetate C5 292.64 0.070 -0.109 80 ethylchloro acetate C6 291.07 -0.038 -0.576 81 2-chloro-2-methylpropane C1 292.13 0.066 0.031 82 2-chloro-2-methylpropane C2 290.80 -0.041 -0.574 83 ethylfluoro acetate C1 293.70 0.175 0.109 84 ethylfluoro acetate C5 292.58 0.070 -0.109 85 ethylfluoro acetate C6 291.06 -0.038 -0.576 86 methyl acrylate C1 292.32 0.061 -0.285 87 tetramethylsilane C2 289.78 -0.069 -0.921 88 trifluoromethylbenzene C7 298.24 0.419 1.079 89 toluene C7 290.10 -0.038 -0.543

215 216 Bibliography

[1] “CrossFire Beilstein”, Webpage, 2005. http://www.mdl.com/products/knowledge/crossfire_beilstein/

[2] “The Physical Properties Database (PHYSPROP)”, Webpage, 2004. http://www.syrres.com/esc/physprop.htm

[3] “About CAS”, Webpage, 2005. http://www.cas.org/about.html

[4] W. J. Hehre, L. Radom, P. v. R. Schleyer, J. A. Pople, Ab Initio Orbital Theory, John Wiley & Sons, New York, 1986.

[5] J. Gasteiger, T. Engel (Editors), Chemoinformatics – A Textbook, Wiley-VCH, Wein- heim, 2003.

[6] J. Gasteiger (Editor), Handbook of Chemoinformatics, Wiley-VCH, Weinheim, 2003.

[7] L. P. Hammett, “Effect of Structure upon the Reactions of Organic Compounds. Ben- zene Derivatives”, J. Am. Chem. Soc. 1937, 59, 96–103.

[8] P. Hohenberg, W. Kohn, “Inhomogeneous Electron Gas”, Phys. Rev. B 1964, 136, 864–871.

[9] J. Gasteiger, M. Marsili, “Iterative Partial Equalization of Orbital Electronegativity – A Rapid Access to Atomic Charges”, Tetrahedron 1980, 36, 3219–3228.

[10] J. Gasteiger, H. Saller, “Calculation of Charge Distribution in Conjugated Systems by a Quantification of the Resonance Concept”, Angew. Chem. Int. Ed. Engl. 1985, 24, 687–689, Angew. Chem. 1985, 97, 699–701.

217 [11] S. Bauerschmidt, J. Gasteiger, “Overcoming the Limitations of a Connection Table Description: A Universal Representation of Chemical Species”, J. Chem. Inf. Comput. Sci. 1997, 37, 705-714.

[12] N. B. Chapman, J. Shorter, Advances in Linear Free Energy Relationships, Plenum Press, London, 1972.

[13] N. B. Chapman, J. Shorter, Correlation Analysis in Chemistry – Recent Advances, Plenum Press, New York, 1978.

[14] R. W. Taft, in Steric Effects in Organic Chemistry, M. S. Newman (Editor), Wiley, New York, 1956, 556–675.

[15] S. Ehrenson, R. T. C. Brownlee, R. W. Taft, “Generalized Treatment of Substituent Effects in the Benzene Series. Statistical Analysis by the Dual Substituent Parameter Equation. I.”, Prog. Phys. Org. Chem. 1973, 10, 1–80.

[16] H. C. Brown, Y. Okamoto, “Directive Effects in Aromatic Substitution. XXX. Elec- trophilic Substituent Constants”, J. Am. Chem. Soc. 1958, 80, 4979–4987.

[17] T. Fujita, J. Iwasa, C. Hansch, “A New Substituent Constant, π, Derived from Partition Coefficients”, J. Am. Chem. Soc. 1964, 86, 5175–5180.

[18] R. Mannhold, R. F. Rekker, “The Hydrophobic Fragmental Constant Approach for Calculating log P in Octanol/Water and Aliphatic Hydrocarbon/Water Systems”, Per- spect. Drug Discovery Des. 2000, 18, 1–18.

[19] C. Hansch, P. P. Maloney, T. Fujita, R. M. Muir, “Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients”, Nature 1962, 194, 178–180.

[20] C. Hansch, T. Fujita, “ρ-, σ-, π-Analysis: A Method for the Correlation of Biological Activity and Chemical Structure”, J. Am. Chem. Soc. 1964, 86, 1616–1626.

[21] H. Kubinyi, QSAR: Hansch Analysis and Related Approaches, VCH, Weinheim, 1993.

[22] T. Kleinöder, S. Spycher, A. Yan, “Prediction of Properties of Compounds”, in Chemoinformatics – A Textbook, J. Gasteiger, T. Engel (Editors), Wiley-VCH, Wein- heim, 2003, 489–514.

218 [23] P. C. Jurs, “Quantitative Structure–Property Relationships”, in Handbook of Chemoin- formatics, J. Gasteiger (Editor), Wiley-VCH, Weinheim, 2003, Volume 3, 1314–1335.

[24] M. Karelson, V. S. Lobanov, A. R. Katritzky, “Quantum-Chemical Descriptors in QSAR/QSPR Studies”, Chem. Rev. 1996, 96, 1027–1043.

[25] R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Wein- heim, 2000.

[26] J. Gasteiger, “A Hierarchy of Structure Representation”, in Handbook of Chemoinfor- matics, J. Gasteiger (Editor), Wiley-VCH, Weinheim, 2003, 1034–1061.

[27] G. Moreau, P. Broto, “Autocorrelation of Molecular Structures”, Nouv. J. Chim. 1980, 4, 757–764.

[28] PETRA, Version 3.2, Molecular Networks GmbH, Erlangen, 2003.

[29] J. Gasteiger, “Emprical Methods for the Calculation of Physicochemical Data of Organic Compounds”, in Physical Property Prediction in Organic Chemistry, C. Jochum, M. G. Hicks, J. Sunkel (Editors), Springer Verlag, Heidelberg, 1988, 119– 138.

[30] J. Sadowski, J. Gasteiger, G. Klebe, “Comparison of Automatic Three-Dimensional Model-Builders Using 639 X-Ray Structures”, J. Chem. Inf. Comput. Sci. 1994, 34, 1000–1008.

[31] CORINA, Version 3.0, Molecular Networks GmbH, Erlangen, 2004.

[32] SURFACE, Version 1.0, Molecular Networks GmbH, Erlangen, 1994.

[33] AUTOCORR, Version 2.4, Molecular Networks GmbH, Erlangen, 2003.

[34] A. Teckentrup, “Einsatzmöglichkeiten selbstorganisierender neuronaler Netze in der Wirkstofforschung”, Dissertation, Friedrich-Alexander-Universität Erlangen- Nürnberg, 2000.

[35] L. Eriksson, H. Antti, E. Holmes, E. Johansson, T. Lundstedt, S. Wold, “Partial Least Squares (PLS) in Cheminformatics”, in Handbook of Chemoinformatics, J. Gasteiger (Editor), Wiley-VCH, Weinheim, 2003, 1134–1166.

219 [36] T. Kohonen, Self–Organizing Maps, 3rd Edition, Springer-Verlag, Berlin, Heidelberg, New York, 2001.

[37] J. Zupan, J. Gasteiger, Neural Networks in Chemistry and Drug Design, 2nd Edition, Wiley-VCH, Weinheim, 1999.

[38] A. Zell, Simulation Neuronaler Netze, Addison-Wesley: Bonn, 1994.

[39] P. C. Jurs, “Quantitative Structure–Property Relationships (QSPR)”, in Encyclope- dia of Computational Chemistry, P. von Ragué Schleyer, N. L. Allinger, T. Clark, J. Gasteiger, P. A. Kollman, H. F. Schaefer III, P. R. Schreiner (Editors), John Wiley & Sons, Chichester, UK, 1998, 2320–2330.

[40] A. Yan, J. Gasteiger, M. Krug, S. Anzali, “Linear and Nonlinear Functions on Model- ing of Aqueous Solubility of Organic Coompounds by Two Structure Representation Methods”, J. Comput. Aided Mol. Des. 2004, 18, 75–87.

[41] S. Sixt, “Methoden zur Abschätzung umweltrelevanter physikochemischer und okolo- gischer Eigenschaften organischer Substanzen aus der Molekulstruktur”, Dissertation, Friedrich-Alexander-Universität Erlangen-Nürnberg, 1998.

[42] A. von Homeyer, “Evolutionary Algorithms and Applications in Chemistry”, in Hand- book of Chemoinformatics, J. Gasteiger (Editor), Wiley-VCH, Weinheim, 2003, Vol- ume 3, 1239–1280.

[43] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, “Optimization by Simulated Annealing”, Science 1983, 220, 671–680.

[44] L. Pauling, “The Nature of the Chemical Bond. IV. The Energy of Single Bonds and the Relative Electronegativity of Atoms”, J. Am. Chem. Soc. 1932, 54, 3570–3582.

[45] R. T. Sanderson, “An Interpretation of Bond Lengths and a Classification of Bonds”, Science 1951, 144, 670–672.

[46] R. S. Mulliken, “New Electroaffinity Scale; Together with Data on Valence States and on Valence Ionization Potentials and Electron Affinities”, J. Chem. Phys. 1934, 2, 782–793.

220 [47] J. Hinze, H. H. Jaffé, “Electronegativity. I. Orbital Electronegativity of Neutral Atoms”, J. Am. Chem. Soc. 1962, 84, 540–546.

[48] J. Hinze, M. A. Whitehead, H. H. Jaffé, “Electronegativity. II. Bond and Orbital Elec- tronegativities”, J. Am. Chem. Soc. 1963, 85, 148–154.

[49] J. Hinze, H. H. Jaffé, “Electronegativity. III. Orbital Electronegativities and Electron Affinities of Transition Metals”, Can. J. Chem. 1963, 41, 1315–1328.

[50] J. Hinze, H. H. Jaffé, “Electronegativity. IV. Orbital Electronegativities of the Neutral Atoms of the Periods Three A and Four A and of Positive Ions of Periods One and Two”, J. Phys. Chem. 1963, 41, 1315–1328.

[51] R. P. Iczkowski, J. L. Margrave, “Electronegativity”, J. Am. Chem. Soc. 1961, 3547– 3551.

[52] J. Gasteiger, M. Marsili, “A New Model for Calculating Atomic Charges in Molecules”, Tetrahedron Lett. 1978, 34, 3181–3184.

[53] M. Marsili, “Ladungsverteilung und chemische Reaktivität in der computerges- teuerten Reaktionssimulation”, Dissertation, Universität München, 1980.

[54] L. G. Hammarström, T. Liljefors, J. Gasteiger, “Electrostatic Interactions in Molecular Mechanics (MM2) Calculations via PEOE Partial Charges I. Haloalkanes”, J. Comput. Chem. 1988, 9, 424–440.

[55] K. T. No, J. A. Grant, H. A. Scheraga, “Determination of Net Atomic Charges Using a Modified Partial Equalization of Orbital Electronegativities. 1. Application to Neutral Molecules for Polypeptides”, J. Phys. Chem. 1990, 94, 4732–4739.

[56] P. Sykes, Reaktionsmechanismen der Organischen Chemie, 9th Edition, VCH, Wein- heim, 1988.

[57] H. Saller, “Quantitative empirische Modelle für elekronische Effekte in π-Systemen und für die chemische Reaktivität”, Dissertation, Universität München, 1985.

[58] H. Balzert, Lehrbuch der Software-Technik: Software Entwicklung, Spektrum Akademischer Verlag, Heidelberg, 1996.

221 [59] H. Balzert, Lehrbuch der Software-Technik: Software-Management, Software- Qualitätssicherung, Unternehmensmodellierung, Spektrum Akademischer Verlag, Heidelberg, 1998.

[60] O. Dahl, E. Dijkstra, C. Hoare, Structured Programming, Academic Press, London, 1972.

[61] G. Booch, Objektorientierte Analyse und Design, 2nd Edition, Addison-Wesley, Bonn, 1994, chapter “Konzepte”, 15–214.

[62] G. Booch, Objektorientierte Analyse und Design, 2nd Edition, Addison-Wesley, Bonn, 1994.

[63] C. Krzysztof, U. Eisenecker, Generative Programming, Addison-Wesley, Boston, MA, 2000, chapter “6. Generic Programming”, 165–210.

[64] A. Alexandrescu, Modern C++ Design, Addison-Wesley, Reading, MA, 2001.

[65] M. H. Austern, Generic Programming and the STL, Addison-Wesley, Boston, 1998.

[66] “IBM Dataprocessing Techniques : Flowcharting Techniques”, Technical Report, In- ternational Buisness Machines Corporation, 1969.

[67] “Object Management Group”, Webpage, 2005. www.omg.org

[68] J. Rumbaugh, I. Jacobson, G. Booch, The Unified Modeling Language Reference Manual, Addison-Wesley, Boston, 1998.

[69] “Unified Modeling Language”, Webpage, 2005. www.uml.org

[70] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Entwurfsmuster, Addison-Wesley, Bonn, 1995.

[71] A. Hunt, D. Thomas, The Pragmatic Programmer, Addison-Wesley, Boston, MA, 2000.

[72] B. W. Kernighan, R. Pike, The Practice of Programming, Addison-Wesley, Reading, MA, 1999.

222 [73] K. Beck, Extreme Programming Explained, Addison-Wesley, Reading, MA, 1999.

[74] M. Fowler, Refactoring, Addison-Wesley, Reading, MA, 1999.

[75] M. Stephens, “The Case Against Extreme Programming”, Webpage, 2005. http://www.softwarereality.com/lifecycle/ xp/case_against_xp.jsp

[76] R. E. Jeffries, “Xtreme Programming : Software”, Webpage, 2005. http://www.xprogramming.com/software.htm

[77] U. Müller, C++-Implementierungstechniken, Internat. Thomson Publ., Bonn, 1997.

[78] J. Gasteiger, M. Marsili, G. Hutchings, H. Saller, P. Löw, P. Röse, K. Rafeiner, “Meth- ods for the Representation of Knowledge about Chemical Reactions”, J. Chem. Inf. Comput. Sci. 1990, 30, 467-476.

[79] A. Herwig, “Development of an Integrated Framework for Chemoinformatics Appli- cations”, Dissertation, Universität Erlangen-Nürnberg, 2004.

[80] J. Marusczyk, Dissertation, Universität Erlangen-Nürnberg, in preparation.

[81] J. Gasteiger, M. G. Hutchings, B. Christoph, L. Gann, C. Hiller, P. Low, M. Masili, H. Saller, K. Yuki, “A New Treatment of Chemical Reactivity: Development of EROS, an Expert System for Reaction Prediction and Synthesis Design”, Topics. Curr. Chem. 1987, 41, 19–73.

[82] PETRA, Version 2.6, Univ. Erlangen-Nürnberg, 1995. www2.chemie.uni-erlangen.de/software/petra

[83] R. Fick, W.-D. Ihlenfeldt, J. Gasteiger, “Computer-Assisted Design of Syntheses for Heterocyclic Compounds”, Heterocycles 1995, 40, 993–1007.

[84] R. Höllering, “Simulation von Massenspektren und Entwicklung eines Systems zur Reaktionsvorhersage”, Dissertation, Friedrich-Alexander-Universität Erlangen- Nürnberg, 1998.

223 [85] S. Bauerschmidt, “Repräsentation von Molekülstrukturen zur computergestützten Behandlung chemischer Reaktionen”, Dissertation, Universität Erlangen-Nürnberg, 1997.

[86] W.-D. Ihlenfeldt, Y. Takahashi, H. Abe, S. Sasaki, “Computation and Management of Chemical Properties in CACTVS: An Extensible Networked Approach toward Mod- ularity and Compatibility”, J. Chem. Inf. Comput. Sci. 1994, 34, 109–116.

[87] M. Sitzmann, “Konzepte zur Syntheseplanung: Entwicklung von Methoden zur Suche nach leistungsfähigen Synthesereaktionen”, Dissertation, Friedrich-Alexander- Universität Erlangen-Nürnberg, 2004.

[88] E. Hückel, “Quantentheoretische Beiträge zum Benzolproblem. I. Die Elektronenkon- figuration des Benzols und verwandter Beziehungen”, Z. Phys. 1931, 70, 204-286.

[89] I. MDL Information Systems, “CTfile Formats”, Webpage, 2002. www.mdl.com/solutions/white_papers/ctfile_formats.jsp

[90] G. N. Lewis, “The Atom and the Molecule”, J. Am. Chem. Soc. 1916, 38, 762-785.

[91] E. Riedel, Anorganische Chemie, 2nd Edition, de Gruyter, Berlin and New York, 1990.

[92] J. E. Huheey, E. A. Keiter, R. L. Keiter, Inorganic Chemistry, 4th Edition, Harper Collins, New York, 1993, chapter “Periodicity”, 857–888.

[93] A. E. Reed, P. v. R. Scheyer, “Chemical Bonding in Hypervalent Molecules. The Dominance of Ionic Bonding and Negative Hyperconjugation over d-Orbital Partici- pation”, J. Am. Chem. Soc. 1990, 112, 1434–1445.

[94] A. E. Reed, L. A. Curtiss, F. Weinhold, “Intermolecular Interactions from a Natural Bond Orbital, Donor–Acceptor Viewpoint”, Chem. Rev. 1988, 88, 899–926.

[95] F. Weinhold, “Natural Bond Orbital Methods”, in Encyclopedia of Computational Chemistry, P. von Ragué Schleyer, N. L. Allinger, T. Clark, J. Gasteiger, P. A. Koll- man, H. F. Schaefer III, P. R. Schreiner (Editors), John Wiley & Sons, Chichester, UK, 1998, 1792–1813.

224 [96] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheese- man, V. G. Zakrzewski, J. A. M. Jr., R. E. Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels, K. N. Kudin, M. C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. Cioslowski, J. V. Ortiz, A. G. Baboul, B. B. Ste- fanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challa- combe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, J. L. Andres, C. Gonzalez, M. Head-Gordon, E. S. Replogle, J. A. Pople, “Gaussian 98, Revision A.7”, Gaussian, Inc., Pittsburgh, PA, 1998.

[97] P. v. R. Schleyer, C. Maerker, A. Dransfeld, H. Jiao, N. v. E. Hommes, “Nucleus- Independent Chemical Shifts (NICS): A Simple and Efficient Aromaticity Probe.”, J. Am. Chem. Soc. 1996, 118, 6317–6318.

[98] F. D. Proft, P. Geerlings, “Relative Hardness as a Measure of Aromaticity”, Phys. Chem. Chem. Phys. 2004, 6, 242–248.

[99] E. Sorkau, “’Ringerkennung in chemischen Strukturen mit dem Computer”, Wiss. Z. Tech. Hochsch. Leuna-Merseburg 1985, 27, 765–770.

[100] A. Herwig, T. Kleinöder, J. Marusczyk, L. Terfloth, J. Gasteiger, MOSES Program- mer’s Guide, TORVS Research Team – Friedrich-Alexander-Universität Erlangen- Nürnberg, 2003.

[101] A. Herwig, T. Kleinöder, J. Marusczyk, L. Terfloth, “Moses API Reference”, Web- page, 2005. http://www2.chemie.uni-erlangen.de/software/moses/API

[102] R. W. Sebesta, Concepts of Programming Languages, 4th Edition, Addison-Wesley, Reading, MA, 1999, chapter “Support for Object-Oriented Programming”, 435–488.

[103] B. Eckel, Thinking in JAVA, 2nd Edition, Prentice-Hall, Upper Saddle River, NJ, 2000.

[104] “Simulated Annealing Information”, Webpage, 2005. http://www.taygeta.com/annealing/simanneal.html

225 [105] Trolltech, “QT Overview”, Webpage, 2005. http://www.trolltech.com/products/qt/

[106] Free Software Foundation, Inc., “GNU General Public License”, Webpage, 2005. http://www.fsf.org/licensing/licenses/gpl.html

[107] J. Marusczyk, “MIA – MOSES Structure Browser”, Univ. Erlangen-Nürnberg, 2005.

[108] R. Mannhold, H. van de Waterbeemd, “Substructure and Whole Molecule Approaches for Calculating log P”, J. Comput. Aided Mol. Des. 2001, 15, 337-354.

[109] U. Breymann, Komponenten entwerfen mit der C++ STL, 2nd Edition, Addison- Wesley, Bonn, 1999.

[110] B. Stroustrup, The C++ Programming Language, 4th Edition, Addison-Wesley, Boston, 2000.

[111] M. Henning, S. Vinoski, Advanced CORBA Programming with C++, Addison-Wesley, Reading, MA, 1999, chapter “C++ Mapping for Type any”, 663–690.

[112] C. A. Lipinski, F. Lombardo, B. W. Dominy, P. J. Feeney, “Experimental and Compu- tational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings”, Advanced Drug Delivery Reviews 1997, 23, 3–25.

[113] J. Zhang, J. Gasteiger, “Prediction of Ionization Constants for Aliphatic Alcohols and Acids”, in preparation .

[114] F. Eisenlohr, “A New Calculation of Atomic Refractions. I.”, Z. physik. Chem. 1911, 75, 585–607.

[115] A. I. Vogel, “Physical Properties and Chemical Constitution. XXIII. Miscellaneous Compounds. Investigation of the so-called Coordinate or Dative Link in Esters of Oxy Acids and in Nitro Paraffins by Molecular Refractivity Determinations. Atomic, Structural, and Group Parachors and Refractivities.”, J. Chem. Soc. 1948, 1833–1855.

[116] K. J. Miller, J. A. Savchik, “A New Empirical Method to Calculate Average Molecular Polarizabilities”, J. Am. Chem. Soc. 1979, 101, 7206–7213.

226 [117] Y. K. Kang, M. S. Jhon, “Additivity of Atomic Static Polarizabilities and Dispersion Coefficients”, Theor. Chim. Acta 1982, 61, 41–48.

[118] K. J. Miller, “Additivity Methods in Molecular Polarizability”, J. Am. Chem. Soc. 1990, 112, 8533-8542.

[119] R. Wang, Y. Gao, L. Lai, “Calculating Partition Coefficient by Atom-Additivity Method”, Perspect. Drug Discovery Des. 2000, 19, 47–66.

[120] K. G. Denbigh, “Polarizabilities of bonds. I.”, Trans. Faraday Soc. 1940, 36, 936–948.

[121] P. R. Andrews, D. J. Craik, J. L. Martin, “Functional Group Contributions to Drug- Receptor Interactions”, J. Med. Chem. 1984, 27, 1648–1657.

[122] N. Cohen, S. W. Benson, “Estimation of Heats of Formation of Organic Compounds by Additivity Methods”, Chem. Rev. 1993, 93, 2419–2438.

[123] J. Gasteiger, “Empirical Approaches to the Calculation of Properties”, in Chemoinfor- matics – A Textbook, J. Gasteiger, T. Engel (Editors), Wiley-VCH, Weinheim, 2003, 320–337.

[124] N. Kotchev, I. Bangov, T. Kleinöder, J. Gasteiger, “Prediction of Heats of Formation by a Group Additivity Scheme”, unpublished results, 2005.

[125] N. Kotchev, T. Kleinöder, “AdditivityModelBuilder”, Univ. Erlangen-Nürnberg, 2005.

[126] A. Szabo, N. S. Ostlund, Modern Quantum Chemistry, McGraw-Hill, Inc., New York, 1989.

[127] S. M. Bachrach, Reviews in Computational Chemistry, VCH Publishers, Inc., New York, 1994, Volume 5, chapter “Population Analysis and Electron Densities from Quantum Mechanics”, 171–227.

[128] W. J. Mortier, K. V. Genechten, J. Gasteiger, “Electronegativity Equalization: Appli- cation and Parametrization”, J. Amer. Chem. Soc. 1985, 107, 829–835.

[129] A. Streitwieser Jr., Molecular Orbital Theory for Organic Chemists, John Wiley & Sons, Inc., New York and London, 1961.

227 [130] E. Heilbronner, H. Bock, Das HMO–Modell und seine Anwendung, 2nd Edition, Ver- lag Chemie, Weinheim, New York, 1978.

[131] F. A. Van-Catledge, “A Pariser–Parr–Pople–Based Set of Huckel Molecular Orbital Parameters”, J. Org. Chem. 1980, 45, 4801–4802.

[132] R. J. Abraham, P. E. Smith, “Charge Calculation in Molecular Mechanics IV: A Gen- eral Method for Conjugated Systems”, J. Comput. Chem. 1988, 9, 288–297.

[133] G. W. Wheland, D. E. Mann, “The Dipole Moments of Fulvene and Azulene”, J. Chem. Phys. 1949, 17, 264–268.

[134] A. Streitwieser Jr., “A Moleclar Orbital Study of Ionization Potentials of Organic Compounds Utilizing the ω-technique”, J. Am. Chem. Soc. 1960, 82, 4123–4135.

[135] W. Gründler, “Omega HMO Treatment of σ π Electron Systems”, Monatsh. Chem. 1970, 101, 1362–1372.

[136] P. R. Gerber, “Charge Distribution from a Simple Molecular Orbital Type Calculation and Non-bonding Interaction in the Force Field MAB”, J. Comput. Aided Mol. Des. 1998, 12, 37–51.

[137] M. Fato, J. Gasteiger, “Ein Quantitatives Modell zur Beschreibung der Ladungs- verteilung in Molekülen”, unpublished results, 1997.

[138] M. G. Hutchings, J. Gasteiger, “Residual Electronegativity – An Empirical Quan- tification of Polar Influences and its Application to the Proton Affinity of Amines”, Tetrahedron Lett. 1983, 24, 2541–2544.

[139] R. W. Taft, R. D. Topsom, “The Nature and Analysis of Substituent Electronic Ef- fects”, Prog. Phys. Org. Chem. 1987, 16, 1–83.

[140] C. Hansch, A. Leo, R. W. Taft, “A Survey of Hammett Substituent Constants and Resonance and Field Parameters”, Chem. Rev. 1991, 91, 165–195.

[141] R. D. Topsom, “Some Theoretical Studies of Electronic Substituent Effects in Organic Chemistry”, Prog. Phys. Org. Chem. 1987, 16, 125–191.

228 [142] A. R. Cherkasov, V. I. Galkin, R. A. Cherkasov, “A New Approach to the Theoretical Estimation of Inductive Constants”, J. Phys. Org. Chem. 1998, 11, 437–447.

[143] J. D. Thompson, J. D. Xidos, T. M. Sonbuchner, C. J. Cramer, D. G. Truhlar, “More Reliable Partial Atomic Charges when Using Diffuse Basis Sets”, PhysChemComm 2002, 5, 117–134.

[144] P. Bagossi, G. Zahuczky, J. Tözsér, I. T. Weber, R. W. Harrison, “Improved Parameters for Generating Partial Charges: Correlation with Observed Dipole Moments”, J. Mol. Model. 1999, 5, 143–152, suppl. Mat.

[145] R. C. Weast, M. J. Astle, W. H. Beyer (Editors), CRC Handbook of Chemistry and Physics, 65th Edition, CRC Press, Inc., Boca Raton, Florida, 1985.

[146] A. D. Becke, “Density-Functional Thermochemistry. III. The Role of Exact Ex- change”, J. Chem. Phys. 1993, 98, 5648–5652.

[147] C. Lee, W. Yang, R. G. Parr, “Development of the Colle-Salvetti Correlation-Energy Formula into a Functional of the Electron Density”, Phys. Rev. B 1988, 37, 785–789.

[148] F. de Proft, J. M. L. Martin, P. Geerlings, “On the Performance of Density Functional Methods for Describing Atomic Populations, Dipole Moments and Infrared Intensi- ties”, Chem. Phys. Lett. 1996, 250, 393–401.

[149] N. v. E. Hommes, personal communication, 2004.

[150] J. B. Foresman, Æ. Frisch, Exploring Chemistry with Electronic Structure Methods, 2nd Edition, Gaussian, Inc., Pittsburgh, USA, 1996.

[151] L. Wall, R. L. Schwartz, Programming perl, O’Reilly & Associates, Inc., Sebastopol, CA, 1990.

[152] W.-D. Ihlenfeldt, J. Gasteiger, “Hash Codes for the Identification and Classification of Molecular Structure Elements”, J. Comput. Chem. 1994, 15, 793–813.

[153] K. Siegbahn, C. Nordling, A. Fahlman, R. Nordberg, K. Hamrin, J. Hedman, G. Jo- hannson, T. Bergmark, S.-E. Karlsson, I. Lindgren, B. Lindberg, ESCA: Atomic, Molecular and Solid State Structure Studied by Means of Electron Spectoscropy, Almquist and Wiksells, Uppsala, 1967.

229 [154] C. Nordling, “ESCA: Electron Spectroscopy for Chemical Analysis”, Angew. Chem. Int. Ed. 1972, 11, 83–92.

[155] W. L. Jolly, W. B. Perry, “Estimation of Atomic Charges by an Electronegativity Equalization Procedure Calibrated with Core Binding Energies”, J. Am. Chem. Soc. 1973, 95, 5442–5450.

[156] J. Mullay, “A Simple Method for Calculating Atomic Charges in Molecules”, J. Am. Chem. Soc. 1986, 108, 1770–1775.

[157] A. A. Oliferenko, V. A. Palyulin, S. A. Pisarev, A. V. Neiman, N. S. Zefirov, “Novel Point Charge Models: Reliable Instruments for Molecular Electrostatics”, J. Phys. Org. Chem. 2001, 14, 355–369.

[158] A. Cherkasov, “Inductive Electronegativity Scale. Iterative Calculation of Inductive Partial Charges”, J. Chem. Inf. Comput. Sci. 2003, 43, 2039–2047.

[159] J. E. Huheey, Anorganische Chemie, de Gruyter, Berlin and New York, 1988, chapter “Experimentelle Ermittlung der Ladungsverteilung in Molekülen”, 172–196.

[160] S. Fliszár, G. Cardinal, M.-T. Béraldin, “Charge Distribution and Chemical Effects. 30. Relationships between Nuclear Magnetic Resonance Shifts and Atomic Charges”, J. Am. Chem. Soc. 1982, 104, 5287–5292.

[161] A. L. McClellan, Tables of Experimental Dipole Moments, Freeman, San Fransisco, 1963.

[162] O. Exner, Dipole Moments in Organic Molecules, Thieme Verlag, Stuttgart, 1975.

[163] G. I. Csonka, M. Erdösy, J. Réffy, “Structure of Disiloxane: A Semiempirical and Post-Hartree-Fock Study”, J. Comput. Chem. 1994, 15, 925–936.

[164] P. Kollman, “A Method of Describing the Charge Distribution in Simple Molecules”, J. Am. Chem. Soc. 1978, 100, 2974–2984.

[165] G. Rauhut, T. Clark, “Multicenter Point Charge Model for High-Quality Molecular Electrostatic Potentials from AM1 Calculations”, J. Comput. Chem. 1993, 14, 503– 509.

230 [166] K.-H. Cho, Y. K. Kang, K. T. No, H. A. Scheraga, “A Fast Method for Calculating Geometry-Dependent Net Atomic Charges for Polypeptides”, J. Phys. Chem. B 2001, 105, 3624-3634.

[167] J. W. Storer, D. J. Giesen, C. J. Cramer, D. G. Truhlar, “Class IV charge models: A new Semiempirical Approach in Quantum Chemistry”, J. Comput. Aided Mol. Des. 1995, 9, 87–110.

[168] J. Li, J. Xing, C. J. Cramer, D. G. Truhlar, “Accurate Dipole Moments From Hartree- Fock Calculations by Means of Class IV Charges”, J. Chem. Phys. 1999, 111, 885– 892.

[169] P. Winget, J. D. Thompson, J. D. Xidos, C. J. Cramer, D. G. Truhlar, “Charge Model 3: A Class IV Charge Model Based on Hybrid Density Functional Theory with Variable Exchange”, J. Phys. Chem. A 2002, 106, 10707–10717.

[170] K. B. Wiberg, P. R. Rablen, “Comparison of Atomic Charges Derived via Different Procedures”, J. Comput. Chem. 1993, 14, 1504–1518.

[171] C. A. Coulson, H. C. Longuet-Higgins, “The Electronic Structure of Conjugated Sys- tems. I. General Theory”, Proc. Roy. Soc. 1947, A191, 39–60.

[172] R. S. Mulliken, “Electronic Population Analysis on LCAO-MO Molecular Wavefunc- tions. I.”, J. Chem. Phys. 1955, 23, 1833–1840.

[173] P. Bultinck, W. Langenaecker, P. Lahorte, F. D. Proft, P. Geerlings, M. Waroquier, J. P. Tollenaere, “The Electronegativity Equalization Method I: Parametrization and Validation for Atomic Charge Calculations”, J. Phys. Chem. A 2002, 106, 7887–7894.

[174] A. E. Reed, R. B. Weinstock, F. Weinhold, “Natural Population Analysis”, J. Chem. Phys. 1985, 83, 735–746.

[175] R. F. W. Bader, “A Quantum Theory of Molecular Structure and its Applications”, Chem. Rev. 1991, 91, 893–928.

[176] F. L. Hirshfeld, “Bonded-atom Fragments for Describing Molecular Charge Densi- ties”, Theor. Chim. Acta 1977, 44, 129–138.

231 [177] F. D. Proft, C. V. Alsenoy, A. Peeters, W. Langenaeker, P. Geerlings, “Atomic Charges, Dipole Moments, and Fukui Functions Using the Hirshfeld Partitioning of the Elec- tron Density”, J. Comput. Chem. 2002, 23, 1198–1209.

[178] K. B. Wiberg, “Substituent Effects on the Acidity of Weak Acids. 1. Bicyclo[2.2.2]octane-1-carboxylic Acids and Bicyclo[1.1.1]pentane-1-carboxylic Acids”, J. Org. Chem. 2002, 67, 1613–1617.

[179] R. K. Roy, “Stockholders Charge Partitioning Technique. A Reliable Electron Popula- tion Analysis Scheme to Predict Intramolecular Reactivity Sequence”, J. Phys. Chem. A 2003, 107, 10428–10434.

[180] M. Mandado, C. V. Alsenoy, R. A. Mosquera, “Comparison of the AIM and Hirshfeld Totals, π, and σ Charge Distributions: A Study of Protonation and Hydride Addition Processes”, J. Phys. Chem. A 2004, 108, 7050–7055.

[181] L. E. Chirlian, M. M. Francl, “Atomic Charges Derived from Electrostatic Potentials: A Detailed Study”, J. Comput. Chem. 1987, 8, 894–905.

[182] C. M. Breneman, K. B. Wiberg, “Determining Atom-Centered Monopoles from Molecular Electrostatic Potentials. The Need for High Sampling Density in For- mamide Conformational Analysis”, J. Comput. Chem. 1990, 11, 361–373.

[183] B. H. Besler, K. M. Merz Jr., P. A. Kollman, “Atomic Charges Derived from Semiem- pirical Methods”, J. Comput. Chem. 1990, 11, 431–439.

[184] B. Beck, T. Clark, R. C. Glen, “VESPA: A New, Fast Approach to Electrostatic Potential-Derived Atomic Charges from Semiempirical Methods”, J. Comput. Chem. 1997, 18, 744–756.

[185] E. Sigfridsson, U. Ryde, “Comparison of Methods for Deriving Atomic Charges from the Electrostatic Potential and Moments”, J. Comput. Chem. 1998, 19, 377–395.

[186] R. Soliva, M. Orozco, F. J. Luque, “Suitability of Density Functional Methods for Calculation of Electrostatic Properties”, J. Comput. Chem. 1997, 18, 980–991.

232 [187] M. Winter, “Chemistry: WebElements Periodic Table”, University of Sheffield, 1993– 2005. www.webelements.com/

[188] K. B. Wiberg, J. J. Wendoloski, “The Electrical Nature of C – H Bonds and its Rela- tionship to Infrared Intensities”, J. Comput. Chem. 1981, 2, 53–57.

[189] S. Fliszár, Charge Distributions and Chemical Effects, Springer-Verlag, New York, 1983.

[190] K. B. Wiberg, J. J. Wendolowski, “Charge Redistributlon in the Molecular Vibrations of Acetylene, Ethylene, Ethane, Methane, Silane, and the Ion. Signs of the M-H Bond Moments”, J. Phys. Chem. 1984, 88, 586–593.

[191] A. E. Reed, F. Weinhold, “Some Remarks on the C – H bond dipole moment”, J. Chem. Phys. 1986, 84, 2428–2430.

[192] H. Hollenstein, R. R. Marquardt, M. Quack, M. A. Suhm, “Dipole Moment Func- tion and Equilibrium Structure of Methane in an Analytical, Anharmonic Nine- Dimensional Potential Surface Related to Experimental Rotational Constants and Transition Moments by Quantum Monte Carlo Calculations”, J. Chem. Phys. 1994, 101, 3588–3602.

[193] G. Monaco, “On the Definition of the Atomic Charge. Relationships Between C NMR Chemical Shifts, Dipole Moments, and Charges in Saturated Hydrocarbons”, Int. J. Quantum Chem. 1998, 68, 201–210.

[194] J. Gasteiger, M. D. Guillen, “Dipole Moments Obtained by Iterative Partial Equalisa- tion of Orbital Electronegativity”, J. Chem. Research (M) 1983, 2611–2624.

[195] J. D. Thompson, C. J. Cramer, D. G. Truhlar, “Parametrization of Charge Model 3 for AM1, PM3, BLYP, and B3LYP”, J. Comput. Chem. 2003, 24, 1291–1304.

[196] M. Marsili, J. Gasteiger, “π-Charge Distributions from Molecular Topology and π- Orbital Electronegativity”, Croat. Chem. Acta 1980, 53, 601-614.

[197] R. S. Mulliken, “Electronic Structures of Polyatomic Molecules and Valence. V.

Molecules RXn”, J. Chem. Phys. 1933, 1, 492–503.

233 [198] P. v. R. Schleyer, A. J. Kos, “The Importance of Negative (Anionic) Hyperconjuga- tion”, Tetrahedron 1983, 39, 1141–1150.

[199] K. B. Wiberg, P. R. Rablen, “Origin of the Stability of Carbon Tetrafluoride: Negative Hyperconjugation Reexamined”, J. Am. Chem. Soc. 1993, 115, 614–625.

[200] I. A. Koppel, V. Pihl, J. Koppel, F. Anvia, R. W. Taft, “Thermodynamic Acidity of

(CF3)3CH and 1H-Undecafluorobicyclo[2.2.1]heptane: The Concept of Anionic (Flu- orine) Hyperconjugation”, J. Am. Chem. Soc. 1994, 116, 8654–8657.

[201] M. Gussioni, C. Castiglioni, G. Zerbi, “Molecular Point Charges as Derived From Infrared Intensities and from ab initio Calculations”, THEOCHEM 1986, 138, 203– 212.

234 Lebenslauf

Geburtsdatum 02.09.1971

Geburtsort Marburg/Lahn

Staatsangehörigkeit deutsch

Familienstand verheiratet, ein Kind

Schulbildung

09/78 - 07/82 Grundschule Marburg/Wehrda

09/82 - 04/91 Martin-Luther-Gymnasium in Marburg

Hochschulausbildung

WS 91 – SS 97 Studium der Chemie an der Philipps-Universität Marburg

07/96 – 04/97 Diplomarbeit bei Prof. Dr. B. Kadenbach:„Einfluß von ATP und ADP auf die Atmungsaktivität und das Membranpotential von rekonstituierter Cytochrom c Oxidase”

seit 10/97 Promotion bei Prof. Dr. J. Gasteiger am Computer-Chemie-Centrum, Institut für Organische Chemie, Friedrich-Alexander-Universität Erlangen-Nürnberg