Predikce Hodnot Pka Na Základě EEM Atomových Nábojů
Total Page:16
File Type:pdf, Size:1020Kb
MASARYKOVA UNIVERZITA PRˇ I´RODOVEˇ DECKA´ FAKULTA NA´ RODNI´ CENTRUM PRO VY´ZKUM BIOMOLEKUL Diplomova´pra´ce BRNO 2013 STANISLAV GEIDL MASARYKOVA UNIVERZITA PRˇ I´RODOVEˇ DECKA´ FAKULTA NA´ RODNI´ CENTRUM PRO VY´ZKUM BIOMOLEKUL Predikce hodnot pKa na za´kladeˇ EEM atomovy´ch na´boju˚ Diplomova´pra´ce Stanislav Geidl Vedoucı´pra´ce: prof. RNDr. Jaroslav Kocˇa, DrSc. Konzultant: RNDr. Radka Svobodova´Varˇekova´, Ph.D. Brno 2013 Bibliograficky´za´znam Autor: Bc. Stanislav Geidl Prˇı´rodoveˇdecka´fakulta, Masarykova univerzita Na´rodnı´centrum pro vy´zkum biomolekul Na´zev pra´ce: Predikce hodnot pKa na za´kladeˇEEM atomovy´ch na´boju˚ Studijnı´program: Biochemie Studijnı´obor: Chemoinformatika a bioinformatika Vedoucı´pra´ce: prof. RNDr. Jaroslav Kocˇa, DrSc. Akademicky´rok: 2012/2013 Pocˇet stran: xi + 78 Klı´cˇova´slova: disociacˇnı´konstanta, pKa, QSPR, vı´cerozmeˇrna´linea´rnı´ regrese, atomove´ na´boje, kvantova´ mechanika, EEM, fenoly, karboxylove´kyseliny Bibliographic Entry Author: Bc. Stanislav Geidl Faculty of Science, Masaryk University National Centre for Biomolecular Research Title of Thesis: Predicting pKa values from EEM atomic charges Degree Programme: Biochemistry Field of Study: Chemoinformatics and Bioinformatics Supervisor: prof. RNDr. Jaroslav Kocˇa, DrSc. Academic Year: 2012/2013 Number of Pages: xi + 78 Keywords: dissociation constant, pKa, QSPR, multilinear regres- sion, partial atomic charge, quantum mechanics, EEM, phenols, carboxilic acids Abstrakt Disociacˇnı´ konstanta pKa je velmi du˚lezˇitou vlastnostı´ molekuly, a proto je vy´voj spolehlivy´ch a rychly´ch metod pro predikci pKa jednou z klı´cˇovy´ch oblastı´ vy´zkumu. Tato pra´ce zjisˇt’uje, jestli lze predikovat pKa pomocı´QSPR modelu˚ vyuzˇı´vajı´cı´ch empiricke´atomove´na´boje, konkre´tneˇna´boje vypocˇı´tane´metodou EEM (Electronegativity Equalization Method). V ra´mci pra´ce jsme nejdrˇı´ve shro- ma´zˇdili 18 sad EEM parametru˚vytvorˇeny´ch pro 8 ru˚zny´ch kvantoveˇmechan- icky´ch (QM) na´bojovy´ch sche´mat. Pote´jsme si prˇipravili tre´ninkovou sadu 74 molekul substituovany´ch fenolu˚. Da´le jsme pro kazˇdou molekulu vytvorˇili jejı´ disociovanou formu tak, zˇe jsme odstranili fenolovy´vodı´k. Pro vsˇechny molekuly v tre´ninkove´sadeˇjsme pak vypocˇı´tali EEM na´boje pomocı´18 sad EEM parametru˚ a QM na´boje pomocı´8 zmı´neˇny´ch na´bojovy´ch sche´mat. Pro kazˇdy´typ QM a EEM na´boju˚jsme vytvorˇili jeden QSPR model, vyuzˇı´vajı´cı´na´boje z nedisociovane´ molekuly (QSPR model se trˇemi deskriptory), a jeden QSPR model, vyuzˇı´vajı´cı´ na´boje z disociovane´i nedisociovane´molekuly (QSPR model s peˇti deskriptory). Pote´jsme vypocˇı´tali krite´ria kvality modelu˚a zhodnotili vsˇechny vytvorˇene´QSPR modely. QSPR modely vyuzˇı´vajı´cı´EEM na´boje se uka´zaly jako vhodna´metodika 2 2 pro predikci pKa (63% teˇchto modelu˚ma´ R > 0.9 a nejlepsˇı´model ma´ R = 0.924). Jak bylo ocˇekva´no, QM QSPR modely poskytovaly prˇesneˇjsˇı´predikci pKa nezˇEEM QSPR modely, rozdı´l v prˇesnosti modelu˚vsˇak nebyl prˇı´lisˇvy´razny´. Navı´c velkou vy´hodou EEM QSPR modelu˚je, zˇe jejich deskriptory (tj., EEM na´boje) mohou by´t vypocˇı´ta´ny vy´razneˇrychleji nezˇQM na´boje. Kromeˇtoho jsme zjistili, zˇe EEM QSPR modely nejsou tak silneˇovlivneˇny vy´beˇrem metodiky vy´pocˇtu na´boju˚jako QM QSPR modely. Robustnost EEM QSPR modelu˚byla na´sledneˇpotvrzena cross- validacı´. Aplikovatelnost EEM QSPR modelu˚na dalsˇı´trˇı´dy chemicky´ch la´tek jsme uka´zali pomocı´prˇı´padove´studie, zameˇrˇene´na karboxylove´kyseliny. Souhrnneˇ mu˚zˇeme rˇı´ci, zˇe EEM QSPR modely poskytujı´rychlou a prˇesnou metodiku pro predikci pKa, a mohou by´t tedy pouzˇity naprˇ. v ra´mci virtua´lnı´ho screeningu. Abstract The acid dissociation constant pKa is a very important molecular property, and there is a strong interest in the development of reliable and fast methods for pKa prediction. We have evaluated the pKa prediction capabilities of QSPR models based on empirical atomic charges calculated by the Electronegativity Equaliza- tion Method (EEM). Specifically, we collected 18 EEM parameter sets created for 8 different quantum mechanical (QM) charge calculation schemes. Afterwards, we prepared a training set of 74 substituted phenols. Additionally, for each molecule we generated its dissociated form by removing the phenolic hydrogen. For all the molecules in the training set, we then calculated EEM charges using the 18 parameter sets, and the QM charges using the 8 above mentioned charge calcu- lation schemes. For each type of QM and EEM charges, we created one QSPR model employing charges from the non-dissociated molecules (three descriptor QSPR models), and one QSPR model based on charges from both dissociated and non-dissociated molecules (QSPR models with five descriptors). Afterwards, we calculated the quality criteria and evaluated all the QSPR models obtained. We found that QSPR models employing the EEM charges proved as a good approach 2 for the prediction of pKa (63% of these models had R > 0.9, while the best 2 had R = 0.924). As expected, QM QSPR models provided more accurate pKa predictions than the EEM QSPR models but the differences were not significant. Furthermore, a big advantage of the EEM QSPR models is that their descriptors (i.e., EEM atomic charges) can be calculated markedly faster than the QM charge descriptors. Moreover, we found that the EEM QSPR models are not so strongly influenced by the selection of the charge calculation approach as the QM QSPR models. The robustness of the EEM QSPR models was subsequently confirmed by cross-validation. The applicability of EEM QSPR models for other chemical classes was illustrated by a case study focused on carboxylic acids. In summary, EEM QSPR models constitute a fast and accurate pKa prediction approach that can be used in assigning charges to compounds in virtual screening. Podeˇkova´nı´ Ra´d bych na tomto mı´steˇ podeˇkoval sve´mu sˇkoliteli prof. Kocˇovi za jeho podporu a pomoc, bez ktere´by tato pra´ce nemohla vzniknout. Da´le bych ra´d podeˇkoval sve´odborne´konzultance Radce Svobobove´za vesˇkerou jejı´pomoc. V neposlednı´ rˇadeˇ bych chteˇl take´ podeˇkovat babicˇce a otci za jejich podporu prˇedevsˇı´m prˇi psanı´pra´ce. Prohla´sˇenı´ Prohlasˇuji, zˇe jsem svoji diplomovou pra´ci vypracoval samostatneˇs vyuzˇitı´m informacˇnı´ch zdroju˚, ktere´jsou v pra´ci citova´ny. Brno 14. kveˇtna 2013 . Stanislav Geidl Contents I Introduction1 II Theory4 1. Acid dissociation constant..............................................5 1.1 Definition............................................5 1.2 Methods for pKa prediction...............................6 2. Atomic charges.........................................................8 2.1 QM approaches for charge calculation.......................8 2.1.1 Quantum mechanics................................9 2.1.2 Charge distribution schemes.......................... 11 2.2 Empirical approaches for charge calculation................... 12 2.2.1 Electronegativity equalization method................... 13 3. Quantitative Structure-Property Relationship (QSPR).................. 15 3.1 Descriptors........................................... 15 3.2 Parametrization of models................................ 16 3.3 Validation of models.................................... 18 3.3.1 Cross-validation................................... 19 III Methods 20 4. Tools................................................................... 21 4.1 NCI database......................................... 21 4.2 Physprop............................................ 21 4.3 Gaussian............................................. 21 4.4 AIMAll.............................................. 22 4.5 EEM solver........................................... 22 4.6 QSPR designer........................................ 22 4.7 R................................................... 22 – ix – IV Results and discussion 23 5. Preparation of input data............................................... 24 5.1 Data sets............................................. 24 5.1.1 Data set for phenols................................. 24 5.1.2 Data set for carboxylic acids.......................... 25 5.2 pKa values............................................ 25 5.3 Charge calculation...................................... 25 5.3.1 EEM parameter sets................................. 25 5.3.2 QSPR model parameterization......................... 26 6. Building QSPR models for phenols.................................... 27 6.1 Selection of descriptors.................................. 27 6.2 Models.............................................. 27 6.2.1 Prediction of pKa using EEM charges.................... 28 6.2.2 Improvement of EEM QSPR models by removing outliers.... 33 6.2.3 Improvement of EEM QSPR models by adding descriptors... 33 6.2.4 Comparison of EEM and QM QSPR models............... 34 6.2.5 Influence of theory level and basis set................... 34 6.2.6 Influence of population analysis....................... 34 6.2.7 Influence of the EEM parameter set..................... 35 7. Validation QSPR models............................................... 36 7.1 Cross-validation....................................... 36 7.2 Comparison with previous work........................... 36 8. Case study for carboxylic acids......................................... 40 8.1 Selection of descriptors.................................. 40 8.2 Models.............................................. 40 9. Publication outputs...................................................