Study of the Similarity Between Molecules Through the Comparison of Thermo-Chemical Properties Tetsafe De Angeli

Study of the similarity between molecules through the comparison of thermo-chemical properties Tetsafe De Angeli Abstract Chemical similarity between molecules, is a key concept in drug design and drug discovery. The advance in computational science, had given rise to many new possibilities for understand difference and similarity between molecules. In particular, is very important the QSAR (quantitative structure–activity relationship ) paradigm (1). Typical approaches to calculate chemical similarities use chemical fingerprints or QSAR, but this doesn´t consider the thermochemical properties of the molecules. In others words chemical similarity is described as an inverse of a measure of distance in descriptor space. In this work is presented a new method, for calculate chemical similarities. Consider De Angeli´s formula: 1 ( )(ln(퐶푝)+ln(푆°)+ln(∆퐻°)) 퐷 = 푚푒 3 In this equation, m is the relative molecular mass of the molecule , and for the same molecule Cp is the specific heat capacity, S is the entropy of the molecule, and H is the standard enthalpy of formation. So every molecule has D value, given by this formula, and is possible to compare the d value, for evaluate the similarity between two or more molecules. (2) 1.Introduction A lot of software, is used for calculating chemical similarity in large database of molecules. The different algorithms can predict the chemical similarity between two or more ligands (3). In complexe process of drug discovery, chemical similarity algorithms are very helpful, and can improve the quality of the entire research (4). Most chemical similarity algorithms use topological 2d or 3d features (5-6). The basic idea behind all the software, and algorithms is that if two ligands share similar structure, then they will have similar bioactivity. In others words, an interaction with the receptors (the biological activity of the ligands) is related with the structure of the ligands, and with the information of the atomic and orbital structure of a single ligand, is possible generate or screen a group of molecule related to the ligand. There are a lot of function (or metrics) that allow the calculation of the similarity, their efficiency depends on the different case of study (7). Typical approaches to calculate chemical similarities use chemical fingerprints or QSAR, but this doesn´t consider the thermochemical properties of the molecules. In others words chemical similarity is described as an inverse of a measure of distance in descriptor space. In this work is presented a new method, for calculate chemical similarities. In fact chemical similarity is calculated by assign a value to various molecules, considering the thermo-chemical reaction of the molecules, in the same environment ( temperature, pression). 2.Theory The basic principle that is used in this method, is related with the thermo-chemical properties of a molecule, in others word, if a molecule has certain atoms and precise bonds, than will also have a precise behaviour in the thermo-chemical properties. And this information could be used for creating a function capable to calculate chemical similarity. In fact, if two molecule have similar thermos-chemical properties, is obvious infer that has similar bonds, kind of atoms, and also similar relative molar mass. Then calculating the D value of De Angeli´s formula, is possible assign to every molecule a precise number, based on the thermo-chemical properties and relative molar mass. Compare two molecule using the D value, is like comparing these molecules considering the thermo-chemical properties. To understand the De Angeli´s formula, is necessary consider the geometric mean. Figure 1.This is the geometric mean of n value. In the case of De Angeli´s formula, n are the three thermos-chemical data of a molecule: specific heat capacity , entropy, and standard enthalpy of formation. So De Angeli´s Formula is the average of thermochemical properties of a molecule, multiplied by the relative molecular mass of the molecule. 1 ( )(ln(퐶푝)+ln(푆°)+ln(∆퐻°)) 퐷 = 푚푒 3 Figure 2. De Angeli´s Formula With the relative molecular mass coefficient is possible, have the dimension of the molecule. With the specif heat capacity, is possible indirectly calculate the heat property of the atoms of the molecule. With the standard enthalpy of formation, is possible indirectly have information about the force of the various bond in the molecule, and entropy gave has a little information about the geometric distribution of the various atoms of the molecule. 3. Test case and Calculation setup For biological and chemical compound, is calculated the D value. After given a specif D value for every molecule, all the D value as been used for a graph representation . If a D value of a molecule A is similar, with the D value of another molecule B, this means that the two molecule are very similar. For calculate D, it has been used thermochemical data from different font of the web site NIST (8). For every molecule we used thermos-chemical information in the condensed phase of the compounds (9-14). Alkanes D 1 594.45 Propane 2 270.83 Ethane 3 906 Butane 4 1425.044482 Pentane 5 2153.279216 n-Hexane 6 2554.158409 Heptane 7 3249.777688 Octane 8 4054.666433 Nonane 9 4878.514027 Decane 10 5801.62439 Undecane 11 7168.818866 Dodecane Isomers D 1 594.45 Propane 2 906.1011931 Butane 3 1425.044482 Pentane 4 2153.279216 n- Hexane 5 2554.158409 Heptane 6 1420.603451 2-methyl-Butane 7 1333.677279 Neopentane 8 883.3497569 Isobutane 9 1949.995758 2-methylpentane 10 1946.270473 3-Methylpentane 11 1914.4928 2,3-dimethybutane 12 1920.159327 2,2-Dimethylbutane 13 2559.256805 2-Methylhexane 14 2490.116913 3-Methylhexane 15 2517.202573 2,2-Dimethylpentane 16 2479.279482 2,3-Dimethylpentane 17 2522.361073 2,4-Dimethylpentane 18 2493.15688 3,3-dimethylpentane 19 2505.163559 3-Ethylpentane 20 2463.169479 2,2,3-trimethylbutane 21 221.6174038 Ethene 22 311.9495632 Propene 23 197.13555 Homopolymer isotactic,1- propene 24 777.2496938 1- Butene 25 330.8662354 cis-2-Butane 26 343.8796565 Trans-2-Butane 27 414.3815335 2-metilpropene 28 882.6430955 1-pentane 29 897.6132415 cis-2-pentene 30 932.0194705 trans-2-pentene 31 968.5487015 2-butene, 2-methyl- 32 943.1017209 2-methyl-1-butene 33 889.8345911 3-methyl-1-butene 34 805.0145602 1,3.pentadiene 35 936.8736387 Isoprene 36 1041.584612 3 -methyl-1,2 -butadiene Cyclic C. D 1 356.9898386 cyclopropane 2 2083 cyclobutane 3 430.7468306 methylcyclopropane 4 980.5453202 cyclopentane 5 943.8139642 methylenecyclobutane 6 931.2711481 1,3-cyclopentadiene 7 1007.96185 1,3-cyclohexadiene 8 973.3474506 1,4-Cyclohexadiene 9 818.8368708 Benzene 10 1440.634748 Cyclohexane 11 1864.861098 Cycloheptane 12 2382.960907 Cyclooctane Bio Co. D 1 1805.393561 Alanina 2 1850.661086 D-Alanina 3 1833.15536 dl-Alanina 4 4721.55296 Glutammic Acid 5 6605.625945 Glutammine 6 4549.010374 N-dl-Alanylglycine 7 3898.369352 Leucine 8 3768.022067 Amino caproic 9 9685.869641 Phenylalanine 10 14204.74455 L-Tryptophan 11 1629.601424 Glycerin Figure 3. Various organic compounds, with the specific D value. Alkanes D 8000 7000 6000 5000 4000 3000 2000 1000 0 0 2 4 6 8 10 12 Alkanes Figure 4. Graphical representation of the D values, for the alkanes. Isomers D 3000 2500 2000 1500 1000 500 0 0 5 10 15 20 25 30 35 40 Isomers Figure 5. Graphical representation of the D values, for various isomers. Cyclic Compound D 3000 2500 2000 1500 1000 500 0 0 2 4 6 8 10 12 14 Cyclic Compound Figure 6. Graphical representation of the D values, for cyclic compound. Biological Compounds D 16000 14000 12000 10000 8000 6000 4000 2000 0 0 2 4 6 8 10 12 Biological Compounds Figure 7. Graphical representation of the D values, for various biological compounds. Conclusion As you can see in the cartesian coordinate system of the various D values, the more similar are different values of D, the more these molecules are similar in geometrical and composition aspects. Using a big database, with this method will be possible understand molecular similarity. The purpose is to find a valid method for, chemical similarity comparison, in order to have efficient tools for drug design and drug discovery. REFERENCES 1. (1) Katritzky, A. R.; Fara, D. C.; Petrukhin, R. O.; Tatham, D. B.; Maran, U.; Lomaka, A.; Karelson, M. The present utility and future potential for medicinal chemistry of QSAR/QSPR with whole molecule descriptors. Curr. Top. Med. Chem. 2002, 2, 1333-1356. 2. Nilakantan, R.; Bauman, N.; Venkataraghavan, R. New method for rapid characterization of molecular shapes: applications in drug design. J. Chem. Inf. Comput. Sci. 1993, 33, 79-85. 3. (1) Willet, P. Similarity and Clustering in Chemical Information Systems. Research Studies Press, Ltd., John Wiley & Sons: New York, 1987. 4. Sliwoski G, Kothiwale S, Meiler J, Lowe EW. Computational methods in drug discovery. Pharmacol Rev. 2014;66(1):334-95. Published 2014 Jan. doi:10.1124/pr.112.007336 5. Perry, N. C.; van Geerestein, V. J. Database searching on the basis of three-dimensional molecular similarity using the SPERM program. J. Chem. Inf. Comput. Sci. 1992, 32, 607-616. 6. Good, A. C.; Hodgkin, E. E.; Richards, W. G. Similarity screening of molecular data sets. J. Comput.-Aided Molec. Design 1992, 6, 513- 520. 7. Arif, S. M., Holliday, J. D., & Willett, P. (2013). Comparison of chemical similarity measures using different numbers of query structures. Journal of Information Science, 39(1), 7–14. 8.P.J. Linstrom and W.G.

Load more