Search for Useful Graph Theoretical Invariants of Molecular Structure

Search for Useful Graph Theoretical Invariants of Molecular Structure

60 J. Chem. In$ Comput. Sci. 1988, 28, 60-68 Search for Useful Graph Theoretical Invariants of Molecular Structure MILAN RANDIe,+ PETER J. HANSEN,* and PETER C. JURS* Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 Received July 10, 1987 We have reexamined several graph theoretical invariants that have been used in the past in discussions of structure-property and structure-activity correlations in the search for optimal single-variabledescriptors for molecular structures. We found that judicious selection of the functional form for a correlation can lead to significant improvements in the correlations. The approach is illustrated with the connectivity index, the molecular ID number, the Wiener number, and the Hosoya topological index. We review the most suitable forms for the considered indices for correlations with boiling point. These findings have been further supported by considering alternative (polynomial expansion) models. The best results are about 2-3 times better (when measured by the magnitude of the standard deviation) than the best considered previously. INTRODUCTION Table I. Selection of Graph Theoretical Indices, Their Origin, and Their Structural Foundation The starting point in many theoretical considerations con- graph theoretical cerns the choice of the basis for the description of the system invariant name/symbol comments and ref under examination. In quantum chemical computations the basis means the selection of the orbitals (atomic) from which path numbers Pi ref 25 nonadjacent bonds P(W ref 3 and 7 one builds molecular wave functions, and the orbitals con- (matching) sidered in the past included Slater-type atomic orbitals, dou- Z = XdG,k) ble-zeta STO, Hermite-type extensions of STO, Gaussian sum of path W ref 2 atomic orbitals and contracted Gaussian sets, Hylaraas or- lengths bitals, geminals, etc. In various problems different bases offer sum over weighted connectivity index x ref 1; based on different advantages, and it is up to the practitioner to select bonds discrimination of the best basis for the problem considered. In mathematical bond types (m,n) and physical computations (including among others the sum over atom higher connectivity ref 26 evaluation of molecular integrals) one similarly considers weighted paths indices,m alternative coordinate systems, and in addition to the more weighed average J ref 12; defined similarly common Cartesian and polar coordinates, one may in different distances to x but distance applications choose between confocal ellipsoidal, confocal between atoms matrix is used sum over weighted identification number ref 18; becomes same as paraboloidal, conical cylindrical, ellipsoidal, elliptic cylindrical, path lengths ID W if bond weighting oblate spheroidal, parabolic, parabolic cylindrical, prolate is eliminated spheroidal, or toroidal and of course the even more general count of bonds shape attribute K ref 21 curvilinear coordinates if the problem warrants. It is generally relative to considered advantageous to have such a variety, and one does extremes not expect objections if for a selected problem one considers generalized ref 21 yet another coordinate system as more suitable. In contrast, weighted bonds when it comes to the problem of describing molecular struc- sum tures and novel molecular descriptors are introduced, one is index (largest) ref 14 defensive and under pressure not to proliferate yet another eigenvalue pruning sequence centric index ref 13 index or parameter. When a new coordinate system is pro- of terminal posed, it is done so in connection with solving a particular vertices problem, and one expects that the new approach is either count of paths of higher shape attributes ref 28 simpler, adds to the problem some insights, or solves a problem length 2 relative mX that was not solvable with alternative schemes. One does not to extreme invent a coordinate system for coordinate system’s sake, isomers modified X’ this work; exponent in however, and the same prudence must accompany the intro- connectivity (mn)kvaried duction of novel structural (and graph theoretical) invariants. In Table I we list several graph theoretical invariants, indi- not been used is hardly warranted. In this paper we critically cating their origin and structural foundation. Not all the examine a selection of graph theoretical indices to see if they invariants, graph theoretical indices, and more general offer optimal molecular descriptions, and if not, in which way mathematical objects (polynomials, sequences) have been used they ought to be modified. The purpose of this paper is not as extensively as others. Some, like the connectivity indices, to develop a recommendation for any specific index but to have been discussed and applied in hundreds of papers, while outline a way of testing indices. By use of the language of others have been used very little. The number of uses of a coordinate systems, this amounts to the outline of an approach single index is not, of course, the best criterion for its evalu- that can guide one in selecting one or a very few suitable ation, but on the other hand the evaluation of indices that have coordinate systems for a problem of interest. In analytical calculus and its applications, the problem of selection of the On sabbatical leave from the Department of Mathematics and Computer coordinate system usually does not arise, because all the co- Science, Drake University, Des Moines, IA 503 11, and Ames Laboratory-DOE, ordinate systems (listed above) have simple symmetry prop- Iowa State University, Ames, IA 50011. *Onsabbatical leave from the Department of Chemistry, Northwestern College, erties and this usually suffices to make the choice obvious. But Orange City, IA 51041. in structure-property or structure-activity studies it is by no 0095-2338/88/1628-0060%01.50/0 0 1988 American Chemical Society GRAPHTHEORETICAL INVARIANTS OF MOLECULARSTRUCTURE J. Chem. Znf. Comput. Sci., Vol. 28, No. 2, 1988 61 Table 11. Experimental Boiling Points for Lower Alkanes and Predicted Values Based on Simple Linear Regression for the Connectivity Index X, Wiener Number W,and Hosoya Topological Index Z (Revised from Ref 4). bP RandiE bP Wiener bP Hosoya bP compound (obsd), OC index (x) (calcd), OC no. (W) (calcd) index (Z) (calcd), "C ethane -88.63 1 .ooo -70.5 1 1 -36.27 2 -26.16 propane -42.07 1.414 -40.24 4 -27.97 3 -18.31 2-methylpropane -1 1.73 1.732 -16.99 9 -14.13 4 -10.46 n-butane -0.50 1.914 -3.68 10 -1 1.36 5 -2.61 2,2-dimethylpropane 9.50 2.000 2.61 16 5.24 5 -2.61 2-methylbutane 27.85 2.270 22.35 18 10.78 7 13.09 n-pentane 36.07 2.414 32.88 20 16.37 8 20.94 2,2-dimethylbutane 49.74 2.561 43.63 28 38.46 9 28.79 2,3-dimethylbutane 57.99 2.643 49.62 29 41.22 10 36.64 2-methylpentane 60.27 2.770 58.91 32 49.53 11 44.49 3-methylpentane 63.28 2.808 61.69 31 46.76 12 52.34 n-hexane 68.74 2.914 69.44 35 57.83 13 60.19 2,2-dimethylpentane 79.20 3.061 80.19 46 86.57 14 68.04 2,4-dimethylpentane 80.50 3.126 84.94 48 93.81 15 75.89 2,2,3-trimethylbutane 80.88 2.943 71.56 42 77.20 13 60.19 3,3-dimethylpentane 86.03 3.121 84.58 44 82.74 16 83.74 2,3-dimethylpentane 89.78 3.181 88.96 46 88.28 17 91.59 2-methylhexane 90.05 3.270 95.47 52 104.88 18 99.44 3-methylhexane 91.85 3.308 98.25 50 99.35 19 107.30 3-ethylpentane 93.48 3.346 101.03 48 93.81 20 115.15 n- heptane 98.42 3.414 106.00 56 115.92 21 123.00 correlation coefficient 0.9914 0.9432 0.9128 estimated standard deviation 6.746 17.09 21.01 means obvious which index or indices may be best for a ogous compounds (e.g., fatty acids). Hence, its use here is particular study. The question then is, how should one choose somewhat improper since it was not meant to be used alone. among graph descriptors? Nevertheless, it is a well-defined parameter, representing the We will consider the boiling points of alkanes, specifically sum of the lengths of all paths in a molecule, where paths are all alkanes from C2 to C,, a total of 21 structures. The counted in their graph theoretical sensee5 Platt6 tried to in- particular property of boiling point is used for illustrative terpret Wand suggested that W is a measure of molecular purposes only. Our aim is not to suggest a better correlation volume. The number can be derived from the distance matrixS for the boiling points of alkanes (for this it would be advisable simply as the sum of all distances in a structure, that is, the to enlarge the sample by including octanes, possibly nonanes, sum of the distances between all pairs of carbon atoms (for decanes, etc.). Our aim is to outline the strategy for selecting acyclic structures, such as the alkanes considered here). graph theoretical indices (invariants), to examine more closely Finally, Hosoya's 2 number is based on a count of nonad- why some give better answers than others, and to seek con- jacent bonds. If p(G,k) represents the number of different ditions that may improve the correlations. ways of selecting k nonadjacent bonds in graph G (e.g., a We start with the connectivity index, x,' the Wiener num- structure), then 2 is the sum of p(G,k) for k equal to zero, ber, W,2 and the Hosoya topological index,3 Z, and show in up to half the number of atoms [by definition, p(G,O) = 1 and Table I1 a comparison of the direct use of these three indices p(G,l) = number of bonds].

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us