J. Cent. South Univ. (2015) 22: 30−36 DOI: 10.1007/s11771-015-2491-0

A new group contribution-based method for estimation of flash point temperature of alkanes

DAI Yi-min(戴益民)1, 2, LIU Hui(刘辉)2, CHEN Xiao-qing(陈晓青 )1, LIU You-nian(刘又年)1, LI Xun(李浔)2, ZHU Zhi-ping(朱志平)2, ZHANG Yue-fei(张跃飞)2, CAO Zhong(曹忠)2

1. School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; 2. School of Chemistry and Biological Engineering, Hunan Provincial Key Laboratory of Materials Protection for Electric Power and Transportation, Changsha University of Science and Technology, Changsha 410004, China

© Central South University Press and Springer-Verlag Berlin Heidelberg 2015

Abstract: Flash point is a primary property used to determine the fire and explosion hazards of a liquid. New group contribution-based models were presented for estimation of the flash point of alkanes by the use of multiple linear regression (MLR) and artificial neural network (ANN). This simple linear model shows a low average relative deviation (AARD) of 2.8% for a data set including 50 (40 for training set and 10 for validation set) flash points. Furthermore, the predictive ability of the model was evaluated using LOO cross validation. The results demonstrate ANN model is clearly superior both in fitness and in prediction performance. ANN model has only the average absolute deviation of 2.9 K and the average relative deviation of 0.72%.

Key words: flash point; alkane; group contribution; artificial neural network (ANN); quantitative structure−property relationship (QSPR)

points of different classes of organic compounds in 1 Introduction Refs. [4−9]. The first method for predicting the flash points of organic compounds from their molecular With an important chemical engineering property, structures was developed by SUZUKI et al [4]. flash point (FP) plays significant role in industrial KESHAVARZ and GHANBARZADEH [5] developed a processes when evaluating process safety of flammable simple method for prediction of the flash points of liquids. The flash point is defined as the lowest unsaturated including alkenes, alkynes and temperature at which a liquid produces flammable vapor aromatics. Carbon and hydrogen atoms are used as a core near its surface that ignites spontaneously when mixed function that can be revised for some compounds by a with the air and brought in a spark or a flame under the correcting function. The predicted flash points of 173 standard atmospheric pressure [1−2]. unsaturated hydrocarbons are in good agreement with the The flash point temperature can be frequently used measured values. KATRITZKY et al [2] developed a to assess the fire hazard associated with designing QSPR model for prediction of the flash point of 758 process, transportation and storage systems of flammable organic compounds by using geometrical, topological, substances. Although the measured flash point values are quantum mechanical and electronic descriptors by linear available for many organic compounds by the open-cup and nonlinear methods. PAN et al [6] developed a or closed-cup methods [3], there are many important back-propagation neural network based QSPR model for chemicals such as toxic, volatile, explosive, and the prediction of flash points of 92 alkanes using group radioactive compounds for which no flash point data are bond contribution method. In our previous work [7], we given. Hence, the development of theoretical prediction developed a prediction model of flash point of methods which are desirably convenient and reliable for based on novel topological predicting flash point is required. electro-negativity indices YC and WC and path number There are various methods for prediction of flash parameter P3. There are various reviews and monographs

Foundation item: Projects(21376031, 21075011) supported by the National Natural Science Foundation of China; Project(2012GK3058) supported by the Foundation of Hunan Provincial Science and Technology Department, China; Project supported by the Postdoctoral Science Foundation of Central South University, China; Project(2014CL01) supported by the Foundation of Hunan Provincial Key Laboratory of Materials Protection for Electric Power and Transportation, China; Project supported by the Innovation Experiment Program for University Students of Changsha University of Science and Technology, China Received date: 2013−09−22; Accepted date: 2014−01−17 Corresponding author: DAI Yi-min, Associate Professor, PhD; Tel: +86−731−85258733; E-mail: [email protected]; CHEN Xiao-qing, Professor, PhD; Tel: +86−731−88830833; E-mail: [email protected]

J. Cent. South Univ. (2015) 22: 30−36 31 for predicting flash points of different classes organic vertices i and j in molecular skeleton graph. Because the compounds [8−9]. interactions between atoms will decrease with the In this work, the main goal is to build a new reliable broadening of distance between them, a new derivative group contribution-based model for the estimation of the matrix S is defined. The elements of S matrix are the 2 flash point temperatures of alkanes. squares of the reciprocal distances 1/ dij ; when i=j, 2 1/ dij =0. Matrices D and S are expressed as 2 Principles and methodologies d11 d12  d1i  d1n 

d d  d  d  2.1 Data preparation  21 21 2i 2n          All the flash points of 50 alkanes used in this work D    were taken from the chemical database of the department di1 di2  dii  din  of chemistry at the University of Akron (USA) [10].           Flash point values of alkanes were in the range from 169 dn1 dn2  dni  dnn  K to 304 K. 1/ d 2 1/ d 2  1/ d 2  1/ d 2  11 12 1i 1n  2 2 2 2  2.2 Molecular descriptors 1/ d21 1/ d22  1/ d2i  1/ d2n  1         Quantitative structure−property relationship (QSPR) S     2 2 2 2 2 is a sophisticated method which relates various physical dij 1/ di1 1/ di2  1/ dii  1/ din  properties to some chemical structure based parameters           call “molecular descriptors” [11]. In QSPR, the first as 2 2 2 2 1/ dn1 1/ dn2  1/ dni  1/ dnn well as the most crucial step is how to exactly extract sufficiently the molecular structure information with Assuming that a graph G={V, S} is a hydrogen- numerical format from the molecular graph. suppressed graph with n vertices of atoms; for the two For an alkane molecular with n carbon atoms, each sets V and S, the former represents the covalent bonds pair of adjacent vertices is linked by a singled bond. between pairs of atoms and the latter symbolizes atoms Based on the number of C—C bond connecting with the in a molecule. Theoretically, such a graph can effectively other carbon atoms in a molecule, various carbon atoms depict the structural information of chemical species [12]. in all alkanes except methane can be classified as four Invariant derived from the graph can be used to types: primary ( — CH3), secondary (>CH2), ternary characterize chemical structure. Then the revised matrix — ( CH<) and quaternary (>C<) carbons, and they are can be obtained by matrix V multiply matrix S. symbolic respectively C , C , C and C . If an atom is 1 2 3 4 VS  V1 V2  Vi  Vn  linked to k(k=1, 2, 3, 4) non-hydrogen atoms through  2 2 2 2  chemical bonds, the atomic type belongs to the k-th one. 1/ d11 1/ d12  1/ d1i  1/ d1n  2 2 2 2  Thus, we suggest that the factor contributing to the 1/ d21 1/ d22  1/ d2i  1/ d2n  properties of alkane series should include four parts: the           contributions of primary (—CH ), secondary (>CH ), 2 2 2 2 3 2 1/ di1 1/ di2  1/ dii  1/ din  ternary ( — CH<) and quaternary (>C<) carbons.         Therefore, according to what is mentioned previously, if   1/ d 2 1/ d 2  1/ d 2  1/ d 2  the specific chemical environment does not influence  n1 n2 ni nn    f f  f  f  carbon, the property of alkanes P(alkane) can be expressed 1 2 i n as where fi is the element of vector VS obtained by V P  f  f  f  f multiply S, and they can respectively encode (alkane) (CH3 ) (CH 2 ) (CH) (C) corresponding vertex structural information in a In order to characterize the size of the molecule and molecule. the connection of each atom, we adopt vertex matrix V According to this definition, for the kth atom-type in and distance matrix D to descript molecular structure. a molecular graph, the corresponding group The vertex matrix V is defined as what can effectively contribution-based index,  , is the sum of all f distinguish the level of branching of the molecule from k V=1,2,3,4 values of the same atom type in a molecular graph: atomic species. The vertex matrix V is defined as follows: 1   fV 1, 2   fV 2, 3   fV 3, 4   fV 4

V  Vi  [V1 V2  Vi  Vn ] As an illustration, Fig.1 depicts the hydrogen- The distance matrix D of n atoms in a molecule, a depleted molecular graph of 2,2,4-trimethylpentane. We square symmetric matrix, can be expressed as D=[dij]n×n, take 2,2,4-trimethylpentane for an example to compute where dij is the length of the shortest path between the the group contribution-based indices1,2,3, and 4.

32 J. Cent. South Univ. (2015) 22: 30−36 can be evaluated by the average relative deviation (AARD) and root mean square error (RMSE) [13]. The two terms can be calculated as

n 100 | yi  y0 | DAARD   (1) n i1 yi

n Fig. 1 Molecular skeleton graph of 2,2,4-trimethylpentane (y  y )2 i1 i 0 ERMSE  (2) n The calculation includes

where y is the observed value, y is the predicted value, V = [1 4 2 3 1 1 1 1] i 0 and n is the number of compounds in the data set. 0 1 2 3 4 2 2 4 The internal predictive capability of model is 1 0 1 2 3 1 1 3   evaluated by leave-one-out cross-validation on the 2 1 0 1 2 2 2 2   training set, which is calculated with the following 3 2 1 0 1 3 3 1 D    equation [14]: 4 3 2 1 0 4 4 2 training   (y  y )2 2 1 2 3 4 0 2 4 2 i1 i 0 RLOO  1 training (3)   (y  y )2 2 1 2 3 4 2 0 4 i1 i tr   4 3 2 1 2 4 4 0 where yi and y0 are respectively the observed, predicted  0 1 1/ 4 1/ 9 1/16 1/ 4 1/ 4 1/16 flash point values of the ith compound; y is the mean   tr  1 0 1 1/ 4 1/9 1 1 1/9  observed flash point values of the compounds in the 1/ 4 1 0 1 1/ 4 1/ 4 1/ 4 1/ 4  training set.   1/ 9 1/ 4 1 0 1 1/9 1/9 1  The external predictive capability of model on the S  2 1/16 1/9 1/ 4 1 0 1/16 1/16 1/ 4  external validation set is evaluated by Q ext, which is   calculated with the following equation [14]: 1/ 4 1 1/ 4 1/ 9 1/16 0 1/ 4 1/16   test 1/ 4 1 1/ 4 1/ 9 1/16 1/ 4 0 1/16 (y  y )2   2 i1 i 0 Q  1 (4) 1/16 1/9 1/ 4 1 1/ 4 1/16 1/16 0  ext test (y  y )2 i1 i tr VS 1 4 2 3 1 1 1 1  where y , y , and y are respectively the observed,  0 1 1/ 4 1/9 1/16 1/ 4 1/ 4 1/16 i 0 tr predicted and mean observed flash point values of the  1 0 1 1/ 4 1/9 1 1 1/9    compounds in the training set. 1/ 4 1 0 1 1/ 4 1/ 4 1/ 4 1/ 4    1/9 1/ 4 1 0 1 1/9 1/9 1   3 Results and discussion 1/16 1/ 9 1/ 4 1 0 1/16 1/16 1/ 4    1/ 4 1 1/ 4 1/9 1/16 0 1/ 4 1/16 3.1 MLR calculation 1/ 4 1 1/ 4 1/9 1/16 1/ 4 0 1/16 The simplest expression of the fundamental   principle of QSPR theory is a linear relationship P=a+bX 1/16 1/ 9 1/ 4 1 1/ 4 1/16 1/16 0  between a property P and the chosen molecular  [5.4583 5.9722 8.2500 5.3333 4.3819 5.4583 descriptor X, where a and b are real numbers determined 5.4583 4.3819] by a standard least-square procedure [15]. According to the aforementioned method, multiple linear regression 1  25.1387, 2  8.2500, 3  5.3333, 4  5.9722 (MLR) is used to build QSPR model, the intercepts and According to the same method, 50 alkanes of coefficients of which are reported with their 95% 1,2,3, and 4 were calculated and the calculated confidence interval. To test the stability of models, leave- values are listed in Table 1. one-out (LOO) cross validation is carried out. The relationship between molecular descriptors and flash 2.3 Model performance point of alkanes is conducted as follows: Model performance can be measured by different 2 metrics. R , which gives the fraction of explained TFP=2.0840υ1+3.3168υ2+2.4973υ3+0.4694υ4+171.2100 (5) variance for training set, is used to measure the model’s fit performance. The goodness of the model performance where TFP is the flash point temperature.

J. Cent. South Univ. (2015) 22: 30−36 33

Table 1 Values of molecular descriptor and flash point temperature for 50 alkanes

TFP/K Set No. Compound     1 2 3 4 Exp. MLR ANN 1 Propane 4.5000 2 0 0 169 187.2 169.0 2 5.2222 6.5000 0 0 213 203.7 212.1 3 Pentane 5.5694 11.7222 0 0 224 221.7 226.5 4 2-methylbutane 10.6944 4.5000 4.2500 0 216 219.0 216.0 5 2,2-dimethylpropane 19 0 0 4 208 212.7 208.0 6 2-methylpentane 11.0277 9.7222 4.6111 0 250 238.0 248.3 7 3-methylpentane 10.5138 9.7222 5.5000 0 241 239.1 241.0 8 2,2-dimethylbutane 19.3055 5.1111 0 5.2500 225 230.9 225.0 9 2,3-dimethylbutane 16.8888 0 11 0 244 233.9 241.5 10 3-methylhexane 10.7049 15.2916 5.8611 0 258 258.9 259.8 11 3-ethylpentane 9.9582 15.6666 6.7500 0 255 260.8 257.3 12 2,2,3-trimethylbutane 25.5832 0 6.7500 6.5000 247 244.4 247.0 13 2,2-dimethylpentane 18.9860 11.3333 0 5.6111 250 251.0 252.9 14 2,3-dimethylpentane 16.9166 5.2222 12.6191 0 258 255.3 258.0 15 3,3-dimethylpentane 17.9582 12.2222 0 6.5000 254 252.2 253.0 16 6.0064 28.9760 0 0 286 279.8 286.0 17 2,2-dimethylhexane 19.2538 17.0763 0 5.7847 269 270.7 270.8 18 2,3-dimethylhexane 16.9168 10.7916 13.1458 0 283 275.1 276.8 19 2,5-dimethylhexane 16.9588 12.9444 9.9166 0 271 274.3 273.3 Training 20 3,3-dimethylhexane 18.1354 18.1527 0 6.8611 272 272.4 271.2 set 21 3,4-dimethylhexane 16.3576 10.7916 14.2222 0 277 276.6 278.9 22 3-ethyl-3-methylpentane 16.8751 19.4166 0 7.7500 276 274.4 272.2 23 2-methylheptane 11.4328 21.0660 4.8872 0 277 277.1 277.7 24 3-methylheptane 10.8719 21.0660 6.0347 0 279 278.8 280.3 25 2,2,3-trimethylpentane 25.3749 6.58330 8 6.8611 270 269.1 271.9 26 2,3,3-trimethylpentane 24.8610 6.4722 7.1111 7.7500 273 265.9 271.1 27 2,3,4-trimethylpentane 23.2220 0 20.4444 0 273 270.7 273.0 28 Nonane 6.0784 34.9824 0 0 304 299.9 304.0 29 2,2-dimethylheptane 19.4556 22.9532 0 5.8872 297 290.6 292.0 30 2,3-dimethylheptane 17.1169 16.5660 13.4219 0 288 295.3 290.4 31 2,4-dimethylheptane 16.6967 18.3438 11.8316 0 288 296.4 290.2 32 2,6-dimethylheptane 17.0912 18.8610 9.9794 0 299 294.3 288.0 33 3,4-dimethylheptane 16.5108 16.5660 14.7569 0 288 297.4 292.1 34 4,4-dimethylheptane 18.2432 24.2882 0 7.2222 288 293.2 292.7 35 3-methyloctane 11.0014 26.9760 6.1372 0 297 298.9 294.7 36 2,2,4-trimethylhexane 24.8788 14.9930 6.5833 6.1458 288 292.1 292.4 37 2,3,3-trimethylhexane 25.0695 12.4028 7.2847 8.1111 288 286.6 285.5 38 2,3,5-trimethylhexane 22.9588 7.7222 18.9999 0 288 292.1 288.4 39 2,2,3,4-tetramethylpentane 32.2500 0 15.8333 7.2222 284 281.3 286.1 40 2,2,4,4-tetramethylpentane 33.7914 9.5000 0 12.6670 276 279.1 276.0 1 2-methylpropane 10.5000 0 3 0 186 200.6 207.9 2 Hexane 5.7744 17.2916 0 0 250 240.6 246.7 3 Heptane 5.9100 23.0660 0 0 269 260.0 255.0 4 2,4-dimethylpentane 16.8332 7 9.9444 0 261 254.3 253.3 Validation 5 2,4-dimethylhexane 16.5418 12.5694 11.3680 0 283 275.8 275.9 set 6 2,2,4-trimethylpentane 25.1387 8.2500 5.3333 5.9722 261 267.1 264.4 7 2,2,3,3-tetramethylbutane 34.9998 0 0 15.5000 273 251.4 263.0 8 3,5-dimethylheptane 16.1810 18.3438 12.7916 0 288 297.7 291.4 9 3,3-diethylpentane 15.4168 27.3332 0 9 294 298.2 295.2 10 2,2,3,3-tetramethylpentane 34.2637 6.8333 0 17.1110 273 273.3 259.4

34 J. Cent. South Univ. (2015) 22: 30−36

For our general four-descriptor model, the cross- 2 validation correlation coefficient RCV  0.9537, compared with the correlation coefficient R2=0.9599, indicates the high stability of the regression model. The predicted flash point temperatures are presented in Table 1. The model, respectively, shows an average absolute relative deviation (AARD), standard deviation, and root mean square error (RMSE) of 1.78%, 6.2 K, and 5.8 K for 40 alkanes in the training set. For validation set, the model only has an average absolute relative deviation of 3.52% and root mean square error (RMSE) of 10.5 K for 10 alkanes, and correlation coefficient Q2 is only 0.8730. As can be found, the observed flash point temperatures can be successfully described by using the new group contribution-based method. Fig. 3 Plot of residuals versus observed flash point values for The analysis of plots has shown to be very useful to model confirm the quality of a model or to detect the anomalies [16]. The scatter 50 data points of dataset are plotted in 3.2 ANN calculation Fig. 2, which shows that observed flash point versus In many cases, the obtained best multivariate model predicted one obtained with Eq. (5) follows a straight is not accurate enough. Therefore, the nonlinear behavior line. Figure 2 and the correlation coefficient R2 of Eq. (5) of this obtained multivariate model must be examined. indicate that there is good match between the observed One of the best methods for considering the nonlinearity and predicted values of flash point. behavior is neural networks utilization [17]. An optimized three-layer back-propagation artificial neural network with the transfer function of logsig(x) and purelin(x) is applied to our hidden layer and output layer neural networks, respectively [18]. By ANN method, very satisfactory results are obtained both in training set and validation set. With this model, R2 for the training set increases to 0.9898 and the root mean square error and average relative deviation are reduced to 2.9 K and 0.72%, respectively. For the validation set, RMSE and AARD are reduced to 9.4 K and 3.37%, respectively, showing the good generalization ability of the ANN model. Compared with the MLR model, the ANN model is clearly superior both in fitness and in prediction performance. The smaller scatter of data points in Fig. 4 also demonstrates this. Fig. 2 Observed and predicted flash point temperature TFP using MLR for 50 alkanes

In order to investigate the error distribution, the plot of the observed value versus residuals of the flash point is presented in Fig. 3. As can be seen from Fig. 3, the residuals show random character with respect to the predicted and observed flash point temperatures and not following any kind of pattern. The residuals seldom exceed the standard deviation of ±2S, and there is no systematic errors, which is in agreement with the general multiple linear theory. The results indicate propane, 2,2,3,3-tetramethylbutane and 2-methylproane reach only the standard deviation limits of ±3S. Therefore, the absence of data clustering suggests that 4-variable-model Fig. 4 Observed and predicted flash point temperatures using is satisfactory. ANN for 50 alkanes

J. Cent. South Univ. (2015) 22: 30−36 35 3.3 Applicability domain (AD) QSPR models are developed on a defined domain of compounds based on properties and structures of training set compounds. The applicability domain (AD) of model is another validation metric to check QSPR models [19]. Predictions of a QSPR model are more reliable if the compounds predicted are within applicability domain of the model. When a compound is very dissimilar to all compounds of the modeling set, reliable prediction of its property is uncertain [20]. To observe the AD of a QSPR model, a plot of standardized residuals (RES) versus leverage (Hat diagonal) values (h), the Williams plot was used for an immediate and simple graphical detection that the horizontal and vertical straight lines indicate the limits of normal values. Firstly, horizontal X outliers described the leverage Fig. 5 Williams plot of FP model (Eq. (3)) for training and impact of the compounds of model. Leverage values can validation sets (h*=0.375) relatively indicate compound’s structural influence from the central of X. The leverage of a compound in the was already found from Fig. 3. original variable space is defined as [21]

T T 1 3.4 Comparison with previous models hi  xi (X X ) xi (6) VAZHEV et al [23] applied transformed infrared where xi is the descriptor vector of the considered spectra as descriptors of molecular structure to predict compound and X is the descriptor matrix derived from flash points of 85 alkanes. ALBAHRI [24] used the training set descriptor values. The warning leverage structural group contribution method for the prediction of * ( h ) is defined as [22] flash points of hydrocarbons including alkanes. PAN et * 3p al [6] constructed models of the relationships between h  (7) the structures and flash points of 92 alkanes by means of n ANNs using the group bond contribution method. where p is the number of model variables plus one and n Simulated with the final optimum BP ANN, the results is the number of the objects used to calculate the model. show that the predicted flash points were in good Secondly, vertical Y outliers describe the impacts of the agreement with the experimental data, with the average Euclidean distances of the objects measured by the cross-validated standardized residuals. The cross- absolute deviation of 4.8 K and root mean square error of validated standardized residuals greater than three 6.86 K. A simple comparison between the models and the standard deviation units, ±3σ, classify the compound as a presented one in this work shows that our four-descriptor response outlier [17]. According to this, chemicals with a model by means of ANN has only the average absolute leverage value (h) greater than the normal limit value of deviation of 2.9 K and the average relative deviation of outliers (h*) and with standard deviation greater than ±3σ 0.72%. In addition, the number of parameters used in this are considered outside the structural chemical domain. study is lower than that in other models. Based on the Williams plot, the applicability domain of the molecules was assessed, and the leverage 4 Conclusions values of all the molecules are subsequently plotted in Fig. 5. From this plot, the applicability domain is The flash points of compounds are important in established inside a square area with 3 standard terms of both practical use and safety. In this work, by deviations and a leverage threshold h* of 0.375. As can the use of multiple linear regression (MLR) and artificial be seen from Fig. 5, none of the chemicals was identified neural network (ANN), new group contribution-based as an obvious outlier if the limit of normal values for the QSPR models are developed for the flash points of Y outliers was set as three standard deviation units. alkanes. The leverage approach is used to define the Chemical propane is only identified as an critical value applicability domain of model. The results demonstrate in the training set. For validation set, 2,2,3,3- that the calculated flash point values are in good tetramethylbutane is truly predicted by the model but due agreement with the experimental ones, and the to the high leverage value; as defined by the Hat vertical performance of the ANN model is superior to the line, it is outside of the applicability domain. Hence, it is multiple linear regression one. The high R2, the low a high structural influential compound in the model. This average relative deviation (AARD) and root mean square

36 J. Cent. South Univ. (2015) 22: 30−36 error (RMSE) obtained from the models suggest that the accurate prediction method for sublimation enthalpies of organic model has good predictability. contaminants using their molecular structure [J]. Thermochim Acta, 2012, 543: 96−106.

[14] GRAMATICA P. Principles of QSAR models validation: Internal and References external [J]. QSAR & Combinatorial Science, 2007, 26: 694−701. [15] DUCHOWICZ P R, CASTRO E A, FERNANDEZ F M, [1] National Fire Protection Agency. Fire protection guide on hazardous GONZALEZ M P. A new search algorithm for QSPR/QSAR theories: materials [M]. 10th ed. Quincy: NFPA, 1991. normal boiling points of some organic molecules [J]. Chem Phys Lett, [2] KATRITZKY A R, STOYANOVA-SLAVOVA I B, DOBCHEV D A, 2005, 412: 376−380. KARELSON M. QSPR modeling of flash points: An update [J]. J [16] DAI Yi-min, LIU You-nian, LI Xun, CAO Zhong, ZHU Zhi-ping, Mol Graph Model, 2007, 26: 529−536. YANG Dao-wu. Estimation of surface tension of organic compounds [3] JONES J C, GODEFROY J. A reappraisal of the flash point of formic using QSPR [J]. Journal of Central South University, 2012, 19(1): acid [J]. J Loss Prev Process Ind, 2002, 15: 245−247. 93−100. [4] SUZUKI T, OHTAGUCHI K, KOIDE K. A method for estimating [17] BAGHERI M, BORHANI T N G, ZAHEDI G. Estimation of flash flash points of organic compounds from molecular structures [J]. J point and autoignition temperature of organic sulfur chemicals [J]. Chem Eng Jpn, 1991, 24: 258−261. Energy Convers Manage, 2012, 58: 185−196. [5] KESHAVARZ M H, GHANBARZADEH M. Simple method for [18] DAI Yi-min, HUANG Ke-long, LI Xun, CAO Zhong, ZHU Zhi-ping, reliable predicting flash points of unsaturated hydrocarbons [J]. J YANG Dao-wu. Simulation of 13C NMR chemical shifts of carbinol Hazard Mater, 2011, 193: 335−341. carbon atoms by using quantitative structure-spectrum relationships [6] PAN Y, JIANG J, WANG Z. Quantitative structure-property [J]. Journal of Central South University of Technology, 2011, 18(2): relationship studies for predicting flash points of alkanes using group 323−340. bond contribution method with back-propagation neural network [J]. [19] GHARAGHEIZI F. Prediction of upper flammability limit percent of J Hazard Mater, 2007, 147: 424−430. pure compound from their molecular structures [J]. J Hazard Mater, [7] DAI Yi-min, LI Xun, CAO Zhong, YANG Dao-wu, HUANG 2009, 167: 507−510. Ke-long. Modeling flash point scale of hydrocarbon by novel [20] ROY K, KABIR H. QSPR with extended topochemical atom (ETA) topological electro-negativity indices [J]. The Chemical Industry and indices: Exploring effects of hydrophobicity, branching and Engineering Society of China Journal, 2009, 60(10): 2420−2425. (in electronic parameters on logCMC values of anionic surfactants [J]. Chinese) Chem Eng Sci, 2013, 87: 141−151. [8] KATRITZKY A R, KUANAR M, SLAVOV S, HALL C D, [21] ERIKSSON L, JAWORSKA J, WORTH A P, CRONIN M T D, KARELSON M, KAHN I, DOBCHEV D A. Quantitative correlation MCDOWELL R M, GRAMATICA P. Methods for reliability and of physical and chemical properties with chemical structure: Utility uncertainty assessment and for applicability evaluations of for prediction [J]. Chem Rev, 2010, 110: 5714−5789. classification and regression-based QSARs [J]. Environ Health [9] LIU X, LIU Z. Research progress on flash point prediction [J]. J Perspect, 2003, 111: 1361−1375. Chem Eng Data, 2010, 55: 2943−2950. [22] PAN Y, JIANG J, WANG R, CAO H Y, CUI Y. A novel QSPR model [10] Hazardous Chemicals data. NFPA49. PC-49−94 [OB/OL]. [1994] for prediction of lower flammability limits of organic compounds http://ull.chemistry.uakron.edu/erd/index.html. based on support vector machine [J]. J Hazard Mater, 2009, 168: [11] GHARAGHEIZI F, ESLAMIMANESH A, ILANI-KASHKOULI P, 962−969. RICHON D , MOHAMMADI A H. QSPR molecular approach for [23] VAZHEV V V, ALDABERGENOV M K, VAZHEVA N V. representation/prediction of very large vapor pressure dataset [J]. Estimation of flash points and molecular masses of alkanes from Chem Eng Sci, 2012, 76: 99−107. their IR spectra [J]. Petrol Chem, 2006, 46: 136−139. [12] ZHOU C Y, NIE C M, LI S, LI Z H. A novel semi-empirical [24] ALBAHRI T A. Flammability characteristics of pure hydrocarbons topological descriptor Nt and the application to study on [J]. Chem Eng Sci, 2003, 58: 3629−3641. QSPR/QSAR [J]. J Comput Chem, 2007, 28: 2413−2423. (Edited by YANG Hua) [13] BAGHERI M, GANDOMI A H, GOLBRAIKH A. Simple yet