Available online at www.sciencedirect.com
Progress in Natural Science 18 (2008) 867–872 www.elsevier.com/locate/pnsc
Prediction of soot–water partition coefficients for selected persistent organic pollutants from theoretical molecular descriptors
Qing Zhang, Jun Huang, Gang Yu *
POPs Research Centre, Department of Environmental Science and Engineering, Tsinghua University, Beijing 100084, China
Received 11 December 2007; received in revised form 2 February 2008; accepted 2 February 2008
Abstract
Quantitative structure–property relationship (QSPR) models were developed for soot–water partition coefficient (KSC) values of selected persistent organic pollutants (POPs), i.e. 10 polychlorinated dibenzo-p-dioxins and dibenzofurans, nine polychlorinated biphe- nyls, four polycyclic aromatic hydrocarbons and two polybrominated diphenyl ethers, using partial least squares (PLS) regression. Quan- tum chemical descriptors computed by parameterized model revision 3 Hamiltonian method were used as predictor variables. The cross- 2 validated Qcum value for the optimal QSPR model is 0.844, indicating a good predictive capability for the logKSC values of these chem- icals. The QSPR results showed that average molecular polarizability (a), standard heat of formation (DHf) and energy of the lowest unoccupied molecular orbital (ELUMO) have dominant effects on KSC of POPs. The results suggested that logKSC values of POPs increase with the increase of a. Contrarily, logKSC values decrease with the increase of ELUMO and DHf of POPs. Ó 2008 National Natural Science Foundation of China and Chinese Academy of Sciences. Published by Elsevier Limited and Science in China Press. All rights reserved.
Keywords: Quantitative structure–property relationships; Black carbon; Persistent organic pollutants; Quantum chemical descriptors; Partial least squares
1. Introduction uptake and microbial degradation of POPs. Hence, a quan- titative understanding of POP sorption is a necessary pre- Persistent organic pollutants (POPs) are ubiquitous requisite for estimating various hazards that organic hydrophobic organic compounds (HOCs) and some cong- pollutants may pose in the environment. Generally, total eners (such as 2,3,7,8-polychlorinated dibenzo-p-dioxins particulate organic matter was considered as experimental and dibenzofurans) are strongly toxic. In recent decades, sorbents to investigate the sorption behaviors of HOCs in POPs are considered as typical endocrine disrupting chem- the environment [9]. However, recent reports indicated that icals with great potential risk to human health [1,2]. soot carbon (or black carbon) produced mainly from bio- Human exposure to POPs has drawn great attention mass burning and fossil fuel combustion is significantly because of their wide occurrence and their adverse impacts better sorbent than total particulate organic matter for on ecosystem and human health [3,4]. Recently, extensive HOCs [5,10–13]. The enhanced sorption capacity may be research has been dedicated to evaluate the environmental attributed to the differences in physicochemical properties behaviors of POPs for the purpose of their environmental between soot and particulate organic matter in terms of risk assessment [5–7]. surface area, elemental composition, functional groups, Sorption of POPs to soils and sediments is a key process and so on. Furthermore, the experimental results in the controlling the environmental fate of such compounds field showed that sorption behaviors for POPs cannot be [7,8]. For instance, sorption can limit both biological explained by organic matter absorption alone. Recent soot sorption experiments revealed that the environmental fate * Corresponding author. Tel.: +86 10 6278 7137; fax: +86 10 6279 4006. of POPs may also be affected by interaction with soot car- E-mail address: [email protected] (G. Yu). bon [14]. Investigations on interaction between soot carbon
1002-0071/$ - see front matter Ó 2008 National Natural Science Foundation of China and Chinese Academy of Sciences. Published by Elsevier Limited and Science in China Press. All rights reserved. doi:10.1016/j.pnsc.2008.02.006 868 Q. Zhang et al. / Progress in Natural Science 18 (2008) 867–872 and HOCs are of great importance to fully understand the [15,18,19] were used to develop QSPR models. In the partition and exposure of POP molecules in environmental experiments of soot–water partition, diesel particulate soot systems [9,15–18]. matter (standard reference material SRM-1650) from the Soot–water partition coefficient (KSC) is one of the most US National Institute of Standards and Technology important parameters characterizing sorption behaviors of (NIST, Gaithersburg, MD, USA) was selected as it repre- HOCs and this parameter is indispensable for environmen- sents one of the environmentally most relevant anthropo- tal risk assessment of these chemicals. So far, few studies genic types of soot [21,22]. Moreover, 48% of its solid have been performed on soot–water partition of POPs. phase consists of soot carbon, and its surface properties For example, some researchers [15,18,19] determined KSC are well characterized [21]. The logKSC data were listed values of a series of POPs using the soot cosolvency-col- in Table 1. The results demonstrated that a nearly twofold umn method, specially designed to study interaction of difference exists between the known logKSC value for naph- 0 0 highly hydrophobic compounds with strongly sorbing thalene (logKSC = 4.55–4.72) and that for 2,4,5,2 ,4 -pent- matrices such as soot. They found that the affinity of POPs abromodiphenylether (logKSC = 8.11 ± 0.06). In present to soot was considerably stronger than the predicted by study, these reported soot–water partition coefficients were bulk organic matter partitioning models. Unfortunately, employed as dependent variables (see Table 1). experimentally determined KSC values of POPs are avail- able for only a limited number of POPs because of large 2.2. Theoretical molecular structural descriptors expenditures of money and time. Furthermore, because of the limited number of standard POPs, it seems impossi- Quantum chemical descriptors of POPs were used to ble to measure KSC values for all the other POPs. Thus, in develop QSPR models in present study. These descriptors order to quickly estimate the environmental behavior of were obtained from MOPAC 2000 in the CS Chem3D other known POPs, quantitative structure–property rela- Ultra (Ver. 6.0) and the quantum parameters were com- tionship (QSPR) models relating soot–water partition coef- puted using semi-empirical parameterized model revision ficients to theoretical molecular descriptors of POPs may 3 (PM3) Hamiltonian method. The molecular structures be applied to evaluate and predict logKSC values. Reliable were optimized using eigenvector following [23], a geome- and stable QSPR models can predict logKSC values effi- try optimization procedure, in which the geometry optimi- ciently and give some insight into the interactional mecha- zation criteria of Gradient Norm was set at 0.1, within nisms between POPs and soot particles. The purpose of this MOPAC 2000. A total of 16 derived descriptors reflecting study is to develop reliable QSPR models to estimate KSC the overall characters of the POP molecules were used in values and deduce the probable interactional mechanisms this study. A full list is given in Table 2. The values of for soot–water partition of POPs. the selected molecular descriptors are summarized in Table Quantum chemical descriptors, such as standard heat of 3 and the others for the studied compounds are available formation (DHf), electronic energy (EE), total energy (TE) on request. The compound numbers in Table 3 correspond and dipole moment (l) can clearly describe some defined to those in Table 1. The molecular orbital energies of a molecular properties. Since they can be easily obtained given molecule are related to chemical reactivity. Inductive by computation and are not restricted to closely related effects and resonance effects exerted by the presence of dif- compounds, the development of QSPR models in which ferent substituents and substructural groups within the quantum chemical descriptors are used is of great impor- molecule affect the electron partition and stability of the tance. Partial least squares (PLS) algorithm can analyze molecular orbitals. The combinations of frontier molecular 2 data with strongly collinear, noisy and numerous variables orbital energies, ELUMO EHOMO,(ELUMO EHOMO) [20]. It cannot only search the relationship between depen- and ELUMO + EHOMO, which were proven to be significant dent variables and predictor variables, but also reduce the in previous QSPR studies of POPs [24,25], were also dimension of the matrices while concurrently maximizing selected as predictor variables. The ELUMO EHOMO and the relationship between the descriptors. In the present ELUMO + EHOMO can be related to absolute hardness and study, PLS algorithm was applied to analyze the interac- electronegativity, respectively [26,27]. tions between POPs and black carbon affected by quantum chemical descriptors. 2.3. Statistical analysis
2. Materials and methods QSPR models were developed using PLS regression, as implemented in the Simca (Simca-S Version 6.0, Umetri 2.1. Data set AB and Erisoft AB) software. The conditions for the com- putation were based on the default values of the software. The logKSC data determined with the soot cosolvency- The criterion used to determine the model dimensionality – column method for ten polychlorinated dibenzo-p-dioxins the number of significant PLS components – is cross and dibenzofurans (PCDD/Fs), nine polychlorinated validation (CV). With CV, when the fraction of the total biphenyls (PCBs), four polycyclic aromatic hydrocarbons variation of the dependent variables that can be predicted (PAHs) and two polybrominated diphenyl ethers (PBDEs) by a component, Q2, for the whole data set is larger than Q. Zhang et al. / Progress in Natural Science 18 (2008) 867–872 869
Table 1
The soot–water partition coefficients (logKSC) of selected POPs d e f No. Compounds Abbreviation logKSC (lw/kgSC) (Obs.) logKSC (lw/kgSC) (Pred.) SE-Pred. Residuals 1 Naphthalenea NAP 4.63 ± 0.08 4.93 ±0.13 0.30 2 Fluorenea FLU 6.03 ± 0.17 5.39 ±0.10 0.64 3 Phenanthrenea PHE 6.62 ± 0.14 5.93 ±0.08 0.69 4 Pyrenea PYR 7.03 ± 0.16 7.21 ±0.10 0.18 5 Biphenylb Biphenyl 5.09 ± 0.05 5.26 ±0.11 0.17 6 4-Chlorobiphenylb 4-PCB 6.07 ± 0.05 5.88 ±0.08 0.19 7 3,5-Dichlorobiphenylb 3,5-PCB 6.19 ± 0.04 6.32 ±0.07 0.13 8 4,40-Dichlorobiphenylb 4,40-PCB 6.35 ± 0.04 6.52 ±0.07 0.17 9 2-Chlorobiphenylb 2-PCB 5.25 ± 0.04 5.77 ±0.08 0.52 10 2,20-Dichlorobiphenylb 2,20-PCB 5.26 ± 0.02 5.35 ±0.11 0.09 11 2,6-Dichlorobiphenylb 2,6-PCB 5.51 ± 0.02 5.58 ±0.09 0.07 12 2,20,6-Trichlorobiphenylb 2,20,6-PCB 5.58 ± 0.05 5.77 ±0.08 0.19 13 2,20,6,60-Tetrachloro-biphenylb 2,20,6,60-PCB 5.51 ± 0.09 6.12 ±0.07 0.61 14 2,4,20,40-Tetrabromodiphenyletherc 2,4,20,40-TBDE 7.43 ± 0.08 7.18 ±0.09 0.25 15 2,4,5,20,40-Pentabromodiphenyletherc 2,4,5,20,40-PBDE 8.11 ± 0.06 7.61 ±0.12 0.50 16 Dibenzo-p-dioxinc DD 6.13 ± 0.05 5.86 ±0.08 0.27 17 1-Chlorodibenzo-p-dioxinc 1-MCDD 6.42 ± 0.05 6.30 ±0.07 0.12 18 1,6-Dichlorodibenzo-p-dioxinc 1,6-DCDD 6.73 ± 0.06 6.74 ±0.07 0.01 19 1,2,4-Trichlorodibenzo-p-dioxinc 1,2,4-TrCDD 6.99 ± 0.08 7.36 ±0.10 0.37 20 1,3,6,8-Tetrachlorodibenzo-p-dioxinc 1,3,6,8-TCDD 7.51 ± 0.11 7.90 ±0.14 0.39 21 Dibenzofuranc DF 5.87 ± 0.04 5.76 ±0.08 0.11 22 2-Chlorodibenzofuranc 2-MCDF 6.37 ± 0.05 6.31 ±0.07 0.06 23 2,8-Dichlorodibenzofuranc 2,8-DCDF 7.10 ± 0.04 6.85 ±0.08 0.25 24 2,4,8-Trichlorodibenzofuranc 2,4,8-TrCDF 7.50 ± 0.02 7.34 ±0.10 0.16 25 1,3,6,8-Tetrachlorodibenzofuranc 1,3,6,8-TCDF 7.81 ± 0.16 7.86 ±0.14 0.05 a Data from Ref. [19]. b Data from Ref. [18]. c Data from Ref. [15]. d Predicted logKSC values. e SE-Pred., standard errors for the predicted logKSC values. f Residuals = logKSC(Obs.) logKSC(Pred.).
Table 2 3. Results List of molecular structural descriptors of POPs Symbols Description Variable importance in the projection (VIP) is a param- eter of PLS analysis that shows the importance of a vari- Mw Molecular weight DHf (kcal) Standard heat of formation able in a PLS model. Terms with a large value of VIP are TE (eV) Total energy the most relevant for explaining dependent variable. To EE (eV) Electronic energy obtain an optimal model, PLS analysis procedure is as fol- CCR (eV) Core-core repulsion energy lows: first, a PLS model with all the predictor variables is E (eV) The energy of the highest occupied molecular orbital HOMO calculated; second, the variable with the lowest VIP value EHOMO 1 The energy of the second highest occupied molecular (eV) orbital is eliminated and a new PLS regression is performed result-
ELUMO (eV) The energy of the lowest unoccupied molecular orbital ing in a new PLS model. This procedure is repeated until ELUMO+1 The energy of the second lowest unoccupied molecular only the major predictor variables remain. The optimal (eV) orbital - PLS model is then selected based on statistical values of QC (a.c.u) The largest negative atomic charges on carbon atoms 2 + Q , R, and SE. QH (a.c.u) The most positive net atomic charges on hydrogen atoms cum l (Debye) Dipole moment Following the procedure described above, Model (1) a (a.u) Average molecular polarizability was obtained for logKSC of selected POPs. Based on the unscaled pseudo-regression coefficients of the independent variables and constants transformed from PLS results, ana- lytical QSPR equation was obtained and shown in Eq. (1): a significance limit (0.097), the tested PLS component is 2 considered significant. Model adequacy was mainly charac- log KSC ¼ 20768 1:072 10 DH f 1:175ELUMO terized by the number of observations used for model þ 2:527 10 2a ð1Þ building in the training set, the number of PLS principal components (A), Q2 , the correlation coefficient between 2 2 cum n = 25, A =1, Rðadj:ÞðcumÞ ¼ 0:324, Rðadj:ÞðcumÞ ¼ 0:865, observed and fitted values (R), the general standard error 2 Eig = 1.705, Qcum ¼ 0:844, R = 0.933, SE = 0.336 where (SE) [28–31] and the significance level (p). 2 2 RX ðadj:ÞðcumÞ and RY ðadj:ÞðcumÞ stand for cumulative variance 870 Q. Zhang et al. / Progress in Natural Science 18 (2008) 867–872
Table 3 Selected quantum chemical descriptors for POPs used in this study a a No. a ELUMO DHf No. a ELUMO DHf 1 83.815 0.408 40.674 14 149.012 0.524 2.691 2 108.829 0.335 48.868 15 159.396 0.622 7.981 3 123.590 0.535 55.026 16 110.126 0.179 9.497 4 147.556 1.292 75.103 17 119.883 0.297 14.582 5 102.002 0.361 47.579 18 129.800 0.408 19.520 6 114.668 0.556 40.696 19 143.179 0.604 24.539 7 123.529 0.677 34.321 20 155.955 0.722 31.554 8 128.414 0.741 33.958 21 106.908 0.476 25.337 9 113.246 0.575 49.607 22 118.424 0.636 18.748 10 115.427 0.062 37.929 23 130.141 0.785 12.261 11 120.456 0.146 38.153 24 141.438 0.913 7.510 12 125.040 0.167 33.030 25 152.667 1.072 1.998 13 134.706 0.209 28.178 a The numbers correspond to those in Table 1.
capability. The model may be used to make predictions for other POPs and the predictions may give an initial esti- mation of soot–water partition of these HOCs. The results obtained from Model (1) show the main factors affecting logKSC values of POPs are a, ELUMO and DHf.
4.1. External validation
To verify the real predictive capability of Model (1), external validation was adopted in this study. Leave-n- out cross validation, which is similar to the leave-one-out cross validation, was carried out for this model. Every compound was stored in a validation set at least once. R, SE, and p values were adopted to assess the modeling per- Fig. 1. The correlated plots of observed and predicted logKSC values of formance of the models. For Model (1), leave-five (20%)- selected POPs. (The numbers correspond to those in Table 1.) out cross validation was carried out five times with five, variously left-out compounds for each model. All the regression statistical values of the five external validation of all the X’s and Y’s, respectively, explained by all ex- models were >0.88 for R, <0.36 for SE, and <0.0001 for tracted components, Eig stands for the eigenvalue which p, respectively. Analytical QSPR equations were obtained. denotes the importance of the PLS principal components. The results are shown in Eqs. (2)–(6) (The numbers corre- As seen from Eq. (1), one PLS principal component was se- spond to those in Table 1): lected in Model (1), which explained 32.4% of the variance of the predictor variables, and 86.5% of the variance of the Leave-five (1, 6, 11, 16, 21)-out: dependent variable. Model (1) includes three predictor 3 2 variables (i.e. a, ELUMO and DHf), the VIP values for the logKSC ¼ 2:586 9:817 10 DH f 1:097ELUMO þ 2:680 10 a three predictor variables are 1.233, 0.918 and 0.798, respec- 2 n ¼ 20;A ¼ 1;Qcum ¼ 0:808;R ¼ 0:919;SE ¼ 0:365 ð2Þ tively. Plot of observed and predicted logKSC values from Model (1) is shown in Fig. 1. The compound numbers in Leave-five (2, 7, 12, 17, 22)-out:
Fig. 1 correspond to those in Table 1. 2 2 logKSC ¼ 2:649 1:127 10 DH f 1:226ELUMO þ 2:587 10 a 2 n ¼ 20;A ¼ 1;Qcum ¼ 0:879;R ¼ 0:944;SE ¼ 0:340 ð3Þ 4. Discussion Leave-five (3, 8, 13, 18, 23)-out: