This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright Author's personal copy

Chemosphere 86 (2012) 634–640

Contents lists available at SciVerse ScienceDirect

Chemosphere

journal homepage: www.elsevier.com/locate/chemosphere

Linear and non-linear relationships between soil sorption and hydrophobicity: Model, validation and influencing factors ⇑ Yang Wen, Li M. Su, Wei C. Qin, Ling Fu, Jia He, Yuan H. Zhao

Key Laboratory for Wetland Ecology and Vegetation Restoration of National Environmental Protection, Department of Environmental Sciences, Northeast Normal University, Changchun, Jilin 130024, PR China article info abstract

Article history: The hydrophobic parameter represented by the octanol/water partition coefficient (logP) is commonly Received 5 July 2011 used to predict the soil sorption coefficient (Koc). However, a simple non-linear relationship between log- Received in revised form 1 November 2011 Koc and logP has not been reported in the literature. In the present paper, soil sorption data for 701 com- Accepted 1 November 2011 pounds was investigated. The results show that logK is linearly related to logP for compounds with logP Available online 9 December 2011 oc in the range of 0.5–7.5 and non-linearly related to logP for the compounds in a wide range of logP. A non-

linear model has been developed between logKoc and logP for a wide range of compounds in the training Keywords: set. This model was validated in terms of average error (AE), average absolute error (AAE) and root-mean Sorption squared error (RMSE) by using an external test set with 107 compounds. Nearly the same predictive Partition coefficient Validation capacity was observed in comparison with existing models. However, this non-linear model is simple, Soil/sediment and uses only one parameter. The best model developed in this paper is a non-linear model with six cor- Hydrophobicity rection factors for six specific classes of compounds. This model can well predict logKoc for 701 diverse compounds with AAE = 0.37. The reasons for systemic deviations in these groups may be attributed to the difference of sorption mechanism for hydrophilic/polar compounds, low for highly hydropho- bic compounds, hydrolysis of esters in , volatilization for volatile compounds and highly exper- imental errors for compounds with extremely high or low sorption coefficients. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction which is constant for each chemical, when foc > 0.001. In situations where sorbents have low organic carbon, high clay contents, and Sorption processes play a major role in determining the envi- for chemicals that have highly polar or ionisable functional groups ronmental transport and fate of organic chemicals. This is because that may significantly interact with polar or charged sites on sor- the transport and fate of a chemical can be dependent on how the bent surfaces, the Koc approach may not be suitable (Bintein and chemical distributes between the solid and aqueous phases. The Devillers, 1994; Doucette, 2003). Selection of appropriate soil to most frequently used parameter to indicate the soil mobility of a solution ratios for sorption studies depends on the distribution chemical is the soil sorption distribution coefficient, Kd, defined coefficient Kd and the relative degree of sorption desired. For max- as the ratio of solute in aqueous and solid phases imum precision in adsorption experiments, it is preferable to ad- at equilibrium. The Kd is usually normalized to the organic carbon just the sorbent concentration so that the percent removed is in content foc yielding Koc = Kd/foc. It reflects the fact that soil organic the 20–80% range, preferably around 50%, at equilibrium (Delle carbon is the major sorption domain for neutral organic com- Site, 2001). Outside this range, relative measurement errors can pounds (Schüürmann et al., 2006). become a dominant factor. To obtain the maximum precision, The sorption of an organic chemical on a natural solid is a very 1:1–1:100 soil/solution ratios were recommended for the com- complicated process, which involves both solute and sorbent prop- pounds with low and high sorption coefficients (OECD Guideline, erties. Organic carbon content foc is an important factor affecting 2001). However, these ranges of sorbent concentration can give sorption process. Low organic carbon content in the sorbent can re- solids effect. A variety of studies have suggested that solid concen- sult in hydrophilic sorption, rather than hydrophobic sorption for tration (or soil/solution ratio) can significantly affect compound hydrophobic compounds, especially for highly polar compounds sorption coefficients (Zhao and Lang, 1996). This effect appears to

(Karickhoff, 1984). The soil/water sorption coefficient (Kd) can be result from the presence of suspended particles and dissolved or- normalized to give an organic-carbon partition coefficient (Koc), ganic matter contributed by the solids, which were not removed from in the separation procedure. It increases the ⇑ Corresponding author. Tel.: +86 431 85099550; fax: +86 431 85955338. amount of solute in the aqueous and makes observed sorp- E-mail addresses: [email protected], [email protected] (Y.H. Zhao). tion coefficients decrease, with the effect being greater for more

0045-6535/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.chemosphere.2011.11.001 Author's personal copy

Y. Wen et al. / Chemosphere 86 (2012) 634–640 635

hydrophobic compounds. In general, soil pH changes have minor between logKoc and logP has not been reported in the literature. effects on the sorption of non-ionic molecules. On the other hand, In the present paper, a soil sorption data base for 701 compounds for ionic compounds, the sorption coefficient is quite sensitive to previously reported by Schüürmann et al. (2006) and Gramatica the pH of the sorbing soil because of different sorption contribu- et al. (2007) was investigated. The aim of the work is to develop tions from ionic and non-ionic species (Franco et al., 2009). The col- a general global model and class-specific models between logKoc loidal surfaces of most natural soils are negatively charged and and logP for a wide range of compounds. The robustness of the therefore have an affinity for positively charged molecules, but developed models will be tested by external validation and model not much affinity for negatively charged molecules (Bintein and quality of the fit will be compared with the MCI and logP based Devillers, 1994). Anionic species (from acids) have quite low sorp- models with correction factors from the KOCWIN program (Epi tion coefficients because they are repelled by the negative net Suite package, version 4.0) (http://www.epa.gov/oppt/exposure/ charge of soil surface, while cationic species (from bases) are quite pubs/episuitedl.htm). At the same time, the factors that influencing strongly sorbed (Wauchope et al., 2002). Not only the organic car- Koc, such as sorption mechanism, solubility, equilibrium time, ion- bon content, soil to solution ratio, sorbent concentration and pH ization, transformation and soil/solution ratio will be discussed described above can affect the sorption coefficient, but tempera- basing on the residuals (observed – predicted) for the well classi- ture, , degradation, and presence of surfactants, co- fied compounds in this paper. solutes and co- can also make a significant impact. More detailed reviews of factors influencing sorption have been pre- sented by many investigators (Pignatello and Xing, 1996; Cousins 2. Materials and methods et al., 1999; Seth et al., 1999; Delle Site, 2001; Wauchope et al., 2002; Doucette, 2003). 2.1. Measured Koc data Many of the reported quantitative structure–activity relation- The experimental data of soil sorption coefficients (K ), nor- ships (QSARs) for predicting Koc are based on the relationship be- oc malized on organic carbon content, of 701 organic compounds tween Koc and determined or calculated molecular parameters. The hydrophobic parameters, octanol/water partition coefficient were taken from the literature (Schüürmann et al., 2006; Gramat- (logP) or solubility (logS), are commonly used to predict the soil ica et al., 2007) and compiled into a single database. Schüürmann sorption coefficient. These two physico-chemical properties are et al. (2006) presented 571 organic chemicals and Gramatica highly inter-correlated and consequently these two QSAR models et al. (2007) presented 643 organic chemicals. The total number can be considered as identical or parallel (Sabljic´ et al., 1995). A re- of the compounds reported in the two data sets is 701. There are view of the existing literature revealed numerous examples of log- 513 overlapping compounds in the two data sets. Some of the over- lapping compounds were taken from the same source (Sabljic´ et al., Koc–logP (or logS) correlations for most classes of non-ionic organic chemicals (Gawlik et al., 1997). These correlations devel- 1995; Huuskonen, 2003) and have the same values. But some of oped for specific classes of compounds were less generally applica- them have different values and were taken from different sources ble than those developed with a wide variety of chemical types. (Tao et al., 1999; Nguyen et al., 2005). The total data set covers a However, the use of an appropriate class-specific relationship wide range of logKoc values (from 0.3 to 6.5) and hydrophobicity should provide a better estimate than the general equation repre- (logP from 3.54 to 8.65) with diverse molecular structures. The senting all data (Gerstl, 1990). Baker et al. (1997) demonstrated compounds were classified into 25 groups based on structures and the substituted functional groups. The names and number of that the slope of a seemingly linear relationship between logKoc and logP appeared to flatten out as a chemical’s logP reached the compounds for each group are listed in Table 1. The details of range 6–7. It indicates that the logP is not a strong predictor for the classification, together with CAS number and descriptors calcu- very hydrophobic organic chemicals which have a logP >5(Baker lated for each compound, can be found in the Supplementary material. et al., 2000). Another widely used approach for predicting Koc is to employ QSARs developed with molecular connectivity indices (MCIs) (Meylan et al., 1992; Baker et al., 2001; Tao et al., 2001). 2.2. Molecular descriptors The methods based on topological indices are attractive because they are simple and can be easily derived from molecular struc- The molecular descriptors for the given compounds were tures for diverse organic compounds. The simple first-order MCI mainly calculated using software of ACD/Labs suite (Advanced was found to correlate extremely well with the soil sorption coef- Chemistry Development, Inc., http://www.acdlabs.com). A total of ficients for several classes of non-polar organic compounds (Sab- 18 descriptors of different types were calculated to describe com- ljic, 2001). The observation also was made that polar organic pound structural diversity. These descriptors represent molecular compounds did not fit the same K –MCI regression line developed oc size, hydrophobicity, solubility, polarity, degree of ionization, flex- for non-polar hydrophobic compounds, but instead systematically ibility, hydrogen bonding acidity and basicity. They are molecular deviated below the line. To solve this problem, a series of polarity weight (MW), octanol/water partition coefficient (P), distribution correction factors were introduced to the MCI model to develop a coefficient (D), solubility, acid and base pKa values, fraction of un- single correlation model for both non-polar and polar compounds ionized (F0), positive (F+), negative (F), zwitterionic (F±) forms at (Meylan et al., 1992; Tao et al., 2001). As well as models based given pH. After importing SMILES strings and the given pH values on logP, logS and MCI parameters, a number of other models have for compounds, the descriptors shown in the Supplementary mate- been reported, such as fragment constant (Tao et al., 2001), linear rial were calculated. energy relationship (LSER) (Poole and Poole, 1999), arti- ficial neural network (Goudarzi et al., 2009), molecular formula (Delgado et al., 2003) and other multiple descriptors (Andersson 2.3. Statistical analysis and validation of QSAR model et al., 2002; Kahn et al., 2005; Liu and Yu, 2005; Lu et al., 2006; Schüürmann et al., 2006; Gramatica et al., 2007; Wang et al., 2009). Multiple linear regression analysis was performed by using the

Currently available logKoc models are useful for screening a Minitab software (version 14). Stepwise regression was also used large number of chemicals, but they have some limitations and for variable selection from the same software. The following statis- the physical or chemical interpretations of the theoretical descrip- tical parameters were used to test the quality of generated regres- tors are usually not straightforward. A non-linear relationship sion equations: number of observations used in the analysis (n), Author's personal copy

636 Y. Wen et al. / Chemosphere 86 (2012) 634–640

Table 1 Classification of compounds based on structures and functional groups.

Groups Compounds n Groups Compounds n 1 Alcohols 12 14 PAHs 30 2 Halogenated alkanes and alkenes 25 15 Acetanilides 25 3 Esters 8 16 Phenylureas and methylureas 46 4 Halogenated cyclic compounds 15 17 Phenylcarbamates 29 5 Alkyl benzenes 17 18 Benzamides 18 6 Halogenated benzenes 16 19 Aromatic heterocycles 53 7 Nitrobenzenes 12 20 Poly aromatic heterocycles 13 8 Nitro N-alkyl anilines 13 21 Symtriazines 18 9 Phenols 36 22 Organic phosphides 64 10 Anilines 25 23 Organic sulfides 31 11 Organic acids 33 24 Sulfone derivatives 15 12 Benzoates 18 25 Other compounds 107 13 Biphenyls 22

2 square of the correlation coefficient (R ), standard error of the esti- logP > 7.5. A non-linear relationship was observed between logKoc mate (S) and Fisher’s criterion (F). and logP for the compounds with logP in the range of 3.54 to 8.65 The 701 compounds classified as 25 groups are shown in (Fig. 1). A typical example of the non-linear relationship for a well Table 1. Groups 1–24 (594 compounds) are relatively well classified classified Group 1 is also showed in Fig. 1 (open circles). Inclusion groups with representative functional groups or structures and were of other descriptors into the models did not significantly improve used as a training set to develop QSAR model (85%). The remaining the relationships. Although more significant linear regression un-classified 107 compounds (Group 25) were used as an external equations can be obtained by using 14 descriptors selected from 2 validation set (15%). The predictive capabilityP of the final models stepwise analysis with regression coefficient R = 0.85, they are was evaluated using averageP error (AE = (Obs Pred)/n), average too complex to study the influencing factors from the equation absolute errorP (AAE = ( |Obs Pred|/n) and root-mean squared coefficients. Eqs. (1) and (2) below are the linear and non-linear 2 1/2 error (RMSE = ( (Obs Pred) /n) ). In addition, the models were equations between logKoc and logP for the compounds in the train- compared with Koc values calculated from existing models based ing set. A plot of logKoc against logP is shown in Fig. 1. on logP and MCI regression equations with a series of class- specific correction factors. These methods have been encoded in log Koc ¼ 0:661 þ 0:672 log P ð1Þ the EPI Suite program (http://www.epa.gov/oppt/exposure/pubs/ 2 episuitedl.htm). n ¼ 558 R ¼ 0:76 S ¼ 0:55 F ¼ 1793

2 3. Results and discussion log Koc ¼ 1:15 þ 0:334 log P þ 0:0396ðlog PÞ þ 0:00290ðlog PÞ3 0:000868elog P ð2Þ 3.1. Development of linear and non-linear logKoc models

n ¼ 594 R2 ¼ 0:79 S ¼ 0:55 F ¼ 538 The relationship between logKoc and descriptors has been per- formed for the compounds in the training set. The significant The developed linear and non-linear models (Eqs. (1) and (2)) descriptor selected from the descriptors calculated in this paper can be used to predict the logKoc values for compounds with logP is hydrophobic parameter octanol/water partition coefficient ranging from 0.5 to 7.5 and compounds with a wide range of logP, (logP). We found that logKoc is linearly related to logP for the respectively. The Eq. (2) accounts for 79% of variation of the sorp- majority of compounds with logP ranging from 0.5 to 7.5. The log- tion data for a diverse set of organic chemicals in the training set. Koc/logP curve breaks down for highly hydrophobic chemicals with Despite the relatively good results, a significant scatter was ob-

tained around the logKoc–logP curve indicating that the prediction error is quite high for some of the compounds. Examination of

residuals between observed logKoc and that predicted by Eq. (2) for the classified compounds demonstrates that there are system-

atic biases for some groups. For example, all the observed Koc val- ues are lower than that predicted from Eq. (2) for alkyl alcohols (Group 1), halogenated alkanes & alkenes (Group 2) and esters

(Group 3). Poly aromatic compounds exhibit higher observed Koc than those predicted by Eq. (2), especially for the poly aromatic heterocycles (see Groups 14 and 20 in Supplementary material). Most of ionisable compounds, organic acids (Group 11), have lower

observed Koc than that predicted from logP by Eq. (2). Using the distribution coefficient logD, instead of logP, to predict sorption coefficients from Eq. (2) can significantly under-estimate the log-

Koc for the organic acids in Group 11. An example of a bias (Group 1) can be seen in Fig. 1. Although average residuals/errors (AE) are not equal to zero in the rest of the groups, they do not exhibit sys-

tematic biases between observed and predicted logKoc data. Table 2 lists the AE-based correction factors for Groups 1, 2, 3, 11, 14 and Fig. 1. Plot of measured logKoc against logP. Straight line: linear Eq. (1) for the compounds in Region 2. Solid curve: non-linear Eq. (2) for the training set. Circles: 20. Inclusion of these correction factors into Eq. (2) results in a training set (open circles are compounds in Group 1). Open triangles: validation set. new model (Eq. (3)) which is expected to well predict the soil Author's personal copy

Y. Wen et al. / Chemosphere 86 (2012) 634–640 637

Table 2 Average errors/correction factors for some class-specific groups.

Groups Compounds Correction factors

1 Alcohols 0.78 2 Halogenated alkanes and alkenes 0.31 3 Esters 0.37 11 Organic mono-acids 0.40 14 PAHs 0.44 20 Poly aromatic heterocycles 0.97

sorption coefficients for a wide range of compounds. Re-analysis for the compounds in the training set basing on these correction factors can give a similar non-linear model but with a much better standard error (S = 0.49). log Koc ¼ Equation 2 þ Correction factors ð3Þ

3.2. Model validation

Fig. 2. Plot of observed logKoc against predicted logKoc for training and test sets. Assessment of the predictive capability of the models was per- The dotted lines indicate the 3r interval. formed by the use of an external validation/test set, 107 com- pounds in Group 25. At the same time, the predictive capability model. This model 2 can be improved by introducing the correction of the models was compared with that of KOCWIN program, the factors for some functional groups in Table 2 (i.e. model 3), which MCI and logP based models with a series of correction factors, in can predict logKoc with an average absolute error less than 0.4 log terms of average error (AE), average absolute error (AAE) and units. root-mean squared error (RMSE); the results are illustrated in Ta- ble 3 for the training, validation and total sets. Table 3 also shows AE, AAE and RMSE for compounds with logP in the ranges of 3.3. The factors that influencing logKoc logP < 0.5, 0.5 < logP < 7.5 and logP > 7.5, respectively, for the com- parison of linear model 1 and non-linear model 2 (see note of Ta- Organic carbon content is an important factor affecting sorption ble 3). Inspection of Table 3 reveals that the general non-linear process. The sorption coefficient normalized to the organic carbon model (Eq. (2)) is better than the linear model (Eq. (1)) for a wide content (Koc) reflects the fact that soil organic carbon is the major range of compounds. As showed in Fig. 1 and statistical parameters sorption domain for neutral organic compounds. Organic chemical in Table 3 (see note), the non-linear model is no better fit than the hydrophobicity is the principal driving force of sorption, and a lin- linear model for a large cluster of data points between logP values ear relationship has been observed between sorption coefficient of 0.5–7.5. Nearly same predictive capability can be obtained both logKoc and hydrophobic parameter logP (see Eq. (1) and Region 2 from linear and non-linear models for the compounds with logP in in Fig. 1). However, sorption of an organic chemical on a natural so- the range of 0.5–7.5. However, the linear model 1 overestimates lid is a very complex process, which involves both solute and sor- bent properties. Many factors can affect sorption coefficients by the logKoc for the compounds with logP > 7.5 (AE = 1.58, affecting the chemical both in aqueous and solid AAE = 1.58, Region 3 in Fig. 1) and underestimates the logKoc for the compounds with logP < 0.5 (AE = 0.57, AAE = 0.74, see Region phases (e.g. , such as pH, solubility, mechanisms, or- 1inFig. 1), leading to predicting error of linear model 1 ganic carbon content, soil to solution ratio, ionic strength, degrada- (RMSE = 0.61) higher than that of non-linear model 2 tion, and presence of co-solutes and co-solvents). (RMSE = 0.55). The best model developed in this paper is model 3 with RMSE = 0.50 for the compounds in total set. Plot of observed 3.3.1. Sorption mechanism logKoc against predicted logKoc for training and validation sets is Hydrophobic bonding is the dominant mechanism of sorption showed in Fig. 2. No bias (AE = 0.01) was found from the non-linear for most hydrophobic compounds. However, for hydrophilic or po- models. Table 3 also summarizes the prediction statistics of litera- lar compounds (i.e. logP < 0.5), hydrophobic sorption becomes less ture methods, MCI and logP based models with a series of correc- significant and the use of Koc may not be valid because hydrophilic tion factors, for the training, test and total sets. Compared with the sorption becomes significant as compared with hydrophobic sorp- literature methods, non-linear model 2 is slightly less accurate tion by the soil organic matter (Baker et al., 1997; Franco et al., than the literature models, but it only uses one parameter in the 2009). The sorption of hydrophilic or polar compounds may not

Table 3

Koc values predicted from Eqs. (1)–(3) and KOCWIN models.

Models Training set (n = 594) Test set (n = 107) Total (n = 701) AE AAE RMSE AE AAE RMSE AE AAE RMSE Eq. (1) 0.00 0.45 0.60 0.16 0.52 0.65 0.03 0.46 0.61 Eq. (2) 0.00 0.41 0.55 0.08 0.49 0.63 0.01 0.42 0.55 Eq. (3) 0.01 0.36 0.49 0.01 0.44 0.57 0.01 0.37 0.50 logP-based 0.01 0.38 0.53 0.14 0.52 0.67 0.01 0.40 0.56 MCI-based 0.05 0.39 0.54 0.12 0.54 0.73 0.06 0.41 0.57

Note: logP < 0.5, Eq. (1): AE = 0.57 AAE = 0.74 RMSE = 0.90; Eq. (2): AE = 0.09 AAE = 0.52 RMSE = 0.62. 0.5 < logP < 7.5, Eq. (1): AE = 0.02 AAE = 0.42 RMSE = 0.56; Eq. (2): AE = 0.01 AAE = 0.41 RMSE = 0.53. logP > 7.5, Eq. (1):AE=1.58 AAE = 1.58 RMSE = 1.72; Eq. (2):AE=0.41 AAE = 0.86 RMSE = 1.01. Author's personal copy

638 Y. Wen et al. / Chemosphere 86 (2012) 634–640 follow the hydrophobic model even in soils having sufficient or- and Devillers, 1994; Vasudevan et al., 2009) and hence the calcu- ganic carbon content (Doucette, 2003). The hydrophilic contribu- lated logD descriptor will be in error if the actual pH was other tion to sorption can be equal or even higher than the than pH = 7. It is generally supposed that most weakly acidic hydrophobic contribution to sorption. Measured Koc is indepen- chemicals are in the predominantly anionic (negatively charged) dent of the hydrophobicity, leading to a nearly horizontal scatter form at the pH of most natural soils, while most weakly basic observed in the logKoc–logP curve (see Region 1 in Fig. 1). chemicals are in the non-ionic molecular form (see ionic fractions for acids in Group 11 and bases in Groups 19, 20 and 21, respec- 3.3.2. Solubility tively). However, it is not true for some bases which can be in the positively charged form at low pH solution in sorption tests. The sorption coefficient (Kd) is defined as the ratio of the con- centration of a chemical sorbed to soil/sediment and the concen- Furthermore, sorption of basic compounds in certain soils (i.e. tration of freely dissolved chemical in water. Not all the montmorillonite clay system) is principally dependent upon the compound present in water is available for sorption by soil. It is be- surface acidity and not on the pH of the solution. Surface acidity lieved that only the freely dissolved chemical concentration in may two pH units lower than the solution pH (Bintein and Devil- water is available for the sorption by soils or sediments (Mackay lers, 1994), resulting in a strong interaction between positively and Fraser, 2000). A lack of availability due to low solubility or charged basic compounds and negatively charged groups on soil chemical binding dissolved organic matter in water can decrease surface, such as humic acid (Stipicˇevic´ et al., 2009). It explains sorption potential for highly hydrophobic compounds, resulting why logKoc values are significantly under-estimated by Eq. (2) for some bases in Groups 19–21. in the decline of logKoc with increasing logP (see Region 3 in Fig. 1).

3.3.3. Ionization 3.3.4. Transformation Transformation (chemical, biochemical and photochemical) Ionisable compounds, such as organic acids and bases, can have more than two forms in . It is generally accepted should be taken into consideration on the soil sorption. The exper- imental K data are generated based on the concentration of the that the anionic, cationic and non-ionic forms have different con- oc tributions to soil sorption coefficients (Bintein and Devillers, parent chemicals only and not the total amount of parent and metabolites. The transformation can reduce the concentration of 1994; Franco and Trapp, 2008). The sorption of acidic compounds increases with decreasing pH, with the free acid being more parent chemicals in aqueous phase thus leading to lower than ex- pected soil sorption. There are a number of substances which have strongly sorbed than the anionic form. For weak bases, the cationic form dominates at low pH and is more highly sorbed than the free been shown to be rapidly transformed in solution. Esters can un- dergo hydrolysis to yield carboxylic acids and alcohols in solution, base (Doucette, 2000). To study the effect of ionization on soil sorp- tion coefficient, we need to consider the definition of octanol/water especially in the basic solution. The pH values should be deter- mined during the sorption test for these compounds; but unfortu- partition coefficient and distribution coefficient. The partition coef- ficient (P) is defined as the ratio of concentrations of unionized nately they were not available from the literature. The hydrolysis rates are quite fast for some esters at pH = 8. This can be seen from form in octanol and water. The distribution coefficient (D) is de- fined as the ratio of concentration of unionized species in octanol the hydrolysis rate calculated from HYDROWIN program from Epi Suite. Examination of K data reveals that observed logK data are and total concentration (unionized and ionized forms) in water. oc oc The distribution coefficient (D) can be calculated from the partition systematically lower than those predicted for esters in Group 3, with average residual = 0.37 (see Table 2). Although not all the coefficient (P) and pKa at given pH according to following formulas. observed logKoc data are lower than predicted for benzoates in pHpKa Dðmono-acidÞ¼P=ð1 þ 10 Þ¼PF0 ð4Þ Group 12. The observed logKoc are lower than those predicted for most of the benzoates with average residual = 0.27 (see Supple- pKapH mentary material). Hydrolysis is a possible reason for the systemic Dðmono-baseÞ¼P=ð1 þ 10 Þ¼PF0 ð5Þ deviation for these esters. Another type of compound that can be where F0 is the fraction of the non-ionic form in water; F0 can be transformed in solution is hydroxyphenols. These compounds can treated as a correction factor for the degree of ionization to P. There- be oxidized easily to benzoquinones in water. Lower than expected fore, D can be regarded as the contribution of the non-ionic form to Koc values have been observed for some hydroxyphenols (i.e. com- the partition coefficient. pound 153 in Supplementary material). Transformation can result If ionic and non-ionic forms have same/similar sorption contri- in the decrease of concentration for parent compounds, leading to butions, the logKoc values predicted from Eqs. (1) or (2) by using a bias in the sorption experiments. logP should be close to the observed logKoc for ionisable com- pounds. If only non-ionic form have sorption contribution and no 3.3.5. Volatilization consideration is given to the ionic form, the observed logKoc should Volatilization can also affect the experimental determination of be close to the logKoc predicted by logD from the Eqs. (1) or (2). sorption coefficients. The compounds with Henry’s Law constants 4 Examination of logKoc values predicted from Eq. (2) by using logP greater than 10 should be readily volatilized from soils (Cousins and logD respectively shows that logKoc values were over-pre- et al., 1999). When testing volatile substances, care should be ta- dicted by logP (AE = 0.40, see Table 2) and under-predicted by ken to avoid losses during the study (OECD Guideline, 2000). Un- logD (AE = 0.86) for ionisable compounds, the organic acids (Group less both the solution and solid phases are analyzed, loss due to 11). The results indicate that although the non-ionic form plays a volatilization can result in overestimation of sorption coefficients very dominant role in the soil sorption and has a higher sorption without control sample. To avoid overestimation, one control sam- contribution than anionic form, the anionic form can also make a ple with only the test substance in solution (no soil) is subjected to contribution to the sorption. The reason may be due to the interac- precisely the same steps as the test systems, serving to safeguard tion between anionic carboxyl group and surficial aluminum or against unexpected volatilization. However, this control method iron ions, although the overall interaction is weaker than the over- may result in underestimation of sorption coefficients because all hydrophobic bonding for the neutral form. the loss of volatilization in the control sample (without soil) is ex- It should be noted that logD, and the fractions of ionic and non- pected to be higher than the loss in the test sample with soil. This ionic forms were calculated on the basis of pH = 7 in this paper. In approach does not account for sorption-retarded volatilization and reality, pH values range from 3 to 8 in the soil sorption test (Bintein probably overestimates the likelihood of losses by volatilization Author's personal copy

Y. Wen et al. / Chemosphere 86 (2012) 634–640 639

(Cousins et al., 1999). To investigate the effect of volatilization, for chemical with logP > 5.5) which require longer equilibrium Henry constants were calculated by HENRYWIN program (EPI Suite times (Cousins et al., 1999; Seth et al., 1999). That may be another

4.0) for 701 studied compounds. The results show that observed principal cause for the discrepancies between observed logKoc and logKoc values are lower than that predicted by Eq. (2) for most of logKoc predicted from linear equation. the highly volatile compounds. Halogenated alkanes and alkenes in Group 2 are the most volatile compounds with very high Henry 3.3.8. Experimental error constants (see Supplementary material). The observed logK val- oc There are 513 overlapping compounds in the two data sets we ues are systemically lower than those predicted by logK in Eq. oc studied: 384 chemicals have exactly the same or nearly same val- (2) with AE = 0.31 (see Table 2). The difference may be due to ues (AE < 0.05) and their logK may come from the same source. the effect of volatilization on the sorption. oc The average error (AE) and average absolute error (AAE) of logKoc for the remaining 129 chemicals between the two sets are 0.01 3.3.6. Soil/solution ratio and 0.29, respectively. The maximum difference in logKoc is 1.25 One of key parameters that can influence the accuracy of sorp- log units (pentachlorophenol). Soil types, exposure concentration, tion measurements is the soil/solution ratio during the equilibra- test condition, duration of the experiment, and determination of tion process. Selection of appropriate soil to solution ratios for the concentration in water and soil may result in the difference sorption studies depends on the distribution coefficient Kd and in measured Koc. PAHs in Group 14 and poly aromatic heterocycles the relative degree of sorption desired. For maximum precision in Group 20 whose logKoc data are systematically underestimated in adsorption experiments, it is preferable to adjust the sorbent by the general model 2. These chemical groups may have specific concentration so that the percent removed is in the 20–80% range, binding to soil constituents (Sabljic´ et al., 1995). It is not clear preferably around 50%, at equilibrium (Delle Site, 2001). At very why the logKoc data of alcohols are systematically overestimated low sorption coefficients, a 1:1 soil/solution ratio is recommended. by the general model 2. More studies are needed to study the sorp- On the other hand, at very high distribution coefficients, one can go tion mechanism for these compounds. up to a 1:100 soil/solution ratio in order to leave a significant In conclusion, logKoc is linearly related to logP for the com- amount of chemical in solution (OECD Guideline, 2000). To study pounds with logP in the range of 0.5–7.5. However, over a wide the impact of soil/solution ratio on sorption coefficient, Eq. (6) de- range of logP values, logKoc is not linearly related to logP. A non- rived from Kd is used to calculate the percentage of sorption for the linear model has been developed between logKoc and logP for compounds with different sorption coefficients. The results are the 594 compounds in the training set, which can account for listed in Table 4. 79% of variation of sorption data. The models were validated by

mS Koc foc mW WS=V W use of 107 compounds in a test set. Examination of residuals be- %Sorption ¼ ¼ tween observed logK and that predicted by non-linear model mS þ mW Koc foc mW WS=V W þ mW oc for the classified compounds demonstrates that there are system- Koc foc WS=V W ¼ ð6Þ atic biases for some class-specific groups. These deviations are due K f W =V þ 1 oc oc S W to the difference in sorption mechanisms for hydrophilic/polar

Here, mS is the mass sorbed to soil (mg); WS is the soil weight (kg); compounds, low solubility for highly hydrophobic compounds, mW is the freely dissolved chemical mass in water (mg); VW is the hydrolysis of esters in solution, volatilization for volatile com- solution volume (L). mS = Kd CW WS = Koc foc mW WS/VW. pounds and high experimental errors for compounds with extre- Table 4 shows that it is not an easy task for compounds with log- mely high or low sorption coefficients. Inclusion of correction

Koc < 1 and logKoc > 5 to obtain the percent removed in the 20– factors for these compounds into the non-linear model can signif- 80% range from 1/1 to 1/100 soil/solution ratios. The experimental icantly improve the predictive capacity of the model with accuracy for the compounds with low or high sorption coefficient AAE = 0.37 for the total set.

(logKoc < 1 or logKoc > 5) will be very poor based on the above anal- ysis. It explains why significant residuals are observed for some Acknowledgements compounds, especially highly hydrophobic and hydrophilic compounds. This work is supported by the National Natural Science Founda- tion of China (20977015) and NENU Youth Foundation 3.3.7. Equilibrium time (10QNJJ016). We thank PharmaAlgorithms, Inc. for kindly supply- Although often regarded as instantaneous for modeling pur- ing the Algorithm Builder program (Merged with ACD/Labs, Ad- poses, sorption may in fact require weeks to many months to reach vanced Chemistry Development, Inc., http://www.acdlabs.com). equilibrium (Pignatello and Xing, 1996). The solid-phase to solu- We would like to thank especially Prof. M.H. Abraham for his valu- tion-phase distribution coefficients (Kd) routinely are not mea- able comments to our work. sured at true equilibrium. During uptake, the apparent sorption app distribution coefficient ðKd Þ can increase by 30% to as much as 10-fold between short contact (1–3 d) and long contact times. Appendix A. Supplementary material

Many reported Koc values for soil represent principally the fast component rather than overall sorption. This has the effect of Supplementary data associated with this article can be found, in reducing these values, especially for hydrophobic chemicals (e.g. the online version, at doi:10.1016/j.chemosphere.2011.11.001.

References Table 4 Percentage of sorption at different soil/solution ratios (kg L1). Andersson, P.L., Maran, U., Fara, D., Karelson, M., Hermens, J.L.M., 2002. General and Ratios logKoc class specific models for prediction of soil sorption using various physicochemical descriptors. J. Chem. Inf. Comput. Sci. 42, 1450–1459. 0 1 2345 6 Baker, J.R., Mihelcic, J.R., Luehrs, D.C., Hickey, J.P., 1997. Evaluation of estimation 1/1 1 9 50 91 99 100 100 methods for organic carbon normalized sorption coefficients. Water Environ. 1/50 0.02 0.2 2 17 67 95 100 Res. 69, 136–144. 1/100 0.01 0.1 1 9 50 91 99 Baker, J.R., Mihelcic, J.R., Shea, E., 2000. Estimating Koc for persistent organic pollutants: limitations of correlations with Kow. Chemosphere 41, 813–817. Author's personal copy

640 Y. Wen et al. / Chemosphere 86 (2012) 634–640

Baker, J.R., Mihelcic, J.R., Sabljic, A., 2001. Reliable QSAR for estimating Koc for Nguyen, T.H., Goss, K.U., Ball, P.W., 2005. Polyparameter linear free energy persistent organic pollutants: correlation with molecular connectivity indices. relationships for estimating the equilibrium partition of organic compounds Chemosphere 45, 213–321. between water and the natural organic matter in soils and sediments. Environ. Bintein, S., Devillers, J., 1994. QSAR for organic chemical sorption in soils and Sci. Technol. 39, 913–924. sediments. Chemosphere 28, 1171–1188. OECD (Organization for Economic Cooperation and Development) 2000. Guideline Cousins, I.T., Beck, A.J., Jones, K.C., 1999. A review of the processes involved in the for the Testing of Chemicals: Adsorption-Desorption Using a Batch Equilibrium exchange of semi-volatile organic compounds (SVOC) across the air–soil Method. interface. Sci. Total Environ. 228, 5–24. Pignatello, J.J., Xing, B., 1996. Mechanisms of slow sorption of organic chemicals to Delgado, E.J., Alderete, J.B., Janˇa, G.A., 2003. A simple QSPR model for predicting soil natural particles. Environ. Sci. Technol. 30, 1–11. sorption coefficients of polar and nonpolar organic compounds from molecular Poole, S.K., Poole, C.F., 1999. Chromatographic models for the sorption of formula. J. Chem. Inf. Comput. Sci. 43, 1928–1932. neutral organic compounds by soil from water and air. J. Chromatogr. 845, Delle Site, A., 2001. Factors affecting sorption of organic compounds in natural 381–400. sorbent/water systems and sorption coefficients for selected pollutants. A Sabljic´, A., Güsten, H., Verhaar, H., Hermens, J., 1995. QSAR modelling of soil

review. J. Phys. Chem. Ref. Data 30, 187–439. sorption. Improvements and systematics of logKoc vs. logP correlations. Doucette, W.J., 2003. Quantitative structure–activity relationships for predicting Chemosphere 31, 4489–4514. soil–sediment sorption coefficients for organic chemicals. Environ. Toxicol. Sabljic, A., 2001. QSAR models for estimating properties of persistent organic Chem. 22, 1771–1788. pollutants required in evaluation of their environmental fate and risk. Franco, A., Trapp, S., 2008. Estimation of the soil–water partition coefficient Chemosphere 43, 363–375. normalized to organic carbon for ionizable organic chemicals. Environ. Toxicol. Schüürmann, G., Bert, R.-U., Kühne, R., 2006. Prediction of the sorption of organic Chem. 27, 1995–2004. compounds into soil organic matter from molecular structure. Environ. Sci. Franco, A., Fu, W.J., Trapp, S., 2009. Influence of soil pH on the sorption of ionizable Technol. 40, 7005–7011. chemicals: modeling advances. Environ. Toxicol. Chem. 8, 458–464. Seth, R., Dmackay, D., Muncke, J., 1999. Estimating the organic carbon partition Gramatica, P., Giani, E., Papa, E., 2007. Statistical external validation and consensus coefficient and its variability for hydrophobic chemicals. Environ. Sci. Technol.

modeling: a QSPR case study for Koc prediction. J. Mol. Graphics Modell. 25, 33, 2390–2394. 755–766. Stipicˇevic´, S., Fingler, S., Drevenkar, V., 2009. Effect of organic and mineral soil Gawlik, B.M., Sotirious, N., Feicht, N., Schulte-Hostede, S., Kettrup, A., 1997. fractions on sorption behaviour of chlorophenol and triazine micropollutants.

Alternative for the determination of soil adsorption coefficient, Koc, of non- Arh. Hig. Rada. Toksikol. 60, 43–52. ionicorganic compounds—A review. Chemosphere 34, 2525–2551. Tao, S., Piao, H., Dawson, R., Lu, X., Hu, H., 1999. Estimation of organic carbon Gerstl, Z., 1990. Estimation of organic chemical sorption by soils. J. Contam. Hydrol. normalized sorption coefficient (K) for soils using the fragment constant 6, 357–375. method. Environ. Sci. Technol. 33, 2719–2725. Goudarzi, N., Goodarzi, M., Araujo, M.C.U., Galva¯o, R.K.H., 2009. QSPR modeling of Tao, S., Lu, X., Cao, J., Dawson, R., 2001. Comparison of the fragment constant and soil sorption coefficients (Koc) of pesticides using SPA-ANN and SPA-MLR. J. molecular connectivity indices models for normalized sorption coefficient Agric. Food Chem. 57, 7153–7158. estimation. Water Environ. Res. 73, 307–313. Huuskonen, J., 2003. Prediction of soil sorption coefficient of a diverse set of organic Vasudevan, D., Bruland, G.L., Torrance, B.S., Upchurch, V.G., MacKay, A.A., 2009. PH- chemicals from molecular structure. J. Chem. Inf. Comput. Sci. 43, 1457–1462. dependent ciprofloxacin sorption to soils: interaction mechanisms and soil Kahn, I., Fara, D., Karelson, M., Maran, U., 2005. QSPR treatment of the soil sorption factors influencing sorption. Geoderma 151, 68–76. coefficients of organic pollutants. J. Chem. Inf. Model. 45, 94–105. Wauchope, R.D., Yeh, S., Linders, J.B.H.J., Kloskowski, R., Tanaka, K., Rubin, B., Karickhoff, S.W., 1984. Organic pollutant sorption in aquatic systems. J. Hydraul. Katayama, A., Kordel, W., Gerstl, Z., Lane, M., Unsworth, J.B., 2002. Pesticide soil Eng. Am. Soc. Civil Eng. 110, 707–735. sorption parameters: theory, measurement, uses, limitations and reliability. Liu, G., Yu, J., 2005. QSAR analysis of soil sorption coefficients for polar organic Pest Manage. Sci. 58, 419–445. chemicals: substituted anilines and phenols. Water Res. 39, 2048–2055. Wang, B., Chen, J., Li, X., Wang, Y., Chen, L., Zhu, M., Yu, H., Kühne, R., Lu, C., Wang, Y., Yin, C., Guo, W., Hu, X., 2006. QSPR study on soil sorption coefficient Schüürmann, G., 2009. Estimation of soil organic carbon normalized sorption

for persistent organic pollutants. Chemosphere 63, 1384–1391. coefficient (Koc) using least squares-support vector machine. QSAR Comb. Sci. Mackay, D., Fraser, A., 2000. Bioaccumulation of persistent organic chemicals: 28, 561–567. mechanisms and models. Environ. Pollut. 110, 375–391. Zhao, Y.H., Lang, P.Z., 1996. Evaluation of the partitioning of hydrophobic Meylan, W., Howard, P.H., Boethling, R.S., 1992. Molecular topology/fragment pollutants between aquatic and solid phases in natural systems. Sci. Total contribution method for predicting soil sorption coefficients. Environ. Sci. Environ. 177, 1–7. Technol. 26, 1560–1567.