Ordinary multigaussian kriging for mapping conditional probabilities of soil properties

Xavier Emery*

Department of , University of Chile, Avenida Tupper 2069, Santiago, Chile

Abstract

This paper addresses the problem of assessing the risk of deficiency or excess of a soil property at unsampled locations, and more generally of estimating a of such a property given the information monitored at sampled sites. It focuses on a particular model that has been widely used in geostatistical applications: the multigaussian model, for which the available can be transformed into a set of Gaussian values compatible with a multivariate Gaussian distribution. First, the conditional expectation estimator is reviewed and its main properties and limitations are pointed out; in particular, it relies on the value of the normal scores data since it uses a simple kriging of these data. Then we propose a generalization of this estimator, called bordinary multigaussian krigingQ and based on ordinary kriging instead of simple kriging. Such estimator is unbiased and robust to local variations of the mean value of the Gaussian field over the domain of interest. Unlike indicator and disjunctive kriging, it does not suffer from order-relation deviations and provides consistent estimations. An application to soil data is presented, which consists of pH measurements on a set of 165 soil samples. First, a test is proposed to check the suitability of the multigaussian distribution to the available data, accounting for the fact that the mean value is considered unknown. Then four geostatistical methods (conditional expectation, ordinary multigaussian, disjunctive and indicator kriging) are used to estimate the risk that the pH at unsampled locations is less than a critical threshold and to delineate areas where liming is needed. The case study shows that ordinary multigaussian kriging is close to the ideal conditional expectation estimator when the neighboring information is abundant, and departs from it only in under-sampled areas. In contrast, even within the sampled area, disjunctive and indicator kriging substantially differ from the conditional expectation; the discrepancy is greater in the case of indicator kriging and can be explained by the loss of information due to the binary coding of the pH data.

Keywords: ; Conditional expectation; Gaussian random fields; Posterior distributions; Indicator kriging; Disjunctive kriging

1. Introduction

* Tel.: +56 2 6784498; fax: +56 2 6723504. Kriging techniques are currently used to estimate E-mail address: [email protected]. soil properties, such as electrical conductivity, pH, X. Emery nutrient or contaminant concentrations, on the basis of tion of stationarity. In general, the available data do a limited set of samples for which these properties not have a Gaussian , so that a transforma- have been monitored. However, because of its tion is required to turn them into Gaussian values smoothing property, linear kriging is not suitable to (normal score transform). The mean of the trans- applications that involve the conditional distributions formed data is usually set to zero and their of the unsampled values, for instance assessing the to one, i.e. one works with standard Gaussian dis- risk of exceeding a given threshold. This problem is tributions (Rivoirard, 1994, p. 46; Goovaerts, 1997, critical for delineating contaminated areas where a p. 273). remedial treatment is needed, for determining land A goal of this paper is to find an estimator that is suitability for a specific crop, or for planning an robust to departures of the data from the ideal multi- application of fertilizer. gaussian model, in particular concerning local varia- Determining the risk of exceeding a threshold, or tions of the mean value over the domain of interest. more generally estimating a function of a soil prop- Therefore, the assumption of zero mean is omitted and erty, can be dealt with either stochastic simulations or the following hypotheses are made: nonlinear geostatistical methods like indicator kriging or disjunctive kriging, which have found wide accep- 1) the transformed data have a multivariate Gaussian tance in soil science (Webster and Oliver, 1989, distribution; 2001; Wood et al., 1990; Oliver et al., 1996; Van 2) their mean is constant (at least at the scale of the Meirvenne and Goovaerts, 2001; Lark and Ferguson, neighborhood used for local estimations) and equal 2004). An alternative to indicator and disjunctive to m; kriging is the conditional expectation estimator. How- 3) their variance is equal to one; ever, in practice this estimator is hardly used, except 4) their is known; henceforth, the cor- in the scope of the multigaussian model (Goovaerts, relation between the values at locations x and xV 1997, p. 271; Chile`s and Delfiner, 1999, p. 381). This is denoted by q(x,xV). In the stationary case, this paper focuses on this particular model and is orga- is a function of the separation vector h=xxV nized as follows. In the next section, the conditional only. expectation estimator is reviewed and its main prop- erties and practical limitations are highlighted, in 2.2. Local estimation with multigaussian kriging particular concerning the use of a simple kriging of the Gaussian data. Then, ordinary kriging is substi- Let { Y(x), xaD} be a Gaussian tuted for simple kriging in order to obtain an unbi- defined on a bounded domain D and known at a set ased estimator that is robust to local variations of the of locations {xa, a =1... n}. It can be Gaussian data mean. The proposed approach is final- shown (Goovaerts, 1997, p. 272) that the conditional ly illustrated and discussed through a case study in distribution of Y(x) is Gaussian-shaped, with mean soil science. equal to its simple kriging Y(x)SK from the available data and variance equal to the simple kriging variance 2 rSK(x). Therefore, the posterior or conditional cumu- 2. On the conditional expectation estimator lative distribution function (in short, ccdf) at location x is 2.1. The multigaussian model ! y YðÞx SK A Gaussian random field, or multigaussian ran- 8yaR; FðÞ¼x; yjdata G ð1Þ r ðÞx dom function, is characterized by the fact that any SK weighted average of its variables follows a Gaussian distribution. Its spatial distribution is entirely deter- where G(.) is the standard Gaussian cdf. This ccdf can mined by its first- and second-order moments (mean be used to estimate a function of the Gaussian field, and function or ), which makes say u[ Y(x)] (this is often referred to as a btransfer the very simple under an assump- functionQ). For instance, one can calculate the X. Emery of the conditional distribution of where g(.) stands for the standard Gaussian probabil- u[ Y(x)], which defines the bconditional expectationQ ity distribution function. The estimator in Eq. (2) is estimator (Rivoirard, 1994, p. 61): also called bmultigaussian krigingQ in the geostatisti- Z cal literature (Verly, 1983; David, 1988, p. 150). Its fgu½YðÞx CE ¼ uðÞy dFðÞx; yjdata implementation relies on an assumption of strict sta- tionarity and knowledge of the prior mean m, in order SK Z hi to express Y(x) . SK In practice, Eq. (2) can be calculated by numerical ¼ u YðÞx þ rSKðÞx t gtðÞdt ð2Þ integration (Fig. 1), by drawing a large set of values

Fig. 1. Numerical integration for calculating the conditional expectation of a transfer function. X. Emery

{u1,... uM} that sample the [0,1] interval either uni- field (Rivoirard, 1994, p. 71; Chile`s and Delfiner, formly or regularly, and putting 1999, p. 382). 1 XM For this reason, although less precise, alternative fgu½YðÞx CEc uðÞy ð3Þ M i methods are sometimes used instead of the conditional i¼1 expectation, e.g. indicator or disjunctive kriging with with 8i a{1,...M},F(x;yi |data)=ui. An alternative unbiasedness constraints (Rivoirard, 1994, p. 69; approach is to replace the integral in Eq. (2) by a Goovaerts, 1997, p. 301; Chile`s and Delfiner, 1999, expansion, which also helps to express the p. 417). The following section aims at generalizing the estimation variance (Emery, 2005b). conditional expectation estimator, by trading simple Note: throughout the paper, the kriging estima- kriging for ordinary kriging in Eq. (2) and correcting tors are regarded as random variables. This allows the expression of the estimator to avoid bias. one to express their expected value and establish their unbiasedness. 3. Ordinary multigaussian kriging 2.3. Properties of the estimator 3.1. Construction of an unbiased estimator Among all the measurable functions of the data, the conditional expectation minimizes the estimation In Eq. (2), the simple kriging estimator Y(x)SK is a variance (variance of the error). It is unbiased and, Gaussian (weighted average of mul- even more, conditionally unbiased. This property is of tivariate Gaussian data) with mean m and variance great importance in resource assessment problems hi (Journel and Huijbregts, 1978, p. 458). Furthermore, var YðÞx SK ¼ 1 r2 ðÞx : ð4Þ since the ccdf [Eq. (1)] is an increasing function, the SK conditional expectation honors order relations and provides mathematically consistent results: for in- Therefore, the conditional expectation of u[ Y(x)] stance, the estimate of a positive function is always belongs to the class of estimators that can be written in positive (Rivoirard, 1994, p. 62). the following form: Despite these nice properties, the estimator is not Z hipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi often used in practice. One of the reasons is the attrac- u½YðÞx T ¼ u YT þ 1 varðÞYT t gtðÞdt tion to the mean that produces the use of a simple kriging of the Gaussian data. Indeed, at locations distant from the data relatively to the of the nohipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi correlogram model, the simple kriging of Y(x) tends ¼ E u YT þ 1 varðÞYT T jYT ð5Þ to the prior mean m and the kriging variance to the prior unit variance, hence the conditional expectation in Eq. (2) tends to the prior expectation of u[ Y(x)]. where Y* is a Gaussian random variable with mean m To avoid this effect when estimating a spatial and T is a standard Gaussian variable independent of Y*. attribute, practitioners often prefer ordinary kriging The value of T does not need to be known, since to simple kriging. This way, the mean value is the estimator u[ Y(x)]* is defined by an expected considered unknown and is estimated from the data value with respect to T. In the case of the conditional located in the kriging neighborhood. Several authors expectation, Y* is the simple kriging of Y(x) and T can suggested a generalization of the conditional expec- be interpreted as the standardized simple kriging error. tation estimator, by substituting in Eq. (2) an ordi- In the multigaussian model, this error is independent nary kriging for the simple kriging and leaving of the kriging estimator (Chile`s and Delfiner, 1999,p. unchanged the simple kriging variance (Journel, 164 and 381). 1980; Goovaerts, 1997, p. 282). However, this ap- A key result is that any estimator given by Eq. (5) proach is generally ill advised, since it produces a bias constitutes an unbiased estimator of u[ Y(x)], provid- when estimating a nonlinear function of the Gaussian ed only that Y* and T are independent normal vari- X. Emery ables, the first one with mean m, the second one with This expression does not depend on the value of mean zero and unitpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi variance. Indeed, under these the Gaussian data mean, so the latter can be con- conditions, YT þ 1 varðÞYT T and Y(x) have the sidered unknown. With respect to the classical con- same univariate distribution (a ditional expectation, the estimation of a function of with mean m and unit variance), hence: Y(x) is obtained by replacing the simple kriging by n nohipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi an ordinary kriging and the simple kriging variance E u½YðÞx *g ¼ E u Y* þ 1 varðÞY* T by the ordinary kriging variance plus twice the Lagrange multiplier introduced in the kriging sys- tem. A particular case of Eq. (10) is the so-called ¼ Efu½YðxÞ g: ð6Þ bordinary lognormal krigingQ (Journel, 1980, p. 295; Rivoirard, 1990, p. 217), for which u is an exponen- In particular, Y* can be the ordinary kriging of tial function. Y(x), defined by the following weighting of the Gaussian data { Y(x ), a =1... n}: a 3.2. Pseudo-conditional distribution for assessing the Xn risk of deficiency or excess of a soil property OK OK YðÞx ¼ ka YðÞxa : ð7Þ a¼1 Let us introduce the associated The ordinary kriging weights and error variance are with threshold y: determined by the following system, in which l(x)is 1ifYðÞx V y 8yaR; I ðÞ¼x; y ð11Þ a Lagrange multiplier: Y j 0 otherwise: 8 Xn ÀÁ > OK The ordinary multigaussian kriging of such indica- > k q xa; xb þ lðÞ¼x qðÞ8xa; x a ¼ 1:::n > b tor provides a pseudo ccdf (bpseudoQ because it differs > b 1 > ¼ < Xn from the true ccdf given in Eq. (1)), which estimates OK the risk that the value at location x is less than or equal > ka ¼ 1 ð8Þ > to y conditionally to the available data at locations > a¼1 > n > X {xa, a =1... n}: > 2 OK :rOKðÞ¼x 1 ka qðÞxa; x lðÞx : oMK a¼1 8yaR; FFˆ ðÞ¼x; yjdata ½IY ðÞx; y

The variance of the estimator is related to the error ! variance and Lagrange multiplier introduced in Eq. y YðÞx OK ¼ G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð12Þ (8): r2 x 2l x hiXn Xn ÀÁ OKðÞþ ðÞ OK OK OK var YðÞx ¼ ka kb q xa; xb a¼1 b¼1 Xn This corresponds to a Gaussian distribution with OK ¼ ka qðÞxa; x lðÞx mean the ordinary kriging of Y(x) and variance the a¼1 ordinary kriging variance plus twice the Lagrange 2 ¼ 1 rOKðÞx 2lðÞx : ð9Þ multiplier. As in Eq. (2), the estimator of a transfer function [Eq. (10)] can be expressed in terms of this The use of an ordinary kriging in Eq. (5) defines the pseudo ccdf: following estimator, named bordinary multigaussian Z krigingQ and associated with the superscript oMK: fgu½YðÞx oMK ¼ uðÞy dFFˆ ðÞx; yjdata ð13Þ fgu½YðÞx oMK Eq. (13) allows one to calculate the ordinary multi- Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi gaussian kriging estimator by numerical integration OK 2 [Eq. (3)]. The main properties of this estimator are ¼ u YðÞx þ rOKðÞþx 2lðÞx t gtðÞdt: ð10Þ discussed in the following subsections. X. Emery

3.3. Robustness to local variations of the mean value [Eq. (1)], in particular both distributions do not have the same variance. Contrary to the conditional expectation, ordinary In general, the pseudo ccdf understates the disper- multigaussian kriging [Eq. (10)] does not rely on the sion of the true ccdf. For instance, assume that only mean value of the Gaussian data. This mean can even one datum is available in the kriging neighborhood vary locally in space, provided that it remains constant (under-sampled area): the pseudo ccdf [Eq. (12)] is a at the scale of the kriging neighborhood. step function (zero variance) and obviously under- Ordinary multigaussian kriging is therefore suit- estimates the uncertainty prevailing at the unsampled able to a locally stationary framework and provides location. This statement is true with other methods estimates that are robust to variations of the mean that assess the posterior distributions using ordinary value in space. An alternative would be to use a kriging, such as indicator and disjunctive kriging with simple kriging with a locally varying mean (in prac- unbiasedness constraints: one obtains a step function tice, estimated from the data located in the kriging if a single datum is found in the kriging neighbor- neighborhood) in the conditional expectation estima- hood. It proves that the posterior distributions esti- tor [Eq. (2)]. However, such approach does not ac- mated with ordinary kriging are suitable for estimation count for any uncertainty on the mean value, which is purposes only, not for assessing local uncertainty or considered known although it may be poorly estimat- simulating the unknown values, e.g. via a sequential ed, and provides biased results as soon as the estimat- algorithm (Emery, 2004, p. 410). ed mean departs from the true mean. Instead, ordinary multigaussian kriging appears to be simpler (there is 3.5. Further properties no need to estimate the local mean) and is unbiased even if the true mean value differs from the global or Ordinary multigaussian kriging [Eq. (10)] honors local mean estimated from the data. the value of u[ Y(x)] at each data location. This In practice, the empirical data never conform to the exactitude property holds as soon as one uses in Eq. ideal model (stationary random field with known (5) an estimator Y* that honors the Gaussian data, not mean), so that a less efficient but more robust estima- only an ordinary kriging. 2 tor is often preferred to the boptimalQ estimator (Math- Eq. (10) does not make sense if rOK(x)+2l(x)b0, eron, 1989b, p. 94; Rivoirard, 1994, p. 71; Chile`s and or equivalently var[ Y(x)OK]N1, because of the pres- Delfiner, 1999, p. 39). This concern for robustness is ence of a negative term under the square root. This the reason why ordinary kriging is generally used situation is unlikely to occur, since the variance of the instead of simple kriging when evaluating a regiona- ordinary kriging estimator (weighted average of the lized variable in linear geostatistics. With the multi- data) is usually less than the prior unit variance. gaussian approach, a nonlinear function of this Otherwise, ordinary kriging should be traded for an- variable can be estimated, which is a substantial im- other weighted average of the Gaussian data with a provement with respect to linear kriging. variance less than or equal to one. A sufficient con- dition is that the weights are nonnegative and add to 3.4. Measures of local uncertainty one to ensure unbiasedness; a solution to this problem has been proposed by Barnes and Johnson (1984). A limitation of ordinary multigaussian kriging con- Concerning the precision of ordinary multigaussian cerns the calculation of local uncertainty measures, kriging, the estimation variance of u[ Y(x)] is greater such as confidence intervals, local variance or inter- than the one of the conditional expectation, which is quartile range. Indeed, formulae (12) and (13) only minimal among all the possible estimators of u[ Y(x)]. give an unbiased estimator of the posterior distribu- However, if the data are abundant, ordinary kriging is tion at location x and of any quantity that can be close to simple kriging (Goovaerts, 1997, p. 137), expressed as a linear function of the ccdf (e.g. hence ordinary multigaussian kriging is close to the expected value of Y(x) or the probability that it optimal conditional expectation. It therefore constitu- exceeds a given threshold). But the estimated distri- tes a worthy alternative to other nonlinear estimation bution [Eq. (12)] does not match the true one methods, such as indicator or disjunctive kriging. X. Emery

Table 1 between 0 and 1 m in depth: available water capacity, Advantages and drawbacks of ordinary multigaussian kriging pH, exchangeable aluminum, total acidity and cation Pros Cons exchange capacity. In the following, we are interested Suitable for assessing Not suitable for assessing in the pH of the topsoil (0–200 mm), which has been conditional probabilities local uncertainty measured in a solution of potassium chloride with a and transfer functions (e.g., via confidence ratio soil/solution equal to 1/2.5. The pH is a critical Unbiased estimator intervals or local variance) variable for determining soil fertility, hence for pre- Robust to local variations Undefined if the variance dicting yield variability within the field and defining of the mean value of the ordinary kriging the amount of fertilizer that should be applied. (local stationarity) estimator is greater than Fig. 2b and c display the histogram of the pH data Accounts for uncertainty the prior unit variance and the normal scores variogram. The latter is not on the mean value When the data are scarce, Theoretically less precise isotropic and has a greater range along the direction does not yield the than the conditional N108 W. The directional have been com- prior expectation expectation puted for distances multiple of 25 m, with a tolerance of 208 on the azimuth and 12.5 m on the lag distance, Close to the conditional Restricted to the so that each point is calculated after at least one expectation when multigaussian model the data are abundant hundred data pairs. The model is the sum of a nugget Honors the data effect with sill 0.5 and a spherical structure with sill 0.5 and ranges 155 (N108 W) and 55 m (N808 E). The estimates do not need Concerning the histogram, it has been declustered order-relation corrections with the cell method (Goovaerts, 1997, p. 83), using a cell size of 2525 m that roughly corresponds to Another advantage of ordinary multigaussian kriging the sampling mesh. It shows a bimodality, which may over these two techniques is that there is no need for be symptomatic of a mixture of two populations. order-relation corrections, since the estimated ccdf Actually, a look at the location map (Fig. 2a) proves [Eq. (12)] is an increasing function of the threshold. that the high pH values are disseminated over the The advantages and drawbacks of the ordinary entire field of interest, hence one cannot define sub- multigaussian kriging approach are summarized in domains to study the two populations separately; this Table 1. The main restriction of this approach is observation also explains the high relative nugget certainly the multigaussian assumption, which must effect in the normal scores variogram (50% of the be suited to the available data to ensure the quality of total sill). the estimator. To validate this assumption, a test is Before performing local estimations with multi- proposed in the next section together with a case gaussian kriging, it is advisable to check the compat- study. The other limitations indicated in Table 1 are ibility of the multigaussian model with the quite mild and should not cause difficulty in practical transformed data. applications, as it will be confirmed in the case study. 4.2. Checking the multigaussian assumption

4. Application to soil data In practice, the multigaussian hypothesis cannot be fully validated because, in general, the inference of 4.1. Presentation of the dataset multiple-point statistics is beyond reach. Usually, only the univariate and bivariate distributions are examined In the following, the previous concepts and meth- (Goovaerts, 1997, p. 280). By construction of the ods are applied to a dataset which consists of 165 normal scores transform, the former is Gaussian samples quasi-regularly spaced over a 335335 m shaped and therefore consistent with the multigaus- field (Fig. 2a) located in Reunion Island (Indian sian model. Regarding the latter, there exist several Ocean) and planted to sugar cane. For each sample, ways of checking the two-point normality. One of several variables have been determined on three levels them is the structural analysis of the indicator function X. Emery

Fig. 2. Exploratory and variogram analyses of the pH data: A) location map of the samples; B) declustered pH histogram; C) normal scores variogram; D) test of the bivariate Gaussian assumption.

IY(x,y)[Eq.(11)]: for several thresholds { y1,... yk}, However, one difficulty of the model is that the the empirical indicator variogram is compared to the mean value m is considered unknown and may vary theoretical variogram, which can be expressed as a locally, so it is preferable to find a test that does not function of the correlogram q(h) of the Gaussian field depend on this mean value. In this respect, a conve- (Chile`s and Delfiner, 1999, p. 101) (here, local statio- nient way to check the bigaussian assumption is to narity is assumed, so that the correlogram only study the variograms of order less than two, since depends on the separation vector between samples): these tools are defined after the increments of the random field. For x a]0,2], the variogram of order cI;yðÞ¼h GyðÞ m ½1 GyðÞ m x is defined as (Matheron, 1989a, p. 30): Z " # 1 qðÞh 2 x 1 ðÞy m du cxðÞ¼h EfgjYðÞx þ h YðÞjx ð15Þ exp pffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2p 0 1 þ u 1 u2 For x =2, one finds the classical variogram ð14Þ c(h)=1q(h). Two other cases are of interest: X. Emery x =1.0 and x =0.5, which correspond to the mado- the classical variogram should be aligned with slope gram and rodogram respectively. Under the assump- x/2. Once applied to the transformed data, this test is tion that the distribution of { Y(x+h), Y(x)} is quite satisfactory (Fig. 2d), hence the multigaussian bigaussian, one has (Emery, 2005a, p. 168): assumption is deemed acceptable for the data under   study. 2x1 x þ 1 8xa0; 2; c ðÞ¼h pffiffiffi C ½cðÞh x=2 ð16Þ x p 2 4.3. Mapping the risk that the pH is less than a given threshold where C(.) is the gamma function. Consequently, in log–log coordinates, the points The pH of the soil has an effect on its fertility. that plot the variogram of order x as a function of When it decreases, H+ ions become sufficiently

Fig. 3. Probability maps that the pH is less than 5.1. The sample locations are superimposed. X. Emery concentrated to attack the clay crystals, releasing ! ordinary indicator kriging (Journel, 1983; Goo- Al3+ ions. At pH equal to approximately 5.1, the vaerts, 1997, p. 301). This method relies on a release becomes pronounced and is detrimental to coding of the pH data into indicator values associ- the crops. Hence it is important to correctly assess ated with the target threshold (5.1) [Eq. (11)], the probability that the pH is less than a threshold of followed by an ordinary kriging of the 0–1 values. 5.1. In the following, four local estimation methods The result is an unbiased estimation of the proba- are compared: bility to be below the threshold; ! bigaussian disjunctive kriging with an unbiased- ! conditional expectation [Eq. (1)] ness constraint (Rivoirard, 1994, p. 70; Chile`s ! ordinary multigaussian kriging [Eq. (12)] and Delfiner, 1999, p. 417). For want of a better

Fig. 4. Error maps plotting the absolute differences between each estimator and the conditional expectation. X. Emery

name, this method will be called ordinary disjunc- Table 2 tive kriging hereafter. It is based on an expansion Coordinates and pH values on the first row of samples of the indicator function [Eq. (11)] into Hermite Sample number Easting [m] Northing [m] pH value and an ordinary kriging of each poly- 1 100.0 325.0 5.54 nomial from its values at the data locations. The 2 125.0 325.0 5.40 spatial covariance of the polynomial of degree p is 3 150.0 325.0 6.11 4 177.0 323.0 4.98 equal to the one of the Gaussian data raised to 5 202.0 323.0 5.25 power p. Unlike indicator kriging, ordinary disjunc- 6 227.0 322.0 5.22 tive kriging avoids the loss of information due to 7 251.0 322.0 4.71 the indicator coding of the original data (Lark and 8 276.0 324.0 4.09 Ferguson, 2004, p. 42). It can be shown that it 9 301.0 324.0 5.36 10 326.0 325.0 5.07 amounts to a full ordinary indicator cokriging, i.e. an ordinary cokriging of the indicators associated with all the possible thresholds (Liao, 1991). west part of the field have a higher pH than the global mean (around 5.5, whereas the average of the entire In each case, kriging is performed in a moving field is 5.2). In contrast, in the sampled area, little neighborhood with a radius of 250 m along the direc- difference is observed between conditional expectation tion of greater continuity (N108 W) and 90 m along the and ordinary multigaussian kriging. These two meth- orthogonal direction (N808 E), looking for the 48 ods account for the multivariate Gaussian distribution nearest samples. This number is deemed sufficient of the data and provide consistent conditional proba- for estimating the local when resorting to ordi- bilities (between 0 and 1), whereas disjunctive and nary kriging. The results are displayed in Fig. 3. One indicator kriging require correcting the order-relation notices that, in the northwest corner, the conditional violations and significantly depart from conditional expectation converges to the that the expectation, even inside the sampled area (Fig. 4). pH is less than 5.1 (namely, 41%), while the other three The discrepancy between indicator and multigaus- methods provide lower estimates (in general, below sian kriging can be better understood on a simple 30%). The reason is that the data located in the north- example. Fig. 5 plots the estimated probabilities

Fig. 5. Estimated probabilities along the east–west transect with north coordinate 322.5 m. X. Emery along a transect that approximately corresponds to the datum of the transect (pH equal to 4.09) and the last first row of data (north coordinate equal to 322.5 m, datum (pH equal to 5.07) play the same role in the see Fig. 2a). The pH values of these data are presented indicator kriging formalism, as both of them are coded in Table 2. As previously mentioned, in the western as b1Q, which is misleading. On the contrary, the other part of the transect, no data are available and the three methods yield a higher risk that the pH is less conditional expectation yields a higher probability than 5.1 in the vicinity of the eighth datum, and a (0.41) than the three other methods. Now, the discrep- lower risk in the vicinity of the last datum, which is ancy between conditional expectation and indicator very close to the target threshold. kriging can be explained by the loss of information produced by the data coding into a binary variable: a 4.4. Use of the probability maps for soil management datum is coded as 1 if its pH is less than 5.1, inde- pendently of whether the value is close to the thresh- The previous methods quantify the risk that soil old or not. In the example under study, the eighth acidity is detrimental for crops and are useful for soil

Fig. 6. Delineation of the areas where liming is needed. X. Emery management and decision-making, such as delineat- 6) estimate the conditional distribution of the normal ing the areas where a treatment is needed. variable at any unsampled location [Eq. (12)], and In this respect, one can calculate the proportion back-transform it to the original unit. of the entire field whose pH is less than 5.1, either from the data histogram (Fig. 2b) or by averaging In addition to its simplicity and straightforward- the local estimations obtained via multigaussian, ness, ordinary multigaussian kriging provides consis- disjunctive or indicator kriging. In every case, one tent results and no order-relation correction is needed. finds a global proportion equal to 41% approximate- If the neighboring information is abundant, it is prac- ly. From this, the probability maps (Fig. 3) can be tically identical to the conditional expectation, which used to delineate the areas for which it is necessary constitutes the bidealQ estimator of the conditional to make conditions favorable for crops to tolerate distributions. Furthermore, it is robust to local varia- soil acidity or to reduce this acidity (e.g. by liming) tions of the mean value over the field of interest, so as to neutralize the toxic concentrations of H+ which is advantageous in under-sampled areas since and Al3+. A simple way is to impose that the it avoids the attraction to the prior probabilities ob- delineated areas cover 41% of the entire field, so served when using simple kriging. For these reasons, as to be consistent with the global estimation. The ordinary multigaussian kriging is expected to always delineations are shown in Fig. 6, in which one outperform bigaussian disjunctive kriging, as in prac- observes that indicator kriging yields results that tice the conditions to apply both methods are the strongly differ from the three other (Gaussian- same. However, the user should beware that the multi- based) methods. gaussian assumption, or at least the bigaussian as- Of course, other approaches are possible for de- sumption, is suitable to the available data. lineation. In particular, instead of using the probabil- Otherwise, one should resort to indicator kriging or ity maps, one may define a (Goovaerts, to disjunctive kriging under an appropriate non- 1997, p. 350) to determine whether liming is eco- Gaussian model (Hu, 1989; Liao, 1991; Chile`s and nomically profitable or not and optimize the Liao, 1993; Chile`s and Delfiner, 1999, p. 398–419). expected farm income by avoiding excessive liming and fertilization. 5. Conclusions 4.5. Discussion on the local estimation methods The ordinary multigaussian kriging approach pro- Indicator kriging may be tedious if many thresh- vides an unbiased estimation of posterior distributions olds are considered, while disjunctive kriging is quite and transfer functions of a spatial attribute, e.g. a soil convoluted for most practitioners. In contrast, ordi- property that can be modeled by a Gaussian random nary multigaussian kriging is straightforward and field. Although this estimator is theoretically less appears to be a helpful tool for soil scientists who precise than the conditional expectation and does wish to map the conditional probabilities of a soil not allow one to measure the uncertainty prevailing property and incorporate these probabilities in deci- at unsampled locations, it is robust to local variations sion-making processes. The steps required for this of the mean value and is therefore suitable when the approach are the following: data do not conform to the ideal stationary model, especially in under-sampled areas. The method is 1) calculated declustering weights to obtain a repre- simple and straightforward to apply, since a single sentative histogram of the original data; variogram analysis and kriging are required. Further- 2) transform these data to normal scores, using the more, it does not suffer from order-relation problems declustered histogram; and provides conditional probabilities that always lie 3) check the suitability of the two-point normality in [0,1]. hypothesis; Like indicator and disjunctive kriging, multigaus- 4) model the variogram of the normal scores data; sian kriging is a helpful method in soil science appli- 5) perform an ordinary kriging of these data; cations concerned with spatial predictions. In X. Emery particular, mapping the conditional probabilities of a Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. soil property is of importance for management deci- Oxford University Press, New York. Hu, L.Y., 1989. Comparing gamma isofactorial disjunctive kriging sions, which are based on threshold values of this and indicator kriging for estimating local spatial distributions. property, such as delineating safe and hazardous In: Armstrong, M. (Ed.), Geostatistics. Kluwer Academic, Dor- areas or identifying zones that are suitable for crop drecht, pp. 154–165. growth and those that must be treated. Journel, A.G., 1980. The lognormal approach to predicting local distributions. Mathematical Geology 12 (4), 285–303. Journel, A.G., 1983. Non parametric estimation of spatial distribu- tions. Mathematical Geology 15 (3), 445–468. Acknowledgements Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. Aca- demic Press, London. The author would like to acknowledge the spon- Lark, R.M., Ferguson, R.B., 2004. Mapping risk of soil nutrient soring by Codelco-Chile for supporting this research, deficiency or excess by disjunctive and indicator kriging. Geo- derma 118, 39–53. and the French Agricultural Research Centre for In- Liao, H.T., 1991. Estimation des re´serves re´cuperables des gise- ternational Development (CIRAD) for providing the ments dTor – comparaison entre krigeage disjonctif et krigeage dataset used in this work. Thanks are due to Dr. Julia´n des indicatrices. Doctoral Thesis, Universite´dTOrle´ans. Report Ortiz (University of Chile) and to the anonymous no 202, Bureau de Recherches Ge´ologiques et Minie`res, reviewers for their comments on an earlier version Orle´ans, France. Matheron, G., 1989a. The internal consistency of models in geos- of this paper. tatistics. In: Armstrong, M. (Ed.), Geostatistics. Kluwer Aca- demic, Dordrecht, pp. 21–38. Matheron, G., 1989b. Estimating and Choosing: An Essay on References Probability in Practice. Springer-Verlag, Berlin. Oliver, M.A., Webster, R., McGrath, S.P., 1996. Disjunctive kriging for environmental management. Environmetrics 7 (3), 333–357. Barnes, R.J., Johnson, T.B., 1984. Positive kriging. In: Verly, G., David, M., Journel, A.G., Mare´chal, A. (Eds.), Geostatistics Rivoirard, J., 1990. A review of lognormal estimators for in situ for Natural Resources Characterization. Reidel, Dordrecht, reserves. Mathematical Geology 22 (2), 213–221. pp. 231–244. Rivoirard, J., 1994. Introduction to Disjunctive Kriging and Non- linear Geostatistics. Clarendon Press, Oxford. Chile`s, J.P., Delfiner, P., 1999. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York. Van Meirvenne, M., Goovaerts, P., 2001. Evaluating the probability Chile`s, J.P., Liao, H.T., 1993. Estimating the recoverable reserves of of exceeding a site-specific soil cadmium contamination thresh- old. Geoderma 102, 75–100. gold deposits: comparison between disjunctive kriging and in- dicator kriging. In: Soares, A. (Ed.), Geostatistics Tro´ia’92. Verly, G., 1983. The multigaussian approach and its applications to Kluwer Academic, Dordrecht, pp. 1053–1064. the estimation of local reserves. Mathematical Geology 15 (2), David, M., 1988. Handbook of Applied Advanced Geostatistical 259–286. Webster, R., Oliver, M.A., 1989. Optimal and isarith- Ore Reserve Estimation. Elsevier, Amsterdam. Emery, X., 2004. Testing the correctness of the sequential algorithm mic mapping of soil properties: VI. Disjunctive kriging and for simulating Gaussian random fields. Stochastic Environmen- mapping the conditional probability. Journal of Soil Sciences tal Research and Risk Assessment 18 (6), 401–413. 40, 497–512. Webster, R., Oliver, M.A., 2001. Geostatistics for Environmental Emery, X., 2005a. Variograms of order x: a tool to validate a bivariate distribution model. Mathematical Geology 37 (2), Scientists. Wiley, New York. 163–181. Wood, G., Oliver, M.A., Webster, R., 1990. Estimating soil salinity by disjunctive kriging. Soil Use and Management 6 Emery, X., 2005b. Simple and ordinary multigaussian kriging for estimating recoverable reserves. Mathematical Geology 37 (3), (3), 97–104. 295–319.