On some variogram habits inappropriate for covariogram

Nicolas Bez June 2004

In geostatistical fora and applications, the intrinsic approach had fully taken the place of the transitive approach before estimation of ecological phenomenon made this latter approach attractive again.

However, a covariogram is not identical to a variogram and some of the geostatistical expertise and usage developed for variograms can not be transposed, as such, to covariograms.

In the present note, three points are considered in turn: 1- The white noises 2- The nested structures 3- Hole effects

What is the covariogram of a white noise?

Introduction In intrinsic geostatistics, a white noise corresponds to a stationary random function without spatial correlation. Its spatial covariance is a pure nugget effect. In this definition, a white noise is thus a particular random function, denoted herafter W(x ) . However the concept of absence of correlation can not be directly transposed to transitive approach first, because this approach considers deterministic regionalized variables (the randomness being attributed in some way to sample locations) and second, because the statistical properties of a realization of a white noise depend on the particular field on which it is studied.

Definition Let us consider the indicator k(x) associated to the field V:

1 if x V k( x ) = 0 if x V

The (geometrical) covariogram of k( x ) is

K( h )= k ( x ) k ( x + h ) dx It is linear at the origin (Matheron, 1965) and gets a range equal to the diameter of the field in each direction. We consider a white noise W(x ) with the following characteristics:m

E{W( x )} = m cov{W (x ), W ( x + h )} = s 2 nugget ( h ) where nugget(h) represents a nugget effect function. When W(x ) represents an unsystematic measurement error, this usually amount to set m to 0. Let us now consider w(x ) , a realization of W(x ) . In a transitive approach, we are only concerned by the restriction of w(x ) to the field V:

wV (x )= k ( x ) w ( x )

We know that

s 2 lim var{W (V )} = V  V which insures ergodicity for the mean (Chilès and Delfiner, 1999, p21). So we know that for large enough fields, the empirical mean of wV (x ) tends to the true mean:

wV (x ) dx lim= limwV (V ) = m VK(0) V 

By extension, we also get the following convergences:

w 2 (x ) dx lim V =s 2 + m 2 V  K(0) w(x ) w ( x+ h ) dx lim V V = m2 V  K( h )

so that, for a large enough field, the covariogram of wV (x )

g( h )=w ( x ) w ( x + h ) dx wV V V is 2 wV (x ) dx if h = 0 g( h ) = wV wV(x ) w V ( x+ h ) dx if not K(0)� (s 2 m 2 ) =if h = 0 K( h ) m2 if not

After developments, we find that, when the field is large enough

g( h )= K (0)鬃s 2 nugget ( h ) + m 2 K ( h ) wV 1 442 4 43 usual contribution for variograms g(0)= K (0)� (s 2 m 2 ) wV

It indicates that the contribution of a white noise is twofold:  It provides a nugget effect equal to the variance of the white noise times the surface of the field  But it also modifies the continous part of the structure by a quantity equals to the geometrical covariogram times the square mean of the white noise This happens to be fondamentally different from the impact of a white noise on a variogram.

Equation 1 corresponds to the covariogram obtained for a very large filed or to the average covariogram of several realizations of W(x ) . It corresponds, as often in transitive approach (see Matheron on the meaning of the fitting procedure), to an implicit stochastic approach while working in a deterministic framework. Indeed, one particular covariogram of a white noise presents fluctuations. These fluctuations are all the more small than the field is large (large compared to the variance of the signal).

As an example, we consider the following 1D white noise:

V= K(0) = 1200

wV(x i )= w i ~ U iid [ 0,1] 1 1 m =;s 2 = 2 12

The geometrical covariogram is a triangle function (Figure 1). The range and the value at the origin are consistent with the characteristics of the field V. The covariogram of wV (x ) is not a pure nugget effect: it is made of a nugget effect and of a linear structure with a range equal to the diameter of the field. However, small fluctuations exist around the linear tendency. As expected, 1 the nugget effect equals 100, that is the variance times the surface of the field ( 1200 ). 12 Regionalized variable Covariogram

k( x )

wV (x )

Figure 1 Covariograms of the indicator of the field k(x) and of a white noise w(x )

Nested structures

When interpreting nested variograms, one often suggest the decomposition of the target random function Z(x) into pairwise orthogonal random functions with different ranges

Z( x )= Z1 ( x ) + Z 2 ( x ) + ... + ZN ( x ) so that the sum of the individual variograms equals the overall variogram:

g(h )= g1 ( h ) + g 2 ( h ) + ... + g N ( h )

This kind of interpretation is no longer possible for nested covariograms. First, let us recall that the range of a covariogram is a geometrical quantity associated to the size of the field in the direction of computation. If we decomposed a regionalized variable into elementary variables with different ranges

z( x )= z0 ( x ) + z 1 ( x ) + ... + zN ( x ) the overall covariogram gets a range equal to the largest range but does not reduce to the sum of the individual covariograms (additional cross terms have to be considered). For instance, for two variables we get

g( h )= g1 ( h ) + g 2 ( h ) +蝌 z 1 ( x ) z 2 ( x + h ) dx + z 1 ( x + h ) z 2 ( x ) dx

g(0)= g1 (0) + g 2 (0) + 2 z 1 ( x ) z 2 ( x ) dx It is not possible to refer to some internal independency (Matheron,1965; p96) as it requires that all elementary variables zi ( x ) get the same field and thus the same range precluding the possibility to observe nested structures with various specific scales.

The particular case of a white noise wV (x ) superimposed to a regionalized variable y( x ) is interesting and workable

z( x )= y ( x ) +wV ( x )

Typically, this model is used when a target variable is observed with some measurement errors (either systematic or unsystematic). We consider that the two components are defined on the same field V and are independent in the transitive sense, i.e. that

gw, y( h )=w V ( x ) y ( x + h ) dx = m w m y K ( h )

When the field is large enough, the covariogram of z(x) is

2 gz( h )= g y ( h ) + 2 mw m y K ( h ) +s w K (0) nugget ( h )

In absolute value, the apparent nugget effect is that of y(x), if it exists, which the case in most 2 practical situations, plus s w K(0) that is a perturbation proportional to the field size.

For unsystematic measurement errors mw 0 . The main impact of the white noise concerns the nugget effect (Figure 2, middle panels). If the mean of the white noise over the field is not strictly null, small contributions are also expected at all distances. For global estimation purposes based on a regular sampling grid, this means that the global estimate is unchanged but that the 2 estimation variance is s w K(0) a wider when considering the white noise ( a being the surface of a grid cell expressed in units compatible with K(0) ).

For systematic measurement errors mw 0 and the white noise impacts all the covariogram structure (Figure 2, bottom panels). Regionalized variable covariogram

y( x )

z( x )= y ( x ) +wV ( x )

wV (V ) 0

z( x )= y ( x ) +wV ( x )

wV (V ) 0

Figure 2 Nested covariogram for y( x ) and z( x )= y ( x ) +wV ( x ) considering the average value of the white noise wV (x ) over the field V is approximately null or not.

A hole effect covariogram model

Introduction

When dealing with a positive regionalized variable z( x ) (e.g. fish concentration), the transitive covariogram g( h )= z ( x ) z ( x + h ) dx can not be negative. Implicitly, which is not theoretically compulsory, only covariograms with finite range are considered in this short communication. As a matter of fact, infinite ranges would be consistent with regionalized variables with infinite field which is practically impossible, and thus useless to be considered. Amongst the usual variogram models, some are no longer admissible for covariograms. In particular, those used to model hole effects, like the cardinal sine funciton taken as an example hereafter, can not be used in a transitive approach. The dilemma comes from the fact that if we want a cardinal sine to be non negative, it will not converge to 0 at large distances (Figure 3a). Conversely, if we force the model to converge to 0, it will be temporally negative which is not allowed (Figure 3b). In addition, the multiple waves present in the model are not consistent with a regionalized variable with only one “hole” (isotropy and repetitive waves).

Figure 3 Cardinal sine model with either positive values (a) or going to 0 at large distances (b).

The question is then to define an admissible covariogram model for hole effects, that is a model which is:  positive definite (to serve for variance computations)  positive (to honor the positiviness of the regionalized variable)

Definition

By definition, convolution products are positive definite. The idea is thus to explicit the covariogram obtained for a regionalized variable of the following form (Figure 4):

z( x , y )= g1 ( x , y ) + g 2 ( tx - x , t y - y )

=C1 g ( x , y , a 1 ) + C 2 g ( tx - x , t y - y , a 2 ) where g( x , y , a1 ) is the probability density function (pdf) of a bigaussian vector of random variables without correlation ( r = 0 ) and with the same standard deviation ( a1 ):

x2+ y 2 - 2 1 2a1 g( x , y , a1 ) = 2 e 2p a1 Figure 4 Plan view of a regionalized variable made of two bigaussian distributions. The circles represent isoprobability lines

Parameters a1 and a2 are expressed in distance units. In the bigaussian pdf they correspond to the standard deviation of each variable. In the present context they correspond to the spatial extension of each dome of the regionalization. Parameters C1 and C2 are expressed in the variable units. They quantify the level of each dome.

The computation of the covariogram amounts to the following: g( hx , h y )=蝌 z ( x , y ) z ( x + h x , y + h y ) dxdy 轾 轾 =蝌臌g1( x , y ) + g 2 ( tx - x , t y - y ) 臌 g 1 ( x + h x , y + h y ) + g 2 ( t x - x - h x , t y - y - h y ) dxdy

蝌g1 ( x , y ) g 1 ( x+ hx , y + h y ) dxdy

+蝌g2( tx - x , t y - y ) g 2 ( t x - x - h x , t y - y - h y ) dxdy = +蝌g1( x , y ) g 2 ( tx - x - h x , t y - y - h y ) dxdy

+蝌g2( tx - x , t y - y ) g 1 ( x + h x , y + h y ) =g11 + g 22 + g 12 + g 21 After developments, we get:

h2 2 - 2 C1 4a1 g11 = 2 e 4p a1 h2 2 - 2 C2 4a2 g22 = 2 e 4p a2 D2 +h 2 -2 - 2 2 C1 C 2 2(a1+ a 2 ) g12 = 2 2 e 2p (a1+ a 2 ) D2 +h 2 +2 - 2 2 C1 C 2 2(a1+ a 2 ) g21 = 2 2 e 2p (a1+ a 2 ) where < D,h > represents the scalar product of Δ and h. Finally, it comes: D2 +h 2 2 2 - h h2 2 轾 2- 2 -2(a1+ a 2 ) - 1 C2 C 2 C C e 轾 2 2 2 2 g( h ) =犏 1 e4a1 + 2 e 4 a 2 + 1 2 犏 e a 1+ a 2 + e a 1 + a 2 4p犏a2 a 2 2 p ( a 2+ a 2 ) 臌1 2 1 2 臌犏

In particular: D2 - 2 2 2 2 2(a1+ a 2 ) 1 轾C1 C 2 C 1 C 2 e g(0) =犏 2 + 2 + 2 2 4p臌a1 a 2 p ( a 1+ a 2 )

When Δ is large enough for the two domes not to overlap (or nearly so):

2 2 1 轾C1 C 2 C 1 C 2 g(0)� 犏 2D 2 and g ( ) 2 2 4p臌a1 a 2 2 p ( a 1+ a 2 )

Interestingly, given that for gaussian distributions, 95% of the data belong to the interval centered on the mean +/- twice the standard deviation, we can define a practical range equals to:

practical. range= D + 2� ( a1 a 2 ) in the direction defined by the two maxima of the two bumps. The following figure represents few situations where a1= a 2 =0.75 and C 1 = 1 with two different distances between domes (Δ) and two levels for the second dome (C2):

Covariogram in three Δ C Regionalized variable 2 directions (x,y,x=y)

2 1

3 1

2 0.5

3 0.5 General remarks

So defined, the “double-gaussian” hole effect function gets a parabolic shape at the origin. Hence it should be considered for regular regionalized variables.

As it is built from the convolution of regionalized variable with two bumps, it also integrates the anisotropy specific to that kind of spatial distributions: gaussian kind of behavior in a direction orthogonal to the line made of the maxima; full hole effect in the direction made of this latter line.

This “double-gaussian” hole effect function is relevant for 2D regionalized variables. It is defined by 6 parameters:

 2 parameters defining the spatial extension of each dome (a1 , a 2 )  2 parameters defining the level of each dome (C , C ) 1 2   2 parameters defining the distance and orientation between the two maxima (D ) In this regards its fitting requires more data than classical models based on 2 parameters only (spherical, exponential, etc). An even more parameterized model can be considered when each bigaussian distribution corresponds to a vector of random variable with correlation ( r 0 ) and with different standard deviation ( a1,1, a 1,2 , a 2,1 , a 2,2 ). But in this case, the model get 9 parameters.

Such model can also be used for variogram fitting.

It also appears that even when the domes are explicit but not enough apart one from each other, they might not be visible in the corresponding covariogram. The covariograms obtained for D = 2 do not get a clear hole effect despite the distributions.