SIMULATIONS USING CONDITIONAL DISTRIBUTIONS DERIVED FROM DIRECT KRIGING

A THESIS SUBMITTED TO THE DEPAFUTMENT OF GEOLOGICAL AND ENVIRONMENTAL SCIENCES AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PAFUTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

By Srinivas E. Rao November 1996 I certify that I have read this report and that in my opinion it is fully adequate, in scope and in quality, as partial fulfillment of the degree of Master of Science in I Geological and Environmental Sciences. /,G.-L*tr"l Andr6 G. Journel (Principal advisor) Abstract

Building on an interpretation of ordinary kriging as an E-type (conditional expecta- tion) estimate, local conditional distributions for the unknown values are built from the kriging weights corrected to be all non-negative. These conditional distributions (ccdf's) compare favourably in terms of accurate probability intervals to distribu- tion models built from the much more demanding multiple indicator kriging. The of these OK-derived ccdf 's are values-dependent (heteroscedastic con- ditional variances) as opposed to the traditionai ordinary kriging (OK) , and as such are better measures of local estimation accuracy. Sequential drawing from these ccdf's, with each simulated value being consid- ered subsequently as a datum, provides direct conditional simulations of the variable shortcutting the steps of normal score transform and back transform of sequential Gaussian simulation. A p-field simulation - drawing from these distributions would be a very fast algorithm allowing quick simulations in day-to-day analyses.

lll Acknowledgements

The reason for my application to Stanford's program was to understand the theory and applications in breadth and depth. Now, at the time of graduation, I am contented with fulfilled objectives. I take this opportunity to thank Dr. Hari Shankar Pandalai who introduced me to the field of geostatistics. I am greatiy indebted to my advisor, Andr6 G. Journel, who guided me along the road at Stanford. Discussions with him opened my eyes in to new vistas and motivated me to learn evermore. To quote my colleague Ian Glacken, this thesis is a testament to his tireless editing, constant questioning, precision of thought, and peerless scientific standards. I gratefully acknowledge the help he rendered, in academics and otherwise. I express my deep sense of gratitude to Clayton Deutsch for providing his able guidance whenever required. I am thankful for his invaluable help and insights in programming and problem solving, throughout my stay. I would like to thank my associates at Stanford, Emmanuel Gringarten, Ian Glacken,'Phaedon Kyriakidis, Ting Ting Yao and Tom Tran for their companion- ship, stimulating discussions, assistance and helpful comments.

iv Contents

Table of Contents

List of Figures vll

Introduction 1 1.1 Motivation 2

OK and SK derived ccdfts 4 2.t The B-type parallel +A 2.2 Sufficient conditions to qet licit ccdf's and class o 2.3 The translation alternative and its interpretation 7 2.3.1 Interpretation 8 2.4 Connections with IK and RK . 10 2.4.1 Non-convexitv . 10

2.4.2 Median IK . . 11

2.4.3 RestrictedKriging 11 2.5 Problem of resolution: the SK solution 13 2.5J Interpretation I4 2.5.2 E-type expression l4 2.5.3 Improving the marginal 15 2.6 Simulation from the ccdf's 16

2.6.I Extending the discrete cdf to continuous cdf: . l6 2.6.2 The sequential simulation approach - OSSSIM: t8 2.6.3 The p-Field simulation approach - OSPFSIM: t9 The Walker Lake validation study 2L 3.1 The reference data set 2I 3.2 Comparing Zf,y to Zff1a and Zfis to ZIk . 24 3.2.1. Ordinary kriging 24 3.2.2 Simple kriging 29 3.3 Traditional estimates 29 3.4 Checking the local class probabilities 32 3.5 Conditional simulations. 40 3.5.1 Sequential simulations 40 3.5.2 P-field simulations 4I

4 Conclusions

Bibliography 52

VI List of Figures

3.1 Exhaustive reference data set and statistics. 22 3.2 Sample data locations and statistics. 22 3.3 as calculated from the exhaustive reference data set. Model fit is the continuous line. 23 3.4 Ordinary kriging zfiy estimates and statistics. 25 3.5 Corrected ordinary kriging z||. estimates and statistics. 25 3.6 a) of the most negative weight in each OK;b) Histogram of correction: zbrc - ztlx 26 3.7 Scattergram of OK estimate vs. true value. u)rbx vs. true z; b) zffy vs. true z. 27 3.8 Data configuration and kriging results at node (x:64, y:163). 28 3.9 Simple kriging (SK) estimates and statistics. 30 3.10 Corrected simple kriging estimates and statistics. 30 3.11 a) Histogram of the most negative weight in each SK;b) Histogram of

correction: zbN - z$x . . 31 3.12 Scattergram of SK estimate vs. true value. u)rbN vs. true z; b) z!]o vs. true z. 31 3.13 Histogram of corrected SK weight attributed to the global . 33 3.14 Scattergrams of traditional estimates vs. true values. 33 3.15 Accuracy scores of probability intervals as derived from, a) ordinary kriging, b) simple kriging, c) inverse distance weighting, d) inverse squared Euclidean distance weighting, e) multiple indicator kriging, f) median indicator kriging. 35

vil 3.16 Scattergram of median indicator kriging E-type estimate vs. a) true values, b) corrected ordinary kriging estimate. 37 3.17 Scattergram of probabilities classes 1-5 : median IK vs. modified OK 38 3.18 Scattergram of probabilities classes 6-10 : median IK vs. modified OK 39 3.19 Sequential simulations with ordinary kriging. 42 3.20 Sequential simulations with simple kriging. 43 3.2r Variogram reproduction in sequential simulations using OK. 44

3.22 Variogram reproduction in sequential simulations using SK. . +o 3.23 P-field simulations with ordinary kriging. 40 3.24 P-field simulations with simple kriging. 47 3.25 Variogram reproduction in p-field simulations using OK. 48 3.26 Variogram reproduction in p-filed simulations using SK. 49

vlll Chapter 1 Introduction

Simple kriging and ordinary kriging algorithms since their inception by Matheron (1971) formed the basis of geostatistics on which other kriging and kriging-related simulation techniques are built. These methods provide an unbiased estimate of the attribute under consideration along with the variance of estimation error. These techniques are simple and have gained popularity among many earth scientists. But kriging has its pitfalls. In a probabilistic framework kriging is used to model conditional expectation of a (the unknown attribute) given a specific realization of the random variables in the neighborhood (the data). The estimates, which are linear combinations of the data in the neighborhood, are smoothed. So these techniques are unsuitable when estimating proportions of extreme values. Also the kriging variance is not conditioned on the data values, it depends only on the data configuration: it is homoscedastic. Thus the kriging variance does not provide a full measure of estimation uncertainty. Kriging produces negative estimates in certain circumstances, even when the un- derlying variable is non-negative. This is the consequence of negative kriging weights, which allows kriging to enjoy the status of a non-convex estimator. Many studies have shown that negative kriging weights occur at the periphery of the kriging neighbor- hood. Deutsch (1996o) suggests a correction for such negative weights so that the resulting estimate stays within the of data used. Last, kriging estimators do not reproduce the spatial correlation observed in the CHAPTER 1. INTRODUCTION data. Conditional simulations, produce alternative realizations with the observed spatial correlation, including the spatial variance. Thus the simulated values are not smoothed. Simulation algorithms have become commonplace in geostatistical applications and different models are available for different scenarios. The usual prerequisite for simulation is a local conditional cumulative distribution (ccdf) from which a sim- ulated value is drawn. Under a multiGaussian hypothesis, the OK or SK derived kriging estimate and kriging variance are sufficient to characterize the local Gaussian ccdf. However, the multiGaussian assumption, even when the data set is univariate Gaussian, rarely holds true. An alternative is to use the non-parametric Indicator kriging (IK) or co-kriging. Because of the modeling and computational burden in- volved in indicator co-kriging, IK has become the choicest tool for building the local ccdf 's. Indicator kriging requires performing a kriging at each of a chosen number of thresholds. This calls for modeling and solving a linear system of equations at each threshold. If each of the indicator variograms are proportional to each other and to the variogram of the attribute, the kriging weights obtained at all these thresholds would be the same. In such a case, corresponding to a mosaic model, a single kriging system performed at median threshold suffices: this is median indicator kriging. Multiple indicator kriging, as opposed to median IK, exploits the difference in the indicator variograms at different thresholds. But when indicator variograms are very much different, order relation problems in the resulting ccdf occur. Modeling all indicator variograms with the same structures and smoothly varying parameters, thus bringing them closer to the mosaic model, would alleviate these order relation problems.

1.1 Motivation

The present work is motivated by the above discussion. The objective is to derive non- parametric IK-type ccdf's through a single OK or SK system for use in simulations. This requires interpretation of the OK or SK estimate as a conditional expectation CHAPTER 1. INTRODUCTION

(E-type estimate). To allow such interpretation, negative weights, if any, must be corrected. A method for correcting the negative weights is proposed. Subsequently, utilization of the resulting local ccdf's in conditional simulations is demonstrated. Chapter 2

OK and SK derived ccdf's

The kriging aigorithm being data values independent does not account for the phyt- ical range of variability of the estimates. Kriging weights can be negative leading to negative estimates. If weights were modified to be all non-negative, they would en- sure positive estimates of non-negative variables and allows the computation of local probabilitv distributions.

2.L The E-type parallel

Consider the ordinary kriging (OK) estimate of value z(u) at location u from n(u) neighboring data values:

n(u) Dr3'o(") z(u.) (2.r) "bx!): a=1

rz(u ) with: Dr3"(") : t a=1

Split the physical interval of Z-variability into K classes fz^n, zr), (rr,rr],..., (" r-r,, rf ,..., ("*-t,z^o,1. Regrouping the n(u) data values into their respective classes, the OK estimate (2.1) is rewritten as an E-type estimate:

K rir(u) z6N\u)r : Lf.- L\- \nh', k=7 ax=l ^;;'(u).2(uo*) CHAPTER 2. OK AND SK DERIVED CCDF'S

ri 1oK1rr1 or(t) "r(u) 1+.2(u,o) I ' tu. cr"(u) K=l-. ak=l K I o*(") -mf,6@;k) (2.2) k=7

where: K n(u) : t nr(u), k=1 with ,r(t) being the number of data z(u.,) falling in class k 'r (u) or(t) : D r3_to(") is the sum of the OK weishts of class-k data ar. =1

K '(u) I"*(") : t.\!K1u; : t k=l a=1

*bx(u;k) : DZor!"J #r(u.,) : kth-class mean estimate

"$l i3-1(r,ri : r, v k:1,...,K fl ct(u)

Recall the expression of the conditional expectation of random Z(l) at location u: K E{z(w)|"(")} : D p@; k) . m(:,; k) (2.3) k=l

k) : Prob{Z(u) € class kln(u)i with: { r'("; [ -(.t; k) : E{z(u)lz(tt) € class k,n(u)}

The OK estimate (2.2) thus appears as an estimate of that conditional expectation, i.e., it is an B-type estimate:

K ,bx6) : D pbxfu; k) ' m[y(u; k) (2.4) &=1 CHAPTER 2. OI( AND SI{ DERIVED CCDF'S

n&(u) pbx(u;k) : c6(u) : D l3ft"l "-r,l rgr?lrl' rn[n(u;k) : ji+.2(uo*) I 0/t( u ok=l ,| 2.2 Sufficient conditions to get licit ccdf's and class means

The E-type interpretation (2.4) can be extended to any estimate which is a weighted linear combination of the data, including such traditional estimate as inverse (Eu- clidean or variogram) distance weighted estimates. The probability for Z(u) to be in class k is estimated by the sum of the OK weights of the rr*(r) data falling in class k: the more such data and the greater their OK weights, the larger the estimated class probability pbx (u; k). The conditional ktfr class mean by a weighted linear combination of those data, with is estimated "r*(r) the weights being restandardized to sum to 1:

nr (u) : r,k: r, . ,K ap=lt #S These implicit OK estimates of class probabilities and class means 'make sense' except for negative OK weights )3f < 0. The possibility of such negative weights might cause the implicit OK class mean *bx(u;k) to be valued outside its class ("x;,znl and the implicit OK class probability oft(u) to be valued outside [0,1]. A sufficient (but not necessary) condition for both ofr(u) and mlro(u;k) to be licit estimates is that all OK weights \o"f be non-negative:

r3j"(") ) o, Vo : 1, ..., n(u) (2.5) CHAPTER 2, OK AI,{D SI( DERIVED CCDF'S

K pb^(t; k) : on(u) e [0, t], and I c6(u) : 1 k=1 ensures: I nr(u) trofi111) rn|6(u;k) : *rc (rx-t, zlrf Vk : rr..., K t +z(uo,)\ ' € , 1 ok=r cl(u) hence: a: 1,...,"(t)), mar(z(t,), a 1, ..., .o K(u) € lmin(z|to), = "("))]

-- fz*;n, z*o"f

The constraint (2.5) while ensuring that z|7r-(u) is within the physical interval l"*nn, z^o"f also forces the kriging estimate to be convex, that is to reside in the data interval [min(z(u,)), max(z(u,))]. We would wish to relax this latter limitation, see later section 3.5.

2.3 The translation alternative and its interpre- tation

An immediate solution to correct for the non-negative condition (2.5) is to add a constant positive value to all n(u) OK weights if and only i,f there is one (or more) negative weight(s). More precisely consider the translation-type correction followed by a restandardization to sum to 1:

: ,13"(") + coKlu; > 0, 0 : 1r...rfr (2.6) "3K @) with: coK(u) (u),9: 1, ..., 0 -min(\flK "(r);0) > modulus of largest negative weight

ri(u) ,so^(n) : D r?nfu) : 1 + n(u)col{(u) > t ^-l ,,oK t,,t t(u) : ;# > o, and I r"o^(") : r "ft(,r) 5"" (u) n=1

The correction (2.6) is implemented if and only if the constraint (2.5) is active, CHAPTER 2. OI( AI,{D SK DERIVED CCDF'S

in which case coK(u) is different from zero: it amounts to add the modulus of the largest negative weight to all original OK weights, then restandardize the sum to 1. As a consequence of correction (2.6), the resulting implicit class probability and class mean estimates are all licit and the corrected OK estimate 2ff71 is within physical bounds.

"(u) ,or.(") : D rStt(") z(u,) € lz^;n, z^o"f (2.7) a=1 /{ I poo("; k)m[s(u;k) fr=1

with : (u) "r p1r$;k):b?o(u) : I"f,^f") e [0,t] ak=l '(u) Df=rb|'o(u) : I"f;o(") : t ,,*ftifor,,,1 : \: 'ar \-/-1" mff6(u:k) krmz(u*) € (zP-1'zPl Remark: The estimafor (2.7) preserves both the unbiasedness and exactitude prop- erties of kriging. Indeed the corrected weights rfK(u) sum to 1, and at a datum location r, - uao, )!f(u;:1, )!K(u):g, Va I oo; no weight is negative, hence the constraint (2.5) is inactive and the correction (2.6) is thus not implemented.

2.3.L Interpretation

Recall the fundamental relation between ordinary and simple kriging (SK), whereby the OK estimate is shown to be a SK estimate with the constant stationary mean m replaced by a location-dependent OK estimated *br<(u), Matheron (1971), Deutsch & Journel (1992, p63):

SK: ,3r<$) )j1{ (u)z(u, ) +,tf/{ (,r;- (2.8) CHAPTER 2. OI( AND SI{ DERIVED CCDF'S

with: fft(") - 1-Dri"(")"(u) o=1

n(u) : Ir3"(") z(u*) "bx$) ^-1 "(t) : D rj"(") z(u*) + )f1{(u)nzbr.(r)

with: mf,*(u) being the OK estimate of the local mean.

,bx@) : z\x(u) + )fK(u; lm[s(u) - m] (2.e) In terms of residual kriging this last relation is written: *m*)fo(") *m] 'brc$)-m: r3r<(u) [rni6(u) (2.10) OK amounts to a translation of the SK estimate to account for the difference between the OK-derived local mean -bir(") and the stationary mean m used by SK. Now return to the corrected OK estimate zi[(u) given by relations (2.7): n(u) ,AN@) : I"ftt(") z(u,) a=I t, I * n(u) '"oto(,') Q.n) Ir * n(u) '"oK(u) I f,a"t"i ",(")] t "(u) with: r?z(u) : z!t') ;o "E Or, in terms of residual krr1 glng:

(u) : I r5r, - mbx(t) -1+;(u);o-n'1ut * n(u) ."oo(.') . Ir'bx$) - -br(")l [a(") - -bro(")]] (2.12)

Similar to the SK-OK correction expressed in relations (2.9) and (2.10), the pro- posed correction (2.7) amounts to a translation of the original OK estimate to account for the difference between the data arithmetic average rir(u) and the estimated mean *bx$) used by OK. CHAPTER 2. OI{ AND SK DERIVED CCDF'S 10

2.4 Connections with median IK and RI{

Recall the E-type expression (2.7) of the corrected OK estimate, then compare it to the indicator kriging (IK)-derived E-type estimate, Deutsch & Journel (1992, p66):

,iir@) : I poo(,r; k)mffs(u; k) (2.13) k=l K zfi k=1

This comparison suggests replacing in the OK-corrected expression the local class probabilities p**(u;k) bV their IK-derived counterparts pir(u;k) which were derived using class-specific indicator variograms. One would keep, however, the OK-derived class means nzffK-(u; k) which use the specific rr*(t) local data, as opposed to the location-independent marginal class mean values nzp. The new improved estimate would be written: K ,ti<$) : D piro("; k) . ml]fr(u; k) 12.r4) ,b=l (2.15) where, ( miiyft;k), if n6(u) )no moK\uitc)***, ,\ : | ' 1 . | *r, otherwtse n6 is the minimum sample size for the local mean estimate mAx(u; k) to be retained with confidence.

2.4.L Non-convexity

Because of order relation correction which involves some smoothing of the original IK class probabilities pi6(u;k), k:1,...,K, some of these probabilities may be non- zero although the corresponding OK-derived probabilities pAN@;k) are zero. This is particularly true for the last class K. Consequently, the improved OK estimate can be valued outside the range of the n(u) data values retained, although "Afi(ru) constrained to be in the physical interval Lr*on, z^orf. CHAPTER 2. OI( AND SK DERIVED CCDF'S 11

2.4.2 Median IK

A further approximation would amount to using the same indicator variogram for all K indicator krigings and solve only one system (instead of K), provided no interval- type data are considered, Deutsch & Journel (1992, p75). If that single indicator variogram is scaled from the Z-variogram used for OK, then the approximation is that of median indicator kriging. As long as the original OK system does not yield any negative weight, i.e., if no correction of type (2.6) is needed, median IK performed on the same n(u) data ,(u") as used by OK would result in a ccdf identical to that derived from the class probabilities p|6(u;k) defined in Q.\. Indeed median IK with the single indicator variogram scaled to the Z-vanogram yields IK weights identical to the OK weights )3to(") since the data configuration n(u) is the same. The resulting ccdf value for threshold zr is then:

Prob"{Z(r) < z1,ln(tt)} : D3!"1 )3tt(") i(u.;z1r) with i(u,; ,r) : 1, if datum z(u") 4 zp, : 0 if not Thus: Prob*{Z(u) e class k l"(")} : D3!"/ )3o(") [i(r.; 21,) - i(u.; rr-')] D;!"? )!to(") = pbr(u), as in relati on Q.a) Indeed the difference [i(u,; z*) - i(u.;21,-1)l is different from zero and equal to 1 and only if the datum z(u,) falls in the kth class (26-1, z,r.]. It thus appears that derivation by OK of the conditional distribution {ptircG;k), k : I,...,1{} is but a median IK procedure, with the single but critical difference that order relations are ensured by a prior translation of the OK weights instead of being established by a posteriori correction of the conditional probabilities, recall Deutsch & Journel (1992, p78).

2.4.3 Restricted Kriging

Rather than replacing in expressions (2.13) the implicit OK-derived class probabil- ities pffe,(u;k) by their IK- derived counterparts one might consider imposing such identification as constraints in the OK system. CHAPTER 2. OI( AI{D SK DERIVED CCDF'S 12

Consider, for example, the original (uncorrected) OK estimator and system, C(h) being the model: n(u) ,br(u) : ! .\!I{z(u") a=1

'(u) with: Dr3o(")c(uB-.,.) +p(") : c(u-u.'), e:r,...,n(u) 0=l "r(u) | 'r!1{1u; : t lJ=7

From the E-type interpretation (2.4), the sum of the rr*(r) weights f3-"(") asssociated to class k-data is the implicit OK estimate of the class probability: nr(u) pf,y(u;k) : D. ck=r ^3ft"l This sum is constrained to identify the IK-derived class probability: rr(u) I i3-t(") : pio("; k) dk=7 or, equivalently:

"(u) I f3"(")i(u,; k) : pix(u;k), k : I,..., K a=l where i(u"; ,k) : 1 if z(u) e (rn-t,zpl, i(uo; k):0 if not, is the indicator of class k. Consideration of these K linear constraints on the OK weights results in the 're- stricted kriging' (RK) estimate and the system proposed by Arik (1992):

n(u) zfio(u): Drf^(")z(u,) (2.16) a=1

"(u) K with: I ffro(")C(uB-..,) +!pr(u)i(r.; k) : C(u-uo), e:I,...,",(r) 0=l #=l n(u) I lfo(")i(rB; k) : pi^(u; k), k :1,...,1{ 0=7 CHAPTER 2. OI( AND SK DERIVED CCDF'S 13

RK does not ensure that the resulting estimate is valued in the physical interval lz*;n,z*o"], since the weights )fK(u)'s are not constrained to be non-negative. One might consider applying to RK a translation correction in all points similar to (2.6), that applied to OK.

2.5 Problern of resolution: the SK solution

The OK estimate (2.7) considers only n(u) data for the estimation of K class probabil- ities and means. In cases where local data are sparse, leading to unreliable estimates of class means and probabilities, one might use the global information in the form of marginal class probabilities and means through SK. Simple kriging has one advantage over ordinary kriging in that it considers the global mean rn in addition to the local neighborhood data; hence and implicitiy, SK retains data values which are not included in the sample of size n(u) retained by OK at u. Consider then repeating the translation-type correction (2.6) to the n(u) + 1 SK weights as defined in the expression (2.8) of the SK estimate:

,jto(n) rito(") 1 csK(u) > 0, a : 0, I,...,D (2.I7) n(u) ,tfo(") 1-\- rjo(")

rsl{(,r) -min()sBK ("),9 : o, ..., n(u);o) > o "(u) s"to(,r) | ,jK1u; : t + [n(u) + t]csK(u) > t o=0 zlK(u) 0 : 1, ..., n 'jr((") sffi >0, 0, n(u) \-.sk-, )_ r;" \n) : 1 a=0 The corrected SK estimate then: r(u) zsr, (u] : D z(u,) + rfK(u)m (2.18) a=l "j"(") CHAPTER 2. OI( AND SK DERIVED CCDF'S I4

Again that estimator preserves the property of unbiasedness and exactitude of the original SK estimator.

2.5.L Interpretation

Replace expression (2.18) the weights rj'o(r) by their expression given in (2.17):

1 [* zsx : csKlu; + tuJ t+csK(u)[n(u) +1] Lzsr<(u )+ [n(u)rn(u) -]]

Or, written terms of residual estimation:

,3)r(u) m : rn]+ .n(u)[riz(u) (2.re) - *iil flr;a-t") - ""n(,r) - -]]

Similar to expressions (2.10) and (2.12), expression (2.19) shows that the proposed correction amounts to shift (translate) the original SK estimate to account for the difference between the data arithmetic average riz and the stationary mean m used in SK. The weight given to the difference [t?r(") - m] is proportional to n(u) i.e., to a measure of reliability of rir(u). This interpretation suggests that the order relations of kriging (both SK and OK) originate from the 'inconsistency'between the r?z(u) of the local data retained and the actual mean used, m for SK and rn|6(u) for OK.

2.5.2 E-type expression

The corrected SK estimate (2.18) is rewritten in E-tvpe format:

"(u) ,T*@) : D + rf((u)rn a-1 "j^(")z(u,) (u) K n* ,-sn'1r, 1 I{ aftt(.t) +rf/{(u) 'Dpr*r --\- ./J ' ): Hz|'o)bi" l=1 #t (u] k=L KK : (1 -'fo(")) .Ipblo(";k)m{o@;k) + .Dpr*r (2.20) k=7 "fft(,r) k=l

** / t \ D3:!"')'i-^(") € Psn(uircJ i - rfK(u) [0,1] CHAPTER 2, OK AND SK DERIVED CCDF'S 15

1 n(u) Df=, pbi,(u;k) : , .i of*' ' : I : l -+-- ror^ (u) ?-_, '^(t') l -4m.GJ 'itt(tt)

mfir(u; *) : € ( zn-t' z*l *f-=, ffi't"'*) pk € [0,1] : stationary marginal probability of class fr mu € lzn-t, znl : stationary class k-mean

As opposed to the E-type expression (2.7) of the corrected OK estimate, the corrected SK estimate shows an additional term D[=rpp** weighted by rfK(u). This additional term carries the additional information (m) used by SK. Were a class, say k6, not represented by any of the n(u) neighbourhood data retained:

p}xfu; ko) : pitr(u; ko) : 0

This class does not contribute to the corrected OK estimate, it still contributes with probability tfK(") .pto to the corrected SK estimate. This advantage of SK could be critical in cases of sparse data: rarely are all K classes informed by the n(u) data retained, particularly the extreme classes.

2.5.3 Improving the marginal statistics

The problem with using the marginal statistics pk and mp is that they are stationary and hence do not account for local trends within the study area. Ordinary kriging accounts for such trend by re-estimating from the local n(u) data these statistics. A compromise alternative would consist of replacing in expression (2.20) the stationary statistics pktrn* by prior regional estimatespr(u), -r(r),'regional'in the sense they are location-specific as plio(u; k) or mio(";k). Such compromise could be achieved by running, prior to the SK's, some filter to provide the smoothly varying prior values pr(u) and rn6(u), David, Marcotte & Souli6 (1984). That filter could be as simple as a with the window size much larger than that used to retain the n(u) data for each local SK. Or, in presence of local data clusters that filter couid be a kriging of the trend, Deutsch & Journel (1992, p66), again retaining more data than "(u). Or, the stationary values pft may be kept while the m;'s are replaced by some CHAPTER 2. OI( AND SI{ DERIVED CCDF'S 16

secondary type information. Or, conversely, the stationary values rnk are kept while the p6's are replaced by input'proportion curves'pr(u), e.g., regional proportions of K facies types as derived from seismic data and sequence stratigraphy interpretation.

2.6 Simulation frorn the ccdfts

The OK- and SK-derived local ccdf 's provide the class probabilities pl.(u; k) at thresh- olds k: 1,...,1(. Theses pdf values can be extrapolated beyond the last classes and interpolated within each class to obtain a smooth and continuous ccdf. The contin- uous local ccdf 's can be used in the simulation of the attribute. Both the sequential simulation and p-Field simulation approaches can be considered.

2.6.t Extending the discrete cdf to continuous cdf:

Because of data sparsity ("(") is small), the local ccdf 's F.(u; z) are typically dis- continuous with many classes with zero probability. If one wish to correct for such discontinuities, the ccdf values should be interpolated (smoothed) between thresholds and extrapolated to z^6n arrd z^or. The corrected OK and SK techniques provide local class probabilities pl"(u; k) and local class means rnl.(u;k). The corresponding local cumulative probabilities are: k, r.(u; k') : Prob{Z(") S zk,} - r ("; k),k' : r, ..., K K=LI If no information exists about the shape of the cdf between estimated thresholds, a uniform distribution is used within each class. Typically the original probabilities r (u; k)s are turned into a continuous cdf F.(u;z) through within all classes and extrapolation for the last class, see hereafter. The corrected OK and SK approaches provide the class means m].(u; k), k : I,...,K * 1 in addition to the class probabilities. The previous and extrapolations must be consistent with these class means in that:

z f $; z) dz - -**(r) CHAPTER 2. OK AND SI( DERIVED CCDF'S I7

Within class interpolation:

Consider the ccdf F.(t; z) derived through OK or SK. In fact fl.(u; z) is a combination of K+1 functions Ff (z),k:7,...,K*1, defined in each of the (K+1) classes ("n-t,t*) with zo : zmin and zyal: Z^o* The functions Ff (z),k:1,...,K*1, are for example power model, see Deutsch & Journel (1992, p133):

0, Yz I zp-1 I Ff (') -t W*]-, vz€(21,-1,21,1 1, Vz ) zp

Let ff (z) represents the piece-wise pdf in (zp-1, zp]: dF* ff (,) - d.z'ro-r,"o\z) u (- -, -\u - lzP-21,-r)u \' - 'K- r I

Identification of the local class mean calls for:

l , dz : rn..(u;k) Jt ftQ)

zza I zp-1 I z - z*-t l- l'* uzp ! zp-1 i------;-- i r : iit--\u;tu/ w+I Lz*-zk-lJ l,o_, w*I m**(u;k)-zn-t _ _ (2.2r) za - m"*(u; k) The parameter w of the power model in each class (zp-1,zpl, is used to identify the local class mearr.

Tail extrapolation:

Upper tail extrapolation is typically the most consequential, since minor changes in the tail of a ccdf lead to sisnificant fluctuations in the slobal statistics of the distribution. Hyperbolic extrapolation, Deutsch & Journel (1992, p134), is most commonly used for upper tails. This function is defined as: CHAPTER 2. OK AIVD SI( DERIVED CCDF'S 18

F-.s(z): l-]' ra)1, z->)')0 ' zu' Let f*,{z) be the corresponding piece-wise pdf for the last class:

lry' f-'x(z): zu+r Identifying the last class probability leads to:

r , , )T.tr f *,^(r) : Fi: p--(u; Ii + l)

p**(u; K+r) =*): +Au) Identification of the last class mean calls for:

['- z f-'^1z1dz : m"*(tt: -& * I) J "*-,

*.zr--1""y"' : fiVt--X- rL-*) : m**(u;K * 1) .---(r;1i ,^:w, (1 - u,) + 1)

Substituting the value of ) gives the value of tr:

,*+t .lzlr-f, zrx -l rn..(u; 1{ + 1) - _ (2.22) (1 -.) p**(u; K + I) The relation 2.22 shows that as for the power model, there is no degree of freedom left.

2.6.2 The sequential simulation approach - OSSSIM:

In this approach, each grid node is sequentially visited along a random path. The local ccdf .F..(rr) is estimated for the K thresholds through OK or SK. A simulated value zl(u) is drawn such that: : "!(u) F.-1(u;ur) CHAPTER 2. OI( AND SK DERIVED CCDF'S 19

wnere: zl(u) is the lth simulated value of the attribute z(u) using the ccdf's given by OK/SK F.-t(.) is the quantile function corresponding to the local ccdf F.(u;z) ul is the random number used in /tA simulation" uniformly

distributed in [0,1]. The simulated value zl(u) is considered as a datum for simulation at subsequent nodes. If from one location u to another location u' the neighborhood data n(u) change, the local ccdf's might be quite different, particularly if n(u) small. This fact when combined with the random drawing of the probability values may yield widely differing simulated values in neighboring locations causing an artefact high nugget effect. This problem can be overcome by utiiizing a p-field approach, where the probability values are correlated, thus assuring the draw of more similar values at neighboring locations.

2.6.3 The p-Field simulation approach - OSPFSIM:

A non-conditional simulation of uniform transform of the attribute (p-field) is carried out as a preliminary step. Next, at each location u, the local ccdf F.(u; z) is obtained bv either OK or SK. The simulated attributes are:

z!(u1 : F-t(r;pr("))

zj(u) is the /th simulated value of the attribute z(u) F.-t(.) is the quantile function of the local ccdf F.(u; z) p/(u) is the lth simulated probability value at location u, The zt simulations have the spatial correlation induced by the p-field.

OK derived ccdf's in the absence of data:

Unlike the SK estimate, the OK estimate is undefined if n(u) : 0 or deemed too small. At such locations one can draw from the global marginal distribution or, CHAPTER 2. OK AND SI{ DERIVED CCDF'S better, implement SK. Such SK would use a local moving average defined over an extended neighborhood larger than the OK search neighborhood. Chapter 3

The Walker Lake validation study

To enable evaluation of the performance of the various algorithms proposed in this thesis, an exhaustively sampled reference data set was considered. The public domain and well-published Walker Lake data set was selected (Isaaks & Srivastava 1989). More precisely, the western half of that reference data set was retained, comprising 39,000 regularly gridded values.

3.1 The reference data set

The exhaustive data is shown in Figure 3.1 with its statistics. A sample of 195 data values at random stratified locations was taken. see Fiqure 3.2 for sample locations and statistics. The variogram map built on the exhaustive data set shows a clear Ng"W direction of greater continuity; the corresponding variograms along the two principal directions Ng"W and l/81',8 with the model fit is shown in Figure 3.3. The variogram model adopted includes azonal anisotropy component along direction N9'W:

1(h,' , ho') : cl,*$2lrp(*f) * r2osph(#) * rblsph(#)] t"r where,

2I CHAPTER 3. THE WALKER LAI(E VALIDATION STUDY 22

The Exhaustive Data Numbe.ofData 39000 mean s/.gJ std. dev. 575.00 4000.00 coof. ofvar 1-87 milimum 9499.51 upp€rquartile 356.30 median 54.01 lowerquarlils 5.16 minimum 0.00 o d o L 0

0

0.1

0

East 130.0

Figure 3.1: Exhaustive reference data set and statistics.

Location Map of the Samples StatistiB ol Sampla Data @ NumborofData 195 mean 294.55 std. d6v. 569,81 c@f- of var i.93 mdimum 4434.50 upper quartile 352.70 median 60.58 lowsr quartile 5.52 minimum 0.00

Figure 3.2: Sample data locations and statistics. CHAPTER 3, THE WALI{ER LAI(E VALIDATIOI\I STUDY 23

301 .50

1.50

o.75

a) Variogram map of exhaustive data

Along N9W Along N81E

b) Variograms in two principaldirections

Figure 3.3: Variograms as calculated from the exhaustive reference data set. Model fit is the continuous line. CHAPTER 3. THE WALIGR LAKE VALIDATIOT\I STUDY 24

h - (h,,,ho,) is the separation vector with component h,, in direction l/81o8 and

ho, in the perpendicular direction ;

Exp(h/a) is an exponential structure of practical range 3a, that is 7.95 grid units for a:2.65,

Sph(h/a) is a spherical structure with range a,

e is a small nugget variance added for matrix stability,

and C:1000 is a multiplicative factor.

Rather than a model fit from the sample variogram, model (3.1) will be used in all subsequent krigings to avoid clouding the results with unrelated (although important) problems of variogram model accuracy.

3.2 Cornparing Zb* to 2ff6 and Z3* to Z!|1

Estimation at all 39000 iocations is carried out using the 195 sample data. The OK and SK estimates are compared to their modified equivalents to evaluate performance of the latter.

3.2.L Ordinary kriging

Ordinary kriging was performed GSLIB program kb2d. A large search radius of 100 units was considered allowing to retain at all 39,000 estimation nodes the nearest 16 data locations. The resulting grey scale map of the OK estimates zf,o(u) and corresponding statistics are shown in Figure 3.4. Almost all (exactly 37,719) OK systems yielded at least one negative weight caus- ing 939 original OK estimates to be negative: the largest negative value is z|ro(u) : -33.97. The translation correction was applied resulting in the statistics of Figure 3.5: ali corrected estimates zfip(u) are now positive. The spatial variance of the ,tiN@) is siightly smaller than that of the original OK estimates which indicates a greater smoothing effect (as expected but insignificant here). CHAPTER 3. THE WALKER LAIrc VALIDATIOI{ STUDY 25

The oK &timate - z'(oK) Numb€rolData 39000 moan 275,99 0.250 std. dev. 380,26 c@t. ofvar und€fin€d maimum 3952,28 upporquartile 320.77 0.200 modian 139.20 lowerquartile 58.EE minimum -3i|.97 5 o.rso g 4 0.100

0.050

0.000 -34. 966. 1966. OK Estimate

Figure 3.4: Ordinary kriging zf,s estirnates and statistics.

The OK estimate - Z"(OK) The oK Estimate - T'(On 300 Numb€rofData 39000 moan 279.16 std. dev. 355.25 coef. of var 1.27 maxtmum 3/bz-u/ uppor quartilo 3'|8.85 median 155.84 low€r quartile 75.20 minimum 1.00

ni

EaSt 130.00

Figure 3.5: Corrected ordinary kriging zli- estimates and statistics. CHAPTER 3. THE WALKER LAKE VALIDATION STUDY 26

Most Negative OK WeighE D ifierence z' (OK).Z"(O K) Number o{ Data 3r/19 Numb€. of Data 3n l9 numbertrimmed 1281 numb€rtrimmed 1281 mean -0.02 mean -3.27 std. d€v. 0.01 std. dev. 39.75 [email protected] undefined c@t. otvar undotined mdimum 0.00 maimum 333.59 upperquartils -0.01 upps.quarlil€ 9.05 median -0.02 median -5,05 lry€r quartile -0.02 loworquariil€ -21.42 minimum -0.O7 minimum -166.70

0.0300

0.0200

0.0100

0.0000 .0.0366 -0.0266 -0.0166 -0.0066 0.0034 z'(OK)-2"(OK) (cor=1)

Figure 3.6: a) Histogram of the most negative weight in each OK; b) Histogram of correction: zbx - zffx

The histogram of the 37,719 most negative OK weights, is given in Figure 3.6a. Both the largest negative weight and the mean are low, respectively -0.067 and -0.016; but again there are many such negative weights. As a consequence, the correction zbx - zffyis generally small, see histogram of Figure 3.6b. The correlation between trt.e z and the 37719 corrected OK estimates zffo is 0.541 almost unchanged from 0.546, that between z and the corresponding zf,o,see Figure 3.7. It appears that the proposed correction does its job without much loss in accuracy. One location (Bast x:64, North y:163) at which there was a high correction was selected for detailed investigation. Figure 3.8a gives the corresponding grey scale data location map; only the 16 data locations retained are marked with both the original OK weight \o.K in parentheses and the corrected weight rfr{ above. Figure 3.8b gives the table of the 16 coordinates and data values,uo : (ro,Uo), zo, and corresponding weightt )3t and r!K, o:1,...,16. The original OK weight of the closest datum, max()!r):0.403 was decreased to 0.332; conversely the largest original negative weight min()!K) :-0.016 was increased to zero. Note on Figure 3.8a that all negative weights are given to outer data (screen effect of closer data). CHAPTER 3. THE WALKER LAI{E VALIDATION STUDY 27

True vs. OK Estimate(z') Number of data 37719 Numbor ol data 37719 Numb€r plotted 940 Number plott€d 964 Numbor trimmed 1281 Numb€r himmod 1281 x Variablo: nsan 310.71 X Variable: moan 310.7i 3000. std. d6v. 580.78 std. dev. 580.78 Y Variablo: m€an 278.86 o Y Variable: moan 282.'13 std. d8v. 386.08 i6 std. d€v. 360.58 corelalion 0.55 cffelalion 0.54 rankconelalion 0.63 u rank cdelation 0.61 ;2m0. 6 zmo U Y o E oo 1 000.

0. 2000. 4000.

Tru€

Figure Scattergram of OK estimate vs. true value. a)zbx vs. true z; b) ,iix vs. true z. CHAPTER 3. THE WALKER LAKE VALIDATIOAT STUDY 28

a) OK weights at Location (64,163) b) Data used and the weights in OK x Y Vmiable )3" 3" ,o;9?h o.o4o C)(0 034)

rc?ati,?o 0.11e v(oozot..,9:0" 64.000 159.000 4434.499 0.403 0.332 @io.iilt 73.000 166. OO0 620.704 0.141 0.124 0.01 1 0.124 o(0.002) 60.000 176.000 2251.A52 0.134 0.119 0.066 68.000 150.000 344.422 0.123 0.111 (o141M ^ 0.039 @(0.064 u(oBa ort:o;;4) 79.000 165.000 2685.309 0.033 0.039 49.000 166.O00 683.211 0.067 0. 066 o?a"i?'t 76.000 178.OOO 1211.651 o.o20 0.029 63.000 144.000 110.154 0.031 0.037 0.111 67.000 185. OO0 2s92.4O7 0.034 0. o40 @p.123) _ o.o [email protected]) 46.000 180.000 426.687 -o. o01 0.013 0.012 59.000 189.000 1661.965 o,o47 0.050 (-o.oo1 a(6.6ii^one7 )o.\o 006 ) 90,ooo 171.000 181.784 -0.002 0.01 1 "(-0.00e) 47.000 143.000 289.356 -0.001 0.012 86.000 147.000 1183.198 -0. 016 0.000 92.000 164.000 42.393 -0.004 0.010 49.000 140.000 69.314 -0. o09 0.006

2530.661 224i.909 True Value 1128.505

d) Data used and the weights in SK c) SK weights at Location (64,163) X Y Variablc ,fltr r:" 0.049^ n.o4$v 0.039 [email protected]) 0.012 rliidito d|'f,rt 64.000 159.000 4434.499 0-402 0.314 @(0.133)^0114 73.000 166.000 620.704 0.140 0.119 60.000 176.000 2251.852 0.133 0.114 0.1 19 n ^ae 68.000 1s0.o00 384.422 0.122 0.106 p:o:6:da @i14o)@ 79.000 165.000 2685.309 0.032 0. O39 49.000 166.000 683.211 0.064 0.063 o?8)t t 76.000 178.000 1211.651 0.014 0.O28 63.000 144.000 110.154 0.029 0.036 67.000 185. O00 2592.401 0. O32 0. O39 3i'%a 46.000 180.000 426.687 -O. OO4 0. O12 59.000 189.000 1661.965 0.O45 0.O49 @odil')a.'ooou 03,8fr) 90.000 171.000 181.784 -0.006 0.011 "(o.012) 47.000 143.000 289.356 -O. OO5 0. O12 86.000 147.000 1183.198 -0.020 0.O00 92.000 164.000 42.393 -O. OO8 0. O09 49.000 140.000 69.314 -0.012 0.006

Weitht g\ve\ to the mean (294.550) is 0.044 Corre.ctcd uteisht given to the mean is 0.039

uiK : 2508.161 ?;;( : 2152.998 True Vatue | 1428.505

Figure 3.8: Data configuration and kriging results at node (x:64, y:163). a-b) Ordinary kriging; c-d) Simple kriging. CHAPTER 3. THE WALI{ER LAI(E VALIDATIOAI STUDY 29

3.2.2 Simple kriging

Simple kriging (SK) using for stationary mean the 195 sample mean, riz:294.55, is now implemented under the same data search conditions as the previous ordinary kriging exercise. Figure 3.9 gives the original SK estimates and statistics and Figure 3.10 the cor- rected SK estimates and statistics. This time, all 39,000 SK systems yield at least one negative weight )j(, thus all 39,000 SK estimates were corrected. The of the most negative SK weight and resulting correction z!* - z\x are given in Figure 3.11. Again, the average correction is small, rr; - e :-2.77, but some correction could be quite high: maximum is *420.64. Figure 3.12 compares the scattergrams of true values vs. SK and true values vs. corrected SK estimates. The correction decreases the correlation from 0.547 to 0.542. The same location (x:64, y:163) considered before was singled out for detailed investigation, see Figure 3.8c. The original SK weight given to the closest datum, max()fK):0.402, was decreased to 0.314. Both largest negative weights for OK and SK were given to the datum at location (x:86, y:147) and were increased to zero. The observed extreme similarity between the SK and OK results is due to the very little relative weight )f"(t) given to the additional information m, see expression (2.8): in average over the 39,000 nodes, .fr1,4:0.09, only 9.0% of the n(u)+1 weights, see histogram of Figure 3.13. This is due to the large amount of data retained, 16 mostly within correlation range of the node being estimated, and the quasi-zero nugget effect of the variogram model (3.1) used.

3.3 Traditional estimates

Both SK and OK require solving one system per node. Traditional inverse distance- weighted estimates do not. Two such estimates were produced at each of the 39,000 nodes for performance comparison; exactly the same 16 data retained for SK and OK were used at each node. These two estimates, zirso(u) and ,ivo(u), differ by the distance used, squared Euclidean distance lhl2 for the first, and variogram distance 7(h) for the second CHAPTER 3. THE WALIrcR LAKE VALIDATION STUDY 30

The sK Estimate - z'(sK) NumberofData 39000 mean 282.06 std. dev. 370.68 cool. of var undelined mdimum 3843.76 upper quartile 329.61 median 150.21 lowerquartile 69.94 minimum -36.88

Figure 3.9: Simple kriging (SK) estimates and statistics.

The SK Estimate. T-(SK) fho SK Estimato - Z"6n NumberofData 39000 mean 284.77 6ld. dev. 336.87 c@1. of var 1.18 mdimum 3708.88 upper quarlile 320,92 hediah 171.23 lowerquarlile 91.68 minimum 4.63

{

EaSt 130.00

Figure 3.10: Corrected simple kriging estimates and statistics. CHAPTER 3. THE WALIrcR LAKE VALIDATION STUDY 31

z,(sKE (sK) Numb€r ol Dala 39000 Numbe.ofData 39000 moan {.02 m6an -2.71 std, dev. 0.01 std. dev. 51.40 c@t,olvar undofined c@1, olve undefined mdimum 0.00 muimum 420.6.f quarlilo {.02 upperquarlilo '|6.23 m€dian {.02 median -5.97 quanile {.09 lowe.quarlil€ -28,29 0.050 mhtmum {-ol minimum -201,99 o 0.040 E II r 0

73 -0.063 -0.053 -0.0,1i1 -0.033 -0.023 -0.013 -2. 98. 198. 298.

Minimum Wdght(corr=1) z"(SK)-2"(SK) (cor=1)

Figure 3.11: a) Histogram of the most negative weight in each SK; b) Histogram of correction: zbx - z$x

True vs. Cofiected SK Estimab(z") Numb€r of data 39000 Numb€r ot data 39000 Number plottod 979 Numbar plotted 998

X Variable: mean 307.93 X Variabl€: mean 307.93 etd, d€v, 574.99 sld, dov. 574.9s Y Variable: mean 282.06 Y Variable: meen 284.77 std. d6v. 370.68 6ld. dov. 336.87 o corelation 0,55 corolation 0.54 6 0.63 rank cor€lation 0.60 u Y a

2000. 2000.

True Trug

Figure 3.12: Scattergram of SK estimate vs. true value. u)rbx vs. true z; b) zio \/q true z. CHAPTER 3. THE WALI(ER LAKE VALIDATION STUDY 32 using the anisotropic model (3.1). The expression for the second one, as an example, is written:

'(u)S . IIIN, . ,ivo@) L ' z(u.) (3.2) o=1 ^;'"(u) K \- * / r\ LPrvrt\ui re) ' m|ro$; k) L-1

11 with: flu'(") M ---7------it d. :1r...r r(u) - 5'u"(u) 7(u - uo) "(u)\.- 1 '(t) ^gtt'(.r) : ) hence: | ,ijYDlu; : t 7=t ^'t(u - uo) o=1 rir (u) pivofu; k) : f )jfD(u"^) € [0, 1l -. nk(u) )IvD(u^, ) mlyp(u;k) : ++ .z(uoo) € (zp-y,zpf Il Pivo(u; k) 'u'=r Figure 3.14 gives the scattergrams of true values vs. z|ssp and ziyp estimates. These scattergrams should be compared to the similar scattergrams of Figures 7 and 1"2. All four estimates considered perform similarly in terms of local accuracy as measured by the (around 0.50). The ziyp estimate has the lowest correlation with true values (0.48) but its mean is also the closest to the true mean value (288.0 vs. 307.9). Such similar performance is not surprising since the 195 sample data used do not present any cluster (declustering is the main feature of kriging) and the sample statistics are quite close to the reference statistics as shown in Figures 1 and 2. We will see, however, that when interpreted as class probabilities the SK and OK weights clearly outperform the traditional ISED and IVD weights.

3.4 Checking the local class probabilities

The E-type interpretation (2.7) of the corrected OK estimate, and similarly for SK, ISED and IVD estimates, yield local class probabilities. Ordinary kriging (or alterna- tively SK) followed by the translation correction yields estimated class probabilities, CHAPTER 3. THE WALI{ER LAKE VALIDATIOAi STUDY 33

NumberolData 39000 mean 0.09 std. dev. 0.07 c@f. ot var 0.83 mdimum 0.55 upperquartile 0.09 median 0.06 lower quartile 0.05 0.150 minimum 0.01

E L

0.'t00 0.200 0.300 0.400 0.5@ 0.600

Lambda_z6ro

Figure 3.13: Histogram of corrected SK weieht attributed to the global mean.

True data w. ISED estimate Number of data 39000 Numb€r ot data 39000 Numb€r pbned 998 Number plottsd 996

X Variabl€: moan 307.929 X Variable: mesn 307.929 std. d€v. 574.988 3000. 8td. dev. 574.S88 Y Variablo: mean 288,083 Y Variable: msan 281.492 std. d€v. 286.910 std. d6v. 457.855 o correlalion 0.481 con€lation 0.523 rank correlation 0.465 rank corelatis 0.6ill f; zooo. o u I

Fieure 3.14: Scatterqrams of traditional estimates vs. true values. CHAPTER 3. THE WALKER LAIG VALIDATION STUDY 34

see relations (2.7) here recalled:

rr(u) K zoK\v) I z(u,) : t ptr(u; k) . mf,6g; k) a:1 "9"(") k=l trr(u) pAN@;k) : t k:1r...,K a&=1 ""'-o("),

The K class probabiiities pUxG; k) constitute a discrete estimate of the Z-conditional at node u:

%lr("; z) : Prob"{Z(u) < zln(u)}, from which one can retrieve various probability intervals. Consider for example the p-probability interval bounded by the f and f conditional quantiles, q(u; f ) and q(u; f ): Prob*{Z(") e Int(u;p)ln(u)}:p (3.3)

with: Inf(u;p) : q(u;#,] as determined from the lt4;|), conditional distribution tr||(u; z).

Deutsch (1996b) has proposed checking the accuracy of such p-probability intervals by comparing p to the actual proportion r(p) of true z-values (reference or cross- validation data set) falling in to the intervals Int(u;p), that is: 1ir' o(p):;L j$tp) (3.4) -tv .

with: j(w;p) : t if z(u) e Int(u;p), : 0 if not l/ is the number of true values z(u) available for checking

The comparison is done by plotting r(p) vs. p, a P-P plot: if the values r(p) for a series of values p € [0,1] plot along the 45'line, the probability intervals Int(u;p) hence the conditional probabilities pfir(u; k) are deemed accurate. If the values r'(p) plot above the 45' line, the corresponding intervals are too conservative, i.e., they contain more true values than the predicted proportion p. Therefore an upward deviation from the 45o line is preferred to a downward (non-conservative) deviation. CHAPTER 3, THE WALKER LAKE VALIDATIOI{ STUDY 35

Checking oK Numborof data gg Nuffi€r ol dale 99 Number plott€d 99 Nurber plof,ed 99

X Variablo: m6an 0.500 X Variablo: man o.5oo sld. dev. 0.286 std. dsv. 0.286 Y Veriabl€r man 0.493 Y Variable: man 0.480 std. d€v. 0.268 std. d€v- 0.264 @relalion 0.999 corelatbn 0.999 €nk corolalion 1 .0OO hnk @rrohtion 1.0O0

Checking tVD ClBcking ISED Nurbor ot date 99 Nurbor ol data 99 Numbor pldsd 99 Nur66r plded 99

X Variabl€: men O.sm X Variablo: rcen 0.5oo sld. dov, 0.286 . std. d€v. 0.286 Y Variablo: man 0.524 Y.Variabl6: man 0.299 std. dov. 0.270 .. std. dev. 0.194 @rrehtbo 0.997 conehti@ 0.984 €nk @nehion 1.0m 6nk @rebtioo 1 .0OO

lK relarcnce Numboroldata 99 Numbaroldala 99 Nudbor plottsd 99 Nufibor plotled 99

X Variabl€: man o.50o X Variabl6: man 0.500 std. d€v. O.28{i std. dov. 0.286 Y Variabl€: man 0.497 Y Veriablo: man 0.502 std. dev. 0.275 sld. dev. 0.274 correhlbo 0.999 corehlbn 0.999 €nk @r€latbh 1.0OO Enkcoreletix t.0m

0.80 r.00

Figure 3.15: Accuracy scores of probability intervals as derived from, a) ordinary kriging, b) simple kriging, c) inverse variogram distance weighting, d) inverse squared Euclidean distance weighting, e) multiple indicator kriging, f) median indicator krig- ine. CHAPTER 3. THE WALKER LAI(E VALIDATION STUDY 36

Figures 3.15 a-b-c-d allow a comparison of the respective accuracies of the ccdf's derived from the four estimation algorithms, OK, SK, IVD and ISED. All 39,000 reference true values were considered, and the probability p was increased from 0.01 to 0.99 by 0.01 () increments. To provide a comparison yardstick, the ccdf 's provided by multiple indicator kriging (using 9 sample deciles cut-offs) and median indicator kriging (using model (3.1)) are given in Figures 3.15 e-f. Recall that multiple IK require solving K (here 9) systems per node; median IK, OK and SK requires solving one system per node; ISED and IVD do not call for any system. Multiple IK and median IK yield very similar and the best results followed by OK and SK, then IVD. The scores of ISED (inverse squared Euclidean distance weighting) are the worst and they are non-conservative (below the 45' line). Remark: Since the single variogram model used for median IK is rescaled from the z-variogram model (3.1) used for OK and, at each node u, the same 16 data are used, the median IK weights identify the original OK weights. The diference observed on the plots of Figures 3.15a and 3.15f arises solely from the different procedures used for correcting order relation deviations: a shift of all OK weights in one case, a forward - backward correction of the ccdf in the case of median IK, see Deutsch & Journel (1992, p78). Does this mean that median IK might outstage OK as the anchor algorithm of all geostatistics? The answer is no because, first, median IK is actually an OK applied to indicator data, second because the E-type estimate zir,,rn

True vs, Mectian tK (E-type) Corracted OK v8. Mcdian tK (E-tyF) Numbsr of dala 39000 Number of data 39000 Numb€r plotted 998 Numberplotted 1000

X Variablei mean 307.929 X Variable: m6En 279,156 etd. dev. 574.988 std. dev. 355.246

Y Variabl€: mean 296.194 Y Variable: mean 296.194 8td. dov. 319.724 std. dev. 319.724 corelation 0.527 corolation 0.944 rank correlatim 0.577 = rank conelatiq 0.972 .fi zooo.

1000. 2000. 3000. 0. '1000. 2000. 3000. True OK Estimat€

Figure 3.16: Scattergram of median indicator kriging E-type estimate vs. a) true values, b) corrected ordinary kriging estimate. the scattergram of zff1a vs. 2f,a76. It appears that utilization of constant stationary class means m7, result in a significant additional smoothing of the E-type estimate, particularly for a few large estimated values seen on the right of the scattergram of Figure 3.i6b. These observations, if corroborated by experience, would lead to prefer the corrected OK algorithm to median IK for determination of ccdf's when approximation of all standardized indicator variograms by the z-variogram model is deemed appropriate. Otherwise multiple IK with multiple (different) indicator variograms would still be preferred, because it allows reproduction of aspects of the two-point Z(u),2(w+ h) spatial distribution beyond the mere Z-covariance model. Since the only difference between modified ordinary kriging and median IK is the order relation corrections (see2.4.2), it is appropriate to compare the class probabili- ties obtained by these techniques. The pdf values of median IK vs. modified OK are plotted for all 10 classes, see Figures 3.17 and 3.18. The correlation coefficient for all the classes is greater than 0.96, with a maximum of 0.987 for the first class. This shows that simulations using the median indicator kriging and the modified ordinary kriging would yield similar results. The advantage of the modified OK approach is that the local ccdf within a class can be constructed to match the local class means, hence the mean of the ccdf to the estimate. CHAPTER 3. THE WALKER LAKE VALIDATION STUDY 38

ol cla$ 1 - Madian lK vs ol ctas 2 - ltredlan tK va, dala 39000 ot data 39000 plot€d 39o plotlod 39O

X Variable: moan 0,083 X Vadeble: me6n 0.0OG std. dov. 0.149 std. dev. 0.024 Y Variabl€: m€en 0.083 N Y Vedabl6: m66n 0.006 std. d6v. 0.141 std. dev. 0.023

@olalbn 0.987 -g cdrslati(f 0.962 rank @rolation 0.793 rank cdolatbn 0.679 E

0.40

L

0.20 0.40 0.60 0.80 1.00

Probability of das * 1

ol class 3 - Itedian IK vs. ol class 4 - tledian lK va. data 39000 fumbsr ol dats 39O0O pbltod 390 Nurbor Dlot€d 39o

X Varieble: mean 0.083 X Varlable: man 0.112 std. dev. 0,1 15 std. dev. 0.164 Y Variablo: man 0.082 Y Variablo: rean 0.112 std. d6v. 0.107 std. dov. 0.153 0.60 qrolation 0.978 3 @ehlbn 0.984 Enk trd6tbn 0.870 E rank ()@btbn 0.893

0.40 nE I

PrcbabilitiB ol clffia 5 - Itedhn tK B. ot dal,a 39mO Dbtt€d 39O

X Verieble: man 0-124 sld. dev. 0.'167 Y Variabb: man 0,126 sld. d€v. 0.159

-q @rddbn 0.985 rahk orelatir 0.925

Figure 3.17: Scattergram of probabilities classes 1-5 : median IK vs. modified OK CHAPTER 3. THE WALIGR LAKE VALIDATION STUDY 39

2 - IK Prcbabilities ol ct# 1 - tladin tK vs 1 .oO _PrcbebilitiB ol ctes Mian w ol dete 39000 ol data 39000 pht6d 39O pkfied 39O

X Variablo: man 0.083 X Variabl€: rean 0.006 std. dov. 0.149 std. dev. 0.024 Y Variabl6: m6an 0.083 Y Variable: rean 0.006 std. dev. 0.141 sld. dev. 0.023 3 0.60 E 0.60 mlalion 0.987 .q cdehtid 0.962 rank qelstion 0.793 rank drelation 0.679

€ 0.40 E o.oo €

0.20 0.40 0.60 0.80 |.00

Probability ol clas # 1 Probability ol cla$ # 2

ot class 3 - nectien tK v9. Xurbe. ot d6te 39mo data 39OO0 Numb€. olc'ttod 39O pbtl€d 39O

X Vadablo: m@n 0.O8!] XVariabl€: msan o.112 sld. dov. 0.115 std. dov. 0.164

Y Varbbl6: tMn 0.082 YVarbble: mean O.112 std. d6v. 0.107 $d. d€v. O.1S E 0.60 cffelaibn 0.978 wohtitn 0.984 E rank @htbn 0.870 6 Enk mlatbn 0.893

5 ne -8 I

Probability ol cla$ # 4

,(umb6r ol dala 39000 Nurber plotted 3SO

0.80 X Variabl€: man o.124 sld. d€v. O-167 Y Variebl€: m6an o-126 std. dov. 0.159 E 0.60 s @rrolation 0.985 rank @rrelalioh 0.925 6 fi o.m d

0.20 0.40 0.60 0.80 r.00 P.obability ol da$ { 5

Figure 3.18: Scattergram of probabilities classes 6-10 : median IK vs. modified OK CHAPTER 3. THE WALIGR LAI{E VALIDATION STUDY 40

3.5 Conditional simulations

The OK- and SK-derived ccdf's are now used for simulation through the sequential and p-field approaches. Since these conditional distributions are similar to the dis- tributions obtained by median IK, the resulting simulations are optimal under the mosaic model where all the indicator variograms are proportional to the Z-variogram (Deutsch & Journel (i992. p75)).

3.5.1 Sequential simulations (OSSSIM)

Realizations using the sequential simulation approach present a salt and pepper tex- ture common to all indicator simulations as can be seen in Figures 3.19 and 3.20. At any particular location, of the .A/ * 1 classes only a few would be represented by local data; the remaining classes wili have zero probability. Some of the classes, especially the ones with a few data and negative original kriging weights, would have negligeable probabilities. This results in raggedness of the ccdf's used in OSSSIM which causes the sait and pepper texture. On the other hand SGSIM produced realizations that appear smoother, this is due to the continuity of the ccdf characterizing the uncer- tainity. This smoothness should not be a reason for preferring SGSIM over other techniques, since SGSIM is optimal only under a multi-Gaussian hypothesis. The continuity observed in the realization is reflected by the variogram calculated over the 10 realizations, see Figures 3.21 and 3.22. The SGSIM realizations repro- duce the spatial variability correctly within the range of stochastic fluctuations. The OSSSIM realizations (using either OK or SK) show too high nugget effect induced by the discontinuous local ccdf's. The OSSIM realizations obtained through SK are expected to be more continuous than their counterparts obtained through OK, since SK approach smooths the ccdf by introducing the marginal probabilities. This would be true only if the area being simulated is fairly homogeneous. This is not the case here as can be seen in Figure 3.1. This explains the still too high nugget effect observed in the variograms of OSSSIM realizations through SK, see Figures 3.2I and 3.22. Note that SISIM simulates a patch of high values in the eastern part of the study CHAPTER 3, THE WALI{ER LAKE VALIDATION STUDY al^1

area. As seen later in Figures 3.23 and 3.24 even the p-field simulation results in a patch of relatively high values in that region. For a closer observation, 10 realizations of non-parametric sequential simulations are presented in appendix A. The variograms of the median SISIM realizations are closer to the Z-variograms since the median indicator variogram which is proportional to the Z-variogram was used for estimation of probability at all the thresholds. Though the variograms of OSSSIM and median SISIM simulations are comparable, the greyscale maps of their realizations appear different. Recall that the only difference between the median IK and the modified OK or SK is the way the order relations are corrected. The closeness of the greyscale map to the map of exhaustive data proves that, for this application, the a priori translation correction applied to the weights in modified Ok or SK is superior to the upward and downward correction applied in the indicator algorithms, see Deutsch & Journel (1992, p77).

3.5.2 P-field simulations (OSPFSIM)

The OK- and SK- derived ccdf's are now used for p-field simulations, see Srivastava (1992). The p-fields are generated by non-conditional SGSIM using the rescaled Z- variogram and subsequent transformation of the Gaussian field into uniform values. These p values are then used to draw a simulated value from each local conditional distribution, obtained by modified OK, SK and the indicator techniques. If coupled with a fast generation of the non-conditional realizations of p-fields, this algorithm would be very fast. The realizations are shown in Figures 3.23 and 3.24. They appear smoother than their sequential counterparts. All the realizations are comparable since the sarne p- field is used. They all appear close to each other with minute differences in their statistics. The variograms reproduced by 10 realizations is shown in Figures 3.25 and 3.26. Note that all the variograms are similar to the variograms of 10 p-fields. CHAPTER 3. THE WALKER LAI{E VALIDATION STUDY 42

Mttzdton Numbrol hb Sm m6n s1.14 sld.&v.721.38 2S.m @t. d vd 2.05 mdimum 6m.@ 9Frquanile @5,14 ro.@ mdian S.@ Merquadile lo-B mlnimum 0.02 1ru.@

tm.@

500.m

0.0

Me.tlan Sis,fi (OK) reallzatlon stsn(oa Numbrd &b Sm n€n 2S-24 dd. &v. $5.16 2ffi.m @l- dvar 1.S maimuh 6m.m wp€r quanib m.87 ro.@ mdlan 101.41 bw€rquanib 10.51 minimum 0.02 1ffi.@

1m.00

ffi.@

0.0

Simulal6d Valu€

Numbrd &b sm m.an s6.93 dd.dsv. $5.99 c€t, olvar 1.64 mdlmum 45m.m upp€rquadib 365.4 mdlan 123.36 lowerquaiile 13.S mlnimum 0.02

SGS'/U s.m Numb€rotDab 39m heen $0.15 2ffi.@ sld. dev. 615.51 c€t. otvar 1.86 mdimum 45m.00 upp€rquanib 415.03 ro.@ mdian S.01 bw€fquanib 4.30 mlnlmum 0,03 1S.@ g ,f 1m.@

m.m

0.0

Figure 3.19: Sequential simulations with ordinary kriging. CHAPTER 3. THE WALKER LAKE VALIDATION STUDY 43

Numbrdhb W l|ld s1.s 2m.@ dd.dev.724-96 @t- ot var 2.06 mdimum 45@.@ wFr quanib 414.49 m.m bwsquanlb 8.86 mlnlmum 0.00 1ru,@

1W.@ 0

m.0m 0

0.0

,recttan fEAl,()!99!!zetton Numbrd&b 3sw m€n 30.75 2m.@ 8td. &v. a1.47 @1. ol var L91 mdlmum ff,m 0.400 upp€rquenlle s3.27 ro.@ mdian 9.89 low€rquanle 10.27 mlnimum 0,00 1m.@

0.200 1W.@

ru.@ 0.1 00

0.0

tlecilen SISiM Numb€ro{Oab 39m |ffin 314.35 ?s.@ BE. dev. S6.25 @el, d var 1,86 mdimum 45m.m upperquadlle 364.93 m.m m€dlao 118.19 bwerquadlle 13.$ mlnlmum 0,00

15m.@

im.m

m.m

0.0

scsflt(sK) Numbrd &b 39m mean s5,84 2@.@ 0.500 sld. dev. 619.33 @1. ol v6f 1,& mslmum 45@.@ upp€rquanil€ 419.58 m.o 0.400 mdlan 72.il low€rquadll€ 5.36 minlmuh 0,&

[email protected] 0.3m

1m.@ 0

s.m 0

0 0.0

Sequential simulations with simple kriging. CHAPTER 3. THE WALKER LAI(E VALIDATION STUDY 44

Snivtriogm reptodrctifi along Ngw- OSSSM(OK)

reptcd[rc'iorralong NgW - nd, Slslll (OK) Semiviliog/m rcprcdrction along ll'lE- Ncct. ggn Pn

reprcduction along NgW - S,Slrt (OK) r-

0.0

Semiwdogffi rcprcdrction along ll9lE - SCSIil (OK)

0.80

0.40

0.00

Figure 3.21: Variogram reproduction in sequential simulations using OK. CHAPTER 3, THE WALIrcR LAI{E VALIDATIOAI STUDY 45

Semivailogem reprcduction elong Ngw - OSSSIM (SK) Semiviliogm rcprcdrction.rong - OSSSTU (SK) 'lidlE

v, v

Semivafiognm nprc.trcrion elong M)W- nd. gs/n (sK) .,ong t't8tE - tM. SiSIN (SK)

v y 0.s0

Semiviliog,rm nprcductiil along NgW- Stgtf (6K) Sffiivedogm Rcp,oductiil dong Srsltl (SK) 'idIE-

a v

rcprcductiil along NgW - SGSIiI (SK)

v v

Figure 3.22: Yariogram reproduction in sequential simulations using SK. CHAPTER 3. THE WALKER LAIrc VALIDATION STUDY 46

Numbrd&la 3W meen s7.93 std, dry. 575.00 2500.00 6el.otvar 1.87 mdlmum 999,51 upperquadib $6.30 [email protected] mdlEn 9.01 lowerquadll€ 5.16 minlmum o,o0

[email protected]

1@.@

m.m

o_o

OK ccdf's NumberdOete 39m mean 276.30 0.5m sld. d.v. €9.78 @t. of var 1,2 mdlnum 45m.00 uPp€rquadlle 377.82 0.4m mdlan 2.83 low€rquenlb 9,89 d mlnlmum 0,02 5 0.300

E

0.200

0.1m

0.000

Udtn lK 6dfs (OA Numb€rolD6b 3W ndn 279.29 std. d.v. 498.71 @f. d ver 1.78 mdlmum 4tr.m upP€rqu.nll€ s3.52 mdlan 78.45 bwerquanil€ 9,61 mlnlmum 0.02

Numbr ol 06b 3m fr6i 277.23 2m.@ dd.dev. €6.89 tri. d v3r 1.76 mdlmuh [email protected] upperquenlb 376.36 m.@ mdian 81.2 tuerquenlle 10.74 mlnhum 0.02 B. [email protected] 6u

E

[email protected] 0.200

0.1@ [email protected]@

0.0 0.0m

Figure 3.23: P-field simulations with ordinary kriging. CHAPTER 3. THE WALKER LAKE VALIDATION STUDY .+lAn

Watker Lal

1@.@

ffi.0@

0.0

Numb€rolDah sm tEn 279.58 dd.&v. 498.07 @l- d var 1.78 mdimum 45m.m upp€rquanlle 376,51 mdian 7.73 bwerquanile 10.11 manimum 0.00

Me.tten lK ccctf'E Mtan tK @df s (sA Numb€rol Oah 39000 fran 2U,22 0.5m std. dev. $8.19 c@l- dvaf 1.79 tffi mdimum 45m.m 35.70 0.400 upFrqua(ll€ t':. :r ii {t mdlen 78.79 !q, . Il j, low€rquadllo 9.59 minimum 0,00 6 0.300

E

0.200

0.1m

Numb€rol Oah 39m rean 291.10 2m.@ 3ld. d€v. tr2.78 c@1. olvar 1.80 mdimum [email protected] uppsrquadlls 376.62 20@.@ mdian 81.70 bwerquenll€ 10.73 minimum 0.00 1U.@ E 0.300 € 1@.@ 0.200

m.@ 0.1 00

0_o 0.000

vetu6

P-field simulations with simple kriging. CHAPTER 3. THE WALKER LAKE VALIDATION STUDY 48

ot Difietds elong Ngw pemiviliogffi ol PfDtds zlong ,/8IE

v T

Sffiivadog/"'n rcpr"drction ttong N81E - OK

v

Sffiivtrtog,am rcprcductton along Ngw - ,le.l.lK (OK)

reprcduction along NgW - ,K (OK) rcprcdrction along NS'E - tK (OK)

a

0.0

Distance

Figure 3.25: Variogram reproduction in p-field simulations using OK. CHAPTER 3. THE WALI{ER LAKE VALIDATIOAI STUDY 49

,r'ptodrctiilalong NgW- Hd, SiSIN (OK)

1.20

0.&

0.{

0.00

repbduction along NgW- SCSTM (OK) Semivefiogram reprcdrction along ,/dIE - SGS,M (OK)

1.20

0.80

0.40

0.00

Figure 3.26: Variogram reproduction in p-filed simulations using SK. Chapter 4

Conclusrons

In a situation where simulation is increasingly used to model uncertainty, any im- provement in the methodology for obtaining a conditional distribution is a welcome step. Conditional distributions derived directly from ordinary kriging represent a faster way to simulation. Such distributions when combined with FFT-based p-field generation would give rise to extremely fast p-field simulations. This would enable the practitioners with only modest PC's to use simulations routinely in day-to-day problem solving. The probability intervals provided by the OK-derived distributions were tested for their accuracy in containing the true value and were found to be satisfactory. Recall that these distributions are the same as provided by median IK, except for a different correction of order relation problems. Hence they would be optimal under a mosaic model. The correction of negative weights is a crucial requirement for interpretation of OK as an E-type estimate. The proposed translation-type correction is simple and requires minimal modification of already existing ordinary kriging programs. The translation correction can be compared to increasing the relative nugget component of the variogram to avoid negative weights and consequent potential negative estimates. But the proposed correction varies from location to location and is applied to only those locations with at least one negative weight. Sequential simulations with OK or SK-derived distributions tend to exhibit too CHAPTER 4. COI{CLUSIOI\IS 51 high nugget effect. The reason is that the OK derived distributions could present sharply varying probabilities from class to class. Their parametric counterparts, eg. Gaussian,, exhibit higher degree of continuity (smothly varying class probabilities). The problem above can be solved by considering a p-field approach. The correlated p-field enforces continuity into the simulations. A drawback of the p-field approach is that it may generate artefact clusters of similar simulated values. However, when speed is an issue the p-field approach is unequaled. Finally, a point that is worth repeating. This method presents a non- parametric route to obtaining conditional simulations that is easier and faster than the traditional parametric sequential Gaussian simulation. This method requires solving only one system per location to obtain a ccdf. Unlike in Gaussian and indicator simulations, the data need not be transformed and the simulation algorithm proceeds in the original space. Bibtiography

Arik, A. (1992), Outlier restricted kriging: A new algorithm for handling of outlier high grade data in ore reserve estimation, in 'Proceedings of the 23rd Interna- tional APCOM Symposium', Society of Engineers, Tucson, AZ, pp. 181- 187.

David, M., Marcotte, D. & Souli6, M. (1984), Conditional bias in kriging and a suggested correction, in G. Verly et al., eds, 'Geostatistics for natural resources characlerization', Vol. 1, Reidel, Dordrecht, Holland, pp.277-230.

Deutsch, C. (1996a), 'Correcting for negative weights in ordinary kriging'. accepted for publication in Computers & Geosciences.

Deutsch, C. (1996b), Direct assessment of local accuracy and precision, in 'Report 9, Stanford Center for Reservoir Forecasting', Stanford, CA.

Deutsch, C. & Journel, A. (1992), GSLIB: Geostatisti,cal Software Library and [Jser's Guide, Oxford University Press, New York.

Isaaks, tr. & Srivastava, R. (1989), An Introduction to Applied Geostatistics, Oxford University Press, New York.

Matheron, G. (1971), La th6orie des variables r6gionalis6es et ses applications. Fasc. 5, Ecole Nat. Sup. des Mines, Paris.

Srivastava, R. (1992), Reservoir characterization with probability field simulation,, in 'SPE Annual Conference and Exhibition, Washington, DC', Society of Petroleum Engineers, Washington, DC, pp. 927-938. SPE paper number 24753.

52