Distributions Derived from Direct Kriging

SIMULATIONS USING CONDITIONAL DISTRIBUTIONS DERIVED FROM DIRECT KRIGING A THESIS SUBMITTED TO THE DEPAFUTMENT OF GEOLOGICAL AND ENVIRONMENTAL SCIENCES AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PAFUTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE By Srinivas E. Rao November 1996 I certify that I have read this report and that in my opinion it is fully adequate, in scope and in quality, as partial fulfillment of the degree of Master of Science in I Geological and Environmental Sciences. /,G.-L*tr"l Andr6 G. Journel (Principal advisor) Abstract Building on an interpretation of ordinary kriging as an E-type (conditional expectation) estimate, local conditional distributions for the unknown values are built from the kriging weights corrected to be all non-negative. These conditional distributions (ccdf's) compare favourably in terms of accurate probability intervals to distribution models built from the much more demanding multiple indicator kriging. The variances of these OK-derived ccdf 's are data values-dependent (heteroscedastic conditional variances) as opposed to the traditionai ordinary kriging (OK) variance, and as such are better measures of local estimation accuracy. Sequential drawing from these ccdf's, with each simulated value being consid- ered subsequently as a datum, provides direct conditional simulations of the variable shortcutting the steps of normal score transform and back transform of sequential Gaussian simulation. A p-field simulation - drawing from these distributions would be a very fast algorithm allowing quick simulations in day-to-day analyses. lll Acknowledgements The reason for my application to Stanford's geostatistics program was to understand the theory and applications in breadth and depth. Now, at the time of graduation, I am contented with fulfilled objectives. I take this opportunity to thank Dr. Hari Shankar Pandalai who introduced me to the field of geostatistics. I am greatiy indebted to my advisor, Andr6 G. Journel, who guided me along the road at Stanford. Discussions with him opened my eyes in to new vistas and motivated me to learn evermore. To quote my colleague Ian Glacken, this thesis is a testament to his tireless editing, constant questioning, precision of thought, and peerless scientific standards. I gratefully acknowledge the help he rendered, in academics and otherwise. I express my deep sense of gratitude to Clayton Deutsch for providing his able guidance whenever required. I am thankful for his invaluable help and insights in programming and problem solving, throughout my stay. I would like to thank my associates at Stanford, Emmanuel Gringarten, Ian Glacken,'Phaedon Kyriakidis, Ting Ting Yao and Tom Tran for their companion- ship, stimulating discussions, assistance and helpful comments. iv Contents Table of Contents List of Figures vll Introduction 1 1.1 Motivation 2 OK and SK derived ccdfts 4 2.t The B-type parallel +A 2.2 Sufficient conditions to qet licit ccdf's and class means o 2.3 The translation alternative and its interpretation 7 2.3.1 Interpretation 8 2.4 Connections with median IK and RK . 10 2.4.1 Non-convexitv . 10 2.4.2 Median IK . 11 2.4.3 RestrictedKriging 11 2.5 Problem of resolution: the SK solution 13 2.5J Interpretation I4 2.5.2 E-type expression l4 2.5.3 Improving the marginal statistics 15 2.6 Simulation from the ccdf's 16 2.6.I Extending the discrete cdf to continuous cdf: . l6 2.6.2 The sequential simulation approach - OSSSIM: t8 2.6.3 The p-Field simulation approach - OSPFSIM: t9 The Walker Lake validation study 2L 3.1 The reference data set 2I 3.2 Comparing Zf,y to Zff1a and Zfis to ZIk . 24 3.2.1. Ordinary kriging 24 3.2.2 Simple kriging 29 3.3 Traditional estimates 29 3.4 Checking the local class probabilities 32 3.5 Conditional simulations. 40 3.5.1 Sequential simulations 40 3.5.2 P-field simulations 4I 4 Conclusions Bibliography 52 VI List of Figures 3.1 Exhaustive reference data set and statistics. 22 3.2 Sample data locations and statistics. 22 3.3 Variograms as calculated from the exhaustive reference data set. Model fit is the continuous line. 23 3.4 Ordinary kriging zfiy estimates and statistics. 25 3.5 Corrected ordinary kriging z||. estimates and statistics. 25 3.6 a) Histogram of the most negative weight in each OK;b) Histogram of correction: zbrc - ztlx 26 3.7 Scattergram of OK estimate vs. true value. u)rbx vs. true z; b) zffy vs. true z. 27 3.8 Data configuration and kriging results at node (x:64, y:163). 28 3.9 Simple kriging (SK) estimates and statistics. 30 3.10 Corrected simple kriging estimates and statistics. 30 3.11 a) Histogram of the most negative weight in each SK;b) Histogram of correction: zbN - z$x . 31 3.12 Scattergram of SK estimate vs. true value. u)rbN vs. true z; b) z!]o vs. true z. 31 3.13 Histogram of corrected SK weight attributed to the global mean. 33 3.14 Scattergrams of traditional estimates vs. true values. 33 3.15 Accuracy scores of probability intervals as derived from, a) ordinary kriging, b) simple kriging, c) inverse variogram distance weighting, d) inverse squared Euclidean distance weighting, e) multiple indicator kriging, f) median indicator kriging. 35 vil 3.16 Scattergram of median indicator kriging E-type estimate vs. a) true values, b) corrected ordinary kriging estimate. 37 3.17 Scattergram of probabilities classes 1-5 : median IK vs. modified OK 38 3.18 Scattergram of probabilities classes 6-10 : median IK vs. modified OK 39 3.19 Sequential simulations with ordinary kriging. 42 3.20 Sequential simulations with simple kriging. 43 3.2r Variogram reproduction in sequential simulations using OK. 44 3.22 Variogram reproduction in sequential simulations using SK. +o 3.23 P-field simulations with ordinary kriging. 40 3.24 P-field simulations with simple kriging. 47 3.25 Variogram reproduction in p-field simulations using OK. 48 3.26 Variogram reproduction in p-filed simulations using SK. 49 vlll Chapter 1 Introduction Simple kriging and ordinary kriging algorithms since their inception by Matheron (1971) formed the basis of geostatistics on which other kriging and kriging-related simulation techniques are built. These methods provide an unbiased estimate of the attribute under consideration along with the variance of estimation error. These techniques are simple and have gained popularity among many earth scientists. But kriging has its pitfalls. In a probabilistic framework kriging is used to model conditional expectation of a random variable (the unknown attribute) given a specific realization of the random variables in the neighborhood (the data). The estimates, which are linear combinations of the data in the neighborhood, are smoothed. So these techniques are unsuitable when estimating proportions of extreme values. Also the kriging variance is not conditioned on the data values, it depends only on the data configuration: it is homoscedastic. Thus the kriging variance does not provide a full measure of estimation uncertainty. Kriging produces negative estimates in certain circumstances, even when the un- derlying variable is non-negative. This is the consequence of negative kriging weights, which allows kriging to enjoy the status of a non-convex estimator. Many studies have shown that negative kriging weights occur at the periphery of the kriging neighborhood. Deutsch (1996o) suggests a correction for such negative weights so that the resulting estimate stays within the range of data used. Last, kriging estimators do not reproduce the spatial correlation observed in the CHAPTER 1. INTRODUCTION data. Conditional simulations, produce alternative realizations with the observed spatial correlation, including the spatial variance. Thus the simulated values are not smoothed. Simulation algorithms have become commonplace in geostatistical applications and different models are available for different scenarios. The usual prerequisite for simulation is a local conditional cumulative distribution (ccdf) from which a simulated value is drawn. Under a multiGaussian hypothesis, the OK or SK derived kriging estimate and kriging variance are sufficient to characterize the local Gaussian ccdf. However, the multiGaussian assumption, even when the data set is univariate Gaussian, rarely holds true. An alternative is to use the non-parametric Indicator kriging (IK) or co-kriging. Because of the modeling and computational burden in- volved in indicator co-kriging, IK has become the choicest tool for building the local ccdf 's. Indicator kriging requires performing a kriging at each of a chosen number of thresholds. This calls for modeling and solving a linear system of equations at each threshold. If each of the indicator variograms are proportional to each other and to the variogram of the attribute, the kriging weights obtained at all these thresholds would be the same. In such a case, corresponding to a mosaic model, a single kriging system performed at median threshold suffices: this is median indicator kriging. Multiple indicator kriging, as opposed to median IK, exploits the difference in the indicator variograms at different thresholds. But when indicator variograms are very much different, order relation problems in the resulting ccdf occur. Modeling all indicator variograms with the same structures and smoothly varying parameters, thus bringing them closer to the mosaic model, would alleviate these order relation problems. 1.1 Motivation The present work is motivated by the above discussion. The objective is to derive non- parametric IK-type ccdf's through a single OK or SK system for use in simulations. This requires interpretation of the OK or SK estimate as a conditional expectation CHAPTER 1. INTRODUCTION (E-type estimate). To allow such interpretation, negative weights, if any, must be corrected. A method for correcting the negative weights is proposed.

Distributions Derived from Direct Kriging

An Introduction to Spatial Autocorrelation and Kriging

Spatial Autocorrelation: Covariance and Semivariance Semivariance

Augmenting Geostatistics with Matrix Factorization: a Case Study for House Price Estimation

Geostatistical Approach for Spatial Interpolation of Meteorological Data

Kriging Models for Linear Networks and Non‐Euclidean Distances

Comparison of Spatial Interpolation Methods for the Estimation of Air Quality Data

Introduction to Modeling Spatial Processes Using Geostatistical Analyst

Geostatistics Kriging Method As a Special Case of Geodetic Least-Squares Collocation

Space–Time Kriging of Precipitation: Modeling the Large-Scale Variation with Model GAMLSS

Sampling Design Optimization for Space-Time Kriging

Tutorial and Manual for Geostatistical Analyses with the R Package Georob

Multigaussian Kriging: a Review1 Julian M