7.5 Geostatistical Simulation

Figure 7.15 Simple kriging results of normal score transforms of the Walker Lake data set Figure 7.16 Probability of exceeding 5 and 10 based on normal score simple kriging of the Walker Lake data set. 7.5 Geostatistical si mulatio n The third field of application of geostatistics is simulating realizations of the conditional random function . Returning to Figure 5.8: in case of a wide sense stationary and multiGaussian RSF Z(x), simple kriging provides the dashed line, which is the mean of all possible conditional realizations. The aim of geostatistical simulation is to generate in the individual conditional realizations. There are two important reasons why sometimes individual realizations of the conditional RSF are preferred over the interpolated map that is provided by kriging: 1. kriging provides a so called best linear prediction (it produces values that minimize the variance of the prediction error: ), but the resulting maps are much smoother than reality. This can again be seen from Figure 5.8. The individual realizations are very noisy and rugged while the kriging prediction produces a smoothly varying surface. The noisy realizations have a semivariogram that resembles that of the data, so one can say that the real 162 variation of the property considered is much more like the realizations than the kriging map. This has repercussions if the kriging map is not the end point of the analysis (such as mapping concentrations). For instance, suppose that the goal is to produce a map of hydraulic conductivities that is to be used in a groundwater flow model. To use the kriged map as input in the groundwater flow model would produce flow lines that are probably too smooth also. Especially if the goal is to model groundwater transport, a smooth map of hydraulic conductivity will yield an underestimation of solute spreading. In that case it is better use realizations of the random function as input. Of course, as each realization has equal probability to be drawn and is therefore an equally viable picture of reality, the question remains: which realization should then be used? The answer is: not a single realization should be analyzed, but a great number of realizations. This conclusion brings us to the second reason why realizations are often preferred over kriging maps; 2. multiple realizations as input for a model can be used for uncertainty analysis and ensemble prediction. Figure 5.8 shows that usually we only have limited information about reality and we therefore represent our uncertainty about reality with a random function (see also chapters 1 and 5). Returning to the example of hydraulic conductivity, if we are uncertain about the parameters of a groundwater model, we also want to know what the uncertainty is about the model output (heads, fluxes). So instead of analyzing a single input of hydraulic conductivity, a large number of conditional realizations of hydraulic conductivity (say 1000) are used as model input. If we use 1000 conditional realizations of hydraulic conductivity as input, we also have 1000 model runs with the groundwater model, producing (in case of a steady state groundwater model) 1000 head fields and 1000 flow fields. From this, it is possible to estimate the probability of hydraulic head at each location, or the probability that a contaminant plume reaches certain sensitive area. This way of modeling is really stochastic modeling, and because we do not produce one prediction, but an ensemble of predictions it is often referred to as ensemble prediction. The variance of the output realizations is a measure of our uncertainty about the output (e.g. hydraulic heads) that is caused by our uncertainty (lack of perfect knowledge) about the model parameters (e.g. hydraulic conductivity). So, through this way of stochastic modeling one performs an uncertainty analysis: estimating the uncertainty about model output that is caused by uncertainty about model input or model parameters. There are several ways of performing such an analysis, as will be shown extensively in chapter 8. The method described here, i.e. generating realizations of parameters or input variables and analyzing them with a numerical model, is called Monte Carlo simulation. In Figure 7.17 the method of Monte Carlo Simulation for uncertainty analysis is shown schematically. 163 Figure 7.17 Schematic representation of Monte Carlo simulation applied for unce rtaint y analysis of hydraulic conductivity in groundwater modeling. Hydraulic conductivity is spatially varying and sampled at a limited number of locations. Hydraulic conductivity is modeled as a random space function. Using the observations statistics are estimated th at characterize this function (histogram, semivariogram). Next M realizations of this rando m function are simulated and used in the groundwater model. This yields M realizations of groundwater model output (e.g. head fields). From these realizations it is possible to obt ain for a given location (e.g. x0 ) the probability density function of the output variables (e.g. he ad, concent ration). The technique of Monte Carlo simulation is further explained in the next chapter. Here, we focus only on the generation of multiple realizations of the conditional random space function, commonly referred to as (conditional) geostatistical simulation. There are quite a few methods for simulating realizations of MultiGaussian random space functions. The most commonly used are LU-decomposition (Alabert, 1987), the turning band method (Mantoglou and Wilson, 1982) and Sequential Gaussian simulation (Goméz-Hernández and Journel, 1993), while there are even more methods for simulating non-Gaussian random functions (e.g. Amstrong and Dowd, 1994). The most flexible simulation algorithm and mostly used nowadays is sequential simulation. Sequential Gaussian simulation (sGs) will be treated here briefly. For a more elaborate description of the method one is referred to Goovaerts (1997) and Deutsch and Journel (1998). Conditional simulation with sGs needs the mean and the semivariogram of the random space function and proceeds as follows. 1. The area is divided into a finite number of grid points N (location indices x1 , x2 , .., xN ) at which values of the conditional realizations are to be simulated. The grid points are visited in a random order. 2. For the first grid point x1 a simple kriging is performed from the given data yielding the prediction and the prediction variance . Under the assumption that Z(x) is stationary and multiGaussian the conditional cumulative distribution is Gaussian: 164 ̀Ƴͮ, ÎͥɳͮʚÉͥʛ, … , ͮʚÉ)ʛƷ Ɣ ͈ƣͮ; ͔ɸ ʚÎͥʛ, ʚÎͥʛƧ (7.39) 3. A random value P between zero and one is drawn from a uniform distribution U[0, 1]. Using the inverse of the conditional distribution (7.39) the random quantile P is used to draw a random value Z: ͯͥ ͔ʚÎͥʛ Ɣ ͈ ƣ͊; ͔ɸ ʚÎͥʛ, ʚÎͥʛƧ (7.40) 4. For the second grid point x2 a simple kriging is performed using the data ͮʚÉ1 ʛ, .., ͮʚÉ͢ ʛ and the previously simulated value z(x1 ) in the kriging equations (so the previously simulated value is now treated as a data point). This yields the ͦ prediction ͔ɸ ʚ̸ͦʛ and the prediction variance ʚ̸ͥʛ. from which the conditional cumulative distribution ̀Ƴͮ, ̸ͦɳͮʚ̸ͥʛ, ͮʚ̳ͥʛ, … , ͮʚ̳)ʛƷ Ɣ ͈ƣͮ; ͔ɸ ʚ̸ͦʛ, ʚ̸ͦʛƧ is build. 5. A random value P between zero and one is drawn from a uniform distribution U[0, 1] ͯͥ and using the inverse of the conditional distribution ͈ ƣ͊; ͔ɸ ʚ̸ͦʛ, ʚ̸ͦʛƧ the random quantile P is used to draw a random value ͔ʚ̸ͦʛ. 6. For the third grid point ̸ͧ a simple kriging is performed using the data ͮʚÉ1 ʛ, .., ͮʚÉ͢ ʛ and the previously simulated values z(x1 ) and z(x2 ) in the kriging equations yielding: ̀Ƴͮ, ̸ͧɳͮʚ̸ͥʛ, ͮʚ̸ͦʛ, ͮʚ̳ͥʛ, … , ͮʚ̳)ʛƷ Ɣ ͈ƣͮ; ͔ɸ ʚ̸ͧʛ, ʚ̸ͧʛƧ. 7. Using a random value P drawn from a uniform distribution U[0, 1] the random variable ͔ʚ̸ͧʛ is drawn and added to the data set. 8. Steps 6 and 7 are repeated adding more and more simulated values to the conditioning data set until all values on the grid have been simulated: the last simple kriging exercise thus yields the conditional probability: ̀Ƴͮ, ̸ɳͮʚ̸ͥʛ, … , ͮʚ̸΂ͯͥʛ, ͮʚ̳ͥʛ, … , ͮʚ̳)ʛƷ. It can be shown heuristically that by construction this procedure produces a draw (realization) from the multivariate conditional distribution ͔̀ ʚͮʚ̸1 ʛ, ..., ͮʚ̸͈ ʛ|ͮʚ̳1 ʛ, .., ͮʚ̳͢ ʛʛ (Goméz-Hernández and Journel, 1993; Goovaerts, 1997), i.e. a realization from the conditional random function ͔ʚʚ̸ʛ|ͮʚ̳1 ʛ, .., ͮʚ̳͢ ʛʛ. To simulate another realization the above procedure is repeated using a different random path over the grid nodes and drawing different random numbers for the quantiles P - U[0, 1]. Unconditional realizations of the random function ͔ʚ̸ʛ can also be simulated by starting at the first grid point with a draw from the Gaussian distribution ͈ʜͮ; ͔ , ͔ ʝ and conditioning at every step on previously simulated points only. Obviously, the number of conditioning points and thus the size of the kriging system to be solved increases as the simulation proceeds. This would lead to unacceptably large computer storage requirements and computation times. To avoid this, a search area is used, usually with a radius equal to the semivariogram range, while only a limited number of observations and previously simulated points in the search radius are used in the kriging system (Deustch and Journel, 1998). Obviously, the assumption underlying the simulation algorithm is that the RSF ͔ʚ̸ʛ is stationary and multiGaussian. For a RSF to be multiGaussian it should at least have a univariate Gaussian distribution ͚͔ ʚ̸ʛ Ɣ ͈ʜͮ; ͔ , ͔ ʝ. So, if this method is applied, for instance, to the Walker-lake data set, a normal score 165 transformation is required. The simulation procedure for a realization of ͔ʚʚ̸ʛ|ͮʚ̳1 ʛ, .., ͮʚ̳͢ ʛʛ would then involve the following steps: 1.

Load more