Basics in Geostatistics 3 Geostatistical Monte-Carlo methods: Conditional simulation

Hans Wackernagel

MINES ParisTech

NERSC • April 2013

http://hans.wackernagel.free.fr Basic concepts Geostatistics

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 2 / 34 Concepts

Geostatistical model The experimental serves to analyze the spatial structure of a regionalized variable z(x). It is fitted with a variogram model which is the structure function of a random function. The regionalized variable (reality) is viewed as one realization of the random function Z(x). : Best Linear Unbiased Estimation of point values (or spatial averages) at any location of a region. Conditional simulation: generate an ensemble of realizations of the random function, conditional upon . not linearly related to data can be computed from this ensemble. Concepts

Geostatistical model The experimental variogram serves to analyze the spatial structure of a regionalized variable z(x). It is fitted with a variogram model which is the structure function of a random function. The regionalized variable (reality) is viewed as one realization of the random function Z(x). Kriging: Best Linear Unbiased Estimation of point values (or spatial averages) at any location of a region. Conditional simulation: generate an ensemble of realizations of the random function, conditional upon data. Statistics not linearly related to data can be computed from this ensemble. Concepts

Geostatistical model The experimental variogram serves to analyze the spatial structure of a regionalized variable z(x). It is fitted with a variogram model which is the structure function of a random function. The regionalized variable (reality) is viewed as one realization of the random function Z(x). Kriging: Best Linear Unbiased Estimation of point values (or spatial averages) at any location of a region. Conditional simulation: generate an ensemble of realizations of the random function, conditional upon data. Statistics not linearly related to data can be computed from this ensemble. Limitations of linear geostatistics

Adequate for Gaussian random functions: in practice the distribution function is often skew. Probing with two-point statistics ( function, variogram): other tools are also available. Need for non-linear estimates, e.g. for estimating probability of exceeding: environmental threshold, cut-off grade in mining. Conditional simulation techniques address all these aspects. Gaussian conditional simulation generates an ensemble of realizations on which non-linear statistics can be readily computed. Random functions: spatial correlation structure

Stationary random function:

, and spatial distribution function exist, spatial correlation is described by the :

C(h) = E[(Z(x+h) − m) · (Z(x) − m)]

the variogram of a stationary random function is given by the formula: γ(h) = C(0) − C(h) Gaussian conditional simulation Classical approach

1 Simulate realizations of a stationary Gaussian random function with known covariance function C(h). 2 Condition the realizations using simple kriging. Gaussian simulation 1) Unconditional simulation of a Gaussian RF

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 7 / 34 Simulation of a Gaussian random function Turning bands method (TBM) The simulation of realizations of a GRF can be done simply: determine the 1D covariance function of a corresponding 2D or 3D covariance model, generate directions θ1, . . . , θK simulate realizations of 1D processes Y1,..., YK along lines in those directions, project a given point on the lines and combine the corresponding simulated values to obtain the simulated value of the 3D process at that point: K 1 X Y(x) = √ Yk(< x, θk >) for x ∈ D. K k=1 1D covariance function corresponding to 2D or 3D isotropic covariance

The following formulas rely on Bochner’s theorem, in 3D: Z 1 d C3D(h) = C1D(t h) dt and C1D(h) = (h C3D(h)) 0 dh

in 2D: Z π Z π/2 1 dC2D C2D(h) = C1D(h sin θ) dθ and C1D(h) = 1 + h (h sin θ) dθ π 0 0 dh Example: exponential covariance function

The 1D model associated to a 3D exponential covariance is:

 h  h C (h) = 1 − exp − with h, a ≥ 0 1D a a

Migration method: compute Poisson points, split intervals into halves set to ±1 (mean interval length is 2a): TBM: exponential covariance Gaussian simulation 2) Conditional simulation of a Gaussian RF

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 12 / 34 Best linear unbiased estimation (BLUE): kriging

? Estimation of a value Z at a location x0 in geographical is performed using a linear combination of weights wα with data at neighboring locations xα, α = 1,..., n. Kriging with known mean m (simple kriging):

n ? X Z (x0) = m + wα (Z(xα) − m) α=1 Conditional simulation of Gaussian RF

? ? ZCS(x) = Z (x) + (ZS(x) − ZS(x)) | {z } | {z } kriged from data simulated kriging error

( − )

+

= Conditional simulation and kriging Comparison with kriging

Simulation (left) Samples (right)

Simple kriging (left) Conditional simulation (right) Conditional simulation and kriging

Conditionally on the data, the mean of conditional simulations is equal to the kriging:

  ? E ZCS(x) | Z(xα), α= 1,..., n = Z (x),

the variance of the conditional simulations is the kriging variance:

? 2 var(ZCS(x) | Z(xα), α= 1,..., n) = var(ZS(x) − ZS(x)) = σK(x). Gaussian simulation with non-Gaussian data

1 Fit of a Gaussian anamorphosis function Z(x) = ϕ(Y(x)). 2 Transform the data to −1 Gaussian values: Y(xα) = ϕ (Z(xα)). 3 Fit the variogram of the Gaussian random function Y(x).

4 Simulate realizations YS(x).

5 Condition YS(x) with Y(xα), thus obtaining YCS(x). 6 Transform the result to the initial scale: ZCS(x) = ϕ(YCS(x)). 7 Compute various statistics on the ensemble of realizations. Case study Simulating Yeu island The island is located off the south-west coast of Bretagne

Measurements of elevation in the sea (depths).

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 18 / 34 Kriging the elevation data

Negative kriging estimates are set to zero (below sea level). Conditional simulation of elevation 9 realizations of Yeu island Simulation profiles along island Probability that elevation is above sea level Estimation of surface (km²) of Yeu island

2

Real From kriging Sim. min Sim. mean Sim. max Surface 23.3 22.9 15.4 23.2 31.9

From conditional simulation the volume is estimated to be: 0.188 km3 (as compared to the value deduced from kriging results: 0.169 km3) Conclusion Summary

Gaussian random function simulations Adequate for simulating Gaussian random functions Anamorphosis to apply them to non-Gaussian data Satisfy the need for non-linear estimates, e.g. for estimating probability of exceeding: environmental threshold cut-off grade in mining Generate an ensemble of realisations on which non-linear statistics are readily computed. However, in a number of applications there is a need for stochastic models beyond the random functions framework...

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 24 / 34 Conclusion Summary

Gaussian random function simulations Adequate for simulating Gaussian random functions Anamorphosis to apply them to non-Gaussian data Satisfy the need for non-linear estimates, e.g. for estimating probability of exceeding: environmental threshold cut-off grade in mining Generate an ensemble of realisations on which non-linear statistics are readily computed. However, in a number of applications there is a need for stochastic models beyond the random functions framework...

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 24 / 34 References

M Armstrong, A G Galli, H Beucher, G Le Loc’h, D Renard, B Doligez, R Eschard, and F Geffroy. Plurigaussian Simulations in Geosciences. Springer-Verlag, Berlin, 2nd edition, 2011. JP Chilès and P Delfiner. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, 2nd edition, 2012. C Lantuéjoul. Geostatistical Simulation: Models and Algorithms. Springer-Verlag, Berlin, 2002. G. Matheron. Random Sets and Integral Geometry. John Wiley & Sons, New York, 1975. RGeoS case-study Conditional simulation example with RGeoS 9.1.2

Code from the document Doc2D.pdf on the site: http://rgeos.free.fr

Hans Wackernagel (MINES ParisTech) Basics in Geostatistics 3 NERSC • April 2013 26 / 34 Loading the soil pollution

library(RGeoS) data(Exdemo_2D_pollution.table) DAT=Exdemo_2D_pollution.table data.db = db.create(DAT, flag.grid=FALSE,ndim=2,autoname=F) data.db = db.locate(data.db,"Zn","z",1) plot(data.db,pch=21,bg.in="black",title="Zn Sample locations") # suppress two outliers hist(DAT$Zn,n=20) data.db = db.sel(data.db,Zn<20) hist(DAT$Zn[DAT$Zn<20])

Zn Sample locations of DAT$Zn

● ● 50 ●

● ●

510 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

● 40 ● ● ● ● ● 505 ● ● ●

● ●

● ● ● 30

500 ● ● ● ● ● ● ● ● ● ● ●

495 ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● 490 ●

● ● 10

● ● ● ● ● ● ●

● 485 ● 0

110 115 120 125 130 135 140 5 10 15 20 25 30

DAT$Zn Structural analysis

data.vario = vario.calc(data.db,lag=1,nlag=10) plot(data.vario,npairdw=TRUE,npairpt=TRUE) data.4dir.vario = vario.calc(data.db,lag=1,nlag=10,dir=c(0,45,90,135)) plot(data.4dir.vario,title="Directional ") data.model = model.auto(data.vario, struct=c("Spherical","Exponential"), title="Modelling omni-directional variogram")

Modelling omni−directional variogram

184 3.0 204 ● 3.0 187 ● 205 229 ● ● ● 198

2.5 ● 2.5 231 ● 2.0 2.0 183 123 ● ● 1.5 1.5 1.0 1.0

3

● 0.5 0.5

2 4 6 8 2 4 6 8 Kriging

# for KRIGING use all 102 data (suppress selection) data.db = db.sel(data.db) #kriging grid grid.db = db.grid.init(data.db,nodes=c(100,90)) # defining unique neighborhood data.unique = neigh.input(ndim=2) data.db = db.locate(data.db,seq(8,9)) data.db = db.locate(data.db,Zn,z) grid.db = kriging(data.db,grid.db,data.model,data.unique,radix="KU") Plot kriging results

# plot estimates with contour lines for std deviations plot(grid.db,name.image="KU.Zn.estim", col=topo.colors(20), title = "Estimation (Unique Neighborhood)") plot(grid.db,name.contour="KU.Zn.stdev",nlevels=10,add=TRUE) plot(data.db,pch=21,bg.in=1,add=TRUE) # separate plot of std deviations plot(grid.db,name.image="KU.Zn.stdev", col=topo.colors(100), title = "Std deviation (Unique Neighborhood)",zlim=c(0,2.5)) plot(grid.db,name.contour="KU.Zn.stdev",nlevels=10,add=TRUE)

Estimation (Unique Neighborhood) Std deviation (Unique Neighborhood)

1● 1●

1● 1●

1● 1● ● ● ● ●

● ● ● ●

1 ● 1 ● 510 ● 510 ● 1● 1● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● 1.6 ● ● 1.6 ● ● ● ● ● ●

● 1 1● ● 1 1● ● ● 1 ● ● 1 0.8 ● 1.4 0.8 ● 1.4 1.2 ● 1.2 ●

● ● ● ● 1 ● ● 1 ● ● ● 1.2 ● 1.2 ● ● 0.8 0.8 505 505 1.4 ● 1.4 ● ●1 1 ● ●1 1 ● 1● 1●

1 1 ● ● ● 1 ● 1 1.2 ● ● 1.2 ● ● 1● 1●

● ● 1 1 500 ● ● 500 ● ● 1● 1 1● 1● 1 1● ● ● 1 1.4 ● ● 1 1.4 ● ● ● ● 1.4 ● 1.4 ● 1● 1 1.2 1● 1 1.2 ● ● 1.6 1.6 ● ● 1 1.6 1 1.6

1 1 ● 1.6 ● 1.6 1● 1● 495 1● 1.4 495 1● 1.4 ● 1● ● 1● ● ● ● ● ● ● 0.8 1 1 1● 0.8 1 1 1● 1.2 ● 1.2 ● ● ● ● ● ● ● 0.8 ● 1.2 0.8 ● 1.2 ● ● ● ● ● 1● ● 1● ● 0.8● ● 0.8● ● ● ● ● ● ● 1 ● 1 1 ● 1 ● ● 1 ● ● 1 ● ● 1.2 ● ●1 1.2 ● ●1

● ● ● ● 490 1 490 1 ● ●

● ● ● ●

1.4 1● 1.4 1● ● ● ● ●

1 1 ● 1● ● 1 ● 1● ● 1 ● ● ● 1.6 ● 1.6 1.2 1.2 485 485 ●1 ●1

110 115 120 125 130 135 140 110 115 120 125 130 135 140 Anamorphosis

## anamorphosis (normal score transform & Hermite poly) data.anam=anam.fit(data.db,"Zn") # transform z to Gaussian y data.db = anam.z2y(data.db,"Zn",anam=data.anam) data.g.vario = vario.calc(data.db,nlag=10,lag=1) plot(data.g.vario,npairdw=TRUE,npairpt=TRUE) data.g.model = model.auto(data.g.vario,struct=c("Exponential")) 1.2

30 218 233 202 ● 209 ● ● ● 194 1.0 ● 198 25 234 ● ● 20 0.8 187 ● Raw 127

15 ● 0.6 10 0.4 5

3

● 0.2

−2 −1 0 1 2 2 4 6 8

Gaussian Conditional simulation 10 realizations (100 turning bands) grid.db = simtub(data.db,grid.db,data.g.model, data.unique,nbsimu=10,nbtuba=100) # transform back from Gaussian Y to Z grid.db = anam.y2z(grid.db,ngrep="Simu.Gaussian.Zn",anam=data.anam) plot(grid.db,name.image="Raw.Simu.Gaussian.Zn.S1", col=topo.colors(20)) plot(data.db,pch=21,bg.in=1,add=TRUE) plot(grid.db,name.image="Raw.Simu.Gaussian.Zn.S10", col=topo.colors(20)) plot(data.db,pch=21,bg.in=1,add=TRUE)

Raw.Simu.Gaussian.Zn.S1 Raw.Simu.Gaussian.Zn.S2

● ● ● ●

● ● ● ● ● ● ● ● ● ●

510 ● ● 510 ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 505 ● 505 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 500 ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●

495 ● ● 495 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

490 ● ● 490 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●

485 ● 485 ●

110 115 120 125 130 135 140 110 115 120 125 130 135 140 Mean and of 10 simulations

## plot mean of simulations grid.db <- db.compare(grid.db,ngrep="Raw.Simu.Gaussian.Zn", fun="mean") plot(grid.db,col=topo.colors(20),zlim=c(3,13)) plot(data.db,bg.in=1,add=TRUE,pch=21) ## standard deviation of simulations grid.db <- db.compare(grid.db,ngrep="Raw.Simu.Gaussian.Zn", fun="stdv") plot(grid.db,col=topo.colors(100),zlim=c(0,2.5)) plot(data.db,bg.in=1,add=TRUE,pch=21)

mean stdv

● ● ● ●

● ● ● ● ● ● ● ● ● ●

510 ● ● 510 ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 505 ● 505 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 500 ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●

495 ● ● 495 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

490 ● ● 490 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●

485 ● 485 ●

110 115 120 125 130 135 140 110 115 120 125 130 135 140 Exporting the results Plotting them with the lattice package grid.db # to find out the names of columns Smean=db.extract(grid.db,"mean"); Sstdv=db.extract(grid.db,"stdv") x1=db.extract(grid.db,"x1"); x2=db.extract(grid.db,"x2") library(lattice) # a standard graphical package in R levelplot(Smean~x1*x2, main="Mean of 10 simulations",col.regions=topo.colors) levelplot(Sstdv~x1*x2, main="Std deviation of 10 simulations", col.regions=rainbow(20, start=.5, end=0.01))

Mean of 10 simulations Std deviation of 10 simulations

25 12 510 510

10 20 505 505

8

500 15 500 6 x2 x2

495 495 4 10

490 490 2

5

485 485 0

115 120 125 130 135 140 115 120 125 130 135 140 x1 x1