Introduction Geostatistical Model structure Cokriging

Geostatistical Model, Covariance structure and Cokriging

Hans Wackernagel¹

¹Equipe de Géostatistique — MINES ParisTech www.cg.ensmp.fr

Statistics and Machine Learning Interface Meeting Manchester, July 2009

Statistics & Machine Learning • Manchester, July 2009 Introduction Geostatistical Model Covariance structure Cokriging

Introduction

Statistics & Machine Learning • Manchester, July 2009 and Gaussian processes

Geostatistics is not limited to Gaussian processes, it usually refers to the concept of random functions, it may also build on concepts from random sets theory. Geostatistics

Geostatistics: is mostly known for the techniques, nowadays deals much with geostatistical simulation.

Bayesian inference of geostatistical parameters has also become a topic of research. Sequential assimilation is an extension of geostatistics using a mechanistic model to describe the time dynamics. In this talk: we will stay with linear (Gaussian) geostatistics, concentrate on kriging in a multi-scale and multi-variate context. A typical application may be: the response surface estimation problem eventually with several correlated response variables. of parameters will not be discussed. Statistics vs Machine Learning ? Necessity of an interface meeting

Differences (subjective): geostatistics favours interpretability of the , machine learning stresses prediction performance and computational perfomance of algorithms.

Ideally both should be achieved. Geostatistics: definition

Geostatistics is an application of the theory of regionalized variables to the problem of predicting spatial phenomena.

(G. MATHERON, 1970)

Usually we consider the regionalized variable z(x) to be a realization of a random function Z(x). Stationarity

For the top series: we think of a (2nd order) stationary model

For the bottom series: a and a finite do not make sense, rather the realization of a non- without drift. Second-order stationary model Mean and covariance are translation invariant

The mean of the random function does not depend on x: h i E Z(x) = m

The covariance depends on length and orientation of the vector h linking two points x and x0 = x + h: h   i cov(Z(x),Z(x0)) = C(h) = E Z(x) − m · Z(x+h) − m Non-stationary model (without drift) Variance of increments is translation invariant

The mean of increments does not depend on x and is zero: h i E Z(x+h) − Z(x) = m(h) = 0

The variance of increments depends only on h: h i var Z (x+h)−Z (x) = 2γ(h)

This is called intrinsic stationarity. Intrinsic stationarity does not imply 2nd order stationarity. 2nd order stationarity implies stationary increments. The

With intrinsic stationarity:

1 h 2 i γ(h) = E Z (x+h) − Z(x) 2 Properties - zero at the origin γ(0) = 0 - positive values γ(h) ≥ 0 - even function γ(h) = γ(−h)

The is bounded by the variance: C(0) = σ 2 ≥ |C(h)| The variogram is not bounded. A variogram can always be constructed from a given covariance function: γ(h) = C(0) − C(h) The converse is not true. What is a variogram ?

A covariance function is a positive definite function.

What is a variogram? A variogram is a conditionnally negative definite function. In particular:

any variogram matrix Γ = [γ(xα −xβ )] is conditionally negative semi-definite,

>  > [wα ] γ(xα −xβ ) [wα ] = w Γw ≤ 0 for any set of weights with

n ∑ wα = 0. α=0 Ordinary kriging

n n ? Estimator: Z (x0) = ∑ wα Z(xα ) with ∑ wα = 1 α=1 α=1 Solving:

" n # ? argmin var(Z (x0) − Z(x0)) − 2µ( ∑ wα − 1) w1,...,wn,µ α=1 yields the system:

 n  w γ(x −x ) + µ = γ(x −x ) ∀α  ∑ β α β α 0  β =1 n  w = 1  ∑ β β =1 n 2 and the kriging variance: σK = µ + ∑ wα γ(xα −x0) α=1 Kriging the mean Stationary model: Z (x) = Y (x) + m

n n ? Estimator: M = ∑ wα Z(xα ) with ∑ wα = 1 α=1 α=1 Solving:

" n # ? argmin var(M − m) − 2µ( ∑ wα − 1) w1,...,wn,µ α=1 yields the system:

 n  w C(x −x ) − µ = 0 ∀α  ∑ β α β  β =1 n  w = 1  ∑ β β =1 2 and the kriging variance: σK = µ Kriging a component S Stationary model: Z(x) = ∑ Yu(x) + m (Yu ⊥ Yv foru 6= v) u=0

n n Estimator: Y ? (x ) = w Z(x ) with w = 0 u0 0 ∑ α α ∑ α α=1 α=1 Solving: " #   n argmin var Y ? (x ) − Y (x ) − 2µ w u0 0 u0 0 ∑ α w1,...,wn,µ α=1 yields the system:

 n  w C(x −x ) − µ = C (x −x ) ∀α  ∑ β α β u0 α 0  β =1 n  w = 0  ∑ β β =1 Mobile phone exposure of children by Liudmila KUDRYAVTSEVA http://perso.rd.francetelecom.fr/joe.wiart/ Child heads at different ages Phone position and child head Head of 12 year old child SAR exposure (simulated) Max SAR for different positions of phone The phone positions are characterized by two angles

The SAR values are normalized with respect to 1 W. Variogram Anisotropic linear variogram model Max SAR kriged map Kriging standard deviations Kriging standard deviations Different sample design Introduction Geostatistical Model Covariance structure Cokriging

Geostatistical filtering: Skagerrak SST

Statistics & Machine Learning • Manchester, July 2009 NAR16 images on 26-27 april 2005 Sea-surface temperature (SST) NAR SST − 200504261000S NAR SST − 200504261200S

6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 59˚ 59˚59˚ 59˚

58˚ 58˚58˚ 58˚

57˚ 57˚57˚ 57˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚

6 7 8 9 6 7 8 9

NAR SST − 200504262000S NAR SST − 200504270200S

6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 59˚ 59˚59˚ 59˚

58˚ 58˚58˚ 58˚

57˚ 57˚57˚ 57˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚

6 7 8 9 6 7 8 9 Nested variogram model

Nested scales modeled by sum of different : micro-scale nugget-effect of .005 small scale spherical model ( .4 deg longitude, sill .06) large scale Geostatistical filtering Small-scale variability of NAR16 images NAR SST − SHORT26apr10h NAR SST − SHORT26apr12h

6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 59˚ 59˚59˚ 59˚

58˚ 58˚58˚ 58˚

57˚ 57˚57˚ 57˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚

−1 0 1 −1 0 1

NAR SST − SHORT26apr20h NAR SST − SHORT27apr02h

6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 59˚ 59˚59˚ 59˚

58˚ 58˚58˚ 58˚

57˚ 57˚57˚ 57˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚ 6˚ 7˚ 8˚ 9˚ 10˚ 11˚

−1 0 1 −1 0 1 Introduction Geostatistical Model Covariance structure Cokriging

Geostatistical Model

Statistics & Machine Learning • Manchester, July 2009 Introduction Geostatistical Model Covariance structure Cokriging Linear model of coregionalization

The linear model of coregionalization (LMC) combines: a linear model for different scales of the spatial variation, a linear model for components of the multivariate variation.

Statistics & Machine Learning • Manchester, July 2009 Two linear models

S Linear Model of Regionalization: Z(x) = ∑ Yu(x) u=0 h i E Yu(x+h) − Yu(x) = 0 h   i E Yu(x+h) − Yu(x) · Yv (x+h) − Yv (x) = gu(h)δuv

N Linear Model of PCA: Zi = ∑ aip Yp p=1 h i E Yp = 0   cov Yp,Yq = 0 for p 6= q Linear Model of Coregionalization

Spatial and multivariate representation of Zi (x) using p u uncorrelated factors Yu (x) with coefficients aip:

S N u p Zi (x) = ∑ ∑ aip Yu (x) u=0 p=1

p Given u, all factors Yu (x) have the same variogram gu(h).

This implies a multivariate nested variogram:

S Γ(h) = ∑ Bu gu(h) u=0 Coregionalization matrices

The coregionalization matrices Bu characterize the correlation between the variables Zi at different spatial scales.

In practice: 1 A multivariate nested variogram model is fitted. 2 Each matrix is then decomposed using a PCA:

N h u i h u u i Bu = bij = ∑ aip ajp p=1

yielding the coefficients of the LMC. LMC: intrinsic correlation

When all coregionalization matrices are proportional to a matrix B:

Bu = au B we have an intrinsically correlated LMC:

S Γ(h) = B ∑ au gu(h) = B γ(h) u=0

In practice, with intrinsic correlation, the eigenanalysis of the different Bu will yield: different sets of eigenvalues, but identical sets of eigenvectors. Regionalized Multivariate Data Analysis

With intrinsic correlation: The factors are autokrigeable, i.e. the factors can be computed from a classical MDA on the variance- V =∼ B and are kriged subsequently.

With spatial-scale dependent correlation: The factors are defined on the basis of the coregionalization matrices Bu and are cokriged subsequently.

Need for a regionalized multivariate data analysis! Regionalized PCA ?

Variables Zi (x) ↓ no Intrinsic Correlation ? −→ γij (h) = ∑ Bu gu(h) u

↓ yes ↓

PCA on B PCA on Bu ↓ ↓ Transform into Y Cokrige Y ? (x) u0p0 ↓ ↓

Krige Y ? (x) −→ Map of PC p0 Introduction Geostatistical Model Covariance structure Cokriging

Geostatistical filtering: Golfe du Lion SST

Statistics & Machine Learning • Manchester, July 2009 Modeling of spatial variability as the sum of a small-scale and a large-scale process Variogram of SST

SST on 7 june 2005 The variogram of the Nar16 im- age is fitted with a short- and a long-range structure (with geo- metrical anisotropy).

The small-scale components of the NAR16 image and of corresponding MARS ocean-model output are extracted by geostatistical filtering. Geostatistical filtering Small scale (top) and large scale (bottom) features

3˚ 4˚ 5˚ 6˚ 7˚ 3˚ 4˚ 5˚ 6˚ 7˚

43˚ 43˚ 43˚ 43˚

3˚ 4˚ 5˚ 6˚ 7˚ 3˚ 4˚ 5˚ 6˚ 7˚

−0.5 0.0 0.5 −0.5 0.0 0.5 NAR16 image MARS model output

3˚ 4˚ 5˚ 6˚ 7˚ 3˚ 4˚ 5˚ 6˚ 7˚

43˚ 43˚ 43˚ 43˚

3˚ 4˚ 5˚ 6˚ 7˚ 3˚ 4˚ 5˚ 6˚ 7˚

15 20 15 20 Zoom into NE corner

3˚ 4˚ 5˚ 6˚ 7˚

8001000

900 700

100 600 600 200500 400 200500 43˚ 700 1100 43˚ 300 9001000 14001200 400 100 8001600 15001300300 1700 2400 200018002300 2100 1900 2200 300200 800

900 500 1000 1200 400 600700 1300 Bathymetry 1100 1500 1400 1700 1600 1800 1900 2000

3˚ 4˚ 5˚ 6˚ 7˚

0 500 1000 1500 2000 2500

Depth selection, scatter diagram

Direct and cross variograms in NE corner Cokriging in NE corner Small scale (top) and large scale (bottom) components

6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 43˚30' 43˚30'

43˚24' 43˚24'

43˚18' 43˚18'

43˚12' 43˚12'

43˚06' 43˚06'

43˚00' 43˚00'

42˚54' 42˚54'

6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54'

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

NAR16 image MARS model output

6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 43˚30' 43˚30'

43˚24' 43˚24'

43˚18' 43˚18'

43˚12' 43˚12'

43˚06' 43˚06'

43˚00' 43˚00'

42˚54' 42˚54'

6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54' 6˚18' 6˚24' 6˚30' 6˚36' 6˚42' 6˚48' 6˚54'

19.0 19.5 20.0 20.5 21.0 21.5 22.0 19.0 19.5 20.0 20.5 21.0 21.5 22.0 Interpretation

To correct for these discrepancies between remotely sensed SST and that provided by the MARS ocean model, the latter was thoroughly revised in order better reproduce the path of the Ligurian current. Introduction Geostatistical Model Covariance structure Cokriging

Covariance structure

Statistics & Machine Learning • Manchester, July 2009 Intrinsic Correlation: Variogram Model

A simple model for the matrix Γ(h) of direct and cross variograms γij (h) is: h i Γ(h) = γij (h) = Bγ(h) where B is a positive semi-definite matrix.

In this model all variograms are proportional to a basic variogram γ(h):

γij (h) = bij γ(h) Codispersion Coefficients

A coregionalization is intrinsically correlated when the codispersion coefficients:

γij (h) ccij (h) = p γii (h)γjj (h) are constant, i.e. do not depend on spatial scale.

With the intrinsic correlation model:

bij γ(h) ccij (h) = p = rij bii bjj γ(h) the correlation rij between variables is not a function of h. Codispersion Coefficients

A coregionalization is intrinsically correlated when the codispersion coefficients:

γij (h) ccij (h) = p γii (h)γjj (h) are constant, i.e. do not depend on spatial scale.

With the intrinsic correlation model:

bij γ(h) ccij (h) = p = rij bii bjj γ(h) the correlation rij between variables is not a function of h. Intrinsic Correlation: Covariance Model

For a covariance function matrix the model becomes:

C(h) = Vρ(h) where h i V = σij is the variance-covariance matrix,

ρ(h) is an function.

The correlations between variables do not depend on the spatial scale h, hence the adjective intrinsic. In the intrinsic correlation model the multi-variate variability is separable from the spatial variation. A Test for Intrinsic Correlation

1 Compute principal components for the variable set. 2 Compute the cross-variograms between principal components.

In case of intrinsic correlation, the cross-variograms between principal components are all zero. A Test for Intrinsic Correlation

1 Compute principal components for the variable set. 2 Compute the cross-variograms between principal components.

In case of intrinsic correlation, the cross-variograms between principal components are all zero. Cross variogram: two principal components

gij(h) Correlated at short distances !!

h 900m 3.5km PC3 & PC4

=⇒ The intrinsic correlation model is not adequate! Introduction Geostatistical Model Covariance structure Cokriging

Cokriging

Statistics & Machine Learning • Manchester, July 2009 Multivariate Kriging

Kriging is optimal linear unbiased prediction applied to random functions in or time with the particular requirement that their covariance structure is known. −→ Multivariate case: co-kriging

Covariance structure: covariance functions (or variograms, generalized ) for a set of variables. Ordinary cokriging

N ni Estimator: Z ? (x ) = w i Z (x ) i0,OK 0 ∑ ∑ α i α i=1 α=1

i = constrained weights: ∑wα δi,i0 α valid for variograms, nonstationary phenomenon without drift. Data configuration & Cokriging neighborhood

Data configuration: sites of different types of measurements in a spatial/temporal domain.

Are sites shared by different measurement types — or not?

Neighborhood: a subset of data used in cokriging.

How should the cokriging neighborhood be defined?

What are the links with the covariance structure? Configurations: Iso- and Heterotopic Data

primary data secondary data

Sample sites Isotopic data are shared

Sample sites Heterotopic data may be different

Secondary data

Dense auxiliary data covers whole domain Configuration: isotopic data Auto-krigeability

A random function Z1(x) is auto-krigeable (self-krigeable), if the cross-variograms of that variable with the other variables are all proportional to the direct variogram of Z1(x):

γ1j (h) = a1j γ11(h) for j = 2,...,N

Isotopic data: auto-krigeability that the cokriging boils down to the corresponding kriging. If all variables are auto-krigeable, the set of variables is intrinsically correlated, i.e. the multivariate variation is separable from the spatial variation. Configuration: dense auxiliary data 3 cokriging neighborhoods

primary data

# target point A # secondary data

B ##C

A: neighborhood using all data B: multi-collocated neighborhood C: collocated neighborhood Neighborhood: all data

primary data # # target point secondary data

Very dense auxiliary data (e.g. remote sensing): a priori large cokriging system, potential numerical instabilities. Ways out: moving neighborhood, multi-collocated neighborhood, sparser cokriging matrix: covariance tapering. Neighborhood: multi-collocated

primary data # # target point secondary data

Multi-collocated cokriging equivalent to cokriging with all data when there is proportionality in the cross-covariance model, for all forms of cokriging: simple, ordinary, universal Neighborhood: multi-collocated Bivariate example: proportionality in the covariance model

Cokriging with all data is equivalent to cokriging with a multi-collocated neighborhood for a model with a covariance structure is of the type:

2 C11(h) = p C(h) + C1(h)

C22(h) = C(h)

C12(h) = p C(h) where p is a proportionality coefficient.

RIVOIRARD (2004) studies various examples of this kind, examining bivariate and multi-variate coregionalization models in connection with different data configurations and neighborhoods, among them the dislocated neighbohood. Acknowledgements

This work was partly funded by the PRECOC project (2006-2008) of the Franco-Norwegian Foundation (Agence Nationale de la Recherche, Research Council of Norway) and by the EU FP7 MOBI-Kids project (2009-2012). References

BANERJEE,S.,CARLIN,B., AND GELFAND,A. Hierachical Modelling and Analysis for spatial Data. Chapman and Hall, Boca Raton, 2004.

BERTINO,L.,EVENSEN,G., AND WACKERNAGEL,H. Sequential data assimilation techniques in . International Statistical Review 71 (2003), 223–241.

CHILÈS,J., AND DELFINER, P. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, 1999.

LANTUÉJOUL,C. Geostatistical Simulation: Models and Algorithms. Springer-Verlag, Berlin, 2002.

MATHERON,G. The Theory of Regionalized Variables and its Applications. No. 5 in Les Cahiers du Centre de Morphologie Mathématique. Ecole des Mines de Paris, Fontainebleau, 1970.

RASMUSSEN,C., AND WILLIAMS,C. Gaussian Processes for Machine Learning. MIT Press, Boston, 2006.

RIVOIRARD,J. On some simplifications of cokriging neighborhood. Mathematical Geology 36 (2004), 899–915.

WACKERNAGEL,H. Multivariate Geostatistics: an Introduction with Applications, 3rd ed. Springer-Verlag, Berlin, 2003. Introduction Geostatistical Model Covariance structure Cokriging

Appendix

Statistics & Machine Learning • Manchester, July 2009 Eddie tracking with circlets by Hervé CHAURIS www.geophy.ensmp.fr Detection of circular structures

Taking account of the band limited aspect of the data: Circlets applied to Skagerrak

Preprocessing (despiking, transfer on regular grid); image gradient; circlet decomposition; coefficient thresholding; Image reconstruction. Methodology Circlets applied to Skagerrak Tracking the position and diameter of an eddy-like structure Introduction Geostatistical Model Covariance structure Cokriging Potential of circlets

Preprocessing can be done by geostatistical filtering Simple and fast circlet transform (CPU cost of a few 2-D FFTs) Deal with edge effects How to integrate eddie tracking into a data assimilation procedure?

Statistics & Machine Learning • Manchester, July 2009