An Introduction to Model-Based Geostatistics

An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University September 2009 Outline • What is geostatistics? • What is model-based geostatistics? • Two examples – constructing an elevation surface from sparse data – tropical disease prevalence mapping Example: data surface elevation Z 6 65 70 75 80 85 90 95100 5 4 Y 3 2 1 0 1 2 3 X 4 5 6 Geostatistics • traditionally, a self-contained methodology for spatial prediction, developed at Ecole´ des Mines, Fontainebleau, France • nowadays, that part of spatial statistics which is concerned with data obtained by spatially discrete sampling of a spatially continuous process Kriging: find the linear combination of the data that best predicts the value of the surface at an arbitrary location x Model-based Geostatistics • the application of general principles of statistical modelling and inference to geostatistical problems – formulate a statistical model for the data – fit the model using likelihood-based methods – use the fitted model to make predictions Kriging: minimum mean square error prediction under Gaussian modelling assumptions Gaussian geostatistics (simplest case) Model 2 • Stationary Gaussian process S(x): x ∈ IR E[S(x)] = µ Cov{S(x), S(x′)} = σ2ρ(kx − x′k) 2 • Mutually independent Yi|S(·) ∼ N(S(x),τ ) Point predictor: Sˆ(x) = E[S(x)|Y ] • linear in Y = (Y1, ..., Yn); • interpolates Y if τ 2 = 0 • called simple kriging in classical geostatistics Predictive distribution • choose the target for prediction, F(S), where S = {S(x): x ∈ A} • draw samples Si : i = 1, ..., N from [S|Y ] • then Fi = F(Si): i = 1, ..., N is a sample from required predictive distribution [F(S)|Y ] Interpolating the elevation surface Under Gaussian modelling assumptions, we need to: • identify a parametric family of correlation functions • fit the model • use the model for prediction • identify a parametric family of correlation functions The empirical variogram 1 2 (xi, Yi): i = 1, ..., n uij = ||xi − xj || vij = (yi − yj) 2 The theoretical variogram 1 2 2 V (u) = Var{Y (x) − Y (x − u)} = τ + σ {1 − ρ(u)} 2 Exploratory analysis E[vij ] = V (uij ) ⇒ smoothed scatterplot of (uij ,vij ) identifies rough shape of ρ(u) and initial estimates of model parameters geoR code: library(geoR) data(elevation) summary(elevation) vario<-variog(elevation,uvec=0.2*(0:25)) plot(vario) ?variog vario2<-variog(elevation,uvec=0.2*(0:25),trend="1st") plot(vario2) plot(vario$u,vario$v,type="l",xlim=c(0,5),ylim=c(0,7000), xlab="u",ylab="V(u)") lines(vario2$u,vario2$v,col="red") • identify a parametric family of correlation functions • fit the model 1. Classical: compute maximum likelihood estimates θˆ 2. Bayesian: prior [θ] implies posterior [θ|Y ] geoR code for option 1: mlfit<-likfit(elevation,ini.cov.pars=c(5000,2.0), cov.model="matern",kappa=1) • identify a parametric family of correlation functions • fit the model • use the model for prediction 1. Plug-in:[S|Y ; θˆ] 2. Bayesian:[S|Y ] = R [S|Y ; θ][θ|Y ]dθ geoR code for option 1: region<-matrix(c(0,0,6.4,0,6.4,6.4,0,6.4),4,2,T) grid<-pred_grid(region,by=0.2) KC<-krige.control(obj.model=mlfit) OC<-output.control(n.predictive=100) set.seed(24367) predictions<-krige.conv(geodata=elevation,locations=grid, borders=region,krige=KC,output=OC) image(predictions) points(elevation,add=T) Tropical disease prevalence mapping • “river blindness” – an endemic disease in wet tropics • donation programme of mass treatment with ivermectin • approximately 50 million people treated to date (target is 80 million by 2015) • serious adverse reactions experienced by some patients highly co-infected with Loa loa parasites • precautionary measures put in place before mass treatment in areas of high Loa loa prevalence http://www.who.int/pbd/blindness/onchocerciasis/en/ Diggle et al, Annals of Tropical Medicine and Parasitology, 101, 499–509. The Loa loa prediction problem Ground-truth survey data • random sample of subjects in each of a number of villages • blood-samples test positive/negative for Loa loa Environmental data (satellite images) • measured on regular grid to cover region of interest • elevation, green-ness of vegetation Objectives • predict local prevalence throughout study-region (Cameroon) • compute local exceedance probabilities, P(prevalence > 0.2|data) Loa loa: a generalised linear model • Latent spatial process S(x) ∼ SGP{0, σ2, ρ(u)} ρ(u) = exp(−|u|/φ) • Linear predictor d(x) = environmental variables at location x η(x) = d(x)′β + S(x) p(x) = exp{η(x)}/[1 + exp{η(x)}] • Conditional distribution for positive proportion Yi/ni Yi|S(·) ∼ Bin{ni, p(xi)} The modelling strategy • use relationship between environmental variables and ground-truth prevalence to construct preliminary predictions via logistic regression • use local deviations from regression model to estimate smooth residual spatial variation • use fitted model for predictive inference logit prevalence vs elevation logit prevalence −5 −4 −3 −2 −1 0 0 500 1000 1500 elevation logit prevalence vs max NDVI logit prevalence −5 −4 −3 −2 −1 0 0.65 0.70 0.75 0.80 0.85 0.90 Max Greeness Comparing non-spatial and spatial predictions in Cameroon Non-spatial 60 50 40 30 20 10 0 Observed prevalence (%) 0 10 20 30 Predicted prevalence - 'without ground truth data' Spatial 60 50 40 30 20 10 0 Observed prevalence (%) 0 10 20 30 40 Predicted prevalence - 'with ground truth data' (%) Probabilistic prediction in Cameroon Take-home message • model-based approach: – makes assumptions explicit – makes choice of analysis strategy less subjective – emphasises uncertainty • exceedance probabilty maps are often more useful than point predictions and standard errors • text-book linked to geoR software Diggle, P.J. and Ribeiro, P.J. (2007). Model-based Geostatistics. New York : Springer..

An Introduction to Model-Based Geostatistics

Getting Started with Geostatistics

An Introduction to Spatial Autocorrelation and Kriging

Master of Science in Geospatial Technologies Geostatistics

Conditional Simulation

Efficient Geostatistical Simulation for Spatial Uncertainty Propagation

ON the USE of NON-EUCLIDEAN ISOTROPY in GEOSTATISTICS Frank C

Geostatistics: Basic Aspects for Its Use to Understand Soil Spatial Variability

A Practical Guide to Geostatistical Mapping of Environmental Variables

Projection Pursuit Multivariate Transform

A COMPARISON of CLASSICAL STATISTICS and GEOSTATISTICS for ESTIMATING SOIL SURFACE TEMPERATURE by PARICHEHR HEMYARI Bachelor Of

A Geostatistical Simulation of a Mineral Deposit Using Uncertain Experimental Data

Model-Based Geostatistics