An Introduction to Model-Based Geostatistics

An Introduction to Model-Based Geostatistics

An Introduction to Model-based Geostatistics Peter J Diggle School of Health and Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University September 2009 Outline • What is geostatistics? • What is model-based geostatistics? • Two examples – constructing an elevation surface from sparse data – tropical disease prevalence mapping Example: data surface elevation Z 6 65 70 75 80 85 90 95100 5 4 Y 3 2 1 0 1 2 3 X 4 5 6 Geostatistics • traditionally, a self-contained methodology for spatial prediction, developed at Ecole´ des Mines, Fontainebleau, France • nowadays, that part of spatial statistics which is concerned with data obtained by spatially discrete sampling of a spatially continuous process Kriging: find the linear combination of the data that best predicts the value of the surface at an arbitrary location x Model-based Geostatistics • the application of general principles of statistical modelling and inference to geostatistical problems – formulate a statistical model for the data – fit the model using likelihood-based methods – use the fitted model to make predictions Kriging: minimum mean square error prediction under Gaussian modelling assumptions Gaussian geostatistics (simplest case) Model 2 • Stationary Gaussian process S(x): x ∈ IR E[S(x)] = µ Cov{S(x), S(x′)} = σ2ρ(kx − x′k) 2 • Mutually independent Yi|S(·) ∼ N(S(x),τ ) Point predictor: Sˆ(x) = E[S(x)|Y ] • linear in Y = (Y1, ..., Yn); • interpolates Y if τ 2 = 0 • called simple kriging in classical geostatistics Predictive distribution • choose the target for prediction, F(S), where S = {S(x): x ∈ A} • draw samples Si : i = 1, ..., N from [S|Y ] • then Fi = F(Si): i = 1, ..., N is a sample from required predictive distribution [F(S)|Y ] Interpolating the elevation surface Under Gaussian modelling assumptions, we need to: • identify a parametric family of correlation functions • fit the model • use the model for prediction • identify a parametric family of correlation functions The empirical variogram 1 2 (xi, Yi): i = 1, ..., n uij = ||xi − xj || vij = (yi − yj) 2 The theoretical variogram 1 2 2 V (u) = Var{Y (x) − Y (x − u)} = τ + σ {1 − ρ(u)} 2 Exploratory analysis E[vij ] = V (uij ) ⇒ smoothed scatterplot of (uij ,vij ) identifies rough shape of ρ(u) and initial estimates of model parameters geoR code: library(geoR) data(elevation) summary(elevation) vario<-variog(elevation,uvec=0.2*(0:25)) plot(vario) ?variog vario2<-variog(elevation,uvec=0.2*(0:25),trend="1st") plot(vario2) plot(vario$u,vario$v,type="l",xlim=c(0,5),ylim=c(0,7000), xlab="u",ylab="V(u)") lines(vario2$u,vario2$v,col="red") • identify a parametric family of correlation functions • fit the model 1. Classical: compute maximum likelihood estimates θˆ 2. Bayesian: prior [θ] implies posterior [θ|Y ] geoR code for option 1: mlfit<-likfit(elevation,ini.cov.pars=c(5000,2.0), cov.model="matern",kappa=1) • identify a parametric family of correlation functions • fit the model • use the model for prediction 1. Plug-in:[S|Y ; θˆ] 2. Bayesian:[S|Y ] = R [S|Y ; θ][θ|Y ]dθ geoR code for option 1: region<-matrix(c(0,0,6.4,0,6.4,6.4,0,6.4),4,2,T) grid<-pred_grid(region,by=0.2) KC<-krige.control(obj.model=mlfit) OC<-output.control(n.predictive=100) set.seed(24367) predictions<-krige.conv(geodata=elevation,locations=grid, borders=region,krige=KC,output=OC) image(predictions) points(elevation,add=T) Tropical disease prevalence mapping • “river blindness” – an endemic disease in wet tropics • donation programme of mass treatment with ivermectin • approximately 50 million people treated to date (target is 80 million by 2015) • serious adverse reactions experienced by some patients highly co-infected with Loa loa parasites • precautionary measures put in place before mass treatment in areas of high Loa loa prevalence http://www.who.int/pbd/blindness/onchocerciasis/en/ Diggle et al, Annals of Tropical Medicine and Parasitology, 101, 499–509. The Loa loa prediction problem Ground-truth survey data • random sample of subjects in each of a number of villages • blood-samples test positive/negative for Loa loa Environmental data (satellite images) • measured on regular grid to cover region of interest • elevation, green-ness of vegetation Objectives • predict local prevalence throughout study-region (Cameroon) • compute local exceedance probabilities, P(prevalence > 0.2|data) Loa loa: a generalised linear model • Latent spatial process S(x) ∼ SGP{0, σ2, ρ(u)} ρ(u) = exp(−|u|/φ) • Linear predictor d(x) = environmental variables at location x η(x) = d(x)′β + S(x) p(x) = exp{η(x)}/[1 + exp{η(x)}] • Conditional distribution for positive proportion Yi/ni Yi|S(·) ∼ Bin{ni, p(xi)} The modelling strategy • use relationship between environmental variables and ground-truth prevalence to construct preliminary predictions via logistic regression • use local deviations from regression model to estimate smooth residual spatial variation • use fitted model for predictive inference logit prevalence vs elevation logit prevalence −5 −4 −3 −2 −1 0 0 500 1000 1500 elevation logit prevalence vs max NDVI logit prevalence −5 −4 −3 −2 −1 0 0.65 0.70 0.75 0.80 0.85 0.90 Max Greeness Comparing non-spatial and spatial predictions in Cameroon Non-spatial 60 50 40 30 20 10 0 Observed prevalence (%) 0 10 20 30 Predicted prevalence - 'without ground truth data' Spatial 60 50 40 30 20 10 0 Observed prevalence (%) 0 10 20 30 40 Predicted prevalence - 'with ground truth data' (%) Probabilistic prediction in Cameroon Take-home message • model-based approach: – makes assumptions explicit – makes choice of analysis strategy less subjective – emphasises uncertainty • exceedance probabilty maps are often more useful than point predictions and standard errors • text-book linked to geoR software Diggle, P.J. and Ribeiro, P.J. (2007). Model-based Geostatistics. New York : Springer..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us