Practicum Spatial Regression

Alex Schmidt IM-UFRJ Practicum : Spatial Regression Brazil

PASI Alexandra M. Schmidt June 2014 Instituto de Matem´atica ESDA EDA in geoR UFRJ - Brazil Review Spatial www.dme.ufrj.br/∼alex Modelling Introduction GP Stationarity Var. and Cov. Cov. Functions Spatial Regression Classical Approach Bayesian Approach PASI 2014 B´uzios,RJ, Brazil June 2014 www.dme.ufrj.br Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Brazil

1. Non-spatial summaries PASI June 2014 Numerical summaries: Mean, median, standard deviation, range, etc. ESDA EDA in geoR Not useful for spatial data, as they ignore the Review Spatial location information. Notice that in spatial statistics Modelling Introduction data should not be regarded as having come from a GP Stationarity single population, usually they are NOT i.i.d. Var. and Cov. Cov. Functions Stem-and-leaf display: Spatial Regression Better than a numerical summary as gives a Classical Approach Bayesian Approach complete picture of the data. Again, the location is ignored, it gives no indication of the data’s spatial structure. Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI June 2014

2. Methods used to explore large-scale spatial variation ESDA EDA in geoR 3-D scatter plot: a plot of Y versus location i Review Spatial (d = 2); Modelling Introduction Plot of Yi versus each marginal coordinate (latitude GP Stationarity and longitude); Var. and Cov. Contour plot of Y (assuming d = 2). Requires some Cov. Functions i Spatial Regression kind of smoothing operation. Classical Approach Bayesian Approach Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI 3. Methods to explore small-scale variation: June 2014 (a) dependence ESDA Variogram cloud EDA in geoR 2 1/2 Plot (Z(si) − Z(sj )) versus (si − sj ) for all Review Spatial Modelling possible pairs of observations; Introduction The plot is often an unintelligible mess, hence it GP Stationarity may often be advisable to bin the lags and plot a Var. and Cov. Cov. Functions boxplot for each bin; Spatial Regression Note that this implicitly assumes isotropy (does not Classical Approach differentiate any directions). Bayesian Approach The square-root differences are more resistant to outliers. Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Empirical semivariogram (or variogram) Brazil Plot one-half the average squared difference (or, for the variogram, merely the average squared difference) of observations lagged the same PASI June 2014 distance and direction apart, versus the lag Assume here that the data are regularly spaced ESDA Formally we plot γˆ(hu) versus hu, where EDA in geoR Review Spatial 1 X 2 Modelling γˆ(hu) = {Z(si) − Z(sj )} , Introduction 2N(hu) si−sj =hu GP Stationarity (u = 1, ··· , k), Var. and Cov. Cov. Functions Spatial Regression h1, ··· , hk are the distinct values of h represented Classical Approach Bayesian Approach in the data set, and N(hu) is the number of times that lag hu occurs in the data set Note that this implicitly assumes stationarity of some kind If d = 2, you can display as a 3-D plot or you can superimpose a few selected directions (e.g.: N-S,NW-SE, E-W, and NE-SW) on the same 2-D graph Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Brazil

- Sample autocovariance function PASI Similar in many ways to the sample semivariogram June 2014 Plot of Cˆ(hu) versus hu, where ESDA EDA in geoR 1 X Review Spatial Cˆ(hu) = (Z(si) − Z)(Z(sj) − Z) Modelling N(hu) Introduction si−sj =hu GP Stationarity Var. and Cov. This is a spatial generalization of an important tool Cov. Functions Spatial Regression used by time series analysts Classical Approach Bayesian Approach - 3-D plot of correlation range versus spatial location, computed from a moving window. This method can detect nonstationarity in dependence Practicum Exploratory (Spatial) Data Analysis Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI 4. Methods used mainly to explore small-scale June 2014 variation: (b) variability ESDA 3-D plot of standard deviation versus spatial EDA in geoR Review Spatial location, computed from a moving window. May Modelling reveal nonstationarity in variability, and indicate Introduction GP which portion(s) of A is (are) different from the rest Stationarity Var. and Cov. Scatterplot of standard deviation versus mean, Cov. Functions Spatial Regression computed from a moving window. May also reveal Classical Approach nonstationarity in variability, but differently from the Bayesian Approach previous method. Strictly this method is non-spatial require("geoR") prices=read.table("house_prices.txt",header=TRUE) prices1=prices[1:500,] names(prices1) prices=na.exclude(prices1) names(prices) #standardizing covariates stdsqft=as.numeric(scale(prices$sqft,scale=TRUE)) stdage=as.numeric(scale(prices$age,scale=TRUE)) stddistfree=as.numeric(scale(prices$dist_freeway,scale=TRUE)) #using the standardizing variables prices$sqft=stdsqft prices$age=stdage prices$dist_freeway=stddistfree #transforming into a geoR object geoprices=as.geodata(prices2,coords.col=c(8,9),covar.col=2:7, data.col=1) plot(geoprices) plot(geoprices,scatter3d=TRUE) #transforming into a geoR object geoprices=as.geodata(prices2,coords.col=c(8,9),covar.col=2:7,data.col=1) plot(geoprices) plot(geoprices,scatter3d=TRUE)

#checking for duplicated coordinates dup.coords(geoprices) #jittering the duplicated locations coord=geoprices$coords coordjitter=jitter2d(geoprices$coords,max=0.001) dup.coords(coordjitter) cbind(geoprices$coords,coordjitter) geoprices$coords=coordjitter plot(geoprices) Practicum Introduction Spatial Regression

Alex Schmidt IM-UFRJ Brazil Basic Model: Data (Y) are a (partial) realization of a random process ( or random field) PASI June 2014 {Y (s): s ∈ D} ESDA EDA in geoR d where D is a fixed subset of R with positive Review Spatial Modelling d-dimensional volume. In other words, the spatial index s Introduction GP varies continously throughout the region D. Stationarity Var. and Cov. NOTE: Cov. Functions Spatial Regression A stochastic process is a collection of random variables Classical Approach Bayesian Approach X(t), t ∈ T defined on a common probability space indexed by t which is in the index set T which describes the evolution of some system. For example, X(t) could be the number of people in line at time t, or Y (s) the amount of rainfall at location s. Practicum Spatial Regression

Alex Schmidt IM-UFRJ Brazil GOAL:

Want a method of predicting Y (s0) for any s0 in D. PASI Want this method to be optimal (in some sense). June 2014 ESDA What do we need? EDA in geoR Want Y (s): s ∈ D to be continuous and ”smooth Review Spatial Modelling enough” (local stationarity) Introduction GP Stationarity description of spatial covariation Var. and Cov. Cov. Functions once we obtain the spatial covariation how to get Spatial Regression Classical Approach predicted values Bayesian Approach Basic Approach: given structure, predict. Practicum Gaussian Processes Spatial Regression

Alex Schmidt IM-UFRJ Brazil Before we proceed let us define Gaussian Processes Definition: PASI A function Y (.) taking values y(s) for s ∈ D has a June 2014 distribution with mean function m(.) ESDA EDA in geoR and function c(., .), denoted by Review Spatial Modelling Introduction Y (.) ∼ GP (m(.), c(., .)) GP Stationarity Var. and Cov. Cov. Functions if for any s1, ··· , sn ∈ D, and any n = 1, 2, ··· , the joint Spatial Regression Classical Approach distribution of Y (s1), ··· ,Y (sn) is multivariate Normal Bayesian Approach with parameters given by

E{Y (sj)} = m(sj) and Cov(Y (si),Y (sj)) = c(si, sj). Practicum Intrinsic Stationarity Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI It is defined through first differences: June 2014

ESDA E(Y (s + h) − Y (s)) = 0, EDA in geoR Review Spatial Var(Y (s + h) − Y (s)) = 2γ(h) Modelling Introduction GP The quantity 2γ(h) is known as the variogram. Stationarity Var. and Cov. γ(.) is known as the semi-variogram. Cov. Functions Spatial Regression In , 2γ(.) is treated as a parameter of the Classical Approach Bayesian Approach random process {Y (s): s ∈ D} (because it describes the covariance structure). Practicum Second Order Stationarity Spatial Regression

Alex Schmidt IM-UFRJ Brazil Statistically speaking, some further assumptions about Y PASI have to be made. Otherwise, the data represent an June 2014 incomplete sampling of a single realization, making ESDA inference impossible. EDA in geoR A random function Y (.) satisfying: Review Spatial Modelling Introduction GP E(Y (s)) = µ ∀s ∈ D Stationarity 0 0 0 Var. and Cov. Cov(Y (s) − Y (s )) = C(s − s ) ∀s, s ∈ D Cov. Functions Spatial Regression Classical Approach is defined to be second-order stationary. Furthermore, Bayesian Approach if C(s − s0) is a function only of || s − s0 || (it is not a function of the locations), then C(.) is said to be isotropic. Practicum Spatial Regression

Alex Schmidt IM-UFRJ Brazil Notice that a process which is second order stationary is also intrinsic stationary, the inverse is not necessarily true. PASI There is a stronger type of stationarity which is called June 2014 strict stationarity (joint probability distribution of the ESDA data depends only on the relative positions of the sites at EDA in geoR which the data were taken). Review Spatial Modelling If the random process Y (.) is Gaussian, we need only to Introduction GP specify its first-order and second-order properties, namely Stationarity Var. and Cov. its mean function and its . Cov. Functions Spatial Regression In practice, an assumption of second-order stationarity is Classical Approach Bayesian Approach often sufficient for inference purposes and it will be one of our basic assumptions. Practicum Covariogram and Spatial Regression

Alex Schmidt IM-UFRJ Brazil Covariogram and Correlogram If 0 0 PASI Cov(Y (s),Y (s )) = C(s − s ) June 2014 0 for all s, s ∈ D, C(.) is called the covariogram. ESDA If C(0) > 0, we can define EDA in geoR Review Spatial Modelling Introduction ρ(h) = C(h)/C(0) GP Stationarity Var. and Cov. as the correlogram. Cov. Functions Spatial Regression Properties: Classical Approach Bayesian Approach C(h) = C(−h) ρ(h) = ρ(−h) ρ(0) = 1 C(0) = V ar(Y (s)) if Y (.) is second order stationary. Practicum Relationships between covariogram Spatial Regression

Alex Schmidt and variogram IM-UFRJ Brazil Consider

0 0 PASI V ar(Y (s) − Y (s )) = V ar(Y (s)) + V ar(Y (s )) June 2014 0 −2Cov(Y (s),Y (s )) ESDA EDA in geoR If Y (.) is second order stationary, Review Spatial Modelling 0 0 Introduction V ar(Y (s) − Y (s )) = 2{C(0) − C(s − s )} GP Stationarity Var. and Cov. Cov. Functions If Y (.) is intrinsically stationary, Spatial Regression Classical Approach 2γ(h) = 2{C(0) − C(h)} Bayesian Approach

If C(h) → 0 as || h ||→ ∞ then 2γ(h) → 2C(0).(C(0) is the sill of the variogram). The variogram estimation is to be preferred to covariogram estimation. (See Cressie, p.70 for more details) Practicum Features of the Variogram Spatial Regression

Alex Schmidt IM-UFRJ Brazil γ(−h) = γ(h)

γ(0) = 0 PASI June 2014 If limh→0 γ(h) = c0 6= 0, then c0 is called the nugget effect. ESDA EDA in geoR Mathematically a nugget effect means: Review Spatial Modelling If Y is L2 continuous(processes Y (.) for which 2 Introduction E(Y (s + h) − Y (s)) → 0, as GP Stationarity || h ||→ 0), nugget effects cannot happen! Var. and Cov. Cov. Functions - So if continuity is assumed at the microscale (very Spatial Regression small h) in the Y (.) process, the only possible Classical Approach Bayesian Approach reason for c0 > 0 is measurement error. (Recall 2γ(0) is the variance of the difference between two measurements taken at exactly the same place).

In practice we only have data {y(si): i = 1, ··· , n} so we can’t say much for lags h < min{|| si − sj ||: 1 ≤ i < j ≤ n} Practicum Often in spatial prediction we assume nugget effect Spatial Regression is entirely due to measurement error. In other words, Alex Schmidt IM-UFRJ it is reasonable to envisage that a measured value at Brazil location x could be replicated, and that the resulting multiple values would not be identical. In this case, PASI an alternative is to model a ”white noise” zero-mean June 2014 process that adds extra variation to each ESDA observation, that is EDA in geoR Review Spatial Modelling Y (si) = S(si) + Zi, Introduction GP Stationarity Var. and Cov. where S(x) follows a Gaussian process with Cov. Functions 2 Spatial Regression covariance function γ(u) = σ ρ(u) such that Classical Approach Bayesian Approach ρ(0) = 1 and the Zi are mutually independent, N(0, τ 2) random variables. And the nuggect effect would be

2 2 2 2 2 2 ρY (u) = σ ρ(u)/(σ + τ ) → σ /(σ + τ ) < 1 as u → 0 Practicum Properties of the variogram Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI June 2014 (i) 2γ(.) continuous at origin implies Y (.) is L2 ESDA continuous EDA in geoR (ii) 2γ(h) does not approach 0 as h → origin implies Review Spatial Modelling Y (.) is not L2 continuous and is highly irregular Introduction GP Stationarity (iii) 2γ(.) is a positive constant (except at the origin Var. and Cov. 0 Cov. Functions where it is zero). Then Y (s) and Y (s ) are Spatial Regression 0 Classical Approach uncorrelated for any s 6= s , regardless of how close Bayesian Approach they are; Y (.) is often called white noise. Practicum Properties of the variogram (continuing) Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI (iv) 2γ(.) must be conditionally negative-definite, i.e. June 2014

n n ESDA X X EDA in geoR aiaj2γ(si − sj) ≤ 0 Review Spatial i=1 j=1 Modelling Introduction GP for any finite number of locations {s : i = 1, ··· n} Stationarity i Var. and Cov. and real numbers {a1, ··· , an} satisfying Cov. Functions n Spatial Regression P a = 0. Classical Approach i=1 i Bayesian Approach (v) 2γ(h)/ || h ||2→ 0 as || h ||→ ∞, i.e. 2γ(h) can’t increase too fast with || h ||2. Practicum Some Isotropic Parametric Covariance Spatial Regression

Alex Schmidt Functions IM-UFRJ Brazil

Most parametric variogram models used in practice will PASI include a nugget effect, and in the stationary case will June 2014 therefore be of the form ESDA EDA in geoR 2 2 Review Spatial 2γ(h) = τ + σ (1 − ρ(h)) Modelling Introduction GP ρ(h) must be a positive definite function. Also we would Stationarity Var. and Cov. usually require the model for the correlation function ρ(h) Cov. Functions Spatial Regression to incorporate the following features: Classical Approach Bayesian Approach 1. ρ(.) is monotone non-increasing in h; 2. ρ(h) → 0 as h → ∞; 3. at least one parameter in the model controls the rate at which ρ(h) decays to zero. Practicum Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI In addition we may wish to include in the model some June 2014

flexibility in the overall shape of the correlation function. ESDA Hence, a parametric model for the correlation function EDA in geoR Review Spatial can be expected to have one or two parameters, and a Modelling Introduction model for the variogram three or four (the two correlation GP Stationarity parameters plus the two variance components). Var. and Cov. Cov. Functions Spatial Regression Classical Approach Bayesian Approach Practicum How does a variogram usually look like? Spatial Regression

Alex Schmidt IM-UFRJ nugget effect - represents micros-cale variation or Brazil measurement error. It can be estimated from the

empirical variogram as the value of γ(h) for h = 0. PASI June 2014 sill - the limh→∞ γ(h) representing the variance of ESDA the random field. EDA in geoR range - the distance (if any) at which data are no Review Spatial Modelling longer autocorrelated. Introduction GP Stationarity Exemplo de Variograma Exemplo de Variograma Var. and Cov. Cov. Functions

1.2 1.2 Spatial Regression Classical Approach 1.0 1.0 Bayesian Approach 0.8 0.8 ) ) h h ( ( 0.6 0.6 γ γ 0.4 0.4 0.2 0.2 0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

distancia distancia Practicum Spatial Regression

Alex Schmidt IM-UFRJ The spherical family Brazil This one parameter family of correlation function is defined by PASI June 2014

 3 1 3 ESDA 1 − 2 (h/φ) + 2 (h/φ) ; 0 ≤ h ≤ φ ρ(h; φ) = EDA in geoR 0 ; h > φ Review Spatial Modelling Introduction Because the family depends only on a scale GP Stationarity parameter φ, it gives no flexibility in shape. Var. and Cov. Cov. Functions The spherical correlation function is continuous and Spatial Regression Classical Approach twice-differentiable at the origin. Bayesian Approach Therefore corresponds to a mean-square differentiable process Y (s). Practicum Spatial Regression

Alex Schmidt IM-UFRJ The powered exponential family Brazil This two parameter family is defined by

κ PASI ρ(h) = exp{−(h/φ) }, June 2014

ESDA with φ > 0 and 0 < κ ≤ 2. EDA in geoR The corresponding process Y (s) is mean-square Review Spatial Modelling continuous (but non differentiable) if κ < 2, but Introduction GP becomes mean-square infinitely differentiable if Stationarity Var. and Cov. κ = 2. Cov. Functions Spatial Regression The exponential correlation function corresponds to Classical Approach Bayesian Approach the case where κ = 1. The case κ = 2 is called the Gaussian correlation function. Practicum Spatial Regression

Alex Schmidt IM-UFRJ The Mat´ernfamily Brazil It is defined by PASI κ−1 −1 κ June 2014 ρ(h; φ; κ) = {2 Γ(κ)} (h/φ) Kκ(h/φ) ESDA EDA in geoR where (φ, κ) are parameters and Kκ(.) denotes the Review Spatial modified Bessel function of the third kind of order κ. Modelling Introduction This family is valid for any φ > 0 and κ > 0. GP Stationarity The case κ = 0.5 is the same as the exponential Var. and Cov. Cov. Functions correlation function, Spatial Regression Classical Approach ρ(h) = exp(−h/φ). Bayesian Approach The Gaussian correlation function is the limiting case as κ → ∞. Practicum Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI One attractive feature of this family is that the parameter June 2014 κ controls the differentiability of the underlying process ESDA Y (s) in a very direct way; the integer part of κ gives the EDA in geoR number of times that Y (s) is mean-square differentiable. Review Spatial Modelling The Mat´ernfamily is probably the best choice as a Introduction GP flexible, yet simple (only two parameters) correlation Stationarity Var. and Cov. function for general case. Cov. Functions Spatial Regression Classical Approach Bayesian Approach Practicum Spatial Regression

Gaussiana Exponencial Alex Schmidt

1.0 1.0 IM-UFRJ Brazil 0.8 0.8 0.6 0.6

Correlacao Correlacao PASI 0.4 0.4 June 2014 0.2 0.2 ESDA 0.0 0.0 EDA in geoR 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Distancia Distancia Review Spatial

Exponencial Potencia Matern Modelling Introduction 1.0 1.0 GP Stationarity 0.8

0.8 Var. and Cov. Cov. Functions 0.6 Spatial Regression 0.6 Classical Approach Correlacao Correlacao 0.4 Bayesian Approach 0.4 0.2 0.2 0.0

0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0

Distancia Distancia Practicum Spatial Regression

Alex Schmidt IM-UFRJ Brazil 1.0

gaussiana PASI exponencial June 2014 expo.potencia 0.8 matern ESDA EDA in geoR

0.6 Review Spatial Modelling

correlacao Introduction

0.4 GP Stationarity Var. and Cov. Cov. Functions

0.2 Spatial Regression Classical Approach Bayesian Approach 0.0

0.0 0.2 0.4 0.6 0.8 1.0

distancia Practicum Spatial Regression Spatial Regression

Alex Schmidt IM-UFRJ Generalized least squares (GLS) with known covariance Brazil matrix. PASI Model: June 2014 Y = Fβ + ,E() = 0, V ar() = V, ESDA where V is a completely specified positive definite EDA in geoR matrix. (e.g. V = σ2exp{−φd }, exponential Review Spatial ij ij Modelling 2 family with φ and σ known). Introduction GP Stationarity GLS estimator of β: Var. and Cov. ˆ 0 −1 −1 0 −1 Cov. Functions βGLS = (F V F) F V Y. Spatial Regression Classical Approach But in practice φ is unknown and consequently V cannot Bayesian Approach be completely specified. A natural solution is to replace φ in the evaluation of V by an estimator φ, thereby obtaining Vˆ = V(φˆ) (Estimated GLS). EGLS estimator of β: ˆ 0 ˆ −1 −1 0 ˆ −1 βGLS = (X V X) X V Y Practicum Spatial Regression Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI Classical Procedure: June 2014

1. Estimate β using ordinary least squares; ESDA EDA in geoR 2. Estimate residuals from this β estimate; Review Spatial Modelling 3. Calibrate the semi-variogram from the Introduction GP residuals; Stationarity Var. and Cov. Cov. Functions 4. Use the calibration of the semi-variogram to Spatial Regression Classical Approach estimate V; Bayesian Approach

5. Re-estimate β using βGLS. #removing some observed values for prediction predloc=c(2,10,85,91,102) prices2=prices[-predloc,]

#standard regression reg=lm(prices2[,1]~prices2[,2]+prices2[,3] +prices2[,4]+prices2[,6]) fitreg=summary(reg) fitreg

#Residual Analysis resid=matrix(c(prices2[,8],prices2[,9],fitreg$residuals), ncol=3,byrow=FALSE) georesid=as.geodata(resid,coords.col=c(1,2),data.col=3) plot(georesid) #variogram of the residuals # binned variogram vario.b <- variog(georesid, max.dist=1) # variogram cloud vario.c <- variog(georesid, max.dist=1, op="cloud") #binned variogram and stores the cloud vario.bc <- variog(georesid, max.dist=1, bin.cloud=TRUE) # smoothed variogram #vario.s <- variog(georesid, max.dist=1, op="sm", band=0.2) # plotting the variograms: par(mfrow=c(2,2)) plot(vario.c, main="variogram cloud") plot(vario.bc, bin.cloud=TRUE, main="clouds for binned variogram") plot(vario.b, main="binned variogram") #plot(vario.s, main="smoothed variogram") #OLS estimate of the variogram varioresid=variog(georesid,max.dist=0.08) #ols estimates of the variogram parameters olsvari=variofit(varioresid,ini=c(0.02,0.1)) summary(olsvari) par(mfrow=c(1,1)) plot(variog(georesid,max.dist=0.10)) lines(olsvari) #Envelopes for an empirical variogram by simulating #data for given model parameters. #Computes bootstrap paremeter estimates resid.mc.env <- variog.mc.env(georesid, obj.variog = varioresid) plot(varioresid,ylim=c(0,0.1)) lines(resid.mc.env)

#fitting the parameters of a variogram "by eye" eyefit(varioresid) #fitting a spatial regression model meantrend=trend.spatial(~sqft+age+bedrooms+dist_freeway, geoprices)

#a set of initial values #spatialreg1=likfit(geoprices,trend=meantrend, ini.cov.pars=c(0.15,0.1),nospatial=TRUE) load(file="spatialreg1.RData") #save("spatialreg1", file="spatialreg1.RData") summary(spatialreg1) #spatial interpolation for locations left out from inference names(prices) xp=prices[predloc,c(2:4,7)] meantrend.loc <- trend.spatial(~sqft+age+bedrooms+ dist_freeway, xp) kc <- krige.conv(geoprices, loc=locationpred, krige=krige.control(trend.d=meantrend, trend.l=meantrend.loc, obj.model=spatialreg1), output=output.control(n.pred=1000)) names(kc) dim(kc$simul) par(mfrow=c(3,2), mar=c(3,3,0.5,0.5)) for(i in 1:5){ hist(kc$simul[i,], main="", prob=T) lines(density(kc$simul[i,])) abline(v=prices[predloc,1][i], col=2) } Practicum Review of Spatial Regression

Alex Schmidt IM-UFRJ Brazil

Recall that PASI 1. S(.) is a stationary Gaussian process with June 2014 2 E[S(s)] = 0, V ar(S(s)) = σ and correlation ESDA function EDA in geoR Review Spatial ρ(φ; h) = Corr(S(s),S(s − h)); Modelling Introduction GP 2. The conditional distribution of Yi given S(.) is Stationarity 2 Var. and Cov. Gaussian with mean µ(xi) + S(xi) and variance τ ; Cov. Functions Pp Spatial Regression 3. µ(s) = βjfj(s) for known explanatory Classical Approach j=1 Bayesian Approach variables fj(.);

4. Yi : i = 1, 2, ··· , n are mutually independent, conditional on S(.). These assumptions imply that the joint distribution of Y Practicum is multivariate Normal, Spatial Regression Alex Schmidt 2 2 IM-UFRJ Y | θ ∼ Nn(Fβ, σ R + τ I) Brazil where:

PASI June 2014

θ = (β, σ2, γ, τ 2), ESDA EDA in geoR Review Spatial Modelling Introduction GP β = (β1, ··· , βp), Stationarity Var. and Cov. Cov. Functions Spatial Regression Classical Approach F is the n × p matrix with jth Bayesian Approach column fj(x1), ··· , fj(x),

I is the n × n identity matrix,

th R is the n × n matrix with (i, j) element ρ(uij; γ), where uij =|| xi − xj ||, the Euclidean distance between xi and xj, and γ are the parameters in the chosen parametric correlation function. Practicum Parameter Estimation Spatial Regression

Alex Schmidt IM-UFRJ In our model the vector of parameters is given by Brazil θ = (β, σ2, γ, τ 2). From Bayes’ Theorem, PASI June 2014

π(θ) ∝ p(θ) l(θ) ESDA EDA in geoR Review Spatial It follows that Modelling Introduction 2 2 2 2 −1/2 GP π(β, σ , γ, τ ) ∝ p(β, σ , γ, τ ) | V | Stationarity Var. and Cov.   Cov. Functions 1 0 −1 Spatial Regression exp − (y − Fβ) V (y − Fβ) (1) Classical Approach 2 Bayesian Approach

where V = σ2R + τ 2I. Usually we assume that the parameters are independent a priori, then

p(β, σ2, τ 2, γ) = p(β) p(σ2) p(τ 2) p(γ) Practicum Predictive Inference Spatial Regression

Alex Schmidt IM-UFRJ Brazil Notice that PASI Z June 2014 p(S | Y) = p(S | Y, θ) p(θ | Y)dθ ESDA EDA in geoR The Bayesian predictive distribution is an average of Review Spatial Modelling classical predictive distributions for particular values of θ, Introduction GP weighted according to the posterior distribution of θ. Stationarity Var. and Cov. The effect of the above averaging is typically to make the Cov. Functions Spatial Regression predictions more conservative, in the sense that the Classical Approach Bayesian Approach variance of the predictive distribution is usually expected to be larger than the variance of the distribution (S | Y, θˆ), obtained by plugging an estimate of θ into the classical predictive distribution. Practicum Prediction with uncertainty only in the mean Spatial Regression

Alex Schmidt parameter IM-UFRJ Brazil

PASI Assume σ2 and φ are known and τ 2 = 0. June 2014

Assume a Gaussian prior mean for the parameter ESDA β ∼ N(m , σ2V ), where σ2 is the (assumed known) EDA in geoR β β Review Spatial variance of S(x), the posterior is given by Modelling Introduction GP ˆ ˆ Stationarity β | Y ∼ N(β; Vβ) Var. and Cov. Cov. Functions Spatial Regression where Classical Approach Bayesian Approach βˆ = (V−1 + F0R−1F)−1(V−1m + F0R−1y) β β β and Vˆ = σ2(V−1 + F0R−1F)−1 β β Prediction of Y0 = S(x0) at an arbitrary location x0, Practicum Spatial Regression 2 p(Y0 | Y, σ , φ) = Alex Schmidt Z IM-UFRJ 2 2 Brazil p(Y0 | Y, β, σ , φ)p(β | Y, σ , φ)dβ

This predictive distribution is Gaussian with mean and PASI June 2014 variance given by ESDA 0 −1 −1 0 −1 −1 EDA in geoR E[Y0 | Y] = (F0 − r R F)(V + F R F) β Review Spatial −1 Modelling V mβ + Introduction β GP 0 −1 0 −1 Stationarity [r R + (F0 − r R F) Var. and Cov. Cov. Functions (V−1F0R−1F)−1F0R−1]Y Spatial Regression β Classical Approach Bayesian Approach 2 0 −1 V [Y0 | Y] = σ [R0 − r R r + (F − r0R−1F)(V−1 + F0R−1F)−1 0 β 0 −1 (F0 − r R F]. where r is the vector of correlations between S(x0) and S(xi): i = 1, ··· , n. Practicum Spatial Regression

The predictive variance has 3 components: the first and Alex Schmidt IM-UFRJ second components represent the marginal variance for Brazil Y0 and the variance reduction after observing Y, whilst the third component accounts for the additional PASI uncertainty due to the unknown value of β. This last June 2014 component reduces to zero if Vβ = 0, since this formally ESDA corresponds to β being known beforehand. EDA in geoR Review Spatial In the limit as all diagonal elements of Modelling Introduction Vβ → ∞, these formulae for the predictive mean and GP Stationarity variance correspond exactly to the universal kriging Var. and Cov. Cov. Functions predictor and its associated kriging variance, which in Spatial Regression Classical Approach turn reduce to the formulae for ordinary kriging if the Bayesian Approach mean value surface is assumed constant. ⇒ ordinary and universal kriging can be interpreted as Bayesian prediction under prior ignorance about the mean (and known σ2). Practicum Prediction with uncertainty in all model Spatial Regression

Alex Schmidt parameters IM-UFRJ Brazil In this case all the quantities in the model are considered unknown ⇒ need to specify their prior distribution. PASI Likelihood is like in equation (1); June 2014

ESDA Bayesian Model completed with prior distribution for EDA in geoR θ = (β, σ2, γ, τ 2) and assume, for example, Review Spatial 2 κ Modelling Vij = σ exp(−φ(dij) ); in this case γ = (φ, κ); Introduction GP 2 Stationarity p(β) ∼ Np(0, σ V ), for some fixed diagonal matrix Var. and Cov. β Cov. Functions V ; Spatial Regression β Classical Approach Bayesian Approach φ must be positive, then one possibility is p(φ) ∼ Ga(a2, b2);

κ could have an uniform prior on the interval [0.05, 2].

2 Usually we use p(σ ) ∼ IG(a1, b1) and 2 p(τ ) ∼ IG(a3, b3) (conjugacy of the normal-gamma) . Practicum Posterior distribution Spatial Regression

Alex Schmidt IM-UFRJ Brazil Assume θ = (β, σ2, φ, τ 2, κ) then

 1 PASI π(θ) ∝ | V(γ) |−1/2 exp − (y − Xβ)0V(γ)−1 June 2014 2 o ESDA (y − Xβ) EDA in geoR Review Spatial ( p ) Modelling 2 −n/2 1 X 2 Introduction ×(σ ) exp − (βi − µi) GP 2σ2 Stationarity i=1 Var. and Cov. Cov. Functions (a2−1) Spatial Regression ×(φ) exp {−b2φ} Classical Approach Bayesian Approach 2 −(a1+1)  2 ×(σ ) exp −b1/σ

2 −(a3+1)  2 ×(τ ) exp −b3/τ

⇒ Unknown distribution ⇒ need of approximation. Usually, we use MCMC methods. Practicum Predictive Inference Spatial Regression

Alex Schmidt IM-UFRJ Recall that Brazil Z p(S | Y) = p(S | Y, θ) p(θ | Y)dθ PASI June 2014

Suppose we can draw samples from the joint ESDA posterior distribution for θ, i.e EDA in geoR Review Spatial (1) (2) (L) Modelling θ , θ , ··· , θ ∼ π(θ) Introduction GP Stationarity Then Var. and Cov. Cov. Functions Z Spatial Regression Classical Approach p(S | Y) = p(S | Y, θ) p(θ | Y)dθ Bayesian Approach

L 1 X ≈ p(S | Y, θ(i)) L i=1 ↓ This is Monte Carlo integration Practicum Predictive Inference (Bayesian kriging) Spatial Regression

Alex Schmidt IM-UFRJ Brazil Prediction of Y (s0) at a new site s0 associated covariates x = x(s ) 0 0 PASI Predictive distribution: June 2014 Z ESDA p(y(s ) | y, X, X ) = p(y(s , θ | y, X, X )dθ EDA in geoR 0 0 0 0 Review Spatial Modelling Z Introduction = p(y(s0) | y, θ, X, X0)p(θ | y, X)dθ GP Stationarity Var. and Cov. Cov. Functions Spatial Regression p(y(s0) | y, θ, X, X0) is normal since Classical Approach Bayesian Approach p(y(s0), y | θ, X, X0) is! ⇒ easy Monte Carlo estimate using composition with Gibbs draws θ(1), ··· , θ(L): for each θ(l) drawn (l) from p(θ | y, X), draw Y (s0) from (l) p(y(s0) | y, θ , X, X0) Practicum Predictive Inference Spatial Regression

Alex Schmidt IM-UFRJ Suppose we want to predict at a set of m sites, say Brazil S0 = {s01, ··· , s0m} We could individually predict each site PASI June 2014 “independently” using method of the previous slide ESDA BUT joint prediction may be of interest, e.g., EDA in geoR bivariate predictive distributions to reveal pairwise Review Spatial Modelling dependence, to reflect posterior associations in the Introduction GP realized surface Stationarity Var. and Cov. Form the unobserved vector Cov. Functions Spatial Regression Y = (Y (s , ··· ,Y (s )), with X as covariate Classical Approach 0 01 0m 0 Bayesian Approach matrix for S0, and compute Z p(y0 | y, X, X0) = p(y0 | y, θ, X, X0)p(θ | y, X)dθ

Again, posterior sampling using composition sampling Practicum Main References Spatial Regression

Alex Schmidt IM-UFRJ Brazil

PASI June 2014

Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004) ESDA Hierarchical Modeling and Analysis of Spatial Data. EDA in geoR Review Spatial New York: Chapman and Hall Modelling Introduction Cressie, N.A. (1993) Statistics for Spatial Data. GP Stationarity John Wiley & son Inc. Var. and Cov. Cov. Functions Spatial Regression Diggle, P.J. e Ribeiro Jr., P.J. (2007) Model based Classical Approach Geostatistics. Springer Series in Statistics Bayesian Approach