Kriging: an Introduction to Concepts and Applications Nicholas M

Kriging: An Introduction to Concepts and Applications Nicholas M. Giner – Esri Agenda • What is interpolation? • Interpolation applications • Spatial autocorrelation • Deterministic vs. Geostatistical interpolators • Building up interpolation • Kriging theory • Empirical Bayesian Kriging (EBK) • EBK Regression • EBK 3D • Areal Interpolation What is interpolation? • Process of predicting values at unknown locations using values at known location • Transforms measurements of a continuous phenomenon into a continuous surface • Interpolation predicts within region; Extrapolation predicts outside region What is interpolation? Interpolation applications • Many continuous phenomena (z) - Elevation - Soil (pH, nutrient levels, porosity) - Precipitation / Snowfall - Temperature - Windspeed - Air pollution / Air quality - Ozone - Water quality - Mining - Heavy metal concentrations - Environmental contaminants - Noise - Disease occurrence Spatial autocorrelation • Tobler’s First Law of Geography - “…everything is related to everything else, but near things are more related than distant things” • O’Sullivan and Unwin, 2003 - “If geography is worth studying at all, it must be because phenomena do not vary randomly across space” Deterministic vs. Geostatistical interpolators • Deterministic interpolators - Based on mathematical functions, not statistical theory - Model parameters are determined by the user - Does not include randomness - No estimates of prediction error (uncertainty/accuracy/confidence) - Examples: Inverse Distance Weighting (IDW), Spline, Global Polynomial Interpolation • Geostatistical interpolators - Based on mathematical functions, AND statistical theory - Model parameters are estimated based on the data (spatial autocorrelation) - Includes randomness to approximate the variation present in geographic data - Produces estimates of prediction error (uncertainty/accuracy/confidence) - Example: Kriging Two components of all interpolators • Neighborhood definition – distance or number of points • Estimation function – mathematics used to make the estimation (e.g. determine the weights) Building up interpolation • Average of all data points: 49 Source: Geographic Information Analysis – O’Sullivan and Unwin Building up interpolation • Local spatial average: 40.75 - All points in the local neighborhood are weighted equally Building up interpolation • Inverse Distance Weighted (IDW): 41.01 - Closer points have higher weights and more influence Source: Geographic Information Analysis – O’Sullivan and Unwin Building up interpolation • Inverse Distance Weighted (IDW): 49.8 - More influence from points below simply because they are within the neighborhood and closer in distance Building up interpolation • Kriging: 56.2 - Prediction is based on how correlated points are based on distance - There can be negative weights Geostatistics and Kriging • Geostatistics - statistics of spatially correlated data • Quantify spatial autocorrelation and incorporate it into the interpolation • Kriging – “optimal” interpolator given that data meets certain conditions (assumptions) - Based on the foundational work by Daniel Krige and George Matheronin the 1950s-1960s predicting gold ores in South Africa - Main idea is that spatial data can be decomposed into two main components 1) Deterministic variation (global trend) • Can be constant mean or mathematical function 2) Spatially correlated, random variation (local autocorrelation) Z (s) = µ + ε(s) Prediction = mean + error What makes it “optimal”? • Estimates true value, on average (unbiased) • Lowest expected prediction error • Can use information about covariates • Can be generalized to different geometries • Estimates a prediction distribution at each location (not just one value) • Kriging assumptions - Normally distributed - No trends - Spatially autocorrelated - Stationary Kriging assumption: Normal distribution • If your input data is normally distributed, you can guarantee that your predicted distribution will be normally distributed • Many transformation options if not Histogram QQ Plot Kriging assumption: No trends • Systematic patterns and trends in an area might impact the interpolation • Trade-off with spatial autocorrelation Kriging assumption: Spatial autocorrelation • How correlated are points based on how far apart they are from one another • Once you know expected correlation in known values given distance, you can predict the value at unknown locations Kriging assumption: Stationarity • The correlation between points is defined only by the distance between them, not their location - Mean stationarity - Local stationarity Kriging workflow 1) Map your data 2) Exploratory 3) Variography – 4) Fit model – 5) Use model to Spatial Data Describe spatial Summarize spatial determine weights Analysis (ESDA) variation in the variation with a in search Configure options data math. function neighborhood 8) Repeat Steps 2-7 7) Evaluate 6) Interpolate (Cross-validation) Demo #1 Map the data, Geostatistical Wizard, ESDA, Configure options Variography (Modeling) • Examining and modeling spatial autocorrelation Variography (Modeling) 1) Calculate empirical semivariogram - Calculate distance and difference between each pair of points Semivariogram (distance h) = 0.5 * average (location i – location j)2 2) Bin the semivariogram - Group the pairs of locations into a specified range of distances (lags) 3) Average the semivariogram - Calculate the average distance and difference (semivariance) for each lag 4) Fit a model - Find the best fit line for the average semivariances Semivariogram • Represents the expected difference in data value for pairs of points that are a given distance apart, regardless of their spatial location Nugget – semivariance at 0 distance (measurement error) Range – distance at which autocorrelation falls off, where semivariance is constant, where there is no more spatial structure in the data. Points are uncorrelated after the range. (data correlation) Sill – constant semivariance value beyond the range (data variance) Demo #2 Simple kriging Validation • Full validation - Split data into ~80% training, ~20% testing • Cross-validation (“Leave-one out”) - Remove a single known point, use all remaining points to interpolate at that location, then compare measured value to predicted value • Diagnostics - Predictions should be unibiased (e.g. over- and under-predictions should cancel each other out) - Mean Error should be near zero (unbiased) - Mean Standardized Error should be near zero - Predictions should be closed to known values - Root Mean Square Error (RMSE) should be as small as possible - Assessment of model stability and accuracy of standard errors - Root Mean Square Standardized should be close to 1 - Average Standard Error close to RMSE Empirical Bayesian Kriging (EBK) • Automates the most difficult aspects of building a valid kriging model • Not as many parameters • Relaxes the stationarity assumption of kriging • More accurate estimates of prediction standard errors • Handles uncertainty associated with one semivariogram (true) How EBK works 1. Divide data into local subsets of a given size (can overlap) 2. For each subset, estimate the semivariogram 3. Use this semivariogram to simulate a new set of values for the points (sim #1) 4. Produce a semivariogram from the simulated points (semiv #1) 5. Repeat step 3 many times, resulting in a distribution of semivariograms 6. Mix the local prediction surfaces together to get the final surface Demo #3 EBK EBK Regression Prediction • Combines regression with kriging • Allows covariates (explanatory variables to improve predictions) • Both regression models and kriging models are estimated locally • Uses Principal Components Analysis (PCA) Kriging Regression Kriging Prediction = mean + error Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error • Mean is constant and error term is estimated • Regression equation estimates the mean for kriging from surrounding points • Error is modeled with the semivariogram, and kriging is performed • Estimation focuses on the error terms, and does little with the mean • If semivariogram is flat, you essentially have OLS • If there are no explanatory variables, you essentially have simple kriging Regression (OLS) Prediction (DV) = intercept + (v1 * coef1) + (v2 * coef2) +… (vk * coefk) + error • Error term is assumed to be random noise (unmodellable) • Estimation focuses on the mean, and does little with the error terms Demo #4 EBK Regression EBK 3D • Applies the EBK model to 3D - Distances are calculated using 3D Euclidean Distance - Subsets are created in 3D - Search neighborhoods are 3D - Vertical trend can be removed • Elevation Inflation Factor - Vertical variation happens at ta different rate than horizontal variation Demo #5 EBK 3D Areal Interpolation • Applies kriging theory to polygon data • Two main use cases - Fill missing data - Downscale from larger polygons to smaller polygons • Three data inputs - Average (Gaussian) - Rate (Binomial) - Count (Poisson) Demo #5 Areal Interpolation Print Your Certificate of Attendance Print Stations Located in 150 Concourse Lobby Tuesday Wednesday 12:30 pm – 6:30 pm 10:45 am – 5:15 pm Expo Expo Hall B Hall B 5:15 pm – 6:30 pm 6:30 pm – 9:30 pm Expo Social Networking Reception Hall B Smithsonian National Museum of Natural History Please Share Your Feedback in the App Download the Esri Select the session you Log in to access the Complete the survey Events app and find Scroll down to “Survey” attended survey and select “Submit” your event.

Load more