FedGIS Conference February 24–25, 2016 | Washington, DC

Getting Started with Geostatistics

Brett Rose Objectives

• Using as an integrating factor.

• Introduce a variety of core tools available in GIS.

• Demonstrate the utility of spatial & Geo for a of

applications.

• Modeling relationships and driving decisions Outline

• Spatial Statistics vs Geostatistics • Geostatistical Workflow • Exploratory Spatial Analysis • Interpolation methods • Deterministic vs Geostatistical •Geostatistical Analyst in ArcGIS • Demos • Class Exercise In ArcGIS Geostatistical Analyst • interactive ESDA • interactive modeling including variography • many models (6 + cokriging) • pre-processing of data - Decluster, detrend, transformation • model diagnostics and comparison Spatial Analyst • rich set of tools to perform cell-based (raster) analysis • kriging (2 models), IDW, nearest neighbor … Spatial Statistics • ESDA GP tools • analyzing the distribution of geographic features • identifying spatial patterns What are spatial statistics in a GIS environment? • Software-based tools, methods, and techniques developed specifically for use with geographic data. • Spatial statistics: – Describe and spatial distributions, spatial patterns, spatial processes model , and spatial relationships. – Incorporate (area, length, proximity, orientation, and/or spatial relationships) directly into their mathematics.

In many ways spatial statistics extend what the eyes and mind do intuitively to assess spatial patterns, trends and relationships.

5 Toolsets and tools for Spatial Statistics

• Core functionality with ArcGIS 10 • Most tools delivered with their source code. • Most tools available at all license levels. Focus on 4 toolsets: 1. Analyzing Patterns 2. Mapping Clusters 3. Measuring Geographic Distributions 4. Modeling Spatial Relationships What is geostatistics? The statistics of spatially correlated data Semivariogram Semivariance

Sill

Nugget Distance between samples Range Geostatistical Analysis … • Exploratory Spatial Data Analysis • , QQplot, Trend Analysis, … • Deterministic interpolation • IDW, GPI, RBF, LPI • Geostatistical Interpolation • Kriging / CoKriging • Ordinary, Simple, Universal, Indicator, Probability, Disjunctive • Interpolation in the presence of barriers • Kernel Smoothing, Diffusion Kernel • Network Design • Geostatistical Simulation • … Why Spatial Statistics/Geostatistics? Proof is required in the decision making process to ensure that the problem and data are accurately described and that the predictions are made correctly. Geostatistical Analyst - Example Where is it used ? Where is it used ? Where is it used ? Where is it used ?

Geostatistical Analyst application for radiocesium contamination threshold Where is it used ?

Geostatistical Analyst application for organic matter in Illinois Where is it used ?

Geostatistical Analyst application for ozone in California Where is it used ? Geostatistical Workflow Map the data Exploratory Spatial Data Analysis (ESDA)

• Examine the distribution of your data

• Look for global and local outliers

• Look for global trends

• Examine local variation

• Examine spatial Exploratory Spatial Data Analysis

The first step in statistical data analysis is to verify three data features: dependency, stationarity, and distribution.

With information on dependency, stationarity, and distribution you can proceed to the modeling step of the geostatistical data analysis, kriging. Exploratory Spatial Data Analysis

Graphs interact with each other Additional tools inside Wizard: and with the map:

• Variography • Trend analysis

• Distribution analysis • Distribution modeling

• Identification of global trends • Variography

• Analysis of data stationarity • Declustering and variability • Search neighborhood • Cross-correlation Exploratory Spatial Data Analysis The probable origin of the arsenic lies in the outcrops of hard rocks higher up the Ganges River catchments that were eroded in the recent geological past and redeposited in Bangladesh by ancient courses of the Ganges River. ESDA Tools • Histogram

• Nomal QQ Plot and General QQ Plot

• Trend Analysis

• Voronoi Map

• Semivariogram/ cloud

• Crosscovariance cloud Geostatistical Analyst Methods for Spatial Interpolation • Deterministic – Inverse Distance Weighted – Global Polynomial – Local Polynomial – Radial Basis Functions • Thin plate spline, Spline with tension, Multiquadratic, Inverse multiquadratic, Completely regularized spline kernels

• Geostatistical – Kriging – Cokriging Global Polynomial When to use GPI

• Fitting a surface to the sample points when the surface varies slowly from region to region over the area of interest (for example, pollution over an industrial area). • Examining and/or removing the effects of long-range or global trends. In such circumstances, the technique is often referred to as trend surface analysis. Inverse Distance Weighted (IDW)

IDW is an exact interpolator. The output surface is sensitive to clustering and the presence of outliers. IDW assumes that the phenomenon being modeled is driven by local variation, which can be captured (modeled) by defining an adequate search neighborhood. Since IDW does not provide prediction standard errors, justifying the use of this model may be problematic. Local Polynomial Interpolation (LPI) When to use LPI

• Useful for creating smooth surfaces and identifying long-range trends • In earth sciences our variables often have short- range variation and long-range trends • LPI can capture short-range variation Note: LPI is sensitive to neighborhood distance and small searching areas may create empty areas. •Thin-plate spline Radial Basis Functions •Spline with tension •Completely regularized spline •Multiquadric function •Inverse multiquadric function

IDW IDW

RBF RBF When to use RBF

• For creating smooth surfaces from large datasets • Gently varying surfaces such as elevation • Inappropriate for large changes in surface values that occur within a short distance. Geostatistical Analyst

Exploratory Spatial Data Analysis - ESDA

• Outliers • Incorrect data or the most influential data ? • Spatial Dependency • If not, then why use Geostatistics ? • Distribution • How close to Gaussian distribution ? • Stationarity • Data preprocessing if non-stationary Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) Geostats Toolbox Geoprocessing tools

Previously only available through the Wizard or from the GA toolbar New features What is new in 10 Improvements to the Wizard

• Overall window can be resized • Individual panels can be resized • New dialog layout and functionality • Model parameter optimization • Variography dialog includes average pairs • Parameter help is available • on the dialog • compiled help. Wizard demo

Resizable Help Local Polynomial Interpolation (LPI)

• Prediction • Prediction standard errors – new

• indicate the uncertainty associated with the value predicted for each location • Spatial condition numbers – new

• measure of how stable or unstable the solution of the prediction equations is for a specific location • Different kernel functions - new • Geoprocessing tool - new Demo : LPI of prediction Condition number What is kriging?

• A weighted, moving-average estimation technique based on Geostatistics that uses the spatial correlation of point measurements to estimate values at adjacent, unmeasured points

• Associates uncertainty with the predictions Correlation

Distance Interpolation with barriers Interpolation with barriers

? Interpolation with barriers Demo : Interpolation with Barriers

Diffusion Kernel GA’s IDW •Create Spatially Balanced Points - based on a priori inclusion probabilities - output is spatially balanced - the spatial dependence between samples is minimized.

•Densify Sampling Network - based on a predefined geostatistical kriging layer - uses, inter alia, the Std Error of Prediction surface to determine where new locations are required or which can be removed. Demo : Sampling Network Design

Densify Sampling Network Create Spatially Balanced Points Subset Features …partitioned by generating random values from a uniform [0,1] distribution

"NBRTYPE=Smooth S_MAJOR=1 S_MINOR=1 ANGLE=0 SMOOTH_FACTOR=0" Cross Validation How good are our predictions?

|Observed – Predicted|

Cross validation does not prove that the model is correct, merely that it is not grossly incorrect [Cressie, 1990]. ModelBuilder

Ozone

IDW GA lyr

For PowerValue

Root Cross Validation Square Standardized

Collect Values

Output RMS Moving Window Kriging Output = Points Output = Contours Output = Raster Output = Prediction, Prediction SE, Probability, Quantile, Condition number Large input datasets

• LPI and IDW can now manage very large input datasets. - IDW with roughly 2 billion input points - contained in more than 400,000 multipoints - output raster of 250 by 250 columns and rows in 20 hours.