Kriging: an Introduction to Concepts and Applications Eric Krause Prahlad Jat Sessions of Note… Tuesday • Interpolating Surfaces in Arcgis (1:00-2:00 SDCC Rm33c)

Total Page:16

File Type:pdf, Size:1020Kb

Kriging: an Introduction to Concepts and Applications Eric Krause Prahlad Jat Sessions of Note… Tuesday • Interpolating Surfaces in Arcgis (1:00-2:00 SDCC Rm33c) Kriging: An Introduction to Concepts and Applications Eric Krause Prahlad Jat Sessions of note… Tuesday • Interpolating Surfaces in ArcGIS (1:00-2:00 SDCC Rm33C) • Kriging: An Introduction to Concepts and Applications (2:30-3:30 SDCC Rm33C) • Geostatistical Analyst: An Introduction (4:00-5:00 SDCC Rm30C) Wednesday • Surface Interpolation in ArcGIS (11:15-12:00 SDCC Demo Theater 10) • Empirical Bayesian Kriging and EBK Regression Prediction in ArcGIS (2:30-3:15 SDCC Demo Theater 10) Thursday • Geostatistics in Practice: Learning Interpolation Through Examples (8:30-9:30 SDCC Rm30A) • Polygon-to-Polygon Predictions Using Areal Interpolation (11:15-12:00 SDCC Demo Theater 10) • Geostatistical Analyst: An Introduction (1:00-2:00 SDCC Rm30A) • Using Living Atlas Data and ArcGIS Pro for 3D Interpolation (2:30-3:30 SDCC Rm 30C) • Interpolating Surfaces in ArcGIS (4:00-5:00 SDCC Rm15A) • Kriging: An Introduction to Concepts and Applications (4:00-5:00 SDCC Rm15B) 2 Geostatistical Analyst Resources http://esriurl.com/GeostatGetStarted • GeoNet – community.esri.com - Blogs - Free textbook and datasets - Spatial Statistical Analysis For GIS Users - Lots of discussions and Q&A • Learn GIS – learn.arcgis.com - Model Water Quality Using Interpolation - Analyze Urban Heat Using Kriging - Interpolate 3D Oxygen Measurements in Monterey Bay Outline • Introduction to interpolation • Introduction to kriging • Validating interpolation results • Empirical Bayesian Kriging and EBK Regression Prediction • Empirical Bayesian Kriging 3D • Areal Interpolation • Questions What is interpolation? • Predict values at unknown locations using values at measured locations • Many interpolation methods: kriging, IDW, LPI, etc What is autocorrelation? Tobler’s first law of geography: "Everything is related to everything else, but near things are more related than distant things." Building up interpolation • Given the data on the right, how would you predict the value of the red point? • Simplest answer: guess the average of all the points: 49 - Ignores spatial information Building up interpolation • Next best guess: average of a local neighborhood • Average of points within 12km buffer: 40.75 - Better, but points that are closer should have more influence Building up interpolation • Next best guess: Weight points by inverse distance • Weighted average of points within 12km buffer: 41.01 - This seems like a reasonable guess with no obvious problems Building up interpolation • What about this point? • More points below than above Building up interpolation • What about this point? • More points below than above • Inverse-distance weighted average in 40km buffer: 49.8 Building up interpolation • What about this point? • More points below than above • Kriging prediction: 56.2 - Some weights are even negative Semivariogram What is kriging? • Kriging is the optimal interpolation method if the data meets certain conditions. • What are these conditions? - Normally distributed - Stationary - No trends • How do I check these conditions? - Exploratory Analysis and charting What is an “optimal” interpolator? • Estimates the true value, on average • Lowest expected prediction error • Able to use extra information, such as covariates • Can be generalized to polygons (Areal interpolation, Geostatistical simulations) • Estimates a prediction distribution, not just a single prediction Kriging output surface types Prediction Error of Predictions Probability Quantile Kriging workflow 1. Explore the data – check kriging assumptions 2. Configure options: trend removal, transformations, etc 3. Estimate a semivariogram model 4. Validate the results 5. Repeat steps 2-4 as necessary 6. Map the data for decision-making Does my data follow a normal distribution? • Histogram chart - Symmetric and bell-shaped - Look for outliers - Mean ≈ Median • What can I do if my data is not normally distributed? - Apply a transformation - Log, Box Cox, Arcsin, Normal Score Transformation Why the normal distribution is important • Kriging predicts mean and standard error at every location Why the normal distribution is important • Kriging predicts mean and standard error at every location Skewed distributions and outliers Skewed Distribution Outlier Assessing models using cross validation • Used to determine the reliability of the model - Iteratively discard each sample - Use remaining points to estimate value at measured location - Compare predicted versus measured value • Calculates various statistics - Root-mean-square : root of average squared deviation from true value - Smaller is better - Mean : the average of the deviations - Should be close to zero - Root-mean-square standardized : measures whether standard errors are estimated correctly - Should be close to one - Average standard error : should be small and close to the root-mean-square Kriging in the Geostatistical Wizard Demo Empirical Bayesian Kriging • Advantages - Requires minimal interactive modeling, spatial relationships are modeled automatically - Usually more accurate, especially for small or nonstationary datasets - Uses local models to capture small scale effects - Doesn’t assume one model fits the entire data - Standard errors of prediction are more accurate than other kriging methods • Disadvantages - Processing is slower than other kriging methods - Limited customization How does EBK work? 1. Divide the data into subsets of a given size - Controlled by “Subset Size” parameter - Subsets can overlap, controlled by “Overlap Factor” 2. For each subset, estimate the semivariogram 3. Simulate data at input point locations and estimate new semivariogram 4. Repeat step 3 many times. This results in a distribution of semivariograms 5. Mix the local surfaces together to get the final surface. EBK Regression Prediction • Allows you to use explanatory variable rasters to improve predictions • Automatically extracts useful information from explanatory variables • Uses Principle Components to handle multicollinearity EBK and EBK Regression Prediction Demo Empirical Bayesian Kriging 3D • New in ArcGIS Pro 2.3 • Interpolate points measured in 3D using EBK • Visualize 2D transects using Range Slider • Export to rasters, contours, and 3D points EBK 3D Demo Areal Interpolation Obesity by school zone Obesity surface and Obesity by census block error surface • Predict data in a different geometry - School zones to census tracts • Estimate values for missing data Areal Interpolation Workflow .
Recommended publications
  • Mathematical Construction of Interpolation and Extrapolation Function by Taylor Polynomials Arxiv:2002.11438V1 [Math.NA] 26 Fe
    Mathematical Construction of Interpolation and Extrapolation Function by Taylor Polynomials Nijat Shukurov Department of Engineering Physics, Ankara University, Ankara, Turkey E-mail: [email protected] , [email protected] Abstract: In this present paper, I propose a derivation of unified interpolation and extrapolation function that predicts new values inside and outside the given range by expanding direct Taylor series on the middle point of given data set. Mathemati- cal construction of experimental model derived in general form. Trigonometric and Power functions adopted as test functions in the development of the vital aspects in numerical experiments. Experimental model was interpolated and extrapolated on data set that generated by test functions. The results of the numerical experiments which predicted by derived model compared with analytical values. KEYWORDS: Polynomial Interpolation, Extrapolation, Taylor Series, Expansion arXiv:2002.11438v1 [math.NA] 26 Feb 2020 1 1 Introduction In scientific experiments or engineering applications, collected data are usually discrete in most cases and physical meaning is likely unpredictable. To estimate the outcomes and to understand the phenomena analytically controllable functions are desirable. In the mathematical field of nu- merical analysis those type of functions are called as interpolation and extrapolation functions. Interpolation serves as the prediction tool within range of given discrete set, unlike interpola- tion, extrapolation functions designed to predict values out of the range of given data set. In this scientific paper, direct Taylor expansion is suggested as a instrument which estimates or approximates a new points inside and outside the range by known individual values. Taylor se- ries is one of most beautiful analogies in mathematics, which make it possible to rewrite every smooth function as a infinite series of Taylor polynomials.
    [Show full text]
  • On Multivariate Interpolation
    On Multivariate Interpolation Peter J. Olver† School of Mathematics University of Minnesota Minneapolis, MN 55455 U.S.A. [email protected] http://www.math.umn.edu/∼olver Abstract. A new approach to interpolation theory for functions of several variables is proposed. We develop a multivariate divided difference calculus based on the theory of non-commutative quasi-determinants. In addition, intriguing explicit formulae that connect the classical finite difference interpolation coefficients for univariate curves with multivariate interpolation coefficients for higher dimensional submanifolds are established. † Supported in part by NSF Grant DMS 11–08894. April 6, 2016 1 1. Introduction. Interpolation theory for functions of a single variable has a long and distinguished his- tory, dating back to Newton’s fundamental interpolation formula and the classical calculus of finite differences, [7, 47, 58, 64]. Standard numerical approximations to derivatives and many numerical integration methods for differential equations are based on the finite dif- ference calculus. However, historically, no comparable calculus was developed for functions of more than one variable. If one looks up multivariate interpolation in the classical books, one is essentially restricted to rectangular, or, slightly more generally, separable grids, over which the formulae are a simple adaptation of the univariate divided difference calculus. See [19] for historical details. Starting with G. Birkhoff, [2] (who was, coincidentally, my thesis advisor), recent years have seen a renewed level of interest in multivariate interpolation among both pure and applied researchers; see [18] for a fairly recent survey containing an extensive bibli- ography. De Boor and Ron, [8, 12, 13], and Sauer and Xu, [61, 10, 65], have systemati- cally studied the polynomial case.
    [Show full text]
  • Easy Way to Find Multivariate Interpolation
    IJETST- Vol.||04||Issue||05||Pages 5189-5193||May||ISSN 2348-9480 2017 International Journal of Emerging Trends in Science and Technology IC Value: 76.89 (Index Copernicus) Impact Factor: 4.219 DOI: https://dx.doi.org/10.18535/ijetst/v4i5.11 Easy way to Find Multivariate Interpolation Author Yimesgen Mehari Faculty of Natural and Computational Science, Department of Mathematics Adigrat University, Ethiopia Email: [email protected] Abstract We derive explicit interpolation formula using non-singular vandermonde matrix for constructing multi dimensional function which interpolates at a set of distinct abscissas. We also provide examples to show how the formula is used in practices. Introduction Engineers and scientists commonly assume that past and currently. But there is no theoretical relationships between variables in physical difficulty in setting up a frame work for problem can be approximately reproduced from discussing interpolation of multivariate function f data given by the problem. The ultimate goal whose values are known.[1] might be to determine the values at intermediate Interpolation function of more than one variable points, to approximate the integral or to simply has become increasingly important in the past few give a smooth or continuous representation of the years. These days, application ranges over many variables in the problem different field of pure and applied mathematics. Interpolation is the method of estimating unknown For example interpolation finds applications in the values with the help of given set of observations. numerical integrations of differential equations, According to Theile Interpolation is, “The art of topography, and the computer aided geometric reading between the lines of the table” and design of cars, ships, airplanes.[1] According to W.M.
    [Show full text]
  • An Introduction to Spatial Autocorrelation and Kriging
    An Introduction to Spatial Autocorrelation and Kriging Matt Robinson and Sebastian Dietrich RenR 690 – Spring 2016 Tobler and Spatial Relationships • Tobler’s 1st Law of Geography: “Everything is related to everything else, but near things are more related than distant things.”1 • Simple but powerful concept • Patterns exist across space • Forms basic foundation for concepts related to spatial dependency Waldo R. Tobler (1) Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240. Spatial Autocorrelation (SAC) – What is it? • Autocorrelation: A variable is correlated with itself (literally!) • Spatial Autocorrelation: Values of a random variable, at paired points, are more or less similar as a function of the distance between them 2 Closer Points more similar = Positive Autocorrelation Closer Points less similar = Negative Autocorrelation (2) Legendre P. Spatial Autocorrelation: Trouble or New Paradigm? Ecology. 1993 Sep;74(6):1659–1673. What causes Spatial Autocorrelation? (1) Artifact of Experimental Design (sample sites not random) Parent Plant (2) Interaction of variables across space (see below) Univariate case – response variable is correlated with itself Eg. Plant abundance higher (clustered) close to other plants (seeds fall and germinate close to parent). Multivariate case – interactions of response and predictor variables due to inherent properties of the variables Eg. Location of seed germination function of wind and preferred soil conditions Mechanisms underlying patterns will depend on study system!! Why is it important? Presence of SAC can be good or bad (depends on your objectives) Good: If SAC exists, it may allow reliable estimation at nearby, non-sampled sites (interpolation).
    [Show full text]
  • Spatial Interpolation Methods
    Page | 0 of 0 SPATIAL INTERPOLATION METHODS 2018 Page | 1 of 1 1. Introduction Spatial interpolation is the procedure to predict the value of attributes at unobserved points within a study region using existing observations (Waters, 1989). Lam (1983) definition of spatial interpolation is “given a set of spatial data either in the form of discrete points or for subareas, find the function that will best represent the whole surface and that will predict values at points or for other subareas”. Predicting the values of a variable at points outside the region covered by existing observations is called extrapolation (Burrough and McDonnell, 1998). All spatial interpolation methods can be used to generate an extrapolation (Li and Heap 2008). Spatial Interpolation is the process of using points with known values to estimate values at other points. Through Spatial Interpolation, We can estimate the precipitation value at a location with no recorded data by using known precipitation readings at nearby weather stations. Rationale behind spatial interpolation is the observation that points close together in space are more likely to have similar values than points far apart (Tobler’s Law of Geography). Spatial Interpolation covers a variety of method including trend surface models, thiessen polygons, kernel density estimation, inverse distance weighted, splines, and Kriging. Spatial Interpolation requires two basic inputs: · Sample Points · Spatial Interpolation Method Sample Points Sample Points are points with known values. Sample points provide the data necessary for the development of interpolator for spatial interpolation. The number and distribution of sample points can greatly influence the accuracy of spatial interpolation.
    [Show full text]
  • Spatial Autocorrelation: Covariance and Semivariance Semivariance
    Spatial Autocorrelation: Covariance and Semivariancence Lily Housese ­P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart
    [Show full text]
  • Comparison of Gap Interpolation Methodologies for Water Level Time Series Using Perl/Pdl
    Revista de Matematica:´ Teor´ıa y Aplicaciones 2005 12(1 & 2) : 157–164 cimpa – ucr – ccss issn: 1409-2433 comparison of gap interpolation methodologies for water level time series using perl/pdl Aimee Mostella∗ Alexey Sadovksi† Scott Duff‡ Patrick Michaud§ Philippe Tissot¶ Carl Steidleyk Received/Recibido: 13 Feb 2004 Abstract Extensive time series of measurements are often essential to evaluate long term changes and averages such as tidal datums and sea level rises. As such, gaps in time series data restrict the type and extent of modeling and research which may be accomplished. The Texas A&M University Corpus Christi Division of Nearshore Research (TAMUCC-DNR) has developed and compared various methods based on forward and backward linear regression to interpolate gaps in time series of water level data. We have developed a software system that retrieves actual and harmonic water level data based upon user provided parameters. The actual water level data is searched for missing data points and the location of these gaps are recorded. Forward and backward linear regression are applied in relation to the location of missing data or gaps in the remaining data. After this process is complete, one of three combinations of the forward and backward regression is used to fit the results. Finally, the harmonic component is added back into the newly supplemented time series and the results are graphed. The software created to implement this process of linear regression is written in Perl along with a Perl module called PDL (Perl Data Language). Generally, this process has demonstrated excellent results in filling gaps in our water level time series.
    [Show full text]
  • Augmenting Geostatistics with Matrix Factorization: a Case Study for House Price Estimation
    International Journal of Geo-Information Article Augmenting Geostatistics with Matrix Factorization: A Case Study for House Price Estimation Aisha Sikder ∗ and Andreas Züfle Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA 22030, USA; azufl[email protected] * Correspondence: [email protected] Received: 14 January 2020; Accepted: 22 April 2020; Published: 28 April 2020 Abstract: Singular value decomposition (SVD) is ubiquitously used in recommendation systems to estimate and predict values based on latent features obtained through matrix factorization. But, oblivious of location information, SVD has limitations in predicting variables that have strong spatial autocorrelation, such as housing prices which strongly depend on spatial properties such as the neighborhood and school districts. In this work, we build an algorithm that integrates the latent feature learning capabilities of truncated SVD with kriging, which is called SVD-Regression Kriging (SVD-RK). In doing so, we address the problem of modeling and predicting spatially autocorrelated data for recommender engines using real estate housing prices by integrating spatial statistics. We also show that SVD-RK outperforms purely latent features based solutions as well as purely spatial approaches like Geographically Weighted Regression (GWR). Our proposed algorithm, SVD-RK, integrates the results of truncated SVD as an independent variable into a regression kriging approach. We show experimentally, that latent house price patterns learned using SVD are able to improve house price predictions of ordinary kriging in areas where house prices fluctuate locally. For areas where house prices are strongly spatially autocorrelated, evident by a house pricing variogram showing that the data can be mostly explained by spatial information only, we propose to feed the results of SVD into a geographically weighted regression model to outperform the orginary kriging approach.
    [Show full text]
  • Bayesian Interpolation of Unequally Spaced Time Series
    Bayesian interpolation of unequally spaced time series Luis E. Nieto-Barajas and Tapen Sinha Department of Statistics and Department of Actuarial Sciences, ITAM, Mexico [email protected] and [email protected] Abstract A comparative analysis of time series is not feasible if the observation times are different. Not even a simple dispersion diagram is possible. In this article we propose a Gaussian process model to interpolate an unequally spaced time series and produce predictions for equally spaced observation times. The dependence between two obser- vations is assumed a function of the time differences. The novelty of the proposal relies on parametrizing the correlation function in terms of Weibull and Log-logistic survival functions. We further allow the correlation to be positive or negative. Inference on the model is made under a Bayesian approach and interpolation is done via the posterior predictive conditional distributions given the closest m observed times. Performance of the model is illustrated via a simulation study as well as with real data sets of temperature and CO2 observed over 800,000 years before the present. Key words: Bayesian inference, EPICA, Gaussian process, kriging, survival functions. 1 Introduction The majority of the literature on time series analysis assumes that the variable of interest, say Xt, is observed on a regular or equally spaced sequence of times, say t 2 f1; 2;:::g (e.g. Box et al. 2004; Chatfield 1989). Often, time series are observed at uneven times. Unequally spaced (also called unevenly or irregularly spaced) time series data occur naturally in many fields. For instance, natural disasters, such as earthquakes, floods and volcanic eruptions, occur at uneven intervals.
    [Show full text]
  • Multivariate Lagrange Interpolation 1 Introduction 2 Polynomial
    Alex Lewis Final Project Multivariate Lagrange Interpolation Abstract. Explain how the standard linear Lagrange interpolation can be generalized to construct a formula that interpolates a set of points in . We will also provide examples to show how the formula is used in practice. 1 Introduction Interpolation is a fundamental topic in Numerical Analysis. It is the practice of creating a function that fits a finite sampled data set. These sampled values can be used to construct an interpolant, which must fit the interpolated function at each date point. 2 Polynomial Interpolation “The Problem of determining a polynomial of degree one that passes through the distinct points and is the same as approximating a function for which and by means of a first degree polynomial interpolating, or agreeing with, the values of f at the given points. We define the functions and , And then define We can generalize this concept of linear interpolation and consider a polynomial of degree n that passes through n+1 points. In this case we can construct, for each , a function with the property that when and . To satisfy for each requires that the numerator of contains the term To satisfy , the denominator of must be equal to this term evaluated at . Thus, This is called the Th Lagrange Interpolating Polynomial The polynomial is given once again as, Example 1 If and , then is the polynomial that agrees with at , and ; that is, 3 Multivariate Interpolation First we define a function be an -variable multinomial function of degree . In other words, the , could, in our case be ; in this case it would be a 3 variable function of some degree .
    [Show full text]
  • Geostatistical Approach for Spatial Interpolation of Meteorological Data
    Anais da Academia Brasileira de Ciências (2016) 88(4): 2121-2136 (Annals of the Brazilian Academy of Sciences) Printed version ISSN 0001-3765 / Online version ISSN 1678-2690 http://dx.doi.org/10.1590/0001-3765201620150103 www.scielo.br/aabc Geostatistical Approach for Spatial Interpolation of Meteorological Data DERYA OZTURK1 and FATMAGUL KILIC2 1Department of Geomatics Engineering, Ondokuz Mayis University, Kurupelit Campus, 55139, Samsun, Turkey 2Department of Geomatics Engineering, Yildiz Technical University, Davutpasa Campus, 34220, Istanbul, Turkey Manuscript received on February 2, 2015; accepted for publication on March 1, 2016 ABSTRACT Meteorological data are used in many studies, especially in planning, disaster management, water resources management, hydrology, agriculture and environment. Analyzing changes in meteorological variables is very important to understand a climate system and minimize the adverse effects of the climate changes. One of the main issues in meteorological analysis is the interpolation of spatial data. In recent years, with the developments in Geographical Information System (GIS) technology, the statistical methods have been integrated with GIS and geostatistical methods have constituted a strong alternative to deterministic methods in the interpolation and analysis of the spatial data. In this study; spatial distribution of precipitation and temperature of the Aegean Region in Turkey for years 1975, 1980, 1985, 1990, 1995, 2000, 2005 and 2010 were obtained by the Ordinary Kriging method which is one of the geostatistical interpolation methods, the changes realized in 5-year periods were determined and the results were statistically examined using cell and multivariate statistics. The results of this study show that it is necessary to pay attention to climate change in the precipitation regime of the Aegean Region.
    [Show full text]
  • Interpolation, Extrapolation, Etc
    Interpolation, extrapolation, etc. Prof. Ramin Zabih http://cs100r.cs.cornell.edu Administrivia Assignment 5 is out, due Wednesday A6 is likely to involve dogs again Monday sections are now optional – Will be extended office hours/tutorial Prelim 2 is Thursday – Coverage through today – Gurmeet will do a review tomorrow 2 Hillclimbing is usually slow m Lines with Lines with errors < 30 errors < 40 b 3 Extrapolation Suppose you only know the values of the distance function f(1), f(2), f(3), f(4) – What is f(5)? This problem is called extrapolation – Figuring out a value outside the range where you have data – The process is quite prone to error – For the particular case of temporal data, extrapolation is called prediction – If you have a good model, this can work well 4 Interpolation Suppose you only know the values of the distance function f(1), f(2), f(3), f(4) – What is f(2.5)? This problem is called interpolation – Figuring out a value inside the range where you have data • But at a point where you don’t have data – Less error-prone than extrapolation Still requires some kind of model – Otherwise, crazy things could happen between the data points that you know 5 Analyzing noisy temporal data Suppose that your data is temporal – Radar tracking of airplanes, or DJIA There are 3 things you might wish to do – Prediction: given data through time T, figure out what will happen at time T+1 – Estimation: given data through time T, figure out what actually happened at time T • Sometimes called filtering – Smoothing: given data through
    [Show full text]