<<

Efficient for real-time spatio-temporal

P.228 20th Conference on Probability and in the Atmospheric Sciences

Balaji Vasan Srinivasan, Ramani Duraiswami∗, Raghu Murtugudde† University of Maryland, College Park, MD

Abstract estimate the changes in ore grade within the mine (Krige, 1951). Kriging has since been applied in several scientific Atmospheric is often recorded at scattered station disciplines including atmospheric science, environmental locations. While the data is generally available over a monitoring and soil management. long period of time it cannot be used directly for extract- Kriging can be linear or non-linear. Simple kriging, ing coherent patterns and mechanistic correlations. The ordinary kriging and universal kriging are linear variants. only recourse is to spatially and temporally interpolate Indicator kriging, log-normal kriging and disjunctive krig- the data both to organize the station recording to a reg- ing are non-linear variants, that were developed to account ular grid and to query the data for predictions at a par- for models where the best predictor is not linear. Moyeed ticular location or time of interest. Spatio-temporal in- and Papritz (2002) show that the performance of linear terpolation approaches require the evaluation of weights and non-linear kriging are comparable except in the use at each point of interest. A widely used interpolation ap- of skewed data where non-linear kriging performs better. kriging proach is . However, kriging has a computational With improved sensors and ease of , the cost that scales as the cube of the number of data points amount of data available to krige has increased by sev- N , resulting in cubic time complexity for each point of eral fold. One drawback of kriging is the computational N 4 interest, which leads to a time complexity of O( ) for cost for large data sizes. One approach to allow kriging of N interpolation at O( ) points. In this work, we formulate large data is to use local neighborhood kriging where only the kriging problem, to first reduce the computational cost the closest observations are used for each prediction. Al- N 3 to O( ). We use an iterative solver (Saad, 2003), and though computationally attractive, the methods require a further accelerate the solver using fast summation algo- local neighborhood for each location where the prediction rithms like GPUML (Srinivasan and Duraiswami, 2009) is made, and predicting on a fine grid is still computation- or FIGTREE (Morariu et al., 2008). We illustrate the ally demanding. Another disadvantage is the discontinu- speedup on and compare the performance ity in prediction along the peripheries of the local regions. with other standard kriging approaches to demonstrate Another strategy for acceleration is to approximate the substantial improvement in the performance of our ap- matrix to result in a sparser kriging system. proach. We then apply the developed approach on ocean Furrer et al. (2006) use tapering to sparsify the covariance color data from the Chesapeake Bay and present some matrix in simple kriging and thus reduce the complexity quantitative analysis of the kriged results. of the . Memarsadeghi and Mount (2007) ex- tend a similar idea to ordinary kriging. Kammann and 1 Introduction Wand (2003) use a low rank approximation to the co- matrix to reduce the space and time complexity. Kriging (Isaaks and Srivastava, 1989) is a group of geo- Sakata et al. (2004) use the Sherman-Morrison-Woodbury statistical techniques to interpolate the value of a random formula on the sparsified with spatial field (e.g., the elevation, z, of the landscape as a func- sorting. The performance of all these approaches depends tion of the geographic location) at an unobserved location on the underlying data, and reduces dramatically when- from observations of its value at nearby locations. It be- ever the recording stations are located close to each other. longs to a family of linear least squares estimation algo- Alternatively, fast algorithms to solve the exact kriging rithms that are used in several geostatistical applications. problem have also been proposed. Hartman and Hssjer It has its origin in application, where it was used to (2008) build a Gaussian Markov random field to represent ∗Balaji Vasan Srinivasan and Ramani Duraiswami are with the Per- the station and the kriging points of interest to accelerate ceptual Interfaces and Reality Laboratory, Institute for Advanced Com- the kriging solution. Memarsadeghi et al. (2008) use fast puter Studies, University of Maryland, College Park, MD 20742, Email: summation algorithms (Yang et al., 2004; Raykar and Du- [balajiv,ramani]@umiacs.umd.edu raiswami, 2007) to krige in linear time (O(N)). However, †Raghu Murtugudde is from the Earth System Science Interdis- ciplinary Center, University of Maryland, College Park, MD 20742, the speedup in these approaches are dependent on the dis- Email: [email protected] tribution of the station and kriged locations. In this pa-

1 N per, we propose to use a graphical processing unit (GPU) X based fast algorithm to solve the exact kriging system. We E [v] − wiE [v] = 0 first formulate the kriging problem to solve it in O(kN 2) i=1 N using iterative solvers (Saad, 2003) and then parallelize X the computations on a GPU to achieve “near” real-time wi = 1 (3) performance. Unlike other approaches discussed before, i=1 our acceleration is independent of station location and dis- Let the residue be rj; tribution and does not rely on covariance approximation. r =v ˜ − vˆ . (4) The paper is organized as follows: In Section 2, we in- j j j troduce the kriging problem and formulate the and Therefore, the residual variance is given by, for an ordinary kriging system. In Section 3, we propose our modification to the kriging formula- V ar(rj) = Cov{v˜jv˜j}−2Cov{v˜jvˆj}+Cov{vˆjvˆj} (5) 2 tion to solve it in O(kN ). In Section 4, we discuss the The first term can further be simplified as follow, acceleration of our formulation on a graphical processor. N In Section 5, we discuss our on various syn- X thetic, spatial and spatio-temporal datasets to illustrate the Cov{v˜jv˜j} = V ar{v˜j} = V ar{ wivi} performance of our kriging algorithm. i=1 N N X X = wi · wk · Cov{vivk} 2 Linear kriging i=1 k=1 N N Linear kriging is divided into simple kriging (known X X = w w Cˆ (6) mean), ordinary kriging (unknown but constant mean) and i k ik i=1 k=1 universal kriging (the mean is an unknown linear combi- nation of known functions), depending on the mean value The second term can be written as, specification. We shall restrict the discussion here to or- N ! X dinary kriging, however the approach that we propose in Cov{v˜jvˆj} = Cov{ wivi vˆj} Section 3 and the accelerations in Section 4 are generic i=i and apply to other categories as well. N N X X ˆ Ordinary kriging is widely used because it is statisti- = wi · Cov{vivˆj} = wiCi0 cally the best linear unbiased estimator (B.L.U.E). Ordi- i=1 i=1 nary kriging is linear because its estimates are linear com- Finally, assuming that the random variables have the same 2 bination of of the available data. It is unbiased because variance σv, the third term can be expressed as it attempts to keep the mean residual to be zero. Finally, 2 it is called best because it tries to minimize the residual Cov{vˆjvˆj} = σv (7) variance. Substituting from Eqs. (6-7) in Eq. (5),

2.1 B.L.U.E Formulation N N N X X ˆ X ˆ 2 V ar(rj) = wiwkCik + wiCi0 + σv. (8) Let the data be sampled at N locations (x1, x2, . . . , xN ), i=1 k=1 i=1 and the corresponding values be v1, v2, . . . , vN . The value vˆj at an unknown location xˆj is estimated as a For ordinary kriging, it is required to find w by mini- weighted linear combination of v’s, given by, mizing V ar(rj) with respect to w subject to the constraint Eq. 3. This can be written as the minimization of the pe- N X nalized cost , v˜j = wivi. (1) N N N i=1 X X X J(w) = wiwkCˆik + wiCˆi0 (9) Here, v˜j is the estimate, and let vˆj be the actual value i=1 k=1 i=1 (unknown) at xˆj. To find the weights, the values (vi and N ! 2 X vˆj) are assumed to be stationary random functions, +σv + 2λ wi − 1 , i=1 E [v ] = E [v ˆ ] = E(v). (2) i j with 2λ the Lagrange multiplier. Taking of J For unbiased estimates, with respect to w and λ, N E [v ˆj − v˜j] = 0 ∂J X = 2 wkCˆi,k − 2Cˆi0 + 2λ (10) ∂wi E [v ˆj] − E [v ˜j] = 0 k=1 " N # N X ∂J X E [v] − E w v = 0 = w − 1 (11) i i ∂λ i i=1 i=1

2 Setting this to zero, we get the following system to solve, 3 Proposed approach to obtain the weights w and λ, Without loss of generality, the system in Eq. (13) can be       Cˆ11 ... Cˆ1n 1 w1 Cˆ10 transposed,  . .. . .   .   .   . . . .   .   .  v = Cˆ TCˆ −1ˆv. (15)  ×  =   ∗ ∗  Cˆn1 ... Cˆnn 1   wN   CˆN0  1 ... 1 0 λ 1 The covariance functions Cij are assumed to be symmet- ric in our discussions, therefore Cˆ T = Cˆ; hence there is ⇒ Cwˆ = ˆc (12) no difference in transposing. However, this strategy ap- ∗ plies to asymmetric covariance matrices as well. Now, In order to obtain the kriged output at M locations Eq. 12 the kriging comprises of solving a linear system followed needs to be solved at each location. by a covariance matrix-vector product; thus resulting in a complexity of O(N 3 + N 2). This is an order reduction T −1 v∗ = ˆv Cˆ Cˆ ∗, (13) from the original formulation. Note that, in this approach, the weights are not evaluated explicitly, and so the storage where, is also avoided. Writing the two steps of the new formu- lation,     v˜1˜ v˜1˜ ˆ  .  Solve for y, Cy = ˆv (16)  .   .  v∗ =  .  , ˆv =    v˜ ˜  ˆ T v˜ M v∗ = C∗ y M˜ 0 Such a formulation makes kriging similar to the train-  ˆ ˆ  C11˜ ... C1˜N ing and prediction in regression (Ras- . . .  . .. .  mussen and Williams, 2005), a Bayesian machine learn- Cˆ ∗ =  . .   ˆ ˆ  ing approach (Bishop, 2006). The Gaussian process for-  CM˜ 1 ... CMN˜  1 ... 1 mulation allows the definition of a variance (Vj) (Ras- mussen and Williams, 2005) at the jth kriged location is given by, 2.2 Covariance functions ˆ ˆ V˜j = C˜j˜j − V˜j (17) There are two ways of specifying the covariances Cij, ei- ˆ ther by a standard function or by evaluating it empirically where, V˜j is given by, at each location. The latter approach is not suitable when there are large number of stations, or when it is required T −1  Cˆ   Cˆ ... Cˆ   Cˆ  to krige at a non-station location. A functional form of co- ˜j1 11 1N ˜j1 Vˆ =  .   . .. .   .  variance is preferred in these cases. The covariance func- ˜j  .   . . .   .  tion is generally chosen to reflect prior information. In ˆ ˆ ˆ ˆ C˜jN CN1 ... CNN C˜jN the absence of such knowledge, the Gaussian function is (18) the most widely used covariance (Isaaks and Srivastava, 1989), For simple kriging, iterative solution techniques like conjugate gradient (CG) and GMRES (Saad, 2003) can  2  kxi − xjk kij = s exp − . (14) be used. For ordinary kriging, the linear system in Eq. h2 (16) results in a “saddle-point” problem, and SymmLQ (Paige and Saunders, 1975) performs very well for these Another advantage with a functional representation is that problems and is used here. it is possible to krige by computing the matrix C on-the- fly thus saving a lot of space requirements. 3.1 Possible accelerations 2.3 Computation complexity The key computation in each iteration of SymmLQ/CG/GMRES is the covariance matrix-vector Evaluation of a single set of weights w for a given location product. By using fast single processor algorithms requires the solution of the system in Eq. (12) and has a (Morariu et al., 2008) or parallelization, the cost of computational complexity of O(N 3), N being the number this matrix-vector product in each iteration can be of samples. Further, to get the weights at M locations, significantly reduced. the complexity increases to O(MN 3). For M ≈ N, the Single processor accelerations use approximation algo- assymptotic complexity is O(N 4), this is undesirable for rithms to compute the matrix-vector product in linear time large N.

3 (a) Covariance matrix-vector product

Figure 1: Growth in the CPU and GPU speeds over the last 6 years on benchmarks (Image from NVIDIA (2008))

O(N), within some known error bound. However, the per- formance is dependent on the distribution of the station lo- cations. Maximum speedups are obtained when the data is recorded at relatively uniform . Alternatively, the matrix-vector product can be divided across multiple cores. No approximation is involved here and this has a complexity of O(N 2). An advantage of this approach is that it is independent of the locations. (b) SymmLQ Strategy: Both these approaches have their own advan- Figure 2: Speedup obtained with GPUML (Srinivasan and tages/disadvantages. A parallel implementation is inde- Duraiswami, 2009) for matrix-vector product (Fig. 2a) pendent of the station locations and hence is useful when and kriging (Fig. 2b) the data are recorded at sparse locations. If it is known apriori that the station locations are equally space (to some degree), single-processor based accelerations are nounced FERMI architecture significantly improves these more beneficial. In this paper, we use a parallel ap- benchmarks. Moreover, GPUs power utilization per flop proach on a graphical processor to accelerate the matrix is an order of magnitude better. GPUs are particularly vector product. For well-gridded data, our approach will well-suited for data parallel computation and are designed be marginally slower compared to a linear approximation as a single-program-multiple-data (SPMD) architecture like FIGTREE (Morariu et al., 2008). with very high arithmetic intensity (ratio of arithmetic op- eration to memory operations). However, the GPU does not have the functionalities of a CPU like task-scheduling. 4 Graphical processors Therefore, it can efficiently be used to assist the CPU in its operation rather than replace it. Computer chip-makers are no longer able to easily im- prove the speed of processors, with the result that com- The covariance matrix vector product can also be puter architectures of the future will have more cores, viewed as a weighted summation of covariance functions, rather than more capable faster cores. This era of multi- and GPU-accelerated summations are available as an open core computing requires that algorithms be adapted to the source, GPUML (Srinivasan and Duraiswami, 2009) and data parallel architecture. If algorithms can be so devised we used this to accelerate the kriging. NVIDIA GPUs the benefit is a significant improvement in performance. A are equipped with an easy programming model called particularly capable set of data parallel processors are the Compute Unified Device Architecture (CUDA) (NVIDIA, graphical processors, which have evolved into highly ca- 2008) and were used here. pable compute coprocessors. A graphical processing unit (GPU) is a highly parallel, multi-threaded, multi-core pro- cessor with tremendous computational horsepower. 4.1 Speedup In 2008, while the fastest Intel CPU could achieve only ∼ 50 Gflops speed theoretically, GPUs could achieve For the Gaussian , the speedup ob- ∼ 950 Gflops on actual benchmarks (NVIDIA, 2008). tained with GPUML is shown in Fig. 2a. By using Fig. 1 shows the relative growth in the speeds of NVIDIA GPUML over each iteration, the speedup is further in- GPUs and Intel CPUs as of 2008 (similar numbers are re- creased over that for SymmLQ (ordinary kriging) in Fig. ported for AMD/ATI CPUs and GPUs). The recently an- 2b.

4 Figure 3: Synthetic surface generated and sampled to test (a) Residues the performance of our kriging 5 Experiments

All our experiments were performed on a Intel Core2 quad core 3GHz processor with 4GB RAM. We used the NVIDIA GTX280 graphical processor with 240 cores ar- ranged as 30 multi-processors and a common 1GB global memory. 5.1 Experiment1: Performance analysis In this , we tested our kriging approach on a 2−dimensional synthetic data in Sakata et al. (2004). The data was sampled randomly on a 2−d grid, and value at each sampled location was generated using the relation, (b) Time taken Figure 4: Performance comparison across different krig- f(x1, x2) = sin(0.4x1) + cos(0.2x2 + 0.2), (19) ing approaches for the synthetic data in (Sakata et al.,

0.0 ≤ x1, x2 ≤ 10.0. 2004)

The surface represented by such a function is given in from which the isosurface was found using standard rou- Fig. 3. Although the surface is not so complex, we used tines (Turk and O’Brien, 2002). Our GPU implementation different sampling size to test the speed of various kriging took only 7 minutes to krige at the 8-million points while approaches across “number of stations”. We compared a direct implementation would have taken ∼ 2 days on a our approach against several open-source kriging pack- state-of-the-art PC. The quality of the resulting interpola- ages: Dace kriging (Lophaven et al., 2002), Davis kriging tion can be seen from Fig. 5b. (Lafleur, 1998) (algorithm based on Davis (1990)) and mGstat kriging (Hansen, 2004). All these packages are 5.2 Experiment2: Chesapeake Bay Data highly optimized for best kriging performance. The com- We applied our kriging to the ocean color data from the parison of the residues and time taken by various kriging Chesapeake bay. The ocean color is an indicator of the approaches is shown in Fig. 4. It can be seen that the pro- concentration of chlorophyll in the water. The data are posed approach has the best time performance for compa- composite maps of SeaWiFS chlorophyll measured at a rable residues. 2km resolution in the Chesapeake bay region recorded ev- In order to further emphasize the quality of our krig- ery 7-days between 1998 and 2002 by NASA. The region ing, we performed spatial interpolation on the “Stanford across which the data was recorded is shown in Fig. 6. bunny” point cloud data1 shown in Fig. 5a using the or- There were some days for which no recording available. dinary kriging approach. The point cloud contains 35947 We used the Gaussian covariance matrix (Eq. 14) and points along the surface of the bunny. In order to krige set the length scales (h in Eq. 14) to 0.06 degrees along this data, the cloud was extended to size 104502 by adding the latitude and longitude scales and 15 days along the points along the normal inside and outside and assigning temporal scale. It was observed that on an average our values 0, −1 and 1 respectively. The surface was kriged at approach took 70 − 80s to krige the data for a single day 8, 000, 000 points (regular spatial grid 200 × 200 × 200) (a direct implementation would have taken ∼ 8 hours for 1http://graphics.stanford.edu/data/3Dscanrep/ the same. The kriging results for the 15th day of each

5 (a) Point cloud

Figure 6: Region across which ocean color data were available (indicated in white color)

synthetic and recorded data.

Acknowledgement

This work is a part of Chesapeake Bay Forecast System and we gratefully acknowledge NOAA for funding the project. We also thank Drs. Nail Gumerov, Clarissa An- derson and Jim Beauchamp, for providing us the data that we used in this paper and for the valuable feedbacks that helped us improve our kriging tool. (b) Kriged surface Figure 5: Spatial interpolation of the “Stanford” bunny point cloud data. Our approach took only 7 minutes, a References direct implementation would have taken ∼ 2 days. Bishop, C., 2006: and Machine Learning month in the year is shown in Fig.7. It can be seen that (Information Science and Statistics). Springer-Verlag New the chlorophyll concentration is low during the cooler pe- York, Inc. riods of the year due to the relative inactivity of the alga, whereas it increases by several folds during the warmer Davis, J., 1990: Statistics and Data Analysis in Geology. John periods (April-August) because of the blooms. Wiley & Sons, Inc., New York, NY, USA. Fig. 8 shows the result of a variability analysis on the Furrer, R., M. Genton, and D. Nychka, 2006: Covariance ta- kriged ocean color data over the entire 5−year period. It pering for interpolation of large spatial datasets. Journal of can be observed that ocean color has a large variability Computational and Graphical Statistics, 15, 502–523. in regions very close to the land region. This is in line with what is expected because of the well-known fact that Hansen, T., 2004: mgstat: A geostatistical matlab toolbox. On- human activity (in the land regions) contribute to a large line web resource. extent to the chlorophyll concentration and hence a larger URL http://mgstat.sourceforge.net/ variation; thus further validating our kriging. Hartman, L. and O. Hssjer, 2008: Fast kriging of large data sets with Gaussian markov random fields. Computational Statis- 6 Conclusion tics & Data Analysis, 52, 2331 – 2349.

In this paper, we have proposed a new formulation for the Isaaks, E. and R. Srivastava, 1989: Applied . Ox- kriging problem, that can be solved using iterative solvers ford University Press, 542 pp. like conjugate gradient, SymmLQ. We further accelerate each iterations on a GPU by using fast covariance matrix Kammann, E. E. and M. P. Wand, 2003: Geoadditive models. vector products. The resulting kriging is illustrated to be Journal of the Royal Statistical Society. Series C (Applied faster than many available tools and further validated on Statistics), 52, 1–18.

6 (a) January (b) February (c) March

(d) April (e) May (f) June

(g) July (h) August (i) September

(j) October (k) November (l) December

Figure 7: Kriging result using the 7−day Chesapeake bay ocean color data on th 15th day of each month in 2002

7 (a) Mean (b)

Figure 8: Variability analysis: Mean and standard deviation of the kriged over the 5 year period from 1998 to 2002.

Krige, D., 1951: A statistical approach to some basic mine val- Rasmussen, C. and C. Williams, 2005: Gaussian Processes for uation problems on the witwatersrand. Journal of the Chem- Machine Learning. The MIT Press. ical, Metallurgical and Mining Society of South Africa, 52, 119–139. Raykar, V. and R. Duraiswami, 2007: The improved fast Gauss transform with applications to machine learning. Large Scale Lafleur, C., 1998: Matlab kriging toolbox. Online web resource. Kernel Machines, 175–201. URL http://www2.imm.dtu.dk/∼hbn/dace/ Saad, Y., 2003: Iterative Methods for Sparse Linear Systems. Lophaven, S., H. Nielsen, and J. Sndergaard, 2002: Dace, a Mat- Society for Industrial and Applied . lab kriging toolbox. Online web resource. Sakata, S., F. Ashida, and M. Zako, 2004: An efficient algorithm URL http://www2.imm.dtu.dk/∼hbn/dace/ for kriging approximation and optimization with large-scale Memarsadeghi, N. and D. Mount, 2007: Efficient implemen- sampling data. Computer Methods in Applied Mechanics and tation of an optimal interpolator for large spatial data sets. , 193, 385 – 404. Proceedings of the 7th international conference on Computa- Srinivasan, B. and R. Duraiswami, 2009: Scaling kernel tional Science, Part II, Springer-Verlag, Berlin, Heidelberg, machine learning algorithm via the use of GPUs. GPU 503–510. Technology Conference, NVIDIA Research Summit. URL http://www.umiacs.umd.edu/∼balajiv/ Memarsadeghi, N., V. C. Raykar, R. Duraiswami, and D. M. GPUML.htm Mount, 2008: Efficient kriging via fast matrix-vector prod- ucts. Aerospace Conference, 2008 IEEE, 1–7. Turk, G. and J. O’Brien, 2002: Modelling with implicit surfaces that interpolate. ACM Trans. Graph., 21, 855–873. Morariu, V., B. Srinivasan, V. Raykar, R. Duraiswami, and L. Davis, 2008: Automatic online tuning for fast Gaussian Yang, C., C. Duraiswami, and L. Davis, 2004: Efficient kernel summation. Advances in Neural Information Processing machines using the improved fast gauss transform. Advances Systems. in Neural Information Processing Systems. URL http://sourceforge.net/projects/ figtree/

Moyeed, R. and A. Papritz, 2002: An empirical comparison of kriging methods for nonlinear spatial point prediction. Math- ematical Geology, 34, 365–386.

NVIDIA, 2008: NVIDIA CUDA Programming Guide 2.0.

Paige, C. C. and M. A. Saunders, 1975: Solution of sparse indef- inite systems of linear equations. SIAM Journal on , 12, 617–629.

8