Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

2018 IEEE International Conference on Cluster Computing Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G. Genton, and David E. Keyes Extreme Computing Research Center Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science Technology, Thuwal, Saudi Arabia. [email protected], [email protected], [email protected], [email protected], [email protected]. Abstract—Maximum likelihood estimation is an important Equations (PDEs) to estimate conditions at specific output statistical technique for estimating missing data, for example points based on semi-empirical models and assimilated mea- in climate and environmental applications, which are usually surements. This conventional approach translates the original large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto big data problem into a large-scale simulation problem, solved model, which operates on the resulting sizable dense covariance globally, en route to particular quantities of interest, and it matrix. The advent of high performance systems with advanced relies on PDE solvers to extract performance from the targeted computing power and memory capacity have enabled full sim- architectures. ulations only for rather small dimensional climate problems, An alternative available in many use cases is to estimate solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements missing quantities of interest from a statistical model. Until of the log-likelihood function, which necessitates O(n2) storage recently, the computation used in statistical models, like using and O(n3) operations, where n represents the number of given field data to estimate parameters of a Gaussian log-likelihood spatial locations. This prohibitive computational cost may be function and then evaluating that distribution to estimate reduced by using approximation techniques that not only enable phenomena where field data are not available, was intractable large-scale simulations otherwise intractable, but also maintain the accuracy and the fidelity of the spatial statistics model. for very large meteorological and environmental datasets. This In this paper, we extend the Exascale GeoStatistics software is due to the arithmetic complexity, for which a key step grows framework (i.e., ExaGeoStat1) to support the Tile Low-Rank as the cube of the problem size [1], i.e., increasing the problem (TLR) approximation technique, which exploits the data sparsity size by a factor of 10 requires 1, 000X more work (and 100X of the dense covariance matrix by compressing the off-diagonal more memory). tiles up to a user-defined accuracy threshold. The underlying linear algebra operations may then be carried out on this In [2], we have introduced Exascale GeoStatistics software data compression format, which may ultimately reduce the framework (ExaGeoStat) as a high performance software arithmetic complexity of the maximum likelihood estimation for geospatial statistics in climate and environment modeling. and the corresponding memory footprint. Performance results The proposed software calculates the core statistical opera- of TLR-based computations on shared and distributed-memory tion, i.e., the Maximum Likelihood Estimation (MLE), up to systems attain up to 13X and 5X speedups, respectively, compared to full accuracy simulations using synthetic and real datasets (up the machine precision accuracy for only rather small spatial to 2M), while ensuring adequate prediction accuracy. datasets due to memory limitations. ExaGeoStat relies Index Terms—massively parallel algorithms, machine learning on the asynchronous task-based dense linear algebra library algorithms, applied computing mathematics and statistics, max- Chameleon [3] associated with the dynamic runtime system imum likelihood optimization, geo-statistics applications StarPU [4] to exploit the underlying computing power toward large-scale systems. I. INTRODUCTION The existing hardware landscape with its limited memory Current massively parallel systems provide unprecedented capacity, and even with its high thread concurrency, still computing power with up to millions of execution threads. appears unfriendly for large-scale simulations due to the This hardware technology evolution comes at the expense of aforementioned curse of dimensionality. Piggybacking on the a limited memory capacity per core, which may prevent sim- renaissance in hierarchically low rank computational linear ulations of big data problems. In particular, climate/weather algebra, we propose to exploit data sparsity in the resulting, simulations usually rely on a complex set of Partial Differential apparently dense, covariance matrix by compressing the off- diagonal blocks up to a specific application-dependent accu- 1https://github.com/ecrc/exageostat racy. 2168-9253/18/$31.00 ©2018 IEEE 98 DOI 10.1109/CLUSTER.2018.00089 Indeed, this work extends our ExaGeoStat software, in The remainder of the paper is organized as follows. Sec- the context of climate and environmental simulations, by tion II covers different MLE approximation techniques that reducing the memory footprint and the arithmetic complexity have been proposed in the literature. Section III illustrates the of the MLE to alleviate the dimensionality bottleneck. We climate modeling structure used as a backbone for this work. employ the Tile Low-Rank (TLR) data format for the com- Section IV recalls the necessary background on the Matern´ pression, as implemented in the Hierarchical Computations on covariance functions. Section V describes the Tile Low-Rank Manycore Architectures (HiCMA2) numerical library. HiCMA (TLR) approximation technique and the HiCMA TLR approx- relies on task-based programming model and is deployed on imation library and its integration into the ExaGeoStat shared [5] and distributed-memory systems [6] via StarPU. framework. Section VI highlights the ExaGeoStat frame- The asynchronous execution achieved via StarPU is even work with its software stack. Section VII defines both the more critical for HiCMA’s workload, characterized by lower synthetic datasets and the two climate datasets obtained from arithmetic intensity, since it permits to mitigate the latency large geographic regions, i.e., the Mississippi River Basin overhead engendered by the data movement. and the Middle-East region that are used to evaluate the proposed TLR method. Performance results and accuracy A. Contributions analysis are presented in Section VIII, using both synthetic and real environmental datasets, and we conclude in Section IX. The contributions of this paper are sixfold. • We propose an accurate and amenable MLE framework II. RELATED WORK using TLR-based approximation format to reduce the Approximation techniques to reduce arithmetic complexities prohibitive complexity of the apparently dense covariance and memory footprint for large-scale climate and environmen- matrix computation. tal applications are well-established in the literature. Sun et • We provide a TLR solution for the prediction operation al. [7] have discussed several of these methods such as Kalman to impute values related to non-sampled locations. filtering [8], moving averages [9], Gaussian predictive pro- • We demonstrate the applicability of our approximation cesses [10], fixed-rank kriging [11], covariance tapering [12], technique on both synthetic (up to 1M locations) and real [13], and low-rank splines [14]. All these methods depend on datasets (i.e., soil moisture from the Mississippi Basin low-rank models, where a latent process is used with lower area and wind speed from the Middle East area). dimension, and eventually result in a low-rank representation • We port the ExaGeoStat simulation framework to a of the covariance matrix. Although these methods propose myriad of shared and distributed-memory systems using several possibilities to reduce the complexity of generating and a single source code to enhance user productivity, thanks computing the domain covariance matrix, several restrictions to its modular software stack. limit their functionality [15], [16]. • We conduct a comprehensive performance evaluation to On the other hand, low-rank off-diagonal matrix approx- highlight the effectiveness of the TLR-based approxi- imation techniques have gained a lot of attention to cope mation method compared to the original full accuracy with covariance matrices of high dimension. In the liter- approach. The experimental platforms include shared- ature, these are commonly referred as hierarchical matri- memory Intel Xeon Haswell / Broadwell / KNL / Skylake ces or H-matrices [17], [18]. The development of various high-end HPC servers and the distributed-memory Cray data compression techniques such as Hierarchically Semi- XC40 Shaheen-2 supercomputer. Separable (HSS) [19], H2-matrices [20]–[22], Hierarchically • We perform a thorough qualitative analysis to assess the Off-Diagonal Low-Rank (HODLR) [23], Block/Tile Low- accuracy of the estimation of the Matern´ covariance pa- Rank (BLR/TLR) [5], [6], [24], [25] increases their impact on rameters as well as the prediction operation. Performance a wide range of scientific applications. Each of the aforemen- results of TLR-based MLE computations on shared and tioned data compression formats has pros

Load more