Article Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data Fei Hu 1, Mengchao Xu 1, Jingchao Yang 1, Yanshou Liang 1, Kejin Cui 1, Michael M. Little 2, Christopher S. Lynnes 2, Daniel Q. Duffy 2 and Chaowei Yang 1,* 1 NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USA;
[email protected] (F.H.);
[email protected] (M.X.);
[email protected] (J.Y.);
[email protected] (Y.L.);
[email protected] (K.C.) 2 NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA;
[email protected] (M.L.);
[email protected] (C.L.);
[email protected] (D.D.) * Correspondence:
[email protected]; Tel.: +1-703-993-4742 Received: 26 February 2018; Accepted: 5 April 2018; Published: 7 April 2018 Abstract: Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets.