big data and cognitive computing

Article Modeling Improves the Efficiency of Spatial Modeling Using Big Data from Remote Sensing

John Hogland * and Nathaniel Anderson Rocky Mountain Research Station, U.S. Forest Service, Missoula, MT 59801 USA; [email protected] * Correspondence: [email protected]; Tel.: +1-406-329-2138

Received: 26 June 2017; Accepted: 10 July 2017; Published: 13 July 2017

Abstract: Spatial modeling is an integral component of most geographic information (GISs). However, conventional GIS modeling techniques can require substantial processing time and storage space and have limited statistical and machine learning functionality. To address these limitations, many have parallelized spatial models using multiple coding libraries and have applied those models in a multiprocessor environment. Few, however, have recognized the inefficiencies associated with the underlying spatial modeling framework used to implement such analyses. In this paper, we identify a common inefficiency in processing spatial models and demonstrate a novel approach to address it using lazy evaluation techniques. Furthermore, we introduce a new coding library that integrates Accord.NET and ALGLIB numeric libraries and uses lazy evaluation to facilitate a wide range of spatial, statistical, and machine learning procedures within a new GIS modeling framework called function modeling. Results from simulations show a 64.3% reduction in processing time and an 84.4% reduction in storage space attributable to function modeling. In an applied case study, this translated to a reduction in processing time from 2247 h to 488 h and a reduction is storage space from 152 terabytes to 913 gigabytes.

Keywords: function modeling; remote sensing; machine learning; geographic information

1. Introduction Spatial modeling has become an integral component of geographic information systems (GISs) and remote sensing. Combined with classical statistical and machine learning algorithms, spatial modeling in GIS has been used to address wide ranging questions in a broad array of disciplines, from epidemiology [1] and climate science [2] to geosciences [3] and natural resources [4–6]. However, in most GISs, the current workflow used to integrate statistical and machine learning algorithms and to process raster models limits the types of analyses that can be performed. This process can be generally described as a series of sequential steps: (1) build a sample data set using a GIS; (2) import that sample data set into statistical such as SAS [7], R [8], or MATLAB [9]; (3) define a relationship (e.g., predictive regression model) between response and explanatory variables that can be used within a GIS to create predictive surfaces, and then (4) build a representative spatial model within a GIS that uses the outputs from the predictive model to create spatially explicit surfaces. Often, the multi-software complexity of this practice warrants building tools that streamline and automate many aspects of the process, especially the export and import steps that bridge different software. However, a number of challenges place significant limitations on producing final outputs in this manner, including learning additional software, implementing predictive model outputs, managing large data sets, and handling the long processing time and large storage space requirements associated with this work flow [10,11]. These challenges have intensified over the past decade because large, fine-resolution remote sensing data sets, such as meter and sub-meter imagery and Lidar, have become widely available and less expensive to procure, but the tools to use such data efficiently and effectively have not always kept pace, especially in the desktop environment.

Big Data Cogn. Comput. 2017, 1, 3; doi:10.3390/bdcc1010003 www.mdpi.com/journal/bdcc Big Data Cogn. Comput. 2017, 1, 3 2 of 14 over the past decade because large, fine‐resolution remote sensing data sets, such as meter and sub‐ meter imagery and Lidar, have become widely available and less expensive to procure, but the tools

Bigto Datause Cogn.such Comput.data efficiently2017, 1, 3 and effectively have not always kept pace, especially in the desktop2 of 14 environment. To address some of these limitations, advances in both GIS and statistical software have focused on integratingTo address functionality some of these through limitations, coding advances libraries in both that GIS extend and statistical the capabilities software of have any focused software on integratingpackage. Common functionality examples through include coding RSGISlib libraries that[12], extend GDAL the [13], capabilities SciPy [14], of any ALGLIB software [15], package. and CommonAccord.NET examples [16]. At include the same RSGISlib time, [12new], GDAL processi [13ng], SciPytechniques [14], ALGLIB have been [15], developed and Accord.NET to address [16]. Atcommon the same challenges time, new with processing big data techniques that aim haveto more been fully developed leverage to addressimprovements common in challengescomputer withhardware big data and thatsoftware aim toconfigurations. more fully leverage For example, improvements parallel processing in computer libraries hardware such and as softwareOpenCL configurations.[17] and CUDA [18] For are example, stable and parallel actively processing being used libraries within such the GIS as OpenCL community [17] [19,20]. and CUDA Similarly, [18] areframeworks stable and such actively as Hadoop being used [21] within are being the GIS used community to facilitate [19,20 ].cloud Similarly, computing, frameworks and suchoffer asimprovements Hadoop [21 ]in are big being data processing used to facilitate by partitioning cloud computing, processes andacross offer multiple improvements CPUs within in big a large data processingserver farm, by thereby partitioning improving processes user access, across affordabi multiplelity, CPUs reliability, within and a largedata sharing server farm,[22,23]. thereby While improvingthe integration, user functionality, access, affordability, and capabilities reliability, of GI andS and data statistical sharing software [22,23]. continue While the to expand, integration, the functionality,underlying framework and capabilities of how procedures of GIS and and statistical methods software are used continue within spatial to expand, models the in underlying GIS tends frameworkto remain the of same, how procedures which can andimpose methods artificial are usedlimitations within on spatial the type models and in scale GIS of tends analyses to remain that can the same,be performed. which can impose artificial limitations on the type and scale of analyses that can be performed. Spatial models models are are typically typically composed composed of ofmultip multiplele sequential sequential operations. operations. Each Eachoperation operation reads readsdata from data a from given a givendata set, data transforms set, transforms the data, the data,and then and thencreates creates a new a newdata data set set(Figure (Figure 1). 1In). Inprogramming, programming, this this is iscalled called eager eager evaluation evaluation (or (or strict strict semantics) semantics) and and is is characterized characterized by by a a flow flow that evaluates all expressions (i.e., arguments) regardless of the need for the values of those expressions in generating finalfinal results [[24].24]. Though eager evaluationevaluation is intuitive and used by many traditional programming languages,languages, creatingcreatingand and reading reading new new data data sets sets at at each each step step of of a a model model in in GIS GIS comes comes at at a higha high processing processing and storageand storage cost, and cost, is notand viable is not for largeviable area for analysis large area outside analysis of the supercomputingoutside of the environment,supercomputing which environment, is not currently which available is not currently to the vast available majority to ofthe GIS vast users. majority of GIS users.

Figure 1. AA schematic schematic comparing comparing conventional conventional modeling modeling (CM), (CM), which which employs employs eager eager evaluation, evaluation, to tofunction function modeling modeling (FM), (FM), which which us useses lazy lazy evaluation. evaluation. The The conceptual conceptual difference difference between between the two modeling techniquestechniques is is that that FM FM delays delays evaluation evaluation until until results results are needed are needed and produces and produces results withoutresults storingwithout intermediatestoring intermediate data sets data to disksets to (green disk (gr squares).een squares). When When input input and output and output operations operations or disk or storagedisk storage space space are the are primary the primary bottlenecks bottlenecks in running in running a spatial a spatial model, model, this this difference difference in evaluation in evaluation can substantiallycan substantially reduce reduce processing processing time time and and storage storage requirements. requirements.

In contrast, lazy evaluation (also called lazy reading and call‐by‐need) can be employed to In contrast, lazy evaluation (also called lazy reading and call-by-need) can be employed to perform perform the same types of analyses, reducing the number of reads and writes to disk, which reduces the same types of analyses, reducing the number of reads and writes to disk, which reduces processing processing time and storage. In lazy evaluation, actions are delayed until their results are required, time and storage. In lazy evaluation, actions are delayed until their results are required, and results can be generated without requiring the independent evaluation of each expression [24]. From a mathematical perspective, this is similar to function composition, in which functions are sequentially Big Data Cogn. Comput. 2017, 1, 3 3 of 14 combined into a composite function that is capable of generating results without evaluating each function independently. The theoretical foundations of lazy evaluation employed in functional programming languages dates back to as early as 1971, as described by Jian et al. [25]. Tremblay [26], provides a thorough discussion of the different classes of functional programming languages and their expressiveness as it relates to the spectrum of strict and non-strict evaluation. He defines strict (i.e., eager) approaches as evaluating all of the arguments before starting the evaluation of the body of the function and evaluating the body of the function only if all arguments are defined, and lazy approaches as evaluating the body of the function before evaluating any of the arguments and evaluating the arguments only if needed while evaluating the body. In GISs, operations in conventional spatial models are evaluated and combined eagerly in ways that generate intermediate data sets whose values are written to disk and used to produce final results. In a lazy evaluation context, evaluation is delayed, intermediate datasets are only generated if needed, and final data sets are not stored to disk but are instead processed dynamically when needed (Figure1). For example, though an analysis may be conducted over a very large area at high resolution, final results can be generated easily in a call-by-need fashion for a smaller extent being visualized on a screen or map, without generating the entire data set for the full extent of the analysis. In this paper, we introduce a new GIS spatial modeling framework called function modeling (FM) and highlight some of the advantages of processing big data spatial models using lazy evaluation techniques. Simulations and two case studies are used to evaluate the costs and benefits of alternative methods, and an open source .Net coding library built around ESRI (Redlands, CA, USA) ArcObjects is discussed. The coding library facilitates a wide range of big data spatial, statistical, and machine learning type analyses, including FM.

2. Materials and Methods

2.1. NET Libraries To streamline the spatial modeling process, we developed a series of coding libraries that leverage concepts of lazy evaluation using function raster data sets [10,11] and integrate numeric, statistical, and machine learning algorithms with ESRI’s ArcObjects [27]. Combined, these libraries facilitate FM, which allows users to chain modeling steps and complex algorithms into one raster data set or field calculation without writing the outputs of intermediate modeling steps to disk. Our libraries were built using an object-oriented design, .NET framework, ESRI’s ArcObjects [27], ALGLIB [15], and Accord.net [16], which are accessible to computer programmers and analysts through our subversion site [28] and packaged in an add-in toolbar for ESRI ArcGIS [11]. The library and toolbar together are referred to here as the U.S. Forest Service Rocky Mountain Research Station Raster Utility (the RMRS Raster Utility) [28]. The methods and procedures within the RMRS Raster Utility library parallel many of the functions found in ESRI’s Spatial Analyst extension including focal, aggregation, resampling, convolution, remap, local, arithmetic, mathematical, logical, zonal, surface, and conditional operations. However, the libraries used in this study support multiband manipulations and perform these transformations without writing intermediate or final output raster data sets to disk. The spatial modeling framework used in the Utility focuses on storing only the manipulations occurring to data sets and applying those transformations dynamically at the time data are read. In addition to including common types of raster transformations, the utility also includes multiband manipulation procedures such as gray level co-occurrence matrix (GLCM), time series transformations, landscape metrics, entropy and angular second moment calculations for focal and zonal procedures, image normalization, and other statistical and machine learning transformations that are integrated directly into this modeling framework. While the statistical and machine learning transformations can be used to build surfaces and calculate records within a tabular field, they do not in themselves define the relationships between response and explanatory variables like a predictive model. To define these relationships, we built Big Data Cogn. Comput. 2017, 1, 3 4 of 14 Big Data Cogn. Comput. 2017, 1, 3 4 of 14 response and explanatory variables like a predictive model. To define these relationships, we built a suite of classes that perform a wide variety of statistical testing and predictive modeling using many aof suitethe optimization of classes that algorithms perform and a wide mathematical variety of procedures statistical testing found andwithin predictive ALGLIB modelingand Accord.net using many[15,16]. of Typically, the optimization in natural algorithmsresource applications and mathematical that use GIS, procedures including found the case within studies ALGLIB presented and Accord.netin Section [2.3,15,16 analysts]. Typically, use insamples natural resourcedrawn from applications a population that use to GIS, test including hypotheses the caseand studiescreate presentedgeneralized in Sectionassociations 2.3, analysts (e.g., an use equation samples or drawn procedure) from a populationbetween variables to test hypotheses of interest and that create are generalizedexpensive to associations collect in the (e.g., field an (i.e., equation response or procedure) variables) between and variables variables that of are interest less costly that are and expensive thought to collectbe related in the to fieldthe response (i.e., response variables variables) (i.e., explanatory and variables variables; that are lessFigures costly 2 and thought3). For example, to be related it is tocommon the response to use variables the information (i.e., explanatory contained variables; in remotely Figures sensed2 and images3). For example,to predict it isthe common physical tocharacteristics use the information of vegetation contained that is in otherwise remotely me sensedasured images accurately to predict and therelatively physical expensively characteristics with ofground vegetation plots. thatWhile is otherwisethe inner workings measured and accurately assump andtions relatively of the algorithms expensively used with to develop ground plots.these Whilerelationships the inner are workings beyond the and scope assumptions of this paper, of the it is algorithms worth noting used that to developthe classes these developed relationships to utilize are beyondthese algorithms the scope ofwithin this paper, a spatial it is worthcontext noting provi thatde computer the classes programmers developed to utilizeand analysts these algorithms with the withinability to a spatialuse these context techniques provide in computerthe FM context, programmers relatively and easily analysts without with developing the ability code to use for thesesuch techniquesalgorithms inthemselves. the FM context, relatively easily without developing code for such algorithms themselves. Combined with FM, the sampling, modeling, and su surfacerface creation workflow can be streamlined to produceproduce outputs outputs that that answer answer questions questions and display and display relationships relationships between responsebetween and response explanatory and variablesexplanatory in avariables spatially explicitin a spatially manner. explicit However, manner. improvements However, in improvements efficiency and effectivenessin efficiency overand conventionaleffectiveness over techniques conventional are not techniques always known. are not These always .NET known. libraries These were .NET used libraries to test the were hypothesis used to thattest the FM hypothesis provides significant that FM provides improvement significant over conventional improvement techniques over conventional and to quantify techniques any relatedand to effectsquantify on any storage related and effects processing on storage time. and processing time.

Figure 2. An example of the types of interactive graphicsgraphics produced within the RMRS Raster Utility toolbar that visuallyvisually describe the generalizedgeneralized association (predictive model) between board foot volume per acre of trees greater than 5 inchesinches inin diameterdiameter (the(the responseresponse variable) and twelve explanatory variables derived from remotely sensed imagery. The figurefigure illustrates the functionalfunctional relationship between volume and oneone explanatory variable called “mean-Band3”,“mean‐Band3”, while holding all other explanatory variables values at 30% ofof theirtheir sampledsampled range.range. Big Data Cogn. Comput. 2017, 1, 3 5 of 14 Big Data Cogn. Comput. 2017, 1, 3 5 of 14

FigureFigure 3.3. AnAn exampleexample ofof thetheinteractive interactive graphicsgraphics producedproduced withinwithin thethe RMRSRMRS RasterRaster UtilityUtility toolbartoolbar thatthat depictsdepicts thetheimportance importance ofof each each variable variable within within a a predictive predictive model model by by systematically systematically removing removing oneone variablevariable fromfrom the the model model and and comparing comparingchanges changesin inroot rootmean meansquared squarederror error (RMSE). (RMSE). The The model model usedused is is a generalizeda generalized association association between between board board foot volume foot volume per acre per and acre twelve and explanatory twelve explanatory variables derivedvariables from derived remotely from sensedremotely imagery, sensed andimagery, was developed and was developed using a random using a forestrandom algorithm forest algorithm and the RMRSand the Raster RMRS Utility Raster toolbar. Utility toolbar.

2.2.2.2. SimulationsSimulations FromFrom aa theoreticaltheoretical standpoint,standpoint, FMFM shouldshould improveimprove processingprocessing efficiencyefficiency byby reducingreducing thethe processingprocessing timetime andand storagestorage spacespace associatedassociated withwith spatialspatial modeling,modeling, primarilyprimarily becausebecause itit requiresrequires fewerfewer inputinput andand outputoutput processes.processes. However,However, toto justifyjustify thethe useuse ofof FMFM methods,methods, itit isis necessarynecessary toto quantifyquantify the the extentextent toto whichwhich FMFM actuallyactually reducesreduces processingprocessing timetime andand storagestorage space.space. ToTo evaluateevaluate FM,FM, wewe designed,designed, ran,ran, andand recordedrecorded processingprocessing timetime andand storagestorage spacespace associatedassociated withwith sixsix simulations,simulations, eacheach includingincluding 1212 models,models, varyingvarying thethe sizesize ofof thethe rasterraster datadata setssetsused used in in each each simulation. simulation. FromFrom thethe resultsresults ofof thosethose simulations,simulations, wewe comparedcompared andand contrastedcontrasted FMFM withwith conventionalconventional modelingmodeling (CM)(CM) techniquestechniques usingusing linearlinear regressionregression andand analysisanalysis ofof covariancecovariance (ANCOVA).(ANCOVA). AllAllCM CM techniques techniques werewere directlydirectly coded coded againstagainst ESRI’sESRI’s ArcObjectArcObject Spatial Spatial AnalystAnalyst classesclasses toto minimizeminimize CMCM processingprocessing timetime andand provideprovide a a fair fair comparison. comparison. SpatialSpatial modeling modeling scenarios scenarios within within each each simulation simulation ranged ranged from from one arithmetic one arithmetic operation operation to twelve to operationstwelve operations that included that included arithmetic, arit logical,hmetic, conditional,logical, conditional, focal, and focal, local and type local analyses. type analyses. Counts ofCounts each typeof each of operationtype of operation included included in each modelin each are model shown are inshown Table in1. Table In the 1. interest In the interest of conserving of conserving space, compositespace, composite functions functions for each for of theseeach modelsof these aremo notdels included are not andincluded defined and here, defined but, ashere, described but, as indescribed the introduction, in the introduction, each model each could model be presented could be presented in mathematical in mathematical form. Each form. modeling Each modeling scenario createdscenario a finalcreated raster a final output raster and output was run and against was run six rasteragainst data six setsraster ranging data sets in size ranging from 1,000,000in size from to 121,000,0001,000,000 to total 121,000,000 cells (1 mtotal2 grain cells size), (1 m incrementally2 grain size), incrementally increasing in totalincrea rastersing sizein total by 2000raster columns size by and2000 2000 columns rows and at each 2000 step rows (adding at each 4step million (adding cells 4 atmillion each step,cells at Figure each 4step,). Cell Figure bit depth 4). Cell remained bit depth constantremained as constant floating typeas floating numbers type across numbers all scenarios across all and scenarios simulations. and simulations. Simulations Simulations were carried were out oncarried a Hewlett-Packard out on a Hewlett EliteBook‐Packard 8740 EliteBook using an 8740 I7 Intel using quad an coreI7 Intel processor quad core and processor standard internaland standard 5500 revolutionsinternal 5500 per revolutions minute (RPM) per minute hard disk (RPM) drives hard (HHD). disk drives (HHD). These simulations employed stepwise increases in the total number of cells used in the analysis. It is worth noting here that this is a different approach than maintaining a fixed area (i.e., extent) and changing granularity, gradually moving from coarse to fine resolution and larger numbers of smaller and smaller cells. In this simulation, the algorithm used would produce the same result in both Big Data Cogn. Comput. 2017, 1, 3 6 of 14

Table 1. The 12 models and their spatial operations used to compare and contrast function modeling and conventional raster modeling in the simulations. Model number (column 1) also indicates the total number of processes performed for a given model. Each of the six simulations included processing all 12 models on increasingly large data sets.

Type and Number of Spatial Operations Model/Total Operations Arithmetic, Arithmetic, Logical Focal Conditional Convolution Local Add (+) Multiply (*) (≥) (Mean 7,7) (True/False) (Sum, 5,5) (Sum ∑) 1 1 ------2 1 1 - - - - - 3 1 1 1 - - - - Big Data Cogn.4 Comput. 2017 2, 1, 3 1 1 - - - - 7 of 14 5 2 1 1 1 - - - 6 2 2 1 1 - - - we created7 12 explanatory 2 variables, 2 three predic 1tive models, 1 and 1three raster surfaces - depicting - mean plot8 estimates 3 of BAA, TPA, 2 and BFA. 1Average plot 1 extent was 1 used to extract - spectral - and textural9 values in the 3 DNRC study 2 and was determined 1 1by calculating 1 average limiting 1 distance - of 10 3 3 1 1 1 1 - trees within11 each plot 3 given the basal 3 area factor 1 used in 1the field measurement 1 (ranging 1 between 1 5 and 20)12 and the diameter 4 of each 3tree measured. 1 1 1 1 1

Figure 4. Extent, size, and layout of each image within simulations. Figure 4. Extent, size, and layout of each image within simulations. While it would be ideal to directly compare CM with FM for the UNF and DNRC case studies, These simulations employed stepwise increases in the total number of cells used in the analysis. ESRI’s libraries do not have procedures to perform many of the analyses and raster transformations It is worth noting here that this is a different approach than maintaining a fixed area (i.e., extent) used in these studies, so CM was not incorporated directly into the analysis. As an alternative to and changing granularity, gradually moving from coarse to fine resolution and larger numbers of evaluate the time savings associated with using FM in these examples, we used our simulated results and estimated the number of individual CM functions required to create GLCM explanatory variables and model outputs. Processing time was based on the number of CM functions required to produce equivalent explanatory variables and predictive surfaces. In many instances, multiple CM functions were required to create one explanatory variable or final output. The number of functions were then multiplied by the corresponding proportion increase in processing time associated with CM when compared to FM. Storage space was calculated based on the number of intermediate steps used to create explanatory variables and final outputs. In all cases, the space associated with intermediate outputs from FM and CM were calculated without data compression and summed to produce a final storage space estimate.

Big Data Cogn. Comput. 2017, 1, 3 7 of 14 smaller and smaller cells. In this simulation, the algorithm used would produce the same result in both situations. However, some algorithms identify particular structures in an image and try to use characteristics like structural patterns and local homogeneity to improve efficiency, especially when attempting to discern topological relations, potentially using map-based measurements in making such determinations. In these cases, granularity may be important, and comparing the performance of different approaches would benefit from using both raster size and grain size simulations. To determine the overhead of creating and using a FM, we also conducted a simulation using focal mean analyses that varied neighborhood window size across six images. Each iteration of the simulation gradually increased the neighborhood window size for the focal mean procedure from a 10 by 10 to a 50 by 50 cell neighborhood (in increments of 10 cells) to evaluate the impact of processing more cells within a neighborhood, while holding the image and number of input and output procedures constant. Images used in this simulation were the same as in the previous simulation and contained between 1,000,000 and 121,000,000 total cells. In this scenario, CM and FM techniques were expected to show no difference with regard to processing time and storage space using a paired t-test comparison, and any differences recorded in processing or storage can be attributed to the overhead associated with creating and using a FM.

2.3. Case Studies To more fully explore the use of FM to analyze data and create predictive surfaces in real applications, we evaluated the efficiencies associated with two case studies: (1) a recent study of the Uncompahgre National Forest (UNF) in the US state of Colorado [6,29] and (2) a study focused on the forests of the Montana State Department of Natural Resources and Conservation (DNRC) [unpublished data]. For the UNF study, we used FM, fixed radius cluster plots, and fine resolution imagery to develop a two-stage classification and estimation procedure that predicts mean values of forest characteristics across 1 million ha at a spatial resolution of 1 m2. The base data used to create these predictions consisted of National Agricultural Imagery Program (NAIP) color infrared (CIR) imagery [30], which contained a total of 10 billion pixels for the extent of the study area. Within this , we created no fewer than 365 explanatory raster surfaces, many of which represented a combination of multiple raster functions, as well as 52 predictive models, and 64 predictive surfaces. All of these outputs were generated at the extent and spatial resolution of the NAIP imagery (1 m2, 10 billion pixels). For the DNRC study, we used FM, variable radius plots, and NAIP imagery to predict basal area per acre (BAA), trees per acre (TPA), and board foot volume per acre (BFA) for more than 2.2 billion pixels (~0.22 million ha) in northwest Montana USA at the spatial resolution of 1 m2. In this study, we created 12 explanatory variables, three predictive models, and three raster surfaces depicting mean plot estimates of BAA, TPA, and BFA. Average plot extent was used to extract spectral and textural values in the DNRC study and was determined by calculating average limiting distance of trees within each plot given the basal area factor used in the field measurement (ranging between 5 and 20) and the diameter of each tree measured. While it would be ideal to directly compare CM with FM for the UNF and DNRC case studies, ESRI’s libraries do not have procedures to perform many of the analyses and raster transformations used in these studies, so CM was not incorporated directly into the analysis. As an alternative to evaluate the time savings associated with using FM in these examples, we used our simulated results and estimated the number of individual CM functions required to create GLCM explanatory variables and model outputs. Processing time was based on the number of CM functions required to produce equivalent explanatory variables and predictive surfaces. In many instances, multiple CM functions were required to create one explanatory variable or final output. The number of functions were then multiplied by the corresponding proportion increase in processing time associated with CM when compared to FM. Storage space was calculated based on the number of intermediate steps used to create explanatory variables and final outputs. In all cases, the space associated with intermediate Big Data Cogn. Comput. 2017, 1, 3 8 of 14 outputs from FM and CM were calculated without data compression and summed to produce a final storage space estimate.

3. Results

3.1. SimulationsBig Data Cogn. Comput. 2017, 1, 3 8 of 14

To3. Results evaluate the gains in efficiency of FM, we compared it with CM across six simulations, each running 12 models of increasing complexity. FM significantly reduced both processing time 3.1. Simulations (Fdf,1 = 303.89; p-value < 0.001) and storage space (Fdf,1 = 3.64; p-value = 0.058) when compared to CM (Figures5 Toand evaluate6). FM the took gains 26.13 in mefficiency and a totalof FM, of we 13,299.09 compared megabytes it with CM to across perform six simulations, all six simulations, each whilerunning CM took 12 73.15 models m andof increasing 85,098.27 complexity. megabytes. FM This significantly represents reduced a 2.8-fold both increase processing in processingtime (Fdf,1 = time and a303.89; 6.4-fold p‐value increase < 0.001) in storageand storage space space moving (Fdf,1 = 3.64; from p‐ FMvalue to = CM. 0.058) Theoretically, when compared processing to CM (Figures time and storage5 and space 6). FM associated took 26.13 with m and creating a total of raster 13,299.09 data megabytes for a given to perform model, computerall six simulations, configuration, while CM set of took 73.15 m and 85,098.27 megabytes. This represents a 2.8‐fold increase in processing time and a operations, and data type should be a linear function of the total number of cells within a raster data 6.4‐fold increase in storage space moving from FM to CM. Theoretically, processing time and storage set and the total number of operations within the model, as shown by: space associated with creating raster data for a given model, computer configuration, set of operations, and data type should be a linear function of the total number of cells within a raster data Time (seconds) = β (cells × operations) set and the total number of operatioi ns within thei model, as shown by: i (1) Spacei (megabytes) = βi (cells)i Timei (seconds) = βi(cells × operations)i (1) where i denotes the technique, eitherSpace FMi (megabytes) or CM in = this βi (cells) case.i Our empirically derived FM time and spacewhere models i denotes nicely the fit thetechnique, linear paradigmeither FM or (R CM2 = in 0.96 this and case. 0.99, Our respectively). empirically derived However, FM time for and the CM 2 techniques,space models the model nicely forfit the Time linear (R 2paradigm= 0.91) deviated(R = 0.96 and slightly 0.99, andrespectively). the model However, for Space for the (R 2CM= 0.10) techniques, the model for Time (R2 = 0.91) deviated slightly and the model for Space (R2 = 0.10) deviated substantially from theoretical distributions, indicating that, in addition to the modeling deviated substantially from theoretical distributions, indicating that, in addition to the modeling operations, extra operations occurred to facilitate raster processing and that some of those extra operations, extra operations occurred to facilitate raster processing and that some of those extra operationsoperations produced produced intermediate intermediate raster raster data data sets,sets, generating wider wider variation variation in time in time and andspace space requirementsrequirements than than would would be expectedbe expected based based on on these these equationsequations (Equation (Equation 1). (1)).

FigureFigure 5. Conventional 5. Conventional modeling modeling (CM, (CM, blue) blue) and and functionfunction modeling modeling (FM, (FM, yellow) yellow) regression regression results results for processingfor processing time. time.

Big Data Cogn. Comput. 2017, 1, 3 9 of 14 Big Data Cogn. Comput. 2017, 1, 3 9 of 14

Figure 6. Conventional modeling (CM, blue) and function modeling (FM, yellow) regression results Figure 6. Conventional modeling (CM, blue) and function modeling (FM, yellow) regression results for storage space. for storage space.

CMCM and and FM FM time time equation equation coefficients coefficients indicate indicate that, that, on on average, average, CM CM processing processing time time was was 278% 278% longerlonger than than FM FM processing processing time—a time—a nearly nearly three three-fold‐fold increase increase in in proceeding time. time. Model Model fit fit for for FM FM andand CM CM space space equations equations varied varied greatly greatly between between tech techniquesniques and and could could not not be becompared compared reliably reliably in thein the same same manner. manner. For Forthe CM the CMmethodology, methodology, storag storagee space space appeared appeared to vary to depending vary depending on the on types the oftypes operations of operations within within the model the model and often and decreased often decreased well after well the after creation the creation of the offinal the data final set. data This set. outcomeThis outcome is assumed is assumed to be tocaused be caused by delayed by delayed remova removall of intermediate of intermediate data datasets. sets.Overall Overall though, though, CM totalCM totalstorage storage space space was was640% 640% larger larger thanthan FM storage FM storage space space for the for six the simulations. six simulations. WhenWhen comparing comparing processing time and storage sp spaceace between FM and and CM CM techniques techniques for for varying varying neighborhoodneighborhood window window sizes, sizes, there there was was a a substantia substantiall difference difference between between the the two two methods methods (Figure (Figure 77;; paired tt‐-testtestdf,29df,29 = =3.82; 3.82; p‐valuep-value < 0.001). < 0.001). As expected, As expected, processing processing time time increased increased as the as number the number of cells of increasedcells increased within within the neighborhood the neighborhood window, window, while while storage storage space space remained remained the same the same across across CM and CM FMand iterations. FM iterations. However, However, FM processing FM processing time timewas significantly was significantly faster faster than thanCM processing CM processing time timeand CMand storage CM storage space space was wastwice twice that thatof FM. of FM.Upon Upon further further inspection, inspection, these these differences differences stem stem from from the creationthe creation of intermediate of intermediate data data sets setsfor the for CM the proc CM procedureedure even even when when only onlyone process one process is specified is specified and alsoand alsofrom from differences differences in the in theunderlying underlying algorith algorithmsms for for calculating calculating mean mean values values within within a a focal neighborhoodneighborhood window. Big Data Cogn. Comput. 2017, 1, 3 10 of 14 Big Data Cogn. Comput. 2017, 1, 3 10 of 14

FigureFigure 7. 7. ConventionalConventional modeling modeling (CM) (CM) and and function function modeling modeling (FM) (FM) proc processingessing time time for for a a focal focal mean mean procedureprocedure performed onon images images of of varying varying sizes sizes (cells) (cells) using using five five different different neighborhood neighborhood window window sizes. sizes. 3.2. UNF and DNRC Case Studies 3.2. UNF and DNRC Case Studies Although simulated results quantify improvements related to using FM, the UNF and DNRC caseAlthough studies illustrate simulated these results efficiencies quantify in aimprovem more concreteents related and applied to using manner. FM, Inthe the UNF UNF and study, DNRC we casecreated studies no fewer illustrate than these 365 explanatory efficiencies rasterin a more surfaces, concrete 52 predictive and applied models, manner. and In 64 the predictive UNF study, surfaces we createdat the extent no fewer and spatialthan 365 resolution explanatory of the raster NAIP imagerysurfaces, (152 million predictive ha study models, area atand 1 m642 resolution).predictive surfacesTo compare at the the extent two approaches, and spatial it resolution was assumed of the that NAIP all potential imagery explanatory (1 million ha surfaces study werearea at created 1 m2 resolution).for CM to facilitate To compare sampling the two and approaches, statistical model it was building. assumed In that contrast, all potential FM only explanatory sampled explanatory surfaces weresurfaces created and calculatedfor CM to values facilitate for thesampling variables and selected statistical in the model final models.building. In contrast, FM only sampledAs describedexplanatory previously, surfaces and CM calculated was not incorporatedvalues for the directly variables into selected the analysis in the final because models. ESRI’s librariesAs described do not have previously, procedures CM to performwas not manyincorporat of theed analyses directly and into raster the transformationsanalysis because required. ESRI’s librariesCM performance do not have was estimatedprocedures based to perform on the results many of of simulations. the analyses For and the landraster cover transformations classification required.stage of the CM UNF performance study, we was estimated estimated that tobased reproduce on the theresults explanatory of simulations. variables For and the probabilistic land cover classificationoutputs using stage CM of techniques, the UNF study, it would we take estimated approximately that to reproduce 283 processing the explanatory h on a Hewlett-Packard variables and probabilisticEliteBook 8740 outputs using anusing I7 Intel CM quad techniques, core processor it would and internaltake approximately 5500 RPM HHD 283 and processing would require h on ata Hewlettleast 19‐ terabytesPackard ofEliteBook storage space.8740 using Using an FM, I7 Intel these quad same core procedures processor took and approximately internal 5500 37 RPM processing HHD andh and would a total require of 140 at gigabytes least 19 ofterabytes storage of space. storag Similarly,e space. forUsing the FM, second these stage same of procedures the classification took approximatelyand estimation 37 approach, processing we h estimated and a total a processingof 140 gigabytes time of of approximately storage space. 2247Similarly, h with for an the associated second stagestorage of requirementthe classification of 152 and terabytes estimation using approach CM. Using, we FM, estimated these same a processing procedures time took of approximately approximately 2247488 hh and with required an associated roughly storage 913 gigabytes requirement of storage. of 152 terabytes using CM. Using FM, these same proceduresFor the took DNRC approximately case study, we 488 estimated h and re thatquired to createroughly the 913 explanatory gigabytes and of storage. final predictive surfaces usingFor CM the would DNRC require case buildingstudy, we 30 estimated raster surfaces that takingto create approximately the explanatory 97 h toand complete final predictive and 239.3 surfacesgigabytes using of storage CM would space. require Using FM,building this same30 raster procedure surfaces took taking approximately approximately 21 h 97 to h complete to complete and andonly 239.3 24.7 gigabytesgigabytes ofof storage storage space space. (Figure Using8). FM, this same procedure took approximately 21 h to complete and only 24.7 gigabytes of storage space (Figure 8). Big Data Cogn. Comput. 2017, 1, 3 11 of 14 Big Data Cogn. Comput. 2017, 1, 3 11 of 14

FigureFigure 8. 8.Example Example of ofa abasal basalarea areaper peracre acre(BAA) (BAA) surface surface created created using using variable variable radius radius plots, plots, National National AgriculturalAgricultural ImageryImagery ProgramProgram (NAI (NAIP)P) image image texture, texture, and and a arandom random forest forest algorithm algorithm within within FM. FM. As Asareas areas transition transition from from low low to to high high BAA, BAA, colors colors transition transition from from purple purple to to green in the BAABAA graphic.graphic. Note,Note, only only forested forested areas areas are are represented represented in in the the modeled modeled outputs. outputs. Non-forested Non‐forested areas areas such such as wateras water in thein the southeastern southeastern portion portion of the of the NAIP NAIP imagery imagery were we notre usednot used to train to train the randomthe random forest forest model model and and do notdo accuratelynot accurately reflect reflect BAA B inAA the in modeledthe modeled outputs. outputs.

4.4. DiscussionDiscussion TheThe underlying underlying concept concept behind behind FM FM is is lazy lazy evaluation, evaluation, which which has has been been used used in in programmingprogramming for for decades,decades, andand isis conceptuallyconceptually similar to function co compositionmposition in in mathematics. mathematics. Using Using this this technique technique in inGIS GIS removes removes the the need need to to store store intermediate intermediate data data se setsts to to disk inin thethe spatialspatial modelingmodeling processprocess andand allowsallows modelsmodels toto processprocess datadata only for the spatial spatial areas areas needed needed at at the the time time they they are are read. read. In InCM, CM, as asspatial spatial models models become become more more complex complex (i.e., (i.e.,more more modeling modeling steps) steps) the number the number of intermediate of intermediate outputs outputsincreases. increases. These intermediate These intermediate outputs outputs significantly significantly increase increase the amount the amount of processing of processing time time and andstorage storage space space required required to run to runa specified a specified model. model. In contrast, In contrast, FM FMdoes does not store not store intermediate intermediate data data sets, sets,thereby thereby significantly significantly reducing reducing model model processing processing time time and andstorage storage space. space. InIn this this study, study, we we quantified quantified this this effect effect by by comparing comparing FM FM to to CM CM methodologies methodologies using using simulations simulations thatthat created created final final raster raster outputs outputs from from multiple, multiple, incrementally incrementally complex complex models. models. However, However, this doesthis does not capturenot capture the full the benefit full benefit of operating of operating in the FMin the environment. FM environment. Because Because FM does FM not does need not to storeneed outputs to store tooutputs disk in to a conventionaldisk in a conventional format (e.g., format Grid (e.g., or Imagine Grid or format), Imagine and format), all functions and all occurringfunctions tooccurring source rasterto source data raster sets can data be sets stored can andbe stored used inand a dynamicused in a fashiondynamic to fashion display, to visualize, display, visualize, symbolize, symbolize, retrieve, andretrieve, manipulate and manipulate function datafunction sets data at a smallsets at fraction a small offraction the time of the and time space and it space takes toit takes create to a create final rastera final output, raster thisoutput, framework this framework allows extremely allows extrem efficientely workefficient flows work and flows data exploration.and data exploration. For example, For ifexample, we were if to we perform were to the perform same six the simulations same six simu presentedlations herepresented but replace here but the replace requirement the requirement of storing theof storing final output the final as aoutput conventional as a conventional raster format raster with format creating with a creating function a model, function it wouldmodel, take it would less thantake 1less second than and 1 second would and require would less require than 50 less kilobytes than 50 of kilobytes storage to of finish storage the to same finish simulations. the same Thesimulations. results can The then results be stored can then as abe conventional stored as a conv rasterentional at a later raster time at a if later needed. time Toif needed. further To illustrate further thisillustrate point, this we conductedpoint, we conducted a simulation a tosimulation quantify theto quantify time and the storage time requiredand storage to create required various to create size various size data sets from a function in FM, and then read it back into a raster object using various neighborhood window sizes. Results show that storage space and processing time to read FM objects Big Data Cogn. Comput. 2017, 1, 3 12 of 14

are extremely small (kilobytes and milliseconds) and that the underlying algorithms behind those procedures have a much larger impact than the overhead associated with lazy evaluation. Based on these results, for the UNF and DNRC case studies, FM facilitated an analysis that simply would have been too lengthy (and costly) to perform using CM.

Big DataThough Cogn. Comput. it is 2017gaining, 1, 3 traction, the majority of GIS professionals use CM and have not12 ofbeen 14 exposed to FM concepts. Furthermore, FM is not included in most “off‐the‐shelf” GISs, making it difficult for the typical analyst to make the transition, even if FM results in better functionality, datasignificant sets from efficiency a function gains, in FM, and and improved then read work it back flow. into Also, a raster even object if coding using variouslibraries neighborhood are available, windowmany GIS sizes. users Results do not show have that the storageprogramming space and skills processing necessary time to execute to read FM outside objects areof a extremely graphical smallinterface (kilobytes of some and kind. milliseconds) To simplify and and that facilitate the underlying the use of algorithms FM and complex behind thosestatistical procedures and machine have alearning much larger algorithms impact within than the GIS, overhead we have associated created witha user lazy‐friendly evaluation. add‐in Based toolbar on thesecalled results, the RMRS for theRaster UNF Utility and DNRC (Figure case 9) that studies, provides FM facilitated quick access an analysisto a wide that array simply of spatial would analyses, have been while too allowing lengthy (andusers costly) to easily to perform store and using retrieve CM. FM models. The toolbar was specifically developed for ESRI’s GIS, one Thoughof the most it is widely gaining used traction, commercial the majority platforms, of GIS professionalsbut the underlying use CM libraries and have have not applicability been exposed in toalmost FM concepts. all GISs. Using Furthermore, the toolba FMr in is notthe ESRI included environment, in most “off-the-shelf” FM models can GISs, be loaded making into it difficultArcMap foras a the function typical raster analyst data to set make and the can transition, be used interc evenhangeably if FM results with in all better ESRI functionality,raster spatial operations. significant efficiencyCurrently, gains, our and solution improved contains work 1016 flow. files, Also, over even 105,000 if coding lines librariesof code, 609 are classes, available, and many 107 forms GIS usersthat facilitate do not have functionality the programming ranging from skills data necessary acquisition to execute to raster FM manipulation, outside of a graphical and statistical interface and ofmachine some kind. learning To simplify types of and analyses, facilitate in addition the use of to FM FM. and Structurally, complex statistical the utility and contains machine two learning primary algorithmsprojects that within consist GIS, weof havean ESRI created Desktop a user-friendly class library add-in toolbar(“esriUtil”) called and the RMRSan ArcMap Raster Utilityadd‐in (Figure(“servicesToolBar”).9) that provides The quick project access esriUtil to a wide contains array most of spatial of the analyses, functionality while of allowing our solution, users towhile easily the storeservicesToolBar and retrieve project FM models. wraps The that toolbar functionality was specifically into a toolbar developed accessible for within ESRI’s GIS,ArcMap. one ofIn theaddition, most widelytwo open used source commercial numeric platforms, class libraries but the are underlying used within libraries our esriUtil have project. applicability These in libraries, almost allALGLIB GISs. Using[15] and the Accord.net toolbar in the [16], ESRI provide environment, access to FMnumeric, models statistical, can be loaded and machine into ArcMap learning as a algorithms function raster used datato facilitate set and the can statistical be used interchangeably and machine learning with all procedures ESRI raster within spatial our operations. classes.

FigureFigure 9. 9.The The RMRS RMRS Raster Raster Utility Utility toolbar toolbar is is a a free, free, public public source, source, object object oriented oriented library library packaged packaged as as anan ESRI ESRI add-in add‐in that that simplifies simplifies data data acquisition, acquisition, raster raster sampling, sampling, and and statistical statistical and and spatial spatialmodeling, modeling, whilewhile reducing reducing the the processing processing time time and and storage storage space space associated associated with with raster raster analysis. analysis.

While FM is efficient compared to CM and uses almost no storage space, there may be Currently, our solution contains 1016 files, over 105,000 lines of code, 609 classes, and 107 forms circumstances that warrant saving FMs to disk in a conventional raster format. Obviously, if that facilitate functionality ranging from data acquisition to raster manipulation, and statistical intermediate data sets are useful products in themselves or serve as necessary inputs for other, and machine learning types of analyses, in addition to FM. Structurally, the utility contains two independent FMs, FMs can be saved as conventional raster data sets at any point along the FM chain. primary that consist of an ESRI Desktop class library (“esriUtil”) and an ArcMap add-in For example, if one FM (FM1) is read multiple times to create a new FM (FM2) and the associated (“servicesToolBar”). The project esriUtil contains most of the functionality of our solution, while the time it takes to read multiple instances of FM1 is greater than the combined read and write time of servicesToolBar project wraps that functionality into a toolbar accessible within ArcMap. In addition, saving FM1 to a conventional raster format (i.e., many transformations within the first FM), then it two open source numeric class libraries are used within our esriUtil project. These libraries, would be advantageous to save FM1 to a conventional raster format and use the newly created ALGLIB [15] and Accord.net [16], provide access to numeric, statistical, and machine learning conventional raster data set in FM2. Also, some processes that are hierarchical in nature where the algorithms used to facilitate the statistical and machine learning procedures within our classes. same operations occur for multiple embedded processes may benefit from a similar strategic While FM is efficient compared to CM and uses almost no storage space, there may be approach to storing specific steps in a FM as a conventional raster data set. Nonetheless, results from circumstances that warrant saving FMs to disk in a conventional raster format. Obviously, our simulations and case studies demonstrate that FM substantially reduces processing time and if intermediate data sets are useful products in themselves or serve as necessary inputs for other, storage space. Given accessible tools and appropriate training, GIS professionals may find that these independent FMs, FMs can be saved as conventional raster data sets at any point along the FM chain. benefits outweigh the costs of learning new techniques to incorporate FM into their analysis and For example, if one FM (FM1) is read multiple times to create a new FM (FM2) and the associated time modeling of large spatial data sets. it takes to read multiple instances of FM1 is greater than the combined read and write time of saving FM1Acknowledgments: to a conventional This raster work formatwas funded (i.e., by many the United transformations States Department within theof Agriculture first FM), then(USDA), it would US Forest be advantageousService, Rocky toMountain save FM1 Research to a conventional Station, and rasterby the formatUSDA andNational use theInstitute newly of created Food and conventional Agriculture, rasterBiomass data Research set in FM2.and Development Also, some processesInitiative award that are no. hierarchical2011‐10006‐30357 in nature and Agriculture where the and same Food Initiative Competitive award no. 2013‐68005‐21298. occur for multiple embedded processes may benefit from a similar strategic approach to storing specific steps in a FM as a conventional raster data set. Nonetheless, results from our simulations and case studies demonstrate that FM substantially reduces processing time and storage space. Given accessible tools and appropriate training, GIS professionals may find that these benefits outweigh the costs of learning new techniques to incorporate FM into their analysis and modeling of large spatial data sets. Big Data Cogn. Comput. 2017, 1, 3 13 of 14

Acknowledgments: This work was funded by the United States Department of Agriculture (USDA), US Forest Service, Rocky Mountain Research Station, and by the USDA National Institute of Food and Agriculture, Biomass Research and Development Initiative award no. 2011-10006-30357 and Agriculture and Food Research Initiative Competitive award no. 2013-68005-21298. Author Contributions: John Hogland proposed, built, and applied the concepts of Function Modeling, developed the RMRS Raster Utility coding libraries and repository, designed and implemented the simulations and case studies, and contributed to the writing of the manuscript; Nathaniel Anderson contributed to the writing of the manuscript and the development of figures and tables, described the theoretical background of eager and lazy evaluation coding strategies, and contributed to the design of the simulations and case studies. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Auchincloss, A.H.; Gebreab, S.Y.; Mair, C.; Diez Roux, A.V. A Review of Spatial Methods in Epidemiology, 2000–2010. Annu. Rev. Public Health 2012, 33, 107–122. [CrossRef][PubMed] 2. Patenaude, G.; Milne, R.; Dawson, T. Synthesis of remote sensing approaches for forest carbon estimation Reporting to the Kyoto Protocol. Environ. Sci. Policy 2005, 8, 161–178. [CrossRef] 3. Pradhan, B.; Lee, S.; Buchroithner, M.F. A GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Comput. Environ. Urban Syst. 2010, 34, 216–235. [CrossRef] 4. Martin, P.H.; LeBoeuf, E.J.; Dobbins, J.P.; Daniel, E.B.; Abkowitz, M.D. Interfacing GIS with water resource models: A state-of-the-art. J. Am. Water Resour. Assoc. 2005, 41, 1471–1487. [CrossRef] 5. Reynolds-Hogland, M.; Mitchell, M.; Powell, R. Spatio-temporal availability of soft mast in clearcuts in the Southern Appalachians. For. Ecol. Manag. 2006, 237, 103–114. 6. Wells, L.A.; Chung, W.; Anderson, N.M.; Hogland, J.S. Spatial and temporal quantification of forest residue volumes and delivered costs. Can. J. For. Res. 2016, 46, 832–843. [CrossRef] 7. SAS Institute. SAS/STAT(R) 9.3 User’s Guide. Available online: https://support.sas.com/documentation/ cdl/en/statug/63962/HTML/default/viewer.htm#titlepage.htm (accessed on 3 March 2017). 8. The R Project for Statistical Computing. Available online: http://www.r-project.org/ (accessed on 3 March 2017). 9. MATLAB. MathWorks-MATLAB and Simulink for Technical Computing. Available online: http://www. mathworks.com/ (accessed on 3 March 2017). 10. Hogland, J.S.; Anderson, N.M. Improved analyses using function datasets and statistical modeling. In Proceedings of the 2014 ESRI Users Conference, San Diego, CA, USA, 14–18 July 2014; Environmental Systems Research Institute: Redlands, CA, USA, 2014; pp. 1–14. 11. Hogland, J.S.; Anderson, N.M. Estimating FIA Plot Characteristics Using NAIP Imagery, Function Modeling, and the RMRS Raster Utility Coding Library; U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station: Portland, OR, USA, 2015; pp. 340–344. 12. Bunting, P.; Clewley, D.; Lucas, R.M.; Gillingham, S. The Remote Sensing and GIS Software Library (RSGISLib). Comput. Geosci. 2014, 62, 216–226. [CrossRef] 13. GDAL. GDAL—Geospatial Data Abstraction Library: Version 2.2.1, 2017, Open Source Geospatial Foundation. Available online: http://gdal.osgeo.org (accessed on 11 July 2017). 14. Jones, E.; Oliphant, E.; Peterson, P. SciPy: Open Source Scientific Tools for Python, 2001. Available online: http://www.scipy.org/ (accessed on 11 July 2017). 15. Bochkanov, S. ALGLIB. Available online: http://www.alglib.net/ (accessed on 3 March 2017). 16. Souza, C. Accord.Net Framework. Available online: http://accord-framework.net/ (accessed on 3 March 2017). 17. Banger, R.; Bhattacharyya, K. OpenCL Programming by Example; Packt Pyblishing Ltd.: Birmingham, UK, 2013; p. 304. 18. Sanders, J.; Kandrot, E. CUDA by Example: and Introduction to General-Purpose GPU Programming; Addison-Wesley: Upper Saddle River, NJ, USA, 2011; p. 290. 19. Ortega, L.; Rueda, A. Parallel drainage network computation on CUDA. Comput. Geosci. 2010, 36, 171–178. [CrossRef] 20. Walsh, S.; Saar, M.; Baily, M.; Lilja, D. Accelerating geoscience and engineering system simulations on graphics hardware. Comput. Geosci. 2009, 35, 2353–2364. [CrossRef] Big Data Cogn. Comput. 2017, 1, 3 14 of 14

21. White, T. Hadoop the Definitive guide, 4th Edition Storage and Analysis at Internet Scale; O’Reilly Media: Sebastopol, CA, USA, 2012; p. 756. 22. Yang, C.; Michael, G.; Huang, Q.; Nebert, D.; Raskin, R.; Xu, Y.; Bambacus, M.; Fay, D. Spatial cloud computing: How can the geospatial sciences use and help shape cloud computing? Int. J. Digit. Earth 2011, 4, 305–329. [CrossRef] 23. Xia, J.; Yang, C.; Liu, K.; Gui, Z.; Li, Z.; Huang, Q.; Li, R. Adopting cloud computing to optimize spatial web portals for better performance to support Digital Earth and other global geospatial initiatives. Int. J. Digit. Earth 2015, 8, 451–475. [CrossRef] 24. Launchbury, J. Lazy imperative programming. In Proceedings of the ACM Sigplan Workshop on State in Programming Languages, Copenhagen, Denmark, June 1993; Yale University Research Report YALEU/DCS/RR-968. Yale University: New Haven, CT, USA; pp. 46–56. 25. Jian, S.; Liu, K.; Wang, X. A survey on concepts and state of the art of functional programming languages. Systems and Computer Technology. In Proceedings of the 2014 International Symposium on Systems and Computer Technology, Shanghai, China, 15–17 November 2014; Taylor and Francis Group: London, UK, 2014; pp. 71–77. 26. Tremblay, G. Lenient evaluation is neither strict nor lazy. Comput. Lang. 2000, 26, 43–66. [CrossRef] 27. ESRI. ArcObjects SDK 10 Microsoft .Net Framework—ArcObjects Library Reference (Spatial Analyst). Available online: http://help.arcgis.com/en/sdk/10.0/arcobjects_net/componenthelp/index.html# /PrincipalComponents_Method/00400000010q000000/ (accessed on 3 March 2017). 28. U.S. Forest Service Software Center. RMRS Raster Utility. Available online: https://collab.firelab.org/ software/projects/rmrsraster (accessed on 3 March 2017). 29. Hogland, J.; Anderson, N.; Chung, W.; Wells, L. Estimating forest characteristics using NAIP imagery and ArcObjects. In Proceedings of the 2014 ESRI Users Conference, San Diego, CA, USA, 14–18 July 2014; Environmental Systems Research Institute: Redlands, CA, USA, 2014; pp. 1–22. 30. U.S. Department of Agriculture. National Agriculture Imagery Program (NAIP) Information Sheet. Available online: http://www.fsa.usda.gov/Internet/FSA_File/naip_info_sheet_2013.pdf (accessed on 3 March 2017).

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).