Computers, Environment and Urban Systems 65 (2017) 28–40

Contents lists available at ScienceDirect

Computers, Environment and Urban Systems

journal homepage: www.elsevier.com/locate/ceus

Integration of genetic algorithm and multiple kernel support vector regression for modeling urban growth

Hossein Shafizadeh-Moghadam a,⁎, Amin Tayyebi b, Mohammad Ahmadlou c, Mahmoud Reza Delavar c,d, Mahdi Hasanlou c a Department of GIS and Remote Sensing, Tarbiat Modares University, Tehran, b Geospatial Big Data Engineer, Monsanto, MO, United States c School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran d Center of Excellence in Geomatic Engineering in Disaster Management, School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran article info abstract

Article history: There are two main issues of concern for land change scientists to consider. First, selecting appropriate and inde- Received 30 November 2016 pendent land cover change (LCC) drivers is a substantial challenge because these drivers usually correlate with Received in revised form 10 April 2017 each other. For this reason, we used a well-known machine learning tool called genetic algorithm (GA) to select Accepted 10 April 2017 the optimum LCC drivers. In addition, using the best or most appropriate LCC model is critical since some of them Available online xxxx are limited to a specific function, to discover non-linear patterns within land use data. In this study, a support vec- tor regression (SVR) was implemented to model LCC as SVRs use various linear and non-linear kernels to better Keywords: Land cover change identify non-linear patterns within land use data. With such an approach, choosing the appropriate kernels to Support vector regression model LCC is critical because SVR kernels have a direct impact on the accuracy of the model. Therefore, various Genetic algorithm linear and non-linear kernels, including radial basis function (RBF), sigmoid (SIG), polynomial (PL) and linear Kernels (LN) kernels, were used across two phases: 1) in combination with GA, and 2) without GA present. The simulated Total operating characteristic maps resulting from each combination were evaluated using a recently modified version of the receiver operat- ing characteristics (ROC) tool called the total operating characteristic (TOC) tool. The proposed approach was ap- plied to simulate urban growth in County, which is located in the north of Iran. As a result, an SVR-GA-RBF model achieved the highest area under curve (AUC) value at 94% while the lowest AUC was achieved when using the SVR-LN model at 71%. The results show that the synergy between GA and SVR can effectively optimize the variables selection process used when developing an LCC model, and can enhance the predictive accuracy of SVR. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Yeh, 2002; Yang & Lo, 2003). Accordingly, numerous data mining and machine learning techniques have been developed to simulate LCC The process of land cover change (LCC) is a function of interrelated over the last three decades (see Tayyebi, 2013 for more details). The biophysical, economic, social, cultural and political driving forces or fac- availability of a wide spectrum of LCC modeling techniques has opened tors (Turner, Lambin, & Reenberg, 2007). Urbanization is one of the the opportunity for researchers to select those methods most appropri- most important dimensions of land cover systems, owing to the fact ate in helping to answer their research questions (Turner et al., 2007; that it can have a variety of impacts on different environmental factors Tayyebi et al., 2013 and Tayyebi, Pijanowski, Linderman, & Gratton, such as climate (Tayyebi & Jenerette, 2016), water quality (Zhou, 2014). Huang, Pontius, & Hong, 2016), the intensity of agricultural land cover Over the last three decades, various machine learning, data mining, (Song, Pijanowski, & Tayyebi, 2015; Tayyebi et al., 2016), and biodiver- statistics- and process-based LCC models have been used to understand sity and deforestation levels (Lambin & Geist, 2008). To develop a better LCC processes (Hu & Lo, 2007; Kamusoko & Gamba, 2015; Pijanowski et understanding of the effects of urbanization on the environment, and to al., 2014; Rienow & Goetzke, 2015; Shafizadeh-Moghadam, Hagenauer, quantify the driving forces behind complex urban systems, the use of in- Farajzadeh, & Helbich, 2015). When using LCC models, two key issues novative data mining and machine learning approaches is vital (Li & within land change science have tended to arise (Tayyebi, 2013). First, because LCC drivers operate on a variety of spatial and temporal scales, LCC modelers always have to decide on the best way to select the appro- ⁎ Corresponding author. priate and independent LCC drivers. This is due the fact that some of LCC E-mail addresses: h.shafi[email protected] (H. Shafizadeh-Moghadam), [email protected] (M. Ahmadlou), [email protected] (M.R. Delavar), models require the use of independent LCC drivers for modeling pur- [email protected] (M. Hasanlou). poses. Second, LCC modelers also have to decide on the best LCC

http://dx.doi.org/10.1016/j.compenvurbsys.2017.04.011 0198-9715/© 2017 Elsevier Ltd. All rights reserved. H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 29 model to use to model the LCC in question (Pontius et al., 2008) since applied on continuous data; however, LCC explanatory factors tend to users might not have a deep understanding of the existing LCC models consist of a combination of continuous and categorical data groupings in place and their capabilities. This limitation also arises as some LCC (Dormann et al., 2013). GA uses stochastic search methods inspired by models are limited to one function when identifying non-linear patterns natural evolution principles (Davis, 1991; Engelbrecht, 2007) to select within a land cover dataset. Among the large number of LCC studies to the most effective LCC drivers. For example, Tang, Wang, and Yao be found within the land change science literature, these issues remain (2007) coupled GA with Markov chain models to carry out a feasibility largely unresolved. study of the potential for remote sensing to predict future landscape Among the LCC models available, machine learning techniques (e.g., change. They found GA to be useful in helping to demonstrate spatial in- ANN and SVM) have gained increasing attention among scholars formation in a spatio-temporal model. On the basis of SVR and GA, (Kamusoko & Gamba, 2015; Pijanowski, Brown, Shellito, & Manik, Nieto, Fernández, de Cos Juez, Lasheras, and Muñiz (2013) suggested a 2002; Rienow & Goetzke, 2015; Shafizadeh-Moghadam, Asghari, hybrid approach known as a GA–SVR to predict the presence of Taleai, Helbich, & Tayyebi, 2017; Tayyebi & Pijanowski, 2014). One of cyanotoxins in the Trasona reservoir in northern Spain, but did not in- the main reasons why the ANN and SVM models are so popular is that vestigate the role of various kernels in the performance of the model, both approaches provide a variety of functions (e.g., ANN) and kernels nor did they compare their suggested model against SVR without the (e.g., SVM) able to model the complexity and non-linearity of urban dy- GA component added. The second objective of this study is to combine namics (Shafizadeh-Moghadam & Helbich, 2015; Tayyebi, Pijanowski, & the GA approach with various SVM kernels, the aim being to reduce the Pekin, 2015 and Tayyebi, Pijanowski, & Tayyebi, 2011). These properties high dimensionality levels of the input variables and provide the opti- give users a choice of functions and kernels so as to model LCC. In the mum selection of predictive variables. portfolio of available LCC models, being able to understand their perfor- Carrying out an accuracy assessment of the suitability and accuracy mance, characteristics, strengths and limitations is a scientific require- of the simulated maps produced by a model is of great importance ment during the model selection phase (Pontius et al., 2008; Tayyebi (Pontius & Schneider, 2001), and a variety of calibration metrics have et al., 2014). SVM is a well-known machine learning technique which been used in land change science to this end (Tayyebi et al., 2014). transforms input data to a higher dimension in order to solve non-linear Among them, receiver operating characteristic (ROC) – one of the classification or regression problems (Cortes & Vapnik, 1995), and is in- most common accuracy matrices – has received a lot of attention in dependent of any prior knowledge (Rienow & Goetzke, 2015). SVM land change science circles (e.g., Hu & Lo, 2007; Rienow & Goetzke, learning problems can be expressed as convex quadratic programming 2015). ROC evaluates the predictive ability of a suitability map for bina- problems, the aim of which is to seek the global, optimal solution (Lin & ry classification, using a reference map for each given threshold, varying Yan, 2016). SVM can, therefore, avoid the issue of local extremes, a from 0 to 1. However, ROC has recently been criticized by scientists (e.g., problem that can occur when using other machine learning techniques Golicher, Ford, Cayuela, & Newton, 2012; Pontius & Si, 2014); for exam- such as ANN (Vapnik & Vapnik, 1998). SVM has been shown to be an ef- ple, for failing in cases where some types of error are more important fective tool to create LCC maps as part of an LCC modeling approach than others (Pontius & Parmentier, 2014). ROC also fails to reveal the (Rienow & Goetzke, 2015; Yang, Li, & Shi, 2008). The core functionality size of each entry in the contingency table for each threshold. As an al- of the SVM approach is the use of kernels which can take both linear and ternative, Pontius and Si (2014) recently introduced the total operating non-linear forms. Kernels play an important role in creating the predic- characteristic (TOC) approach to rectify the limitations of ROC. The area tion accuracy of SVM models, and the most common kernels that have under the curve for TOC is the same as for ROC. In this study we com- been used with the SVM approach include the radial basis function pared the performance of both ROC and TOC in the evaluation of SVR (RBF), sigmoid (SIG), polynomial (PL) and linear (LN) kernels. However, kernels. To confirm the efficiency of the proposed framework, SVR we know of no study carried out that has recommended the best and with varying kernels was implemented with and without the GA, and most appropriate kernels to use when modeling LCC. As a result, the the results were compared. first objective of this research is to evaluate the influence of linear and non-linear kernels on the accuracy of the simulated urban growth 2. Study area and dataset maps produced using the SVM approach. When using the LCC modeling process, several environmental and 2.1. Study area socio-economic variables can influence urban growth (Hu & Lo, 2007). As well as the large volume of satellite images available, which are In this study, we used our models on the city of Rasht, the capital of often computationally expensive to process, the large number of explan- in Iran and the largest and most populous city on the atory variables involved can give rise to the issue of computational time Caspian Sea coast. The city is located at 37° 53′ N and 49° 58′ E, and being required (Pijanowski et al., 2014). Moreover, collinearity, which has an area of approximately 180 km2 (Fig. 1). In its 2011 census, the refers to the dependency among predicting factors (Dormann et al., county's population was 920,000. The city has six districts, these being 2013), is another issue which can arise from the use of a large number , Khoshke Bijar, Kuchesfahan, Lashte Nesha, Sangar and Cen- of predictors. The problem of collinearity is particularly serious when a tral. The conjunction of the Caspian Sea coast with the plains and moun- model has to be adjusted and prepared in the light of data coming tainous regions set behind has made urban Rasht one of the major from one district or time point that used to make a prediction in another tourist centers in Iran, attracting thousands of tourists annually. area, or data with a different or obscure collinearity structure (Pontius, In recent decades the city has experienced increasing population Huffaker, & Denman, 2004). The selection of the most effective predictive growth and urban expansion. Similar to other large Iranian cities and variables is essential when modeling LCC. For example, Pal and Foody provincial capitals, industrialization is a key feature in this region. In (2010) revealed that the performance of SVM varies as a function of Gilan Province as a whole, the people, services, industry and investment the number of input features. In another study, Shafizadeh-Moghadam are concentrated in Rasht, and the city's economic growth has influ- et al. (2015) showed that the design and even the types of functions enced most of the province's peripheral cities. Due to urban expansion used to model LCC can affect the accuracy of the model produced. in recent decades, many peripheral villages have been absorbed into To achieve a better performance, therefore, it is important to pay Rasht's urban zones. careful attention to the optimum selection of the model's features. For this purpose, several statistical techniques such as principle component 2.2. Dataset analysis (PCA; Dormann et al., 2013) and factor analysis, as well as evo- lutionary techniques such as the genetics algorithm (GA) have been The information required for this study was derived from Landsat used in previous studies (Shan, Alkheder, & Wang, 2008). PCA can be satellite images, plus we extracted road networks from topographic 30 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

Fig. 1. The study area of Rasht County (right) which shows the true color composition of Landsat 2015, located in northern Iran (left).

maps. To map the spatiotemporal urban footprint and land cover clas- of total cells) while the cells that were changed from built-up to non- ses, Landsat images for 1985 (TM, May 1985), 2000 (ETM+, May built up areas (urban loss) were 453 and 311, respectively. The urban 2000) and 2015 (Landsat 8, April 2015) were processed and projected losses were not considered for modeling process and removed from fur- to UTM Zone 39 North, with a 30 m spatial resolution. A set of 11 driving ther analysis. Moreover, the built-up areas and water bodies in 1985 forces (Fig. 2) were employed as explanatory variables. The selection of were excluded from the modeling process. Fig. 3 illustrates the urban these driving forces was based on previous experimental studies (Hu & expansion that took place between 1985 and 2000, and between 2000 Lo, 2007; Liu et al., 2008; Shafizadeh-Moghadam et al., 2015) and the and 2015. We used the 1985–2000 time interval for the training run, local characteristics of the region. and 2000–2015 time interval for the testing run. The classification procedure was conducted using a supervised algo- rithm called maximum likelihood. The classification process resulted in 3. Methods five land cover classes including built-up areas, crop lands, open lands, water bodies and forests. To prepare the data, the land cover classes After the data had been prepared, eleven explanatory factors were were aggregated into a binary class including built-up and non-built associated with the SVR-RBF, SVR-PL, SVR-SIG, and SVR-LN models up areas. Aiming at improving the accuracy of the binary land cover used for the training run. Then, the GA was integrated with each maps, a process called post-classification phase was then followed. model to perform the optimum feature selection for the SVR kernels The post-classification process was performed using expert knowledge (Fig. 4). The subsequent procedure involved assessing the predictive ac- from study area as well as using majority filters to merge the isolated curacy of the models' outputs using the TOC and ROC statistical cells and finally using Google Earth and topographic maps for the measures. accuracy assessment. The built-up areas covered 4.36%, 7.33% and 12.50% in 1985, 2000 and 2015, respectively. In contrast, the non-built 3.1. Support vector regression up areas covered 95.64%, 92.67% and 87.50% in 1985, 2000 and 2015, respectively. SVMs are popular machine learning methods and are used to esti- To measure the accuracy of land cover maps, 350 sample points mate distribution, classification and regression tasks which were first were randomly distributed in entire landscape across built-up and introduced by Cortes and Vapnik (1995). SVMs use a non-linear tech- non-built up areas. Land cover maps in 1985, 2000 and 2015 had the nique which transforms the original covariate into a higher or infinite overall accuracy of 87%, 86% and 88% respectively. To extract the LCC dimensional feature space, the aim of which is to find an optimal sepa- maps from the two sequent times, a spatial overlay operation was per- rating hyperplane or a set of hyperplanes(Moguerza & Muñoz, 2006). formed. LCC map during 1985–2000 and 2000–2015 showed cells For utilizing classification purposes using SVM technique, most of re- changed from non-built up to built-up areas (called urban growth) searchers used the Support Vector Classification (SVC). On the other which were 39,226 cells (2.97% of total cells) and 68,313 cells (5.17% hand, for regression application using SVM technique, it is common to H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 31

Fig. 2. Spatial explanatory variables of urban growth extracted from the Landsat image of 1985 for model calibration.

use the Support Vector Regression (SVR). SVR incorporates the same suitable SVR parameters (ε,C) and utilized kernel parameter (s). Never- principles as SVM for classification, with only a few minor differences: theless, the main idea is always the same for classification and regres- 1) with output result being a real number, it becomes difficult to predict sion problem (to minimize error and individualize the hyperplane the information that has infinite possibilities. In regression problem, a which maximizes the margin). To model LCC, we used the ε-SVR margin of tolerance (ε) is set to approximate the SVM which is already model, which results in the creation of a suitability map. requested from the problem, and 2) the algorithm is more complicated. There are four types of kernel commonly used (Pradhan, 2013). This complication is corresponded to estimation and selection of the They include linear (LN), polynomial (PL), radial basis function 32 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

Fig. 3. Urbanization between 1985–2000 and 2000–2015, as well as the exclusionary zone covering current built-up areas and water bodies in 1985.

(RBF), and sigmoid (SIG). Each of these types of kernel requires spe- Voženílek, 2011). If the value of C is small, then a higher degree of cific parameters to be set and coefficients to be determined, so that training errors can be expected (Tehrany, Pradhan, & Jebur, 2014). the parameterization and proper selection of these kernels is able For RBF, SIG and LN kernels, the degree of non-linearity (Bui, to control the level of accuracy and success of the SVR (Cortes & Pradhan, Lofman, Revhaug, & Dick, 2012) is controlled by the kernel Vapnik, 1995). In this study, we used the LIBSVM package in width or gamma (γ) parameter while for the PL and SIG kernels, d is MATLAB, which is a library for support vector machine implementa- the polynomial degree and r is the bias term (Hasanlou, tion (Chang & Lin, 2011). During the parameterization process there Samadzadegan, & Homayouni, 2015; Pradhan, 2013; Tehrany et al., are four parameters which should be carefully adjusted. The C 2014). Table 1 shows the structure of ε-SVR and the SIG, LN, PL and parameter, known as a regularization parameter or penalty factor, RBF kernels as implemented in LIBSVM (Chang & Lin, 2011). n is set to avoid models over-fitting, and adjusts the trade-off between Consider a set of training points, {(x1,z1), …,(xl,zl)}, where xi ϵ R is 1 training errors and margins (Marjanović,Kovačević, Bajat, & a feature vector and zi ϵ R is the target output. Under the given

Fig. 4. Urban growth modeling flowchart. H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 33

Table 1 surfaces, offering a list of optimum variables, and being able to carry Different type of kernels and parameters in the SVR method. out simultaneous searches of a wide sample of cost surfaces (Haupt & Kernel type Formula Estimation parameters Haupt, 2004). While the output of LCC models have binary status, a binary feature Linear k(x,y)=xTy C, ε T d selection was used as the encoding strategy, as a result of which eleven Polynomial k(x,y)=(γx y+β0) d, γ, β0,C,ε 2 Radial basis function k(x,y)=e(−γ ‖x−y‖ ) γ,C,ε bits' of binary chromosome (eleven input feature sets) were found. If Sigmoid K(x,y)=tanh((γxTy+r). γ, r,C,ε the selection is filled by zeroes, it means that the associated feature must be ignored. The problem of feature selection was modelled as fol- lows: Each individual has a size of n (=11) features (genes), as shown parameters of C N 0andε N 0, the standard form of SVR (V. Vapnik, in Fig. 2, and each gene represents bit strings of 0 s and 1 s which are 2013)is: chosen for coding. Since n features form one combination, chromosome is arranged as comprised of n individual feature sequence numbers l l min 1 ωT ω þ ∑ ξ þ ∑ ξ ð Þ which are arranged in a serial mode (Fig. 5). C i C i 1 ω; b; ξ; ξ 2 i¼1 i¼1 In addition, the RMSE criterion (Eq. (5)) was also used as a fitness function within the GA algorithm, which means individuals with the ωT φðÞþ− ≤ε þ ξ ; minimum RMSE were selected as final optimal features. The RMSE of a subject to xi b zi i model prediction with respect to the estimated variable Xmodel is de- −ωT φðÞ− ≤ε þ ξ ; fined as the square root of the mean squared error. After solving SVR zi xi b i model, we can apply decision functions to predict labels of the testing ξ ; ξ ≥ ; ¼ ; …; : data. Let x , … ,x be the testing data and f(x )=X , … ,f(x )= i i 0 i 1 l 1 n 1 model,1 n Xmodel,n be the decision values predicted by the tuned model. If the φ ξ ; ξ true labels of testing data are known and denoted as X , … ,X , where (xi)mapsxi into a higher-dimensional space, i i are slack var- obs,1 obs,n iables to cope with otherwise infeasible constraints of the optimization we evaluate the prediction results by the following measure. N fl problem, and C 0 determines the trade-off between the atness and sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ε ÀÁ the amount up to which deviations larger than are tolerated. This cor- ∑n − 2 i¼1 Xobs;i Xmo del;i responds to dealing with a so called ε–insensitive loss function |ξ|ε de- RMSE ¼ ð5Þ scribed by n 0 if jjξ bε where Xobs is observed values and Xmodel is modelled values for each run jjξ ε ¼ ð2Þ jjξ −ε otherwise of individuals i. Table 2 presents all the parameter sets for the GA, for which samples Due to the possible high dimensionality of the vector variable ω,the were chosen randomly. In order to ensure reliability of the results, 20 following dual problem is usually solved: GA runs for each dataset were performed. Those features that appeared more frequently were then selected. Tuning the parameters and opti- l l min 1 ðÞα−α T ðÞþα−α ε ∑ðÞþα þ α ∑ ðÞα −α ðÞ mum feature selection for each SVR kernel type facilitated the evalua- Q i i yi i i 3 α; α 2 i¼1 i¼1 tion of the SVR method using different kernel models. To assess the effectiveness of the GA, SVR was implemented with and without the GA. subject to eT ðÞ¼α−α 0; 3.3. Model validation 0≤αi; αi ≤C; i ¼ 1; …; l: Accuracy assessment of the simulated urban growth maps, which T where e=[1,…,1] is the vector of all ones, αi; αi are Lagrange multi- shows how well the reference and simulated maps align, is an essential pliers, Q is an l×l positive semi definite matrix, Qij =K(xi,xj)≡φ(xi)- step (Azari, Tayyebi, Helbich, & Reveshty, 2016; Pijanowski, Tayyebi, T φ(xj), and K(xi,xj) is the kernel function. Delavar, & Yazdanpanah, 2010). It is performed through comparison After solving problem (3), the approximate function is: of three maps: the reference map at the start of simulation time, the

l ∑ðÞ−αi þ αi KxðÞþi; x b: ð4Þ i¼1

The model output (α∗−α), supports vectors and other information such as kernel parameters in the model for prediction.

3.2. Genetic algorithms

The input dimension is one of the key parameters controlling the ac- curacy of the SVM outcomes. Identifying the most explanatory and un- correlated factors (variable selection) can play a very important role in dimensionality reduction (Mas, Puig, Palacio, & Sosa-López, 2004). We integrated SVR models with GA to optimize the feature selection pro- cess. The GA was first proposed by Holland (1975) which is an optimi- zation algorithm that finds its roots in genetics and natural selection. GA allows the evolution of population to achieve a maximum fitunder specified rules (Haupt & Haupt, 2004). GA offers several advantages such as optimizing both continuous and discrete variables independent of the derived information, handling a large number of variables, being suitable for parallel computers, working well with highly complex cost Fig. 5. Binary coding in bit string mode for selecting optimum feature. 34 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

Table 2 and GA-SVR models, urban growth in the Rasht County for 2015 was GA parameters. predicted in validation run. Parameters Value Having executed the model, transition suitability maps were obtain- ed for each model. The quantity of change was calculated by comparing Population size 20 Crossover rate 0.7 the reference land cover maps from 2000 to 2015. Between the years Elitism Ration 1 2000 and 2015, a total of 68,313 cells converted to the built-up. Cells Mutation Ratio 0.05 were sorted in the suitability map and converted in order of decreasing Crossover Method Two point suitability until the quantity of change between the time points was Max Iterations 20 Elite count 1 met and urban growth maps in 2015 for various models were produced (Pijanowski et al., 2002; Pijanowski et al., 2014; Shafizadeh-Moghadam et al., 2015). The value of cells in suitability map vary from 0 to 1 where reference map at the end of simulation time, and the simulation map at cells with values closer to 0 and 1 are more likely for “no-change” and the end of simulation (Brown et al., 2013). Therefore, in this study data “change”, respectively. However, the value of cells in simulated map taken from between 1985 and 2000 were used to calibrate the model are either 0 or 1 which represent “ no-change” and “change” between while the data taken from between 2000 and 2015 were used to vali- two times, respectively. date the model. To assess the accuracy of the modeling approaches used in this study, simulated maps of 2015 were overlaid with the ob- 4. Results and discussion served change map of 2000 and 2015 and the following categories were obtained (Brown et al., 2013; Shafizadeh-Moghadam, Asghari, Machine learning algorithms are designed to detect patterns across Tayyebi, & Taleai, 2017): 1) reference change simulated correctly as space and time. Various LCC models have various purposes; thus, an change (i.e., hits), 2) reference change simulated incorrectly as persis- “ideal” model does not exist because ideal depends on the purpose. tence (i.e., misses), 3) reference persistence simulated incorrectly as LCC models are designed to detect patterns during the calibration time change (i.e., false alarms), and 4) reference persistence simulated cor- interval, and then extrapolate those patterns into the validation time in- rectly as persistence (i.e., correct rejections). Furthermore, the perfor- terval. It is possible that LCC model detects appropriate patterns during mance of the models was assessed using the TOC metrics. the calibration interval, but does not predict accurately during the vali- TOC converts a suitability map into a simulated map and then com- dation interval since the process of urban growth during the calibration pares the simulated map with a reference map using a contingency interval can be different from the process of urban growth during the table. The suitability map contains cells with values which range from validation interval. Furthermore, the reference maps are usually con- 0 (less likely to change) to 1 (more likely to change), while the simulat- taminated with errors, so this might directly affect calibration and vali- ed map contains cells with values of either 0 (cells which do not change dation runs. Moreover, model selection, model limitations, data scarcity, over the selected time intervals) or 1 (cells which change over the se- sources of uncertainty, and the complexity of driving forces and their lected time intervals). TOC results can be summarized as hits (cells cor- functionality are among key challenges when modeling LCC. Hence, rectly predicted as having changed), misses (cells incorrectly predicted selecting independent drivers before modeling and selecting an appro- as not having changed), false alarms (cells incorrectly predicted as hav- priate LCC model has always been a main issue within land change sci- ing changed) and correct rejections (cells correctly predicted as having ence circles. remained unchanged). Although the ROC tool is frequently applied in LCC science (Rienow &Goetzke,2015;Shafizadeh-Moghadam et al., 2015), a number of stud- 4.1. SVR without GA ies have argued against using it (Pontius & Si, 2014). One of the most im- portant limitations is that the ROC fails to reveal the size of each entry in Parameters associated with each kernel in the SVR model were the contingency table for each threshold (Pontius & Si, 2014). Thus, tuned. This tuning process was conducted using the gird search (GS) ap- Pontius and Si (2014) recently suggested using the TOC method as an al- proach, which lead to minimizing the RMSE and maximizing the R2 ternative to overcome ROC's limitations. While the TOC method creates values. The GS approach is a simple yet exhaustive search tool used to a curve very similar to the ROC method, it includes enough information effectively find hyper-parameters in a pre-defined hyper-parameter so that the user can recreate a two-by-two presence and absence ma- space area (Surendar, Mohankumar, Anand, & Prasanna, 2015). In trix. Also, for each threshold, the four members of the contingency tuning process, we first tuned the processing component (SVR parame- table can be observed at once on the TOC curve (Cook, 2007; Golicher ters) and in second step, the input component was tuned. Thus, by fix- et al., 2012). TOC's horizontal axis represents the size of a study area ing the SVR parameters (processing component), we tuned the input and the vertical axis shows the total number of cells that have changed component (suitable features) by using GA feature selection procedure. in a reference map. The TOC tool also includes two dashed lines in which Table 3 shows the tuned parameters associated with each kernel with the maximum and minimum lines outline a mathematically possible incorporating full feature sets. Using this approach, the gamma (γ), C area where the ROC curve can appear. Interpreting TOC is also more in- and epsilon (ε) parameters for the RBF kernel, the gamma (γ), coeffi- tuitive because the axis has no longer the ratio of contingency table's cient (r), C and epsilon (ε) parameters for the SIG kernel, the gamma member like ROC curve. (γ), degree (d), coefficient (r), C and epsilon (ε)parametersforthePL kernel, and the epsilon (ε) and C parameters for the LN kernel were all 3.4. Model implementation tuned.

To set up the LCC models, cells between 1985 and 2000 – those that had been transformed to built-up from all other classes – were coded as Table 3 1, while those that remained unchanged were coded as 0. The resulting Tuned parameters obtained by the grid search method. binary map was set as the target variable, with a set of 11 driving forces Model Gamma Coef0 Degree Epsilon C RMSE R2 set at the initial time (1985), as shown in Fig. 2, representing the explan- (γ) (r) (d) (ε) (cost) atory variables. The entire model calibration including training and test- SVR-LN –––0.0730 0.6325 0.288 0.814 ing (Pontius et al., 2004; Tayyebi et al., 2013) was programmed in the SVR-PL 0.0625 0.0235 4 0.0740 0.5635 0.284 0.838 MATLAB software. During the model calibration phase, 60% of the cells SVR-RBF 0.0915 –– 0.0965 0.5240 0.270 0.844 SVR-SIG 0.0430 0.0030 – 0.0830 0.5185 0.280 0.832 were used for training and 40% for testing purposes. Using the SVR H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 35

Table 4 between 1985 and 2000 were allocated to the highest values. Fig. 6 Performance of the SVR model without GA. shows the error map obtained by overlaying the maps simulated by Kernels RMSE R2 the SVR models on to the reference maps. Since the focus of this paper was to simulate urban growth, the size of hits and misses by each kernel LN Testing 0.300 0.784 Training 0.288 0.814 is of great importance. Visual exploration of the error maps show dis- PL Testing 0.296 0.792 crepancy in the pattern and extent of hits and misses among different Training 0.284 0.838 kernels. In detail, the results show that SVR-RBF outperformed other RBF Testing 0.286 0.838 kernel types so that having the highest hits and lowest misses while Training 0.270 0.844 SIG Testing 0.274 0.826 the SVR-LN was the least accurate one. On the other hand, SVR-SIG ap- Training 0.280 0.832 peared more accurate than SVR-PL.

4.2. SVR with GA The performance of the models for each kernel by considering the training and testing data, are illustrated in Table 4. As explained earlier, The performance of the models for each kernel, including the GA and the result of execution was a suitability map which was sorted in de- taking the training and testing data into account, are illustrated in Table scending order and the amount of cells transformed to built-up areas 5. The results show that the best performance was achieved by the RBF

Fig. 6. Error maps obtained by overlaying the reference maps for 2000 and 2015 on the SVR produced map for 2015. 36 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

Table 5 generations, by which time the fitness values had reached a stable min- Performance of the SVR-GA model. imum of 0.25. Kernels RMSE R2 4.2.1. Feature selection using GA LN Testing 0.243 0.836 Training 0.221 0.884 Running the four combined SVR models with GA (these being the PL Testing 0.246 0.823 SVR-LN-GA, SVR-PL-GA, SVR-RBF-GA and SVR-SIG-GA approaches) Training 0.213 0.894 showed that a different set of factors contributed to each model. RBF Testing 0.235 0.861 Among the input variables, two of them were selected by the GA to be Training 0.218 0.885 SIG Testing 0.242 0.850 associated with all the kernel types; easting and northing. Together, Training 0.229 0.873 these factors reflect the importance of geographic coordinates and spa- tial aspects when modeling urban growth. In accordance with the east- ing and northing results, the models indicated that new construction is kernel with RMSE, at 0.23 and R2 = 0.86. Fig. 7 shows the best fitness likely to occur close to existing built-up areas. Excluding the SVR-RBF- values and the means of the fitness values (RMSE) for each generation GA, which considered seven factors (distance to built-up areas, distance of the different SVR kernel types, and for a single GA run. As illustrated, to the sea, distance to a river, distance to a forest, height, and northing trends of approaching to the best chromosome were reasonable for each and easting coordinates; see Table 6), the other three models included kernel type. Reasonable means that the trend of both ‘best fitness’ and six factors (Table 6). Important factors for the SVR-LN-GA model were ‘mean fitness’ minimizes with the progress of generation for incorporat- distance to agriculture areas, distance to the sea, distance to a river, dis- ed kernels. We conducted a training across generations to identify a tance to a forest, and northing and easting coordinates. For the SVR-PL- training generation that would produce model results that deviated GA model, the factors considered were distance to a built-up area, dis- from the observed values. For nearly all the models, mean fitness values tance to a river, distance to a road, height, and northing and easting co- started at around 0.30 and dropped rapidly over 10 generations. Simi- ordinates. For the SVR-SIG-GA model, factors seen as most important larly, the best fitness values started around 0.27 and dropped rapidly were distance to the sea, distance to a built-up area, distance to a over 8 generations (Fig. 7). We halted the training process at 20 road, height, and northing and easting coordinates. As shown, in this

Fig. 7. Fitness values achieved by the proposed GA approach. H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 37

Table 6 model some of the 11 contributing factors were important in a kernel, Feature selection by GA for the SVR technique using various kernels. while being completely ignored by the other models. For example, dis- Independent variable in SVR-GA-LN SVR-GA-PL SVR-GA-RBF SVR-GA-SIG tance to crop land was important for only the SVR-LN-GA model, and 1985 played no role at all in the other models Table 6 shows the feature selec- Distance to crop lands √ –– – tion by GA for the SVR technique using various kernels during the model Distance to sea √ – √√ calibration. The variables belong to the year 1985. Distance to built-up areas – √√ √ √√√ – Distance to river 4.2.2. Performance of the SVR-GA models Distance to forest √ – √ – Distance to roads – √ – √ The results show that using GA led to an improvement in the rate of Height – √√ √ cells correctly predicted as having changed by 16.8%, 9%, 6%, and 5.5% for Slope ––– – the SVR-LN, SVR-PL, SVR-RBF and SVR-SIG models respectively. GA also ––– – Aspect increased the accuracy of the rate of cells correctly predicted as having Easting parameter √√√ √ Northing parameter √√√ √ not changed by the SVR-LN, SVR-PL, SVR-RBF and SVR-SIG models by 8.5%, 5%, 6.5%, and 3% respectively. These results confirm that the

Fig. 8. Error map obtained by overlaying the reference maps for the years 2000 and 2015 with the map for 2015 predicted by the SVR-GA models. 38 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

a) RBF b) SIG

c) PL d) LN

Fig. 9. TOC performance curves for the a) SVR-LN-GA, b) SVR-PL-GA, c) SVR-RBF-GA and d) SVR-SIG-GA models. The blue and red ROC curves show the results of modeling with and without GA integration respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 10. Sizes of FOM and PCM for SVR-LN, SVR-PL, SVR-RBF and SVR-RBF with and without GA. H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40 39 number of input features changes the accuracy of SVR kernels, as men- This enables RBF kernel to account for the spatial heterogeneity of geo- tioned by Pal and Foody (2010). It was also important to investigate graphic data in which global patterns might be contradicted by local how many of the cells predicted as having changed or not changed patterns (Anselin, 1995; Fotheringham, Brunsdon, & Charlton, 2002). matched the reference maps. In this regard, the level of discrepancy be- Also, the RBF kernel devotes the highest weight to the center of kernel tween the SVR kernels with and without using GA was remarkable. Fig. and the weights decrease with further distance from the center. This at- 8 shows the error map obtained by overlaying the maps simulated by tribute can be seen in the form of the first law of geography or spatial the SVR-GA models on to the reference maps. autocorrelation stating that the closer objects are the more similar and ROC and TOC curves were used to assess the predictive ability of the thus they get more weights in modeling than the more distant objects. transition suitability maps, which were evaluated against the corre- On the other hand, in both SVR-GA-RBF and SVR-RBF models, the LN sponding reference urban growth maps (for between 2000 and 2015). kernel was the least accurate one. This is due to the fact that the LN ker- For the analysis of TOC, only the non-urban pixels in 2000 were consid- nel separates the input space linearly while the driving forces of urban ered, meaning the urban pixels in 2000 are removed from the analysis. growth operate in a very complex way. As seen from Fig. 9, the SVR-RBF-GA and SVR-SIG-GA models produced a sharp rise at the beginning when compared to the SVR-PL-GA and SVR-LN-GA models. The results of the area under the curve (AUC) 5. Conclusion were 94%, 90% 86% and 80% for the SVR-RBF-GA, SVR-SIG-GA, SVR-PL- GA and SVR-LN-GA models respectively (Fig. 9). A comparison of the Assuming that a rich dataset is available, being able to carry out an performance of the different kernels when using and not using GA re- optimized feature selection process will have an important effect on vealed the effectiveness of GA when selecting features. Integrating the the generalization ability of LCC models. This paper highlights the im- GA and SVM tools increased the AUC by 5% for the SVR-SIG model, by portance of selecting the right features and kernels to the accuracy of 9% for the SVR-LN, by 3% for the SVR-RBF model, and by 4% for the maps produced when using the SVR model. In this study, the authors SVR-PL model, when compared to no integration at all. These results calibrated the SVR model with LN, PL, SIG, and RBF kernels, in order to show that the SVR-RBF-GA model was better at simulating urbanization simulate urban growth. To show the importance of feature selection than the other model combinations. This finding is in agreement with on model performance, once all the selected explanatory factors had the work of Rienow and Goetzke (2015), whose SVM model with an been associated with the selected kernels, the GA was used to carry RBF kernel achieved an AUC value of 0.94. out an optimum feature selection process. To evaluate the performance fi Further assessment indicated that from the 68,300 cells between of each model, TOC statistical measures were employed. The ndings 2000 and 2015 that were transformed into the built-up class, 55.5%, showed that the GA was able to explicitly enhance the accuracy and 58%, 66.5% and 62% were correctly classified by the SVR-LN-GA, SVR- generalization ability of the kernels utilized. The results revealed that PL-GA, SVR-RBF-GA and SVR-SIG-GA models respectively. Also, the per- the highest performance belonged to the SVR-RBF-GA model and the centage correctly predicted as “not changing” ranged between 95% and lowest to the SVR-LN-GA model. The authors conclude that using hybrid 96%. LCC models can act as an effective alternative when wishing to over- The highest percentage of correct predictions belonged to the SVR- come the limitations of individual models, and can result in substantial RBF-GA model and the lowest percentage to the SVR-LN-GA model accuracy improvements. (Fig. 10). This finding mirrors the figures achieved by the TOC statistical index. The reason that LN resulted in the lowest accuracy can be attrib- References uted to the fact that the LN kernel appeals to linearly separable features whereas urban growth and its associated driving forces are both com- Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, fi 27(2), 93–115. plex and non-linear (Sha zadeh-Moghadam et al., 2015). The percent- Azari, M., Tayyebi, A., Helbich, M., & Reveshty, M. A. (2016). Integrating cellular automata, ages of Misses and False Alarms were exactly same for each model. artificial neural network, and fuzzy set theory to simulate threatened orchards: Ap- The reason was that the quantity of urban growth between two time in- plication to Maragheh, Iran. GIScience & Remote Sensing, 53(2), 183–205. tervals was fixed by the researcher. Brown, D. G., Band, L. E., Green, K. O., Irwin, E. G., Jain, A., Lambin, E. F., ... Verburg, P. H. (2013). Advancing land change modeling: Opportunities and research requirements. Washington DC: The National Academies Press. 4.3. Comparing SVR and SVR-GA Bui, D. T., Pradhan, B., Lofman, O., Revhaug, I., & Dick, O. B. (2012). Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers & Geosciences, 45,199–211. We found that modeling urban growth by considering the full fea- Chang, C. -C., & Lin, C. -J. (2011). LIBSVM: A library for support vector machines. ACM ture sets (SVR modeling) compared to modeling optimum feature sets Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. (SVR-GA) is less accurate (Figs. 6 and 8). This result is in line with Pal Cook, N. R. (2007). Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation, 115(7), 928–935. and Foody's (2010) that revealed the number of input features affects Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), the SVM performance and thus selection of the most predictive vari- 273–297. ables is an essential step. Inclusion of GA, thus, showed to be an impor- Davis, L. (1991). Handbook of genetic algorithms. Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., ... Leitão, P. J. (2013). tant step reducing the adverse effects of high dimensional data such as Collinearity: A review of methods to deal with it and a simulation study evaluating multi-collinearity or correlation among drivers (Tayyebi, Tayyebi, their performance. Ecography, 36(1), 27–46. Arsanjani, Moghadam, & Omrani, 2016; Tayyebi et al., 2014). Moreover, Engelbrecht, A. P. (2007). Computational intelligence: An introduction. John Wiley & Sons. Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regres- quantitative procedures such as TOC curve (Fig. 9) showed that model- sion: The analysis of spatially varying relationships. New York: Wiley. ing urban growth had higher accuracy utilizing SVR modeling with op- Golicher, D., Ford, A., Cayuela, L., & Newton, A. (2012). Pseudo-absences, pseudo-models timum feature selection method (GA). and pseudo-niches: Pitfalls of model selection based on the area under the curve. – Moreover, SVR models were tested using various kernels. It was International Journal of Geographical Information Science, 26(11), 2049 2063. Hasanlou, M., Samadzadegan, F., & Homayouni, S. (2015). SVM-based hyperspectral found that the RBF kernel had the best performance compared to image classification using intrinsic dimension. Arabian Journal of Geosciences, 8(1), other utilized kernels in both SVR-GA-RBF and SVR-RBF scenarios 477–487. (Table 6). Similarly, Pradhan (2013) found that RBF kernel achieved Haupt, R. L., & Haupt, S. E. (2004). Practical genetic algorithms. John Wiley & Sons. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis the best prediction results among other kernels. This is due to the fact with applications to biology, control, and artificial intelligence. U Michigan Press. that the RBF kernel is constructed based on the principles that are in Hu, Z., & Lo, C. (2007). Modeling urban growth in Atlanta using logistic regression. harmony with the characteristics of geographic data. First, in RBF kernel, Computers, Environment and Urban Systems, 31(6), 667–688. Kamusoko, C., & Gamba, J. (2015). Simulating urban growth using a Random Forest-Cel- the input data is not presented as a single layer to the kernel (global pat- lular Automata (RF-CA) model. ISPRS International Journal of Geo-Information, 4(2), tern), so the kernel trains over series of points to reach local pattern. 447–470. 40 H. Shafizadeh-Moghadam et al. / Computers, Environment and Urban Systems 65 (2017) 28–40

Lambin, E. F., & Geist, H. J. (2008). Land-use and land-cover change: Local processes and Shafizadeh-Moghadam, H., Asghari, A., Tayyebi, A., & Taleai, M. (2017). Coupling machine global impacts. Springer Science & Business Media. learning, tree-based and statistical models with cellular automata to simulate urban Li, X., & Yeh, A. G. O. (2002). Neural-network-based cellular automata for simulating mul- growth. Computers, Environment and Urban Systems, 64,297–308. tiple land use changes using GIS. International Journal of Geographical Information Shan, J., Alkheder, S., & Wang, J. (2008). Genetic algorithms for the calibration of cellular Science, 16(4), 323–343. automata urban growth modeling. Photogrammetric Engineering & Remote Sensing, Lin, Z., & Yan, L. (2016). A support vector machine classifier based on a new kernel func- 74(10), 1267–1277. tion model for hyperspectral data. GIScience & Remote Sensing, 53(1), 85–101. Song, W., Pijanowski, B. C., & Tayyebi, A. (2015). Urban expansion and its consumption of Liu, X., Li, X., Shi, X., Wu, S., & Liu, T. (2008). Simulating complex urban development using high-quality farmland in Beijing, China. Ecological Indicators, 54,60–70. kernel-based non-linear cellular automata. Ecological Modelling, 211(1), 169–181. Surendar, V., Mohankumar, V., Anand, S., & Prasanna, V. D. (2015). Estimation of state of Marjanović, M., Kovačević, M., Bajat, B., & Voženílek, V. (2011). Landslide susceptibility as- charge of a lead acid battery using support vector regression. Procedia Technology, 21, sessment using SVM machine learning algorithm. Engineering Geology, 123(3), 264–270. 225–234. Tang, J., Wang, L., & Yao, Z. (2007). Spatio-temporal urban landscape change analysis Mas, J. -F., Puig, H., Palacio, J. L., & Sosa-López, A. (2004). Modelling deforestation using GIS using the Markov chain model and a modified genetic algorithm. International and artificial neural networks. Environmental Modelling & Software, 19(5), 461–471. Journal of Remote Sensing, 28(15), 3255–3271. Moguerza, J. M., & Muñoz, A. (2006). Support vector machines with applications. Tayyebi, A. (2013). Simulating land use land cover change using data mining and machine Statistical Science,322–336. learning algorithms. Nieto, P. G., Fernández, J. A., de Cos Juez, F., Lasheras, F. S., & Muñiz, C. D. (2013). Hybrid Tayyebi, A., & Jenerette, G. D. (2016). Increases in the climate change adaption effective- modelling based on support vector regression with genetic algorithms in forecasting ness and availability of vegetation across a coastal to desert climate gradient in met- the cyanotoxins presence in the Trasona reservoir (northern Spain). Environmental ropolitan Los Angeles, CA, USA. Science of the Total Environment, 548,60–71. Research, 122,1–10. Tayyebi, A., Meehan, T. D., Dischler, J., Radloff, G., Ferris, M., & Gratton, C. (2016). Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SmartScape™: A web-based decision support system for assessing the tradeoffs SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297–2307. among multiple ecosystem services under crop-change scenarios. Computers and Pijanowski, B. C., Brown, D. G., Shellito, B. A., & Manik, G. A. (2002). Using neural networks Electronics in Agriculture, 121,108–121. and GIS to forecast land use changes: A land transformation model. Computers, Tayyebi, A., Pekin, B. K., Pijanowski, B. C., Plourde, J. D., Doucette, J. S., & Braun, D. (2013). Environment and Urban Systems, 26(6), 553–575. Hierarchical modeling of urban growth across the conterminous USA: Developing Pijanowski, B. C., Tayyebi, A., Delavar, M. R., & Yazdanpanah, M. J. (2010). Urban expan- meso-scale quantity drivers for the Land Transformation Model. Journal of Land Use sion simulation using geospatial information system and artificial neural networks. Science, 8(4), 422–442. International Journal of Environmental Research, 3(4), 493–502. Tayyebi, A., & Pijanowski, B. C. (2014). Modeling multiple land use changes using ANN, Pijanowski, B. C., Tayyebi, A., Doucette, J., Pekin, B. K., Braun, D., & Plourde, J. (2014). Abig CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data urban growth simulation at a national scale: Configuring the GIS and neural net- data mining tools. International Journal of Applied Earth Observation and work based land transformation model to run in a high performance computing Geoinformation, 28,102–116. (HPC) environment. Environmental Modelling & Software, 51,250–268. Tayyebi, A., Pijanowski, B. C., Linderman, M., & Gratton, C. (2014). Comparing three global Pontius, R. G., Huffaker, D., & Denman, K. (2004). Useful techniques of validation for spa- parametric and local non-parametric models to simulate land use change in diverse tially explicit land-change models. Ecological Modelling, 179(4), 445–461. areas of the world. Environmental Modelling & Software, 59,202–221. Pontius, R. G., & Parmentier, B. (2014). Recommendations for using the relative operating Tayyebi, A., Pijanowski, B. C., & Pekin, B. K. (2015). Land use legacies of the Ohio River characteristic (ROC). Landscape Ecology, 29(3), 367–382. Basin: Using a spatially explicit land use change model to assess past and future im- Pontius, R. G., & Schneider, L. C. (2001). Land-cover change model validation by an ROC pacts on aquatic resources. Applied Geography, 57,100–111. method for the Ipswich watershed, Massachusetts, USA. Agriculture, Ecosystems & Tayyebi, A., Pijanowski, B. C., & Tayyebi, A. H. (2011). An urban growth boundary model Environment, 85(1), 239–248. using neural networks, GIS and radial parameterization: An application to Tehran, Pontius, R. G., & Si, K. (2014). The total operating characteristic to measure diagnostic abil- Iran. Landscape and Urban Planning, 100(1), 35–44. ity for multiple thresholds. International Journal of Geographical Information Science, Tayyebi, A., Tayyebi, A. H., Arsanjani, J. J., Moghadam, H. S., & Omrani, H. (2016). FSAUA: A 28(3), 570–583. framework for sensitivity analysis and uncertainty assessment in historical and fore- Pontius, R. G., Boersma, W., Castella, J. C., Clarke, K., de Nijs, T., Dietzel, C., Duan, Z., Fotsing, casted land use maps. Environmental Modelling & Software, 84,70–84. E., Goldstein, N., Kok, K., & Koomen, E. (2008). Comparing the input, output, and val- Tehrany, M. S., Pradhan, B., & Jebur, M. N. (2014). Flood susceptibility mapping using a idation maps for several models of land change. The Annals of Regional Science, 42(1), novel ensemble weights-of-evidence and support vector machine models in GIS. 11–37. Journal of Hydrology, 512,332–343. Pradhan, B. (2013). A comparative study on the predictive ability of the decision tree, sup- Turner, B. L., Lambin, E. F., & Reenberg, A. (2007). The emergence of land change science port vector machine and neuro-fuzzy models in landslide susceptibility mapping for global environmental change and sustainability. Proceedings of the National using GIS. Computers & Geosciences, 51,350–365. Academy of Sciences, 104(52), 20666–20671. Rienow, A., & Goetzke, R. (2015). Supporting SLEUTH – enhancing a cellular automaton Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business with support vector machines for urban growth modeling. Computers, Environment Media. and Urban Systems, 49,66–81. Vapnik, V. N., & Vapnik, V. (1998). Statistical learning theory. Vol. 1.New York: Wiley. Shafizadeh-Moghadam, H., Asghari, A., Taleai, M., Helbich, M., & Tayyebi, A. (2017). Sen- Yang,Q.,Li,X.,&Shi,X.(2008).Cellular automata for simulating land use changes based sitivity analysis and accuracy assessment of the land transformation model using cel- on support vector machines. Computers & Geosciences, 34(6), 592–602. lular automata. GIScience and Remote Sensing,1–18. Yang, X., & Lo, C. (2003). Modelling urban growth and landscape changes in the Atlanta Shafizadeh-Moghadam, H., Hagenauer, J., Farajzadeh, M., & Helbich, M. (2015). Perfor- metropolitan area. International Journal of Geographical Information Science, 17(5), mance analysis of radial basis function networks and multi-layer perceptron net- 463–488. works in modeling urban change: A case study. International Journal of Geographical Zhou, P., Huang, J., Pontius, R. G., & Hong, H. (2016). New insight into the correlations be- Information Science, 29(4), 606–623. tween land use and water quality in a coastal watershed of China: Does point source Shafizadeh-Moghadam, H., & Helbich, M. (2015). Spatiotemporal variability of urban pollution weaken it? Science of the Total Environment, 543,591–600. growth factors: A global and local perspective on the megacity of Mumbai. International Journal of Applied Earth Observation and Geoinformation, 35,187–198.