Environmental Earth Sciences (2021) 80:75 https://doi.org/10.1007/s12665-020-09345-0

ORIGINAL ARTICLE

Developing geographic weighted regression (GWR) technique for monitoring soil salinity using sentinel‑2 multispectral imagery

Mohammad Mahdi Taghadosi1 · Mahdi Hasanlou1

Received: 6 June 2020 / Accepted: 16 December 2020 © The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021

Abstract Soil salinity is a widespread natural hazard that negatively infuences soil fertility and crop productivity. Using the potential of earth observation data and remote sensing technologies provides an opportunity to address this environmental issue and makes it possible to identify salt-afected regions accurately. While most of the utilized methods and model development techniques for monitoring soil salinity to date have been globally considered and tried to detect salinity and create predic- tive maps with a single regression algorithm, fewer studies have investigated the potential of local models and weighted regression techniques for estimating soil salinity. Accordingly, this research deals with monitoring surface soil salinity by the potential of Sentinel-2 multispectral imagery using the geographic weighted regression (GWR) technique. The feld study was conducted in an area that has sufered from salinization, and the salinity of several soil samples was measured to be used as a source of ground truth data. The most efcient satellite features, which accurately predict surface soil salinity by its higher spectral refectance, were derived from the Sentinel-2 data to be used as explanatory variables in the analysis. The GWR algorithm was then implemented with a fxed Gaussian kernel, and the optimized bandwidth was calculated in a calibration process using the cross-validation score (CV score). The results of the analysis proved that the GWR method has a great capability to predict soil salinity with an accuracy of two decimal places. The visual interpretation of the local estimates of coefcients and local t-values for each predictor variable has also been provided, which highlights the local variations in the study site. Finally, the achieved results were compared with the outcomes obtained from implementing two global regression techniques, Support Vector Machines (SVM), and Multiple Linear Regression (MLR), which confrmed the higher performance of the GWR algorithm.

Keywords Soil salinity · Sentinel-2 multispectral imagery · Geographic weighted regression · Local variation

Introduction regions (Elhag 2016; Cuevas et al. 2019). Accordingly, in the last few years, the potential of modern technologies and Soil salinity is a natural environmental hazard that has a advanced methods like remote sensing (RS) has been widely destructive infuence on soil quality and crop productiv- investigated for monitoring soil salinity (Taghadosi et al. ity in agricultural lands (Allbed and Kumar 2013; Gorji 2018). Remotely sensed data makes it possible to monitor et al. 2017). Regarding the importance of conservation of soil salinity rapidly and cost-efectively and can characterize natural resources and to increase the sustainability of crop salt-afected regions at larger scales due to its wide spatial production, a reliable method should be sought for soil coverage (Brunner et al. 2007; Allbed and Kumar 2013). reclamation and preventing further salinization of afected Over the past few years, optical remotely sensed data have been widely used for salinity monitoring through high spec- tral refectance of salt-afected regions in the visible and * Mahdi Hasanlou [email protected] near-infrared range of the electromagnetic spectrum. The main strategy is to establish a relationship between satellite Mohammad Mahdi Taghadosi [email protected] features as predictor variables and measured soil salinity values as responses to build a satellite-based model for salin- 1 School of Surveying and Geospatial Engineering, College ity estimation in study sites. A variety of satellite features, of Engineering, University of Tehran, P. O. Box 11155‑4563, known as salinity indices (SI), which are mostly developed 1439957131 Tehran,

Vol.:(0123456789)1 3 75 Page 2 of 14 Environmental Earth Sciences (2021) 80:75 based on the ratio between the visible and NIR spectral analysis showed an accuracy of no more than R2 = 0.52 and bands, have been investigated for this task to construct a R2 = 0.61 for HH and VV polarization, respectively. high precision mathematical model for soil salinity estima- One of the main obstacles in providing accurate results tion. However, the obtained accuracies in previous research for monitoring and mapping soil salinity by the potential of have been varied and largely depend on various factors such RS is that the data analysis procedure and algorithm devel- as the spatial/spectral resolutions of sensors, the used meth- opment techniques are usually globally considered. This may odology, study sites and other related issues. Such variations weaken the strength and efciency of predictive models in in model performance to predict target variables through estimating dependent variables on a large scale because the remote sensing and machine learning approaches are not model tries to estimate coefcients and predict all dependent limited to soil salinity studies, but expand to other related variables at the same time and with only one global equa- felds, as reported in Veronesi and Schillaci (2019). They tion (Mennis 2006; Matthews and Yang 2012). So that the compared popular regression algorithms for soil organic car- accuracy cannot go beyond a certain limit. In other words, it bon (SOC) mapping in semi-arid Mediterranean region of would be difcult to consider an inclusive model to predict Sicily, and pointed out it is unclear from the literature which salinity at a regional scale for soils of diferent character- technique performs well for SOC mapping. They mentioned istics (e.g., soil texture, constituents, moisture level, etc.). that uncertainty is a challenging issue in predicting response The situation exacerbates when soils with diferent land variables by machine learning strategies. cover types are considered (Taghadosi and Hasanlou 2017). There have been similar fndings in the literature that For instance, in green areas, spectral refectance of vegeta- imply the accuracy of a satellite-based model depends on tion covers or plants decreases due to salinity occurrence, varying factors, and developing regression algorithms to pre- whereas in bare soil land salt-afected regions can be easily dict target variables may be a challenging issue. Zhang et al. detected through high spectral refectance of salt deposits, (2015) evaluated the potential of MODIS time series vegeta- which are apparent in the site. In such conditions, the spatial tion indices data for monitoring soil salinity in the Yellow locations of salinity occurrence in the scene and the salinity River Delta, China. The results showed that the correla- ranges, play a signifcant role in properly estimating soil tion of the satellite-derived feature and soil salinity highly salinity, which suggests using local models and weighted depends on land cover heterogeneity. The best ft model regression techniques for modeling and mapping salinity reached no more than R2 = 0.70 using a binomial regres- rather than a global model for the entire scene. Using Geo- sion model. Some authors also investigated the potential of graphic Weighted Regression (GWR) method enables us to various satellite sensors and their derived features to predict signifcantly improve the models’ accuracy and guarantee salinization. Ivushkin et al. (2019) used three diferent UAV the strength and reliability of the predictive maps (Wang sensors to measure the salinity stress of quinoa plants in the et al. 2005; Kala et al. 2017; Guo et al. 2017) because it Netherlands. The multiple linear regression model was used builds a unique model for each regression point according to relate remotely sensed data and ground truth data meas- to the spatial location of observations (Murayama 2012). urements. However, the achieved results reached no more Despite the great potential of the GWR method in mod- than R2 = 0.46 when the acquired data from all three sensors eling scale-dependent and nonstationary relationships were combined. In another study, Allbed et al. (2014) used between predictor and response variables, limited studies the high spatial resolution IKONOS satellite data with pixel have focused on addressing an environmental issue by the size 1 m to assess soil salinity in the area that was dominated potential of remotely sensed data and the GWR technique. In by date palm. Various salinity and vegetation indices were a case study in China, Li et al. (2010) used the GWR method investigated to predict soil salinity in the site, and the best to investigate the nonstationary spatial relationships between results were achieved with R2 = 78% using SAVI and SI-T Land Surface Temperature (LST) of an urban area and envi- indices. In a case study in India, George and Kumar (2015) ronmental factors. The results of the analysis indicated that investigated the potential of Hyperion hyperspectral satel- the GWR method has a higher performance than the tradi- lite data to map various soil salinity levels. The Support tional Ordinary Least Square (OLS) model and also provides Vector Machine technique was used as a classifer, and the detailed information about the local variation of LST in the map of soil salinity was generated with an overall accuracy study site. In another study, Georganos et al. (2017) assessed of 78.13% and a kappa coefcient of 0.71. Li et al. (2014) the potential of the GWR technique to model the nonlinear also utilized the AIEM backscattering model to simulate the and scale-dependent relationship between vegetation pro- backscattering coefcient of saline soils by the analysis of ductivity and rainfall. The results proved that the proposed complex dielectric constant. The RadarSAT-2 data in HH method provides better accuracy than conventional methods, and VV polarization were acquired from the study site, and and enhances the power of analysis. the backscattering coefcients of soil samples were extracted To build a stronger and specifc body of research on the and compared with simulated values. The results of the topic, address the knowledge gaps in relevant studies of the

1 3 Environmental Earth Sciences (2021) 80:75 Page 3 of 14 75 feld, and to highlight the novelty of the present study, a Overall, there were about 120 studies found in this scope. systematic review on monitoring soil salinity using remotely GWR techniques were applied in areas such as estimating sensed data and weighted regression techniques, has been PM2.5 concentration, urban growth and its relationship conducted in three main stages based on Web of Science with spatial variables, forest cover mapping, precipitation (WoS) research database. estimates, climate warming and thermal degradation of In the frst step, the studies related to monitoring and permafrost, estimating Leaf Area Index (LAI), above- modeling soil salinity by the potential of remotely sensed ground biomass (AGB) estimation, etc. In most cases, the data have been searched with these keywords: soil salin- weighted regression algorithm has been compared with ity, salinization, salt-affected regions, remote sensing, other learning strategies and showed better performance remotely sensed data, earth observation data, and sorted in terms of model goodness of ft and residual analysis. In by their citation. The review of the most cited articles has some articles, the GWR algorithm is combined with sta- then been performed to identify the used dataset, features, tistical approaches like ordinary kriging and build a mixed and explanatory variables, model development techniques, weighted regression to achieve higher accuracy. There and their fndings for monitoring soil salinity using remotely have also been cases that used a combination of constant sensed data. Overall, around 800 research papers were found and spatially varying predictor variables to enhance model with these keywords. In terms of the dataset, the potential of performance. multispectral optical imagery has been widely investigated In the fnal stage of the systematic review, the potential due to the wide spatial coverage of the satellite products, fne of the weighted regression technique specifcally developed spatial resolution, and the ease of accessibility to some satel- for monitoring soil salinity using remotely sensed data have lite data like the Landsat archive and Sentinel-2 MSI. Imag- been assessed. The keywords to search for related materials ing spectroscopy and hyperspectral airborne sensors were were soil salinity, salinization, geographic weighted regres- also used in some research to analyze the spectral behavior sion, GWR, local models, spatial non-stationarity, remote of saline soil in narrow and contiguous bands. Besides, the sensing. Overall, around 20 articles were found in this area, number of research works that characterize soil salinity from which is relatively low and suggests only a few research- passive microwave sensors or SAR data products has also ers investigated the potential of weighted regression tech- increased in recent years. Timely monitoring of soil salin- niques for soil salinity monitoring through remotely sensed ity and evaluating the rate of changes in afected regions is data. In a case study in western Jilin Province, China, Nie another area of focus in this feld that is widely investigated et al. (2020) assessed the secondary salinization problem in by multi-temporal Landsat imagery. In terms of features agricultural felds using the geographic weighted regression and predictor variables, spectral bands and their derived Kriging (GWRK) method and found that the algorithm can features, e.g., soil & vegetation indices, had the most con- predict the spatial distribution of soil salinity efciently. In tribution in soil salinity estimation. Some studies used other another study, Tran et al. (2020) used the GWR algorithm sources of ancillary data and environmental variables, such to investigate the spatial relationship between measured as elevation data, topographic features, climate variables, soil salinity values and Landsat 8 data with their derived soil texture, etc., and combined them with spectral features features, such as spectral bands, band ratio, and soil/vegeta- to detect saline soils. With regards to model development tion indices. The results revealed that the spatial correlation techniques and learning strategies, a variety of classifcation between the mentioned predictors and response variables algorithms, linear and nonlinear regression methods (e.g., could be efectively analyzed through the GWR technique. ML, SVM, MLR, regression tree analysis, OLS, PLSR, krig- Li et al. (2019) also evaluated the potential of 64 environ- ing, etc.) have been utilized to fnd a proper relationship mental variables, including Landsat products and their between predictors and response variables. Over the last derived features, digital elevation model, and other ancil- years, more advanced learning strategies, hybrid machine lary data for soil salinity monitoring using MLR, GWR, and learning approaches, and rule-based fuzzy classifers were RF algorithms. The results indicated the higher accuracy of also applied to characterize soil salinity. GWR and RF techniques for estimating soil salinity in dry In the next step, the studies relevant to the geographic seasons. weighted regression technique as an alternative to the most Noteworthy is that authors in previous research rarely widely used global learning approaches have been sought focused on the interpretation of the large number of outputs in the remote sensing feld with these keywords: geo- acquired from implementing the GWR algorithm. In most graphic weighted regression, GWR, local models, spatial cases, the reports of the accuracy obtained from statistical non-stationarity, remote sensing. In this respect, a variety analysis were compared with global learning approaches, of environmental applications that have been monitored and higher achieved performance of the algorithm was through weighted regression algorithms and local mod- concluded. In the present study, the output variables were els were addressed, and their outcomes were evaluated. explained, the visual interpretation of the numerical values

1 3 75 Page 4 of 14 Environmental Earth Sciences (2021) 80:75 were produced, and their relationship with soil salinity levels annual precipitation in this region is about 115.5 mm. The was discussed in detail. highest and lowest temperatures occur in July with around Based on the issues raised above, this research aims at 39.7 c, and December with 0.4 c, respectively. (Fallahi et al. monitoring soil salinity, a hazardous environmental prob- 1983; Taghadosi et al. 2018). The study site is mainly com- lem, in an area that has sufered from severe salinization and posed of bare soils and sparse vegetation, and wheat and tries to accurately predict salinity of the surface soil by the barley cultivated in few parts (Fallahi et al., 1983; Tagha- potential of the GWR method using remotely sensed data. dosi et al. 2019). The soil texture varies from silt loam to Soil salinity was measured in the study site to be used as the silty clay loam, with a surface color of yellowish-brown to response variables, and Sentinel-2 multispectral imagery and dark brown (Fallahi et al. 1983). Salinization has occurred in its derived features were employed as the predictors. The this area due to its vicinity to Salt Lake , which makes outcomes of the GWR analysis were fnally compared with the soil high and extremely saline (Taghadosi et al. 2019). the results achieved from implementing two global regres- Figure 1 shows the Kuh-Sefd village in the central district sion techniques, Support Vector Machines (SVM), and Mul- of Qom , Iran. These datasets can be found online at tiple Linear Regression (MLR) in the site. http://rslab​.ut.ac.ir. The central part of the Kuh-Sefd village, with an area of around 15 km × 15 km, was considered for the ongoing Materials and methods analysis. The surface refectance varies from place to place in the selected part. As can be seen in Fig. 2, the Salinity Study site, feld observation, and measuring soil of surface soil is apparent in the east and north-east of this salinity region, and soil with dark brown color is obvious in the western part. Sparse vegetation can also be observed in the In this study, Kuh-Sefd village, a rural area in the central south-west and a few points at the central part of the site. district of Qom county Iran, has been considered for moni- Regarding the spectral diferences in the area of interest, toring soil salinity. This area has an arid climate, high evapo- sample points were collected in most of the mentioned land transpiration rate, and little rainfall (Fallahi et al. 1983). The cover classes.

Figure 1. The location of the Kuh-Sefd village in Qom County, Iran

1 3 Environmental Earth Sciences (2021) 80:75 Page 5 of 14 75

Table 1 Geographical coordinates of soil samples and their measured EC values

Number Longitude Latitude Electrical Conductivity (ds.m-1)

1 51° 11′ 29ʺ 34° 47′ 52ʺ 2.870 2 51° 11′ 52ʺ 34° 47′ 55ʺ 3.010 3 51° 12′ 16ʺ 34° 48′ 5ʺ 0.403 4 51° 12′ 41ʺ 34° 48′ 24ʺ 28.500 5 51° 12′ 57ʺ 34° 48′ 22ʺ 2.970 6 51° 13′ 12ʺ 34° 48′ 21ʺ 0.250 7 51° 13′ 51ʺ 34° 48′ 17ʺ 3.670 8 51° 12′ 44ʺ 34° 48′ 30ʺ 19.900 9 51° 12′ 42ʺ 34° 48′ 35ʺ 0.673 10 51° 13′ 14ʺ 34° 48′ 48ʺ 30.800 11 51° 13′ 16ʺ 34° 48′ 42ʺ 43.900 12 51° 13′ 27ʺ 34° 48′ 48ʺ 38.600 13 51° 13′ 28ʺ 34° 48′ 40ʺ 44.000 14 51° 13′ 34ʺ 34° 48′ 40ʺ 32.800 Figure 2. The distribution of the sample points in the study site 15 51° 13′ 34ʺ 34° 48′ 45ʺ 31.200 16 51° 12′ 14" 34° 48′ 7ʺ 1.129 17 51° 12′ 14ʺ 34° 48′ 15ʺ 13.130 Field observation was carried out in March 2017 in the 18 51° 12′ 17ʺ 34° 48′ 1ʺ 42.500 study site and the Electrical Conductivity (EC) of 58 soil 19 51° 12′ 17ʺ 34° 47′ 56ʺ 27.100 samples, which were randomly collected at topsoil depth 20 51° 11′ 59ʺ 34° 48′ 22ʺ 0.421 (0–20 cm), was measured by conductivity-meter (WTW 21 51° 11′ 53ʺ 34° 48′ 23ʺ 0.425 Cond 330i), in the unit of decisiemens per meter (dS/m), 22 51° 12′ 14ʺ 34° 48′ 48ʺ 12.150 with the method of soil–water extract, proposed by Richards 23 51° 12′ 28ʺ 34° 49′ 10ʺ 35.400 (1954). The details of this procedure can be found in Tagha- 24 51° 12′ 22ʺ 34° 49′ 10ʺ 7.510 dosi et al. (2019). The measured EC values with their spe- 25 51° 12′ 37ʺ 34° 49′ 34ʺ 7.410 cifc spatial locations are shown in Table 1. These values are 26 51° 12′ 42ʺ 34° 49′ 32ʺ 7.540 the indicator of salinity levels in the site and are considered 27 51° 10′ 56ʺ 34° 47′ 7ʺ 2.620 as regression points in the next steps of analysis. Figure 2 28 51° 11′ 1ʺ 34° 47′ 5ʺ 7.160 illustrates the distribution of sample points in the study site. 29 51° 10′ 51ʺ 34° 46′ 53ʺ 13.220 30 51° 10′ 46ʺ 34° 46′ 55ʺ 6.090 Remotely sensed data 31 51° 10′ 50ʺ 34° 46′ 42ʺ 8.100 32 51° 11′ 8ʺ 34° 47′ 13ʺ 47.500 The Sentinel-2A Multispectral Satellite Imagery (MSI) 33 51° 11′ 9ʺ 34° 47′ 17ʺ 4.090 was considered for this research. Sentinel-2 is an earth 34 51° 11′ 39ʺ 34° 47′ 14ʺ 7.320 observation data and part of the Copernicus mission that 35 51° 11′ 43ʺ 34° 47′ 14ʺ 9.450 has been developed and operated by the European Space 36 51° 11′ 43ʺ 34° 47′ 12ʺ 12.320 Agency (ESA). It provides optical imagery in 13 distinctive 37 51° 12′ 31ʺ 34° 47′ 17ʺ 6.000 spectral bands and a high spatial resolution of 10–60 m for 38 51° 12′ 33ʺ 34° 47′ 22ʺ 4.000 Copernicus services and applications like land management, 39 51° 12′ 27ʺ 34° 47′ 21ʺ 8.060 agricultural and forestry, risk mapping, etc. The image data 40 51° 13′ 0ʺ 34° 47′ 17ʺ 8.790 in the L1C processing level, which is freely available, was 41 51° 13′ 3ʺ 34° 46′ 57ʺ 8.720 acquired from the study site on 4 March 2017, alongside 42 51° 13′ 11ʺ 34° 46′ 56ʺ 6.650 feld observation. 43 51° 10′ 53ʺ 34° 48′ 11ʺ 2.840 The pre-processing of the acquired Sentinel-2A data is 44 51° 10′ 46ʺ 34° 48′ 12ʺ 4.150 performed by converting the L1C data product to L2A level 45 51° 9′ 13ʺ 34° 48′ 26ʺ 1.033 which is an ortho-image corrected refectance product at the 46 51° 9′ 48ʺ 34° 48′ 40ʺ 3.040 bottom of the atmosphere (Suhet and Hoersch, 2015). The 47 51° 9′ 46ʺ 34° 48′ 42ʺ 3.560 procedure was done with the Sen2Cor tool, a processor for

1 3 75 Page 6 of 14 Environmental Earth Sciences (2021) 80:75

Table 1 (continued) the pre-processed Sentinel-2 L2A multispectral bands to be Number Longitude Latitude Electrical used as predictor variables in the next step of the analysis. Conductivity In addition to the derived features, the performance of the (ds.m-1) individual spectral bands of the Sentinel-2 L2A, have been 48 51° 9′ 56ʺ 34° 49′ 7ʺ 6.700 evaluated for monitoring soil salinity in this study. It should 49 51° 9′ 56ʺ 34° 49′ 5ʺ 9.520 be noted that band 1 (coastal aerosol), band 9 (water vapor), 50 51° 10′ 21ʺ 34° 49′ 31ʺ 9.130 and band 10 (cirrus) were excluded from the ongoing analy- 51 51° 10′ 19ʺ 34° 49′ 33ʺ 4.800 sis due to their inadequate information. Table 2 illustrates all 52 51° 10′ 20ʺ 34° 49′ 35ʺ 9.190 the explanatory variables that were utilized for the analysis. 53 51° 9′ 33ʺ 34° 49′ 1ʺ 33.000 The corresponding predictor variables of each sample 54 51° 9′ 20ʺ 34° 49′ 2ʺ 26.800 point were extracted from the image data according to 55 51° 9′ 20ʺ 34° 48′ 58ʺ 0.349 Table 2 (20 values for each regression point) and stored as 56 51° 9′ 7ʺ 34° 48′ 55ʺ 3.180 explanatory variables for the analysis. As mentioned above, 57 51° 8′ 51ʺ 34° 49′ 38ʺ 12.850 the ongoing analysis aims to use the weighted regression 58 51° 8′ 52ʺ 34° 49′ 35ʺ 6.740 technique and local models to estimate soil salinity in the study site, and then compare the obtained results with global regression techniques, which will now be discussed in detail. generating Sentinel-2 L2A data, developed by Mueller-Wilm et al. (2016). Used methodology

Satellite feature extraction and defning predictor Geographic weighted regression (GWR) technique variables Most of the utilized regression analysis and predictive Soil salinity can be characterized through the high spectral models to date have been globally considered. The GWR refectance of salt-afected lands in the visible and near- technique, on the other hand, focuses on the spatial distri- infrared range of the electromagnetic spectrum. This can be bution of sample points in the study site and investigates formulated using the spectral bands of remotely sensed data, methods to ft local models based on regression points in which diferent band combinations are used to highlight (Murayama 2012). The GWR algorithm works by a search afected regions. Over the past years, numerous band com- window that moves from one regression point to another. binations, known as salinity indices (SI), have been devel- At each step, all the observations around the regression oped for salinity detection from satellite imagery. Some of point or within the search window are selected and a them have been found to highly correlate with salinity lev- weight is assigned to them based on their distance from els and proven to have the best performance in estimating the regression point (Fotheringham et al. 2002). The local soil salinity. The present study exploited the most efcient model or regression algorithm is then ftted to the selected salinity indices for characterizing soil salinity, according points or subsets, which results in having a unique model to their higher achieved performance in previous studies for each regression point. These local models can then be (Weng et al. 2010; Allbed and Kumar 2013; Vermeulen and developed to predict non-regression points with x–y coor- Niekerk 2016). These salinity indices were derived from dinates. This may improve the performance and accuracy

Table 2 Satellite features that Bands Wavelength (µm) Index Formula Index Formula were extracted to characterize Red soil salinity (1 + L) (NIR− ) , L = 1 Red2 Green2 Blue 0.49 SAVI (NIR+Red+L) SI-4 √( + ) Green 0.56 R −NIR Red2 NIR2 Red 0.665 NDSI NIR+R BI √( + ) Red Edge 1 0.705 Red×NIR (Green2 + Red2 + NIR2 Red Edge 2 0.74 SI-1 Green SI-5 Red Edge 3 0.783 Blue×Red Blue Red NIR 0.842 SI-2 Green SI-6 √ × Red Edge 4 0.865 Blue−Red Green2 Red2 NIR2 SWIR 1 1.61 SI-3 Blue+Red SI-7 √( + + ) SWIR 2 2.19

1 3 Environmental Earth Sciences (2021) 80:75 Page 7 of 14 75 because the algorithm tries to ft a local model for very 2 2 dij = (xi − xj) +(yi − yj) (7) few points rather than a global model for the entire scene. A global regression model is defned as Eq. (1): b is the distance threshold, also known as bandwidth, and y =  +  x +  determines the area which is covered by the search window. i 0 k ik i (1) k The bandwidth can be considered in two general forms, fxed and adaptive. Selecting the type of spatial kernel depends on y where i is the predicted value of the dependent variable for the points’ distribution in the study site. If the sample points i th observation, k is the parameter estimate for the explana- are randomly selected, the fxed kernel will achieve better k x tory variable , 0 is the intercept, ik is the value of the inde- results (Brunsdon et al. 1996). The adaptive kernel is more i pendent variable for , and i is the error of regression point suitable for data points that are distributed normally or based i . This equation changes in the GWR method in the form of on a predefned scheme. Eq. (2) and a separate regression algorithm is developed for The key factor in calculating the weight matrix is fnd- each regression point. ing the best distance threshold or bandwidth, which can be chosen either manually or optimized through a calibration y i = 0(ui, vi)+ k(ui, vi)xik + i k (2) process (Lu et al. 2011). In this work, the best bandwidth was calculated by the calibration process using The Cross- , where 0(ui vi) is the intercept for i th regression point and Validation algorithm (CV score), in the form of Eq. (8), in , k(ui vi) is the local estimate for the explanatory variable k which the algorithm tries to minimize the cross-validation in regression point i . The above equation can be written in score. the form of Eq. (3): n 2 −1 CVscore = (y − y )  (u , v )=(XT W u , v X) XT W u , v y i i≠i (8) i i i i i i (3) i=1 X y where is the matrix of predictor variables, is the vec- where n is the number of observations, and yi and yi are tor of dependent variables, and W is the diagonal matrix observed and predicted dependent variables, respectively. , of the geographical weight of observations. (ui vi) is also Another factor that is calculated in the GWR analysis is 1 considered as Eq. (4) which consists of k + coefcients of the t-value that measures the mean diference between the explanatory variables for each regression point. two datasets expressed in units of standard error and can be used to detect the areas where the relationship between  u , v  = 0 u , v , … ,  u , v  i i i i k i i (4) sample and modeled data is statistically signifcant. The absolute magnitude of the t-value increases when the difer- The weight matrix (W ) is applied to calibrate local mod- ence between the two groups increases. It can be described els, in which a weight is assigned to each observation based as follows: on its distance to each regression point. The closer the obser- vations are to each regression point, the greater the weight − − y− y is assigned to them, and vice versa. The calculation of the t = S.E( y) diagonal weight matrix is based on kernel functions. Gauss- ian and Bi-square kernel functions are the most efcient − − where y and y are the mean values of modeled and sampled algorithms for this task, as illustrated in Eqs. (5), (6) data, respectively, and S.E( y) is the standard error of the 2 estimator. In the analysis, local estimates of t-values were � �d � � ⎧ exp − ij�b if dij < b calculated for local models, and the local maps were pro- wij = ⎪ (5) ⎨ 0 otherwise duced for each predictor variable. ⎪ According to the mentioned criteria, in this study, the ⎩ fxed Gaussian kernel was selected based on the random distribution of the soil samples in the study site and the best 2 2 ⎧ � �d � � bandwidth was calculated using the CV score algorithm. 1 − ij�b if dij < b wij = ⎪ (6) ⎨ 0 otherwise Global regression algorithms, MLR and SVM ⎪ ⎩ where dij is the distance between regression point i and To assess the performance of the proposed GWR methodol- observation point j , usually considered as Euclidian distance ogy, the results of the local models should be compared with in the form of Eq. (7) global regression algorithms. The relationship between pre- dictors and response variables might be linear or nonlinear

1 3 75 Page 8 of 14 Environmental Earth Sciences (2021) 80:75 in the global model. Therefore, there is a need to consider and the coefcient of determination ­(R2), which then com- regression algorithms that perform well in both cases. In this pared with the GWR algorithm. respect, the two most common learning strategies, Multiple Linear Regression (MLR) for the case of linear modeling and Support Vector Regression (SVR) for nonlinear distribu- Results tion, were considered to develop the global salinity model of the entire scene. The GWR algorithm was implemented with 58 regression As a frst global approach, the relationship between pre- points, which were collected from the study site. The Sen- dictor variables, illustrated in Table 2 and measured EC val- tinel-2 satellite features and measured EC values were con- ues, described in part 1: Study site, feld observation, and sidered as explanatory and response variables, respectively. measuring soil salinity, was established using a best-ft line Based on the results of the GWR analysis and developed to create a global linear model for the entire scene, as shown local models, 400 non-regression points, in a grid of 20 rows in Eq. 1. In the second global regression technique, the sup- by 20 columns, were then extracted from the scene (Fig. 3) port vector machine regression algorithm, known as ε-SVR, to predict local geographical coefcients in a short interval was developed to predict the surface salinity of the study of 1 km × 1 km around the study site. site using a single global model. The details of the proposed The results of the analysis can be represented in the form methodology, including the ε-SVR algorithm and formula, of a large table, which includes (1) local estimate of coef- can be found in another study of this study site performed by fcients for each predictor variables, (2) local standard errors, the same authors in Taghadosi et al. 2019. The Genetic Opti- (3) local pseudo-t-values, and (4) local coefcients of deter- mization Algorithm (GA) was then implemented on these mination (R2) for each x–y coordinates. Due to a large num- global regression techniques to improve the efciency of the ber of outputs, a subset of explanatory variables including, developed models and fnd the most efcient explanatory Red, NIR, SI-1, SI-2, SI-5, have been selected and the visual variables. Finally, the performance of the regression tech- interpretation of the numerical values was obtained, for local niques was assessed using root mean square error (RMSE) estimates of coefcients (Fig. 4) and local pseudo-t-values

Figure 3. Distribution of 400 non-regression points in a grid of 20 rows by 20 columns in the study site

1 3 Environmental Earth Sciences (2021) 80:75 Page 9 of 14 75

(Fig. 5). The map of local coefcients of determination (R2) global regression algorithms: support vector machines, and has also been shown in Fig. 6. multiple linear regression method. The same dataset was The map of local estimates of coefcients, Fig. 4a–e, used in this study, and predictor variables were extracted shows how diferent the coefcients are and how they change from the Sentinel-2 data as discussed in the “Used Method- around the study site. When the coefcients of an explana- ology part II” section. The comparison of the results shows tory variable are close to each other in a particular area of that the GWR method outperformed other approaches in the site, it means that the local models are more similar in both cases of using all explanatory variables and model that region, which may suggest a more inclusive model can improvements with optimization algorithms. Table 3 illus- be implemented there. For instance, the local estimates of trates the outputs of the regression techniques for modeling coefcients in the NIR band (Fig. 4b) have similar values in soil salinity in the study site. the western part of the study site. A similar pattern can be seen in this region for the Red band (Fig. 4a), SI-1 (Fig. 4c), SI-2 (Fig. 4d), and SI-5 (Fig. 4e) indices. This provides an Discussion opportunity to apply a more comprehensive model in this region, rather than a unique model for each point. In con- Developing local models around the study site, expands the trast, when the coefcients are widely diferent, the local number of coefcients for each point, for the aim of reach- models vary from place to place, which means that a global ing high-level accuracy. These local coefcients have almost algorithm cannot be used there. similar values in a specifc land cover class, as shown in For t-values maps and residuals, as shown in Fig. 5a–e, Fig. 4, especially apparent in a, b, e subfgures, which sug- the absolute values show the accuracy of local models gests interpreting these coefcients as separators between around the site. The greater the absolute t-values and residu- diferent land cover classes. In this research, the study site als, the lower the accuracy of local models, and vice-versa. has been selected on the criteria of including a variety of In the study site, the greater positive and negative t-values land cover classes. There were four main land cover types in can be observed at edges, where the spectral refectance of the study site, including saline surface soil, yellowish-brown the pixels changes rapidly across a boundary. For instance, soil, sparse vegetation, and dark brown soil. Mapping the in the middle of the study site, where soil with dark brown local estimates of coefcients efectively divides the scene color reached the center part of the Kuh-Sefd, greater abso- into these four land cover types. In Fig. 4, the local coef- lute t-values, shown with light and dark blue, were observed fcients in the west and the north–west of the study region in Fig. 5. This pattern has been more or less reproduced have similar values, which corresponds to dark brown soil in all subfgures. Higher t-values can also be seen in other pixels in the natural color composite of the study site. The places. In most cases, it occurred when a land cover class local coefcients of sparse vegetation can also be observed changes to another, e.g. from sparse vegetation to the Kuh- from the south-west to the central part of the Kuh-Sefd Sefd center part, from the Kuh-Sefd center part to the dark village. The yellowish-brown soil exists in the south and brown soil, from the dark brown soil to highly saline surface the south-east of the study site that has almost similar pixel soil, from saline surface soil to sparse vegetation, etc. This values in Fig. 4a, b, e. The pixel values in this part were suggests that the local models do not have the optimum per- mixed with sparse vegetation coefcients in some areas. The formance in high-frequency regions. saline surface soil that appears in the north and the north- The great potential of the proposed GWR methodology east of the site has non-homogenous spectral refectance, in estimating soil salinity can be deduced by Fig. 6, where which results in varying coefcient values. However, they local R2 values have the optimum performance in most almost preserve the spatial pattern of this land cover type, places. However, in the middle and north-east of the scene, especially in subfgures 4c and 4d. The strength of the local the obtained accuracy was lower, decreased to less than 0.60 coefcients in separating land cover classes also depends in some parts. The lower accuracy of the local models in the on the predictor variables or the satellite features which is central part of the study site, confrms the visual interpreta- used. As shown in Fig. 4, the mentioned land cover types are tion of the results obtained in Fig. 5, which showed a higher more separable by the coefcients of the Red band (Fig. 4a), range of absolute t-values in this area. For the north-east NIR band (Fig. 4b), and SI-5 (Fig. 4e). Varying coefcients regions, lower R2 values might be due to the non-homogene- of an explanatory variable indicate the variation of the ous spectral response in this part, which results in decreasing local models at short distances, which implies the failure of the efciency of local models. model establishment in the case of using a global regression In the fnal step, the outcomes of the GWR analysis were algorithm. compared with the obtained results from the study that has At edge areas, where two or more land cover classes were been done by the same authors (Taghadosi et al. 2019) for separated, local models changed rapidly due to the signif- modeling soil salinity in the Kuh-Sefd district using two cant variations in the local coefcients. These changes had

1 3 75 Page 10 of 14 Environmental Earth Sciences (2021) 80:75

Figure 4. The maps of local estimates of coefcients obtained from the results of the GWR analysis in the study site. a Red. b NIR. c SI-1. d SI-2. e SI-5

1 3 Environmental Earth Sciences (2021) 80:75 Page 11 of 14 75

Figure 5. The maps of local Pseudo t-values obtained from the results of the GWR analysis in the study site. a Red. b NIR. c SI-1. d SI-2. e SI-5

1 3 75 Page 12 of 14 Environmental Earth Sciences (2021) 80:75

Figure 6. The map of local R2 values in the study site

negative impacts on the performance of the local models, east and the north-east, and dark brown soil is situated at the as revealed by higher absolute t-values at edges, shown in top of that. As can be seen in Fig. 5a–e, absolute t-values light and dark blue colors in Fig. 5, especially apparent in have the maximum amounts in this region and indicates the Fig. 5a, b, e in the south-west and south-east regions. The worst performance of the local models there. The weakness weakness of the developed local models in the mentioned of the local models in the central parts of the scene can also regions occurred due to changes in land cover classes. be confrmed in Fig. 6, where the lowest rate of R2 values Therefore, it can be concluded that increasing the number occurred in these regions. of land cover types, fails in an inclusive model to perform The high rate of R2 value in the analysis is a challenging well in predicting response variables. In the study site, the issue that may occur as a result of the overftting problem. central part of the Kuh-Sefd, connected to diferent land This can negatively infuence the performance and generaliz- cover classes: It is related to sparse vegetation from the south ability of the model. One solution is to use cross-validation and the south-east, adjacent to saline surface soil from the that estimates model accuracy in an iteration process, in which the samples were split into k folds to be used as new test observations in each iteration. Having a limited number Table 3 Results of the analysis using Multiple Linear Regression of soil samples, a variety of land cover classes in the study (MLR), Support Vector Regression (SVR), and Geographic Weighted site, and using numerous predictor variables increase the Regression model complexity and raise the risk of overftting. 2 Developed algorithm R (%) RMSE Using the local regression method is a promising approach to estimate soil salinity by the potential of Sen- Multiple linear regression (MLR) 63.13 6.38 tinel-2 multispectral imagery. This technique signifcantly Improved MLR 83.40 4.28 enhances the performance of the regression algorithm and Support vector regression (SVR) RBF kernel 87.42 5.19 accurately predicts soil salinity values in the study site. Improved SVR RBF kernel 86.58 5.05 Comparing the achieved results with the outcomes obtained Geographic weighted regression (GWR) 99.78 0.86 from implementing two global regression techniques in the

1 3 Environmental Earth Sciences (2021) 80:75 Page 13 of 14 75 site, with the same dataset and predictor variables, confrms References the higher performance of the developed method. However, it has some limitations. One issue is the high computational Allbed A, Kumar L (2013) Soil salinity mapping and monitoring in arid time and memory consumption, as the technique tries to and semi-arid regions using remote sensing technology: a review. Adv Remote Sens 2013. https://doi.org/https​://doi.org/10.4236/ develop multiple local models and fts a unique algorithm to ars.2013.24040​ each regression point. In this study, we had a limited number Allbed A, Kumar L, Aldakheel YY (2014) Assessing soil salinity using of regression points, 58, and creating local models for these soil salinity and vegetation indices derived from IKONOS high- numbers was not complicated. However, implementing the spatial resolution imageries: applications in a date palm domi- nated region. Geoderma 230–231:1–8. https​://doi.org/10.1016/j. GWR method for larger datasets and big data analysis may geode​rma.2014.03.025 be challenging. Brunner P, Li HT, Kinzelbach W, Li WP (2007) Generating soil electri- The visual interpretation of the results, and showing the cal conductivity maps at regional level by integrating measure- outputs in the form of maps and fgures is another chal- ments on the ground and remote sensing data. Int J Remote Sens 28:3341–3361. https​://doi.org/10.1080/01431​16060​09286​41 lenging issue in applying the GWR technique, as mentioned Brunsdon C, Fotheringham AS, Charlton ME (1996) Geographi- in Matthews and Yang (2012). For each predictor variable, cally weighted regression: a method for exploring spatial some interpretive maps can be developed based on the nonstationarity. Geographical Anal 28:281–298. https​://doi. obtained numerical values, which results in having numer- org/10.1111/j.1538-4632.1996.tb009​36.x Cuevas J, Daliakopoulos IN, del Moral F et al (2019) A review of ous output maps, especially when the number of explanatory soil-improving cropping systems for soil salinization. Agronomy variables is large. The obtained maps can be used to divide 9:295. https​://doi.org/10.3390/agron​omy90​60295​ the scene into diferent patches, which are representative of Elhag M (2016) Evaluation of diferent soil salinity mapping using the diferent soil characteristics in the study site. Using dif- remote sensing techniques in arid ecosystems, Saudi Arabia. J Sens 2016:e7596175. https​://doi.org/10.1155/2016/75961​75 ferent ancillary data, geomorphological maps of the study Fallahi S, Banaei MH, Eskandarzadeh Y (1983) Report of semi- site, texture analysis, and various environmental factors detailed studies of soil sience Qom-Masileh. Soil and water would be benefcial to improve the efciency of constructed research institute. Iran Tech J 628:1–149 patches in the study site. Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relation- ships, 1st edn. Hoboken, NJ, USA, Wiley, Chichester, England Georganos S, Abdi AM, Tenenbaum DE, Kalogirou S (2017) Examin- Conclusion ing the NDVI-rainfall relationship in the semi-arid Sahel using geographically weighted regression. J Arid Environ 146:64–74. https​://doi.org/10.1016/j.jarid​env.2017.06.004 The present study focused on developing local regression George J, Kumar S (2015) Hyperspectral remote sensing in character- techniques, as an alternative to commonly used global izing soil salinity severity using SVM technique -a case study of regression approaches, for soil salinity estimation by the alluvial plains. Int J Adv Remote Sens GIS 2015:1344–1360461. potential of high-resolution sentinel-2 multispectral imagery. https://doi.org/https​://doi.org/10.23953​/cloud​.ijars​g.122 Gorji T, Sertel E, Tanik A (2017) Monitoring soil salinity via remote Evaluating the obtained results proved that the GWR algo- sensing technology under data scarce conditions: a case study rithm outperformed other global regression techniques, both from Turkey. Ecol Ind 74:384–391. https://doi.org/10.1016/j.ecoli​ ​ linear and nonlinear functions, as it considers spatial vari- nd.2016.11.043 ations and nonstationary relationships between predictors Guo L, Chen Y, Shi T et al (2017) Exploring the role of the spatial characteristics of visible and near-infrared refectance in predict- and response variables. In comparison with other relevant ing soil organic carbon density. ISPRS Int J Geo-Information studies, the numerical outputs obtained from implementing 6:308. https​://doi.org/10.3390/ijgi6​10030​8 the GWR technique were analyzed in detail in this research, Harris P, Fotheringham S, Crespo R, Charlton M (2010) The use of and their strengths and drawbacks to characterize the study geographically weighted regression for spatial prediction: an evaluation of models using simulated data sets. Math Geosci site with regard to salinity hazard were investigated. The 42:657–680. https​://doi.org/10.1007/s1100​4-010-9284-7 visual interpretation of the results was obtained in this study, Harris P, Brunsdon C, Fotheringham S (2011) Links, comparisons and and the maps of local estimates of coefcients, t-values, extensions of the geographically weighted regression model when and local R2 values were provided for each predictor. The used as a spatial predictor. Stochastic Hydrol Hydraul 25:123– 138. https​://doi.org/10.1007/s0047​7-010-0444-6 local estimates of coefcients have similar values in a spe- Ivushkin K, Bartholomeus H, Bregt AK et al (2019) UAV based soil cifc land cover class, which makes it possible to separate salinity assessment of cropland. Geoderma 338:502–512. https://​ main land cover types in the study site. The obtained maps doi.org/10.1016/j.geode​rma.2018.09.046 from t-values and local R2 revealed that the local regression Kala AK, Tiwari C, Mikler AR, Atkinson SF (2017) A comparison of least squares regression and geographically weighted regression algorithms do not have the optimum performance at edges, modeling of West Nile virus risk based on environmental param- where two or more land cover classes were separated. This eters. PeerJ 5:e3070. https​://doi.org/10.7717/peerj​.3070 suggests that increasing the number of land cover types, Li S, Zhao Z, Miaomiao X, Wang Y (2010) Investigating spatial non- results in the failure of a global regression approach to per- stationary and scale-dependent relationships between urban sur- face temperature and environmental factors using geographically form well in estimating soil salinity.

1 3 75 Page 14 of 14 Environmental Earth Sciences (2021) 80:75

weighted regression. Environ Modelling Softw 25:1789–1800. Scudiero E, Teatini P, Manoli G et al (2018) Workfow to establish https​://doi.org/10.1016/j.envso​ft.2010.06.011 time-specifc zones in precision agriculture by spatiotemporal Li Y, Zhao K, Ren J et al (2014) Analysis of the dielectric constant integration of plant and soil sensing data. Agronomy 8:253. https​ of saline-alkali soils and the efect on radar backscattering coef- ://doi.org/10.3390/agron​omy81​10253​ fcient: a case study of soda alkaline saline soils in Western Jilin Suhet, Hoersch B (2015) Sentinel-2 user handbook. ESA Standard Province using RADARSAT-2 data. Sci World J 2014:e563015. Document. Issue 1. Revision 1. European Space Agency (ESA). https​://doi.org/10.1155/2014/56301​5 Taghadosi MM, Hasanlou M (2017) Trend analysis of soil salinity in Li Z, Li Y, Xing A et al (2019) Spatial prediction of soil salinity in diferent land cover types using LANDSAT TIME SERIES DATA a semiarid oasis: environmental sensitive variable selection and (case study Bakhtegan Salt Lake). In: ISPRS—International model comparison. Chin Geogra Sci 29:784–797. https​://doi. Archives of the Photogrammetry, Remote Sensing and Spatial org/10.1007/s1176​9-019-1071-x Information Sciences. Copernicus GmbH, pp 251–257 Lloyd CD (2010) Nonstationary models for exploring and mapping Taghadosi MM, Hasanlou M, Eftekhari K (2018) Soil salinity map- monthly precipitation in the United Kingdom. Int J Climatol ping using dual-polarized SAR Sentinel-1 imagery. Int J Remote 30:390–405. https​://doi.org/10.1002/joc.1892 Sens 0:1–16. https://doi.org/https​://doi.org/10.1080/01431​ Lu B, Charlton M, Fotheringhama AS (2011) Geographically weighted 161.2018.15127​67 regression using a non-euclidean distance metric with a study on Taghadosi MM, Hasanlou M, Eftekhari K (2019) Retrieval of soil London house price data. Proc Environ Sci 7:92–97. https​://doi. salinity from Sentinel-2 multispectral imagery. Euro J Remote org/10.1016/j.proen​v.2011.07.017 Sens 52:138–154. https://doi.org/10.1080/22797​ 254.2019.15718​ ​ Matthews SA, Yang T-C (2012) Mapping the results of local statistics: 70 Using geographically weighted regression. Demogr Res 26:151– Tran T, Tran D, Pham H, et al (2020) Exploring spatial relationship 166. https​://doi.org/10.4054/DemRe​s.2012.26.6 between electrical conductivity and spectral salinity indices in the Mennis J (2006) Mapping the results of geographically weighted regres- mekong delta. J Environ Sci Manag 23 sion. Cartographic J 43:171–179. https​://doi.org/10.1179/00087​ Vermeulen D, Niekerk A van (2016) Evaluation of a worldview-2 0406X​11465​8 image for soil salinity monitoring in a moderately afected irri- Mueller-Wilm U, Devignot O, Pessiot L (2016) S2 MPC Sen2Cor gated area. JARS, JARSC4 10:026025. https://doi.org/https://doi.​ confguration and user manual (S2-PDGS MPC-L2A-SUM-V2.3 org/10.1117/1.JRS.10.02602​5 Issue: 01). European Space Agency (ESA). http://step.esa.int/ Veronesi F, Schillaci C (2019) Comparison between geostatistical and main/third​-party​-plugi​ns 2/sen2cor/. machine learning models as predictors of topsoil organic carbon Murayama Y (2012) Progress in geospatial analysis. Springer Science with a focus on local uncertainty estimation. Ecol Ind 101:1032– & Business Media 1044. https​://doi.org/10.1016/j.ecoli​nd.2019.02.026 Nakaya T, Charlton M, Lewis P, et al (2014) GWR4 user manual. Wang Q, Ni J, Tenhunen J (2005) Application of a geographically- Windows Application for Geographically Weighted Regression weighted regression analysis to estimate net primary production Modelling. of Chinese forest ecosystems. Glob Ecol Biogeogr 14:379–393 Nie S, Bian J, Zhou Y (2020) Estimating the spatial distribution of soil Weng Y-L, Gong P, Zhu Z-L (2010) A spectral index for estimating salinity with geographically weighted regression kriging and its soil salinity in the yellow river delta region of China using EO-1 relationship to groundwater in the Western Jilin Irrigation Area, hyperion data. Pedosphere 20:378–388. https​://doi.org/10.1016/ Northeast China. Pol J Environ Stud 30:283–294. https://doi. S1002​-0160(10)60027​-6 org/https​://doi.org/10.15244​/pjoes​/12198​8 Zhang T-T, Qi J-G, Gao Y et al (2015) Detecting soil salinity with Richards LA (1954) Diagnosis and improvement of Saline and Alkali MODIS time series VI data. Ecol Ind 52:480–489. https​://doi. Soils. Soil Sci 78:154 org/10.1016/j.ecoli​nd.2015.01.004 Schillaci C, Acutis M, Lombardo L et al (2017) Spatio-temporal topsoil organic carbon mapping of a semi-arid Mediterranean region: the Publisher’s Note Springer Nature remains neutral with regard to role of land use, soil texture, topographic indices and the infu- jurisdictional claims in published maps and institutional afliations. ence of remote sensing data to modelling. Sci Total Environ 601–602:821–832. https://doi.org/10.1016/j.scito​ tenv.2017.05.239​ Schillaci C, Saia S, Acutis M (2018) Modelling of soil organic car- bon in the mediterranean area: a systematic map. ROL 46/2018: https://doi.org/https​://doi.org/10.3301/ROL.2018.68

1 3