Applying GRACE satellite data and hydro- meteorological parameters for predicting the combined terrestrial evapotranspiration index (CTEI) based on an improved SVM and a greedy regression approach
Ahmed Elbeltagi ( [email protected] ) Mansoura University Faculty of Agriculture Mehdi Jamei shohadaye university Ankur Srivastava Newcastle University Iman Ahmadianfar -- Mahfuzur Rahman - Harun I. Gitari ty Jaydeo K. Dharpure e
Research Article
Keywords: Drought index, Climate change, Greedy regression method, GRACE, Remote sensing, Support vector machine
Posted Date: February 23rd, 2021
DOI: https://doi.org/10.21203/rs.3.rs-222279/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License 1 Applying GRACE satellite data and hydro-meteorological parameters for
2 predicting the combined terrestrial evapotranspiration index (CTEI) based on
3 an improved SVM and a greedy regression approach
1,2,* 3 4 5 4 Ahmed Elbeltagi , Mehdi Jamei , Ankur Srivastava , Iman Ahmadianfar , Mahfuzur
5 Rahman6 , Harun I. Gitari7, Jaydeo K. Dharpure 8,*
6 1Agricultural Engineering Dept., Faculty of Agriculture, Mansoura University, Mansoura 35516, 7 Egypt
8 2College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, 9 China
10 3Faculty of Engineering, Shohadaye Hoveizeh University of Technology, Dasht-e Azadegan, Iran
11 4School of Engineering, The University of Newcastle, Callaghan, NSW 2308, Australia
12 5Department of Civil Engineering, Behbahan Khatam Alanbia University of Technology, 13 Behbahan, Iran
14 6 International University of Business Agricultural Technology, Dhaka 1230, Bangladesh
15 7 Department of Agricultural Science and Technology, School of Agriculture and Enterprise 16 Development, Kenyatta University, P. O. Box 43844-00100, Nairobi, Kenya.
17 8 Centre of Excellence in Disaster Mitigation and Management (CoEDMM), Indian Institute of 18 Technology Roorkee, Uttarakhand, India
19 *Corresponding author: Correspondence to Ahmed Elbeltagi ([email protected]);
20 Jaydeo K. Dharpure ([email protected]). 21 Applying GRACE satellite data and hydro-meteorological parameters for
22 predicting the combined terrestrial evapotranspiration index (CTEI) based on
23 an improved SVM and a greedy regression approach
24 Abstract
25 Drought has become relatively one of the widespread extreme events that affect socio-economic
26 sphere, ecological balance, environmental conditions, crop-yields, climate feedback mechanism,
27 and water resources management. Monitoring of the drought is one of the challenging tasks,
28 hence required to develop new techniques in prediction of this natural calamity accurately and
29 timely. In this work, Combined Terrestrial Evapotranspiration Index (CTEI) that is forced
30 through precipitation (P) and potential evapotranspiration (PET) is used for estimating drought.
31 Also, this is a novel index developed for Indus-Ganga river basin, which also uses a Gravity
32 Recovery and Climate Experiment (GRACE) terrestrial water storage anomalies (TWSA) data.
33 Thirteen input-variables comprised of GRACE-TWSA, Global Land Data Assimilation System
34 GLDAS-TWSA, Groundwater Storage Anomaly (GWSA), P, Land Surface Temperature
35 ( LST), Wind Speed (WS), Evapotranspiration (ET), Runoff, air temperature ( ), PET, net-
36 shortwave (SWN) and long-wave (LWN) along with net radiation (NR) radiations were
37 implemented for predicting CTEI. Further, thirteen different combinations were analyzed using
38 an improved Sequential Minimal Optimization (SMO) algorithm for support vector machine
39 (SVM) and a greedy linear regression method. Therefore, the objectives of this study are to
40 develop several combinations based on machine learning models used for modeling CTEI,
41 compare the accuracy and stability of these models in CTEI prediction, and select the best
42 outcomes have been provided from these models. The performed error measures and statistical 43 indicators prove that Combo 2 and 3 owing to SVM model are superior to the best combinations
44 of Greedy model. Overall based on the results, the improved SVM is superior to Greedy model in
45 estimating CTEI. SVM (Combo2: R=0.81) in spite of having a higher range of relative error, is
46 superior to Greedy model (Combo3, R=0.77) for estimating CTEI. This method can be one of
47 promising alternatives for estimating CTEI over major river basins across the world.
48 Keywords: Drought index; Climate change; Greedy regression method; GRACE; Remote
49 sensing; Support vector machine.
50 1 Introduction
51 One of the most significant catastrophic phenomena in nature is drought that has
52 potentially severe outcomes on agriculture, socio-economic, and ecosystems (Wilhite 2000; Mo
53 2011). Drought-related disasters have been a large increase by changing climate worldwide over
54 the past decades, although with different intensities (Allen et al. 2011). In addition, with respect
55 to the global warming phenomenon, the frequency and effect of drought are gradually growing;
56 hence, it is very important to make a deep study on the drought phenomenon. With the purpose of
57 quantifying drought, many researchers used hydrological and meteorological factors affecting
58 drought to create various drought indices and compare drought frequency, duration, and intensity
59 and its effect on temporal and spatial scales (Marcos-Garcia et al. 2017; Oloruntade et al. 2017).
60 In this regard, several drought indices were introduced as a primary factor for predicting
61 drought such as Palmer Drought Severity Index (PDSI) (Palmer 1965), Standardized Precipitation
62 Index (SPI) (McKee et al. 1993), and Standardized Runoff Index (SRI) (Shukla and Wood, 2008;
63 Paul et al., 2018),Vegetation Drought Response Index (VegDRI) (Brown et al. 2008),
64 Standardized Stream flow Index (SSI) (Shukla and Wood 2008; Vicente-Serrano et al. 65 2010),Standardized Precipitation Evapotranspiration Index (SPEI) (Vicente-Serrano et al. 2010),
66 Multivariate Standardized Drought Index (MSDI) (Hao and AghaKouchak 2013), Gravity
67 Recovery and Climate Experiment (GRACE)derived Drought Severity Index (DSI) (Feng et al.
68 2017; Zhao et al. 2017),water storage deficit index (WSDI) (Sun et al. 2018), Combined
69 Climatologic Deviation Index (CCDI) (Sinha et al. 2019), and Combined Terrestrial
70 Evapotranspiration Index (CTEI) (Dharpure et al. 2020).
71 Among the above-stated drought indices, the SPI, PDSI, and SRI are the most common
72 indicators applied to drought characterization. The SPI uses the long-term precipitation dataset
73 (Wang et al. 2016), while the PDSI considers evaporation, precipitation, soil moisture, runoff,
74 and other variables (Zargar et al. 2011) to estimate long-term drought periods. Besides, the SPEI
75 is a drought indicator based on the PDSI and SPI for expressing meteorological drought. Based
76 on the SPEI, Wang and Chen (2014) evaluated the application of this indicator in China and
77 indicated that it has a good capability to present drought occurrence. Zhao et al. (2017)
78 demonstrated that the SPEI was suitable for monitoring drought in both short- and long-term in
79 China. Rivera et al. (2017) presented that the SSI can precisely show the extreme hydrological
80 drought in the Central Andes, Argentina. Tirivarombo et al, (2018) proved that the SPEI can
81 better describe drought than the SPI in the Kafue basin due to consider the evapotranspiration
82 factor. Dharpure et al.(2020) presented the CTEI by combining the GRACE and meteorological
83 variables such as precipitation (P) and potential evapotranspiration (PET). They indicated that
84 this index has several advantages compared with the other drought indices. For instance, it can
85 decrease simulation, numerical modeling, and parameterization. Also, the CTEI yielded a
86 measurement of climatic parameters and water storage variations to determine hydrological
87 drought occurrences. 88 Predicting drought to decrease the damage made by the drought phenomenon is an
89 inevitable issue. The most commonly utilized prediction models are artificial neural network
90 (ANN) Borji et al., (2016),support vector machines (SVMs) (Ahmad et al. 2010; Weng 2012).
91 Khan et al. (2020)developed wavelet-based hybrid ANN-autoregressive integrated moving
92 average (ARIMA) models for predicting the SPI and the Standard Index of Annual Precipitation
93 (SIAP). They indicated that the proposed model can provide very promising accuracy to predict
94 the drought events. Li et al. (2020) used three data models (i.e., SVM, random forest (RF), and
95 ARIMA) to predict the SPEI in the Guangzhong area in China. The authors showed that the RF
96 and SVM models have better performance to predict the SPEI than the ARIMA model. Three
97 data-driven models (i.e., SVM, ANN, and k-Nearest Neighbor (KNN)) were investigated for
98 predicting drought over Pakistan (Khan et al., 2020). According to the results, the SVM and ANN
99 were more accurate than the KNN to predict the droughts.
100 The main goal of the present study was to introduce a novel data-driven model for
101 predicting extreme droughts over India based on the Combined Terrestrial Evapotranspiration
102 Index (CTEI). This study employs the CTEI as the drought index because of its incorporation of
103 the GRACE data and meteorological variables (i.e., P and PET), thus better show the drought
104 than the other indices (Dharpure et al. 2020). In this study, two novel models called SMO-SVM
105 and greedy models are utilized for predicting the CTEI, which are not yet employed in
106 developing drought prediction models. Therefore, an efficiency comparison between the CTEI
107 prediction methods developed with the SMO-SVM and greedy methods will make an unrivaled
108 opportunity for specifying their pros and cons. This study is also the first research which
109 implemented the development of India-wide drought prediction models. Consequently, the 110 proposed drought prediction models can help in better managing water resources systems
111 required for India’s agriculture-dependent society.
112
113 2 Material and methods
114 2.1 Study Area
115 The Ganga river basin (GRB) is one of the most populous (about 440 million people)
116 river systems in the World (Anand et al. 2018). The basin is situated in the northern part of the
117 country and lies between latitude 21º 32’ 8.6” - 31º 27’ 36.2” N and longitude 73º 14’ 33.4” - 90º
118 53’ 18.9” E, over an area of 10,86,000 km2 (Fig. 1).The GRB outspreads in four countries, i.e.,
119 India (79%), Nepal (14%), Bangladesh (4%), and China (3%). In India, it covers 8,61,452 km2,
120 which is nearly 26% of the total geographic area of the country(India-WRIS 2012). The GRB
121 originates in the Himalayan Mountains at the snout of the Gangotri glacier at an elevation of
122 ~7000 m a.s.l. The confluence of the Bhagirathi River and Alaknanda Rivers joins in the town of
123 Devprayag, then officially called the Ganga River. The main tributaries of the Ganga River are
124 the Yamuna, the Ramganga, the Gomti, the Ghaghra, the Sone, the Gandak, the Kosi, and the
125 Mahananda. And, it flows for about 2,510 km, generally southeastward, through a vast plain to
126 the Bay of Bengal.
127 The main source of water in the Ganga River is a surface runoff generated by the
128 precipitation (~66%), base flow (~14%), glacier melt (~11.5%), and snowmelt (~8.5%). The
129 GRB received 84% of total rainfall during the monsoon season (June to October). Though, the
130 monsoon season accounts for 75% of the rain in the upper basin and 85% of the rain in the lower
131 basin (Shrestha et al. 2015). The elevation range varies from sea level to the highest mountain
132 peak (~8850 m a.s.l) in the world. 133 2.2 Data used
134 2.2.1 GRACE Terrestrial Water Storage anomalies (TWSA)
135 In this study, the 168 monthly solutions of GRACE data (level-3 RL-05, special
136 harmonics) at 1°×1° spatial resolution were acquired from three research agencies, i.e., Center
137 for Space Research (CSR) at the University of Austin/Texas, NASA Jet Propulsion Laboratory
138 (JPL) and the German Research Centre for Geosciences (GFZ) were used for TWSA changes
139 from January 2003 to December 2016 (14 years) (Table 1). However, 17 months of GRACE
140 solutions were missing during the observation period; therefore, the missing solution of a
141 particular month was replaced by the available solutions of sequent months (before and after)of
142 the missing months (Long et al. 2015; Yang et al. 2017). The GRACE TWSA obtained from
143 three data centre’s solution (JPL, GFZ, and CSR) were averaged for reducing the noise in the
144 gravity field (Sakumura et al. 2014; Xiao et al. 2015). The TWSA includes the groundwater
145 storage (GWSA), soil moisture storage (SMSA), canopy water storage (CWSA), surface water
146 storage (SWSA), and snow water equivalent (SWEA) anomaly, expressed by Eq. 1.
147 (1)
148 2.2.2 GLDAS observation
149 GLDAS is a joint project designed by NASA, the National Oceanic and Atmospheric
150 Administration (NOAA), and the National Centers Environmental Prediction (NCEP) by
151 integrating the hydrological components obtained from the ground and satellite-based observation
152 with fine spatial and temporal resolution (Rodell et al. 2004). The details of the data products are
153 described by Rui and Beaudoing (2018). The GLDAS data comprises four Land Surface models
154 (LSM) data, i.e., the Community Land Model (CLM2.0) (Dai et al. 2003), Variable Infiltration 155 Capacity (VIC) (Liang et al., 1994; Srivastava et al., 2020a, 2020b; Maza et al., 2020), NOAH
156 (Chen et al. 1996; Ek et al. 2003), and Mosaic (Koster and Suarez 1996). The spatial resolution of
157 all the LSMs data is about 1°× 1°. For monthly TWS data through GLDAS, a summation of
158 monthly soil moisture (SM) layer, snow water equivalent (SWE), and the Canopy Water Storage
159 (CWS) from four LSM datasets were used from 2003 to 2016 (Table 1). The average of four
160 LSM datasets was used to estimate the monthly TWS with minimum bias (Yang et al. 2017).
161 Other researchers adopted a similar approach(Ahmed and Abdelmohsen 2018; Wu et al. 2019).
162 None of these LSMs dataset includes groundwater storage and surface water storage (SWS)(Dai
163 et al. 2003; Rodell et al. 2004). We assume that the SWS in the study area likely to be a minor
164 component or small contribution over the region; therefore, we have neglected in this study.
165 Numerous studies neglect the SWS changes for GWS estimation (Strassberg et al. 2007; Rodell
166 et al. 2009; Tiwari et al. 2009). The GLDAS TWSA was converted into anomalies with the same
167 consideration of GRACE data (baseline period of January 2004 to December 2009). The GWSA
168 is obtained by subtracting the model-based from the (Sun et al. 2019)
169 by rearranging Eq. 1. GLDAS has been used in numerous studies to isolate the GWSA from the
170 GRACE derived TWSA in different regions of the world (Rodell et al. 2007; Leblanc et al. 2009;
171 Tiwari et al. 2009).
172 2.2.3 Tropical Rainfall Measuring Mission (TRMM)
173 The Tropical Rainfall Measuring Mission (TRMM) 3B43monthly research Version 7
174 (TRMM-3B43-V7) precipitation data at 0.25º × 0.25º spatial resolution was used over the Ganga
175 river basins during the period 2003–2016 (Huffman et al. 2018) (Table 1). This product is
176 designed for global precipitation analysis, and its algorithm combines several instruments
177 (Huffman et al. 2007). TRMM precipitation products were widely used worldwide. Several 178 authors have compared the TRMM data with observational data and reported good accuracy (Jia
179 et al. 2011; Yang et al. 2017; Khan et al. 2018).
180 2.2.4 Potential Evapotranspiration (PET)
181 The daily global potential evapotranspiration (PET) with a spatial resolution of 1°× 1°
182 datasets were used from 2003 to 2016 (Table 1). This data is generated from the climate
183 parameter, i.e., extracted from the Global Data Assimilation System (GDAS) analysis fields. The
184 NOAA generates the GDAS data every 6 hours, and it is freely available on the USGS website
185 https://earlywarning.usgs.gov/fews/product/81. The daily PET is calculated on a spatial basis
186 using the Penman-Monteith equation (Allen et al. 2006; Shuttleworth 1992; Srivastava et al.
187 2017, 2018; Kumari and Srivastava 2020). The monthly and yearly PET was obtained using the
188 accumulation of daily data.
189 2.2.5 CTEI Description and Calculation
190 This index was developed by Dharpure et al. (2020) using hydrological and climatological
191 conditions over the Indus, Ganga, and Brahmaputra river basins, and also highlighted that the
192 CTEI was highly correlated with the ground observation wells. The CTEI index was derived
193 using satellite data and hydro-meteorological parameters for the period 2003–2016. They have
194 also revealed that this index is able to quantify the drought occurrence and their severity on
195 spatial as well as temporal scale. This model presents a comprehensive picture to estimate the
196 drought events at regional scale where the ground observation is limited. The normalized
197 distribution of all input variables in modeling CTEI is illustrated in Fig.2. 198 2.3 Methodology
199 2.3.1 SMO-SVM Algorithm
200 One of the most advanced and relatively new machine-learning algorithms in the data-
201 dominated science and research field can be assigned to Support Vector Machine (SVM)
202 algorithms. These SVMs are basically structured on the statistical learning theories and developed
203 from the conceptual optimization hypothesis (Vapnik 1995; Vapnik 1998). It is generally used in
204 order to achieve the best generalization capability for both the empirical relations and confidence
205 intervals of machine learning. These SVMs have always been shown to perform extremely well
206 and efficient for the optimizations and regressions studies (Vapnik et al. 1997; Collobert et al.
207 2001). These are assumed and proven to be highly robust in nature for extremely noise mixed
208 data in comparison to the other local models and algorithms which uses traditional chaotic
209 methods. The SVM estimation function (F) in any given regression scenario can be defined as:
210 2)
211 Where W is the weight age vector; Tf represents the nonlinear transfer function, which projects
212 the input vectors towards a very high dimension feature space; and b is the constant variable. Fig.
213 3 shows the schematic representation of the SVM algorithm.
214 A convex dual optimization problem was introduced by Vapnik (1995) that created an
215 insensitivity loss function. Several algorithms have been developed and suggested for solving
216 these dual optimization issues in the SVM (Shevade et al. 2000; Schölkopf and Smola 2002;
217 Adamala and Srivastava 2018). In this current study, the Sequential Minimal Optimization (SMO)
218 algorithm is used along with the SVM algorithm in order to solve the dual optimization problem.
219 Several studies, such as Platt (1999) and Schölkopf and Smola (2002), have used these SMO in 220 combination with SVMs. One of the major advantages for using SMO is that one can obtain any
221 analytical solution of the subset that can be directly obtained without the help of a quadratic
222 optimizer. Henceforth, all the model parameters of SVMs are trained by using the SMO
223 algorithm in this study. The programming codes from the Library for Support Vector Machines
224 (LIBSVM) are used for the calibration and validation of all the datasets (Chang and Lin 2011).
225 2.3.2 A Greedy Method
226 A greedy strategy is a simple, intuitive algorithmic paradigm that starts optimal solution
227 from the status quo and is used for solving problem-solving heuristic, and repeatedly chooses the
228 ‘superior’ choice from a fixed number of options based on the existing rather than following
229 situations until no alternative is available (Black 2005).There has been a wide existence of this
230 method in the literature for several decades.
231 In the greedy algorithm, there are several input parameters such as positive real numbers
232 , min, < 1, and a function : N × N → N. These parameters , min, and controls the
233 iteration duration for each simulation. The function takes the values and as inputs and sends
234 backs the several numbers of iterations towards the algorithm main stage (Fig.4). In order to
235 provide the best solutions, it is required to run the considerable iterations for maximizing the best
236 fit and its approximation. Mainly there are two steps to carry out this whole process, such as:
237 Initially, we apply a greedy algorithm in order to obtain the attainable solution, which is followed
238 by the destruction phase. In this phase, we eliminate some of the elements from the current
239 solution keeping the partial soluton aside. In the third stage of this process, which is called as
240 construction phase, we implement the algorithm to the partial solution to obtain a different
241 feasible solution.Further, we repeat the second and third stages until the best condition is
242 satisfied. The combination of all variables was implemented as inputs to two models, as shown in 243 Table 2 to select the best combination ofinput variables that has a high correlation with CTEI.
244 The analyzed datasets were divided into two sections: the training period represented 80% of the
245 dataset, and the remaining (20%) was for the testing period.
246 2.4 Statistical analysis
247 The actual data of CTEI and modeled values were compared through the period of this
248 study. To evaluate the accuracy of models, the following statistical indicators have been selected
249 using root mean square error, coefficient of determination, and mean absolute error (Elbeltagi et
250 al. 2020 b, a, c, d). All parameters defined as follows: is an observed or actual value,
251 is simulated or foreseen value, is the mean value of reference samples, and N is
252 the total number of data points.
253 1. Root mean square error
254 RMSE is the sample standard deviation of the differences between foreseen and actual values. It 255 is given by:
256 (3)
257 2. Mean Absolute Error
258 The mean absolute error evaluates the mean magnitude of the errors in a set of forecasts, without
259 considering their sign. It’s an average over the test sample of the absolute differences between
260 foreseen and actual values. It is defined as:
261 (4)
262 3. Relative Absolute Error
263 The relative absolute error takes the total absolute error and normalizes it by dividing by the total
264 absolute error of the simple predictor.
265 (5)
266 4. Root Relative Squared Error
267 The relative squared error takes the total squared error and normalizes it by dividing by the total
268 squared error of the simple predictor. By taking the square root of the relative squared error one
269 reduces the error to the same dimensions as the quantity being predicted.
270 (6)
271 3 Results and Discussions
272 The descriptive statistics were analyzed for the data collected over the studied years, as
273 shown in Table 3. This table showed that the SD had the highest value of 32.74 at SWN and the
274 lowest value of 0.33 at WS. As mentioned before, 13 input variables comprised of GRACE TWSA,
275 GLDAS TWSA, GWSA, P, LST, WS, ET, Runoff, Air Temp, NR, SWN, LWN, and PET were
276 implemented for prediction of CTEI. Among available all influence inputs, 13 different
277 combinations were examined using two soft computing models, namely a Greedy algorithm and
278 SMO-SVM models. Table 4 demonstrates the performance metrics of the Greedy model, based
279 on all datasets, for all combinations. The result clearly shows that Combo3 (R=0.77, RMSE=0.49,
280 MAE=0.38, and RAE=63.23%) and Combo2(R=0.77, RMSE=0.5, MAE=0.37, and 281 RAE=63.63%) by having highest correlation coefficient and lowest error metrics are identified as
282 the superior input combination for prediction of CTEI, respectively. Besides, the Combo13
283 (R=0.51 and RMSE=0.67) has the worth performance among all combinations. The statistical
284 criteria for the SMO-SVM model are listed in Table 5. It is crystal clear that Combo2 and
285 Combo3 have promising predictive performance in term of (R=0.81, RMSE=0.46, MAE=0.35
286 and RAE=57.25%) and (R=0.80, RMSE=0.46, MAE=0.34 and RAE=59.27%), respectively. The
287 statistical measures for training and testing phases are tabulated in Tables A and B in the
288 appendix. Fig. 5 depicted the comparison between Greedy and SMO-SVM models in terms of
289 five considered metrics for all combinations. A closer look at the bar plot of metrics reveals that
290 the best performance has occurred in Combo 2 and 3 for both the AI-based model, and it can be
291 included that SMO-SVM is superior to the Greedy model in estimating CTEI. Fig. 6 shows the
292 scatter plots of all combinations for Greedy and SMO-SVM models in both training and testing
293 phases. The distribution of the predicted points around line 1:1 demonstrates that for both Greedy
294 and SMO-SVM models, Combo2 and Combo3 have more consistency with the measured value
295 of CTEI. Although, the Combos 2, 3, and even Combo 4 of the SMO-SVM model perform better
296 than Combo 2 and Combo 3 of the greedy model. The results obtained generated better
297 performance compared to the findings of El Ibrahimi and Baali (2018), who applied SVM for
298 predicting standardized precipitation index (SPI) and found that the determination coefficients
299 (R2) ranged from 0.52 to 0.77 for training and testing periods. As well, the developed models
300 achieved an acceptable result than Dikshit et al. (2020), who stated that the highest correlation
301 coefficient for modeling Standard Precipitation Evaporation Index (SPEI) was 0.75 by using
302 SVM. In addition, These results are in agreement with those from Borji et al., (2016), who
303 simulated the stream flow drought index (SDI)of different multi timescales and indicated that R2
304 ranged from 0.65 to 0.99. Also, SVM has been used for predicting drought based on SPEI by Deo 305 et al. (2018) and found that R2 ranged from 0.52 to 0.98. Moreover, these findings were extremely
306 an acceptable and agree with those suggested by Fung et al., (2019), who applied an improved
307 SVM for estimation of SPEI-1, SPEI-3 and SPEI-6, at various timescales and their findings
308 resulted that R2 ranged from 0.83 to 0.95 for training and from 0.79 to 0.90 for validation periods.
309 Furthermore, our results are similar to the results of (Komasi et al. 2018), who found that SVM
310 algorithm-generated R2 of 0.86 for modeling SPI. Besides, these results coincide with the
311 observations of Shamshirband et al.(2020), who found that R2 varied from 0.66 to 0.92 for
312 prediction of SPEI, and varied from 0.67 to 0.90 for prediction of SPI during six-time delays.
313 Moreover, the model finding were similar to those revealed by Zhang et al., (2019), who forecast
314 the drought by applying SVM and illustrated that the R2 between SPEI calculated versus
315 observed values were 0.827 to 0.833.
316 To get a better evaluation, visualization was applied by using a violin plot for each
317 combination of models in Fig.7. Considering the shape of the violin plots, it can be concluded
318 that for Greedy model the Combos 1, 2, and 3 have the closed predictive performance in
319 predicting CTEI and Combo7 stand in forth rank. Also, in SMO-SVM the Combo 1, 2, 3, and 4
320 can yield promising results in comparison other combination. Moreover, the Taylor diagrams for
321 testing phase are depicted in Fig.5 for two considered model. Comparison between cut come of
322 two model prove that Combo2, 3 and 4 in SMO-SVM and Combo2 and Combo3 in Greedy
323 model due to having minimal physical distance to target point are identified as the best
324 combination for prediction of CTEI. Hereafter, Combos 2 and 3 form two models are examined
325 for error analysis to introduce the most efficient and accurate predictive model.
326 Fig.8 depicted the violin distribution function (Left) and diffusion plots (Right) of
327 relative error for the best selected combination of Greedy and SMO-SVM model in testing phase. 328 The error analysis the best selection combinations in testing phase implies that optimum relative
329 error range for Greedy model is belong to Comb2 with ( 405 Er 680 ) and for SMO-SVM
330 model is owing to the Combo2 with ( 627 Er 788 ).Besides, the attained result demonstrated
331 that the SMO-SVM (Combo2, R=0.81) in spite of having higher range of relative error is superior
332 to the Greedy model (Combo3, R=0.77) for estimation of CTEI.
333 Eventually, Fig.9 shows the cumulative frequency of relative percentage absolute error
334 (RPAE) for all best combination in both AI models in tree scales comprised of all datasets,
335 training and testing stages. According to the Fig.10, more than 60% of all data points for Greedy-
336 Combo2, Greedy-Combo3, SMO-SVM-Combo2, and SMO-SVM-Combo3 have the RPAE less
337 than79%, 75.35%, 64.23%, and 67.2, respectively. Furthermore, for 90% of all data points,
338 Greedy and SMO-SVM models can predict CTEI by the RPAE less than356.36, 356.36, 355, and
339 356.25 for second and third combinations, respectively. The cumulative frequency of RPAE for
340 training dataset (80% of all datasets) shows that the best selection input combinations of SMO-
341 SVM model are fairly promising approaches for prediction of CTEI. So, the performed error
342 analysis proves that Combo 2 and Combo 3 owing to SMO-SVM model are superior to the best
343 combinations of Greedy model. However, due to the close accuracy of the performance of the
344 Combo2 and Combo3 can be inferred that the Combo3 with less input variable (11 influence
345 variables) is considered as the first priority model for prediction of CTEI. It is worth noting that
346 the entire selected alternative AI models for range of ( 0.5 CTEI 0.5 ) cannot achieve to the
347 high accuracy for prediction of CTEI. 348 4 Conclusion
349 Assessment of drought is one of the most important tasks in the present condition as it
350 leads to several adverse effects on the soil-water-atmosphere cycles of the earth systems across
351 the different climate of the world. There exists a several techniques and machine learning
352 methods in the literature to quantify drought; however based on CTEI this is one of the unique
353 study that compares and contrast the relative role of improve SMO algorithm for support vector
354 machine and a greedy linear method. Thirteen input variables were implemented for prediction of
355 CTEI using an improved SMO algorithm for support vector machine and a greedy linear
356 regression method. Several combinations of each method were made to determine the best fit by
357 using the statistical indicators and error measures. Based on the results of greedy model, we
358 found that Combo3 and Combo2 respectively yield the highest performance than other combos
359 with having correlation coefficient, R=0.77, RMSE=0.49, MAE=0.38 and RAE=63.23%) and
360 (R=0.77, RMSE=0.5, MAE=0.37 and RAE=63.63%). These two combos have the highest value
361 of R and lowest error metrics and it suggested as the superior input combination for prediction of
362 CTEI, over IGB river basin. On the other hand, the least efficiency was shown by combo13 with
363 R = 0.51 and RMSE=57.25% among all other 13 combinations. When coming towards the
364 results of SMO-SVM mode, combo2 and combo2 were having highest performance with R=0.81,
365 RMSE=0.46, MAE=0.35 and RAE=57.25%) and (R=0.80, RMSE=0.46, MAE=0.34 and
366 RAE=59.27% respectively. These combinations are similar to the greedy model but SMO-SVM
367 has slight better performance. Among all the combos, the combo2 and combo3 are well suited for
368 the estimation of CTEI over IGB using both the soft computing techniques. Further, in support to
369 other error measure the RPAE also showed that combo2 and combo3 of Greedy and SMO-SMV
370 model performed better with having less than 79%, 75.35%, 64.23%, and 67.2, respectively. 371 Apart from this in the Greedy model the Combos 3 also has the closed predictive performance in
372 predicting CTEI and followed by Combo7. On the other hand, for SMO-SVM Combo 3 and 4
373 also yield promising results in comparison to rest of the other combinations. The cumulative
374 frequency of RPAE for training dataset (80% of all datasets) shows that the best selection input
375 combinations of SMO-SVM model are fairly promising approaches for prediction of CTEI. So,
376 the perform error analysis proves that Combo 2 and Combo 3 owing to SMO-SVM model are
377 superior to the best combinations of Greedy model and can be one of the promising alternatives
378 for the estimation of CTEI over large river basins of different hydro-climatic regions of the world.
379
380 Acknowledgments: Thanks to the National Centre for Polar and Ocean Research, Goa, India,
381 and Indian Institute of Technology, Roorkee, India
382 Authors Contribution
383 Ahmed Elbeltagi had the original idea of the research. Ahmed Elbeltagi, Jaydeo K. Dharpure,
384 Mehdi Jamei, Ankur Srivastava, Iman Ahmadianfar: Conceptualization, Formal analysis, Writing
385 - original draft, Writing - review & editing. Mahfuzur Rahman, Harun I. Gitari: Writing- review
386 & editing. All authors read and accept the manuscript.
387 Funding: This research did not receive any specific grant from funding agencies in the public, 388 commercial, or not-for-profit sectors.
389 Data Availability Statements: The datasets generated during and/or analysed during the current 390 study are available from the corresponding author on reasonable request.
391 Compliance with ethical standards 392 Ethics approval: The authors confirm that this article is original research and has not been 393 published or presented previously in any journal or conference in any language (in whole or in 394 part).
395 Conflict of interest: The authors declare that they have no conflict of interest
396 Consent to participate and consent to publish: The authors declare that they have consent to 397 participate and consent to publish
398
399 References
400 Adamala S, Srivastava A (2018) Comparative evaluation of daily evapotranspiration using
401 artificial neural network and variable infiltration capacity models. Agricultural Engineering
402 International: CIGR Journal, J 20(1):32–39
403 Ahmad S, Kalra A, Stephen H (2010) Estimating soil moisture using remote sensing data: A
404 machine learning approach. Adv Water Resour 33:69–80
405 Ahmed M, Abdelmohsen K (2018) Quantifying Modern Recharge and Depletion Rates of the
406 Nubian Aquifer in Egypt. Surv Geophys 39:729–751. doi: 10.1007/s10712-018-9465-3
407 Allen PM, Harmel RD, Dunbar JA, Arnold JG (2011) Upland contribution of sediment and
408 runoff during extreme drought: A study of the 1947-1956 drought in the Blackland Prairie,
409 Texas. J Hydrol 407:1–11. doi: 10.1016/j.jhydrol.2011.04.039
410 Allen RG, Perieira LS, Raes D, Smith M (2006) Irrigation and Drainage Paper Crop No. 56
411 Anand J, Gosain AK, Khosa R, Srinivasan R (2018) Regional scale hydrologic modeling for
412 prediction of water balance, analysis of trends in streamflow and variations in streamflow: 413 The case study of the Ganga River basin. J Hydrol Reg Stud 16:32–53. doi:
414 10.1016/j.ejrh.2018.02.007
415 Borji M, Malekian A, Salajegheh A, Ghadimi M (2016) Multi-time-scale analysis of hydrological
416 drought forecasting using support vector regression (SVR) and artificial neural networks
417 (ANN). Arab J Geosci 9:. doi: 10.1007/s12517-016-2750-x
418 Brown JF, Wardlow BD, Tadesse T, et al (2008) The Vegetation Drought Response Index
419 (VegDRI): A New Integrated Approach for Monitoring Drought Stress in Vegetation.
420 GIScience Remote Sens 45:16–46. doi: 10.2747/1548-1603.45.1.16
421 Chen F, Mitchell K, Schaake J, et al (1996) Modeling of land surface evaporation by four
422 schemes and comparison with FIFE observations. J Geophys Res Atmos 101:7251–7268.
423 doi: 10.1029/95JD02165
424 Dai Y, Zeng X, Dickinson RE, et al (2003) The common land model. Bull Am Meteorol Soc
425 84:1013–1023. doi: 10.1175/BAMS-84-8-1013
426 Deo RC, Salcedo-Sanz S, Carro-Calvo L, Saavedra-Moreno B (2018) Drought prediction with
427 standardized precipitation and evapotranspiration index and support vector regression
428 models. Elsevier Inc.
429 Dharpure JK, Goswami A, Patel A, et al (2020) Drought characterization using the Combined
430 Terrestrial Evapotranspiration Index over the Indus, Ganga and Brahmaputra river basins.
431 Geocarto Int 0:1–25. doi: 10.1080/10106049.2020.1756462
432 Dikshit A, Pradhan B, Alamri AM (2020) Temporal hydrological drought index forecasting for
433 New South Wales, Australia using machine learning approaches. Atmosphere (Basel) 11:1– 434 13. doi: 10.3390/atmos11060585
435 Ek MB, Mitchell KE, Y. L, et al (2003) Implementation of Noah land surface model advances in
436 the National Centers for Environmental Prediction operational mesoscale Eta model. J
437 Geophys Res 108:8851. doi: 10.1029/2002JD003296
438 El Ibrahimi A, Baali A (2018) Application of several artificial intelligence models for forecasting
439 meteorological drought using the standardized precipitation index in the Saïss Plain
440 (Northern Morocco). Int J Intell Eng Syst 11:267–275. doi: 10.22266/ijies2018.0228.28
441 Elbeltagi A, Deng J, Wang K, et al (2020a) Modeling long-term dynamics of crop
442 evapotranspiration using deep learning in a semi-arid environment. Agric Water Manag
443 241:106334. doi: 10.1016/j.agwat.2020.106334
444 Elbeltagi A, Deng J, Wang K, Hong Y (2020b) Crop Water footprint estimation and modeling
445 using an arti fi cial neural network approach in the Nile Delta , Egypt. Agric Water Manag
446 235:106080. doi: 10.1016/j.agwat.2020.106080
447 Elbeltagi A, Zhang L, Deng J, et al (2020c) Modeling monthly crop coefficients of maize based
448 on limited meteorological data : A case study in Nile Delta , Egypt. Comput Electron Agric
449 173:105368. doi: 10.1016/j.compag.2020.105368
450 Feng Y, Cui N, Gong D, et al (2017) Evaluation of random forests and generalized regression
451 neural networks for daily reference evapotranspiration modelling. Agric Water Manag
452 193:163–173. doi: 10.1016/j.agwat.2017.08.003
453 Fung KF, Huang YF, Koo CH, Mirzaei M (2019) Improved SVR machine learning models for
454 agricultural drought prediction at downstream of Langat River Basin, Malaysia. J Water 455 Clim Chang 1–16. doi: 10.2166/wcc.2019.295
456 Hao Z, AghaKouchak A (2013) A Nonparametric Multivariate Multi-Index Drought Monitoring
457 Framework. J Hydrometeorol 15:89–101. doi: 10.1175/jhm-d-12-0160.1
458 Huffman GJ, Bolvin DT, Braithwaite D, et al (2018) Algorithm Theoretical Basis Document
459 (ATBD) Version 5.2 NASA - NASA Global Precipitation Measurement (GPM) Integrated
460 Multi-satellitE Retrievals for GPM (IMERG) - Algorithm Theoretical Basis Document
461 (ATBD) Version 5.2. IMERG Algorithm Theor Basis Doc IMERG Algo:1–31. doi:
462 https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-
463 imerg-algorithm-theoretical-basis-
464 Huffman GJ, Bolvin DT, Nelkin EJ, et al (2007) The TRMM Multisatellite Precipitation Analysis
465 (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales.
466 J Hydrometeorol 8:38–55. doi: 10.1175/JHM560.1
467 India-WRIS (2012) River basin atlas of India. RRSC-West, NRSC, ISRO, Jodhpur, India
468 Jia S, Zhu W, Lu A, Yan T (2011) A statistical spatial downscaling algorithm of TRMM
469 precipitation based on NDVI and DEM in the Qaidam Basin of China. Remote Sens Environ
470 115:3069–3079. doi: 10.1016/j.rse.2011.06.009
471 Khan A, Koch M, Chinchilla K (2018) Evaluation of Gridded Multi-Satellite Precipitation
472 Estimation (TRMM-3B42-V7) Performance in the Upper Indus Basin (UIB). Climate 6:76.
473 doi: 10.3390/cli6030076
474 Khan MMH, Muhammad NS, El-Shafie A (2020a) Wavelet based hybrid ANN-ARIMA models
475 for meteorological drought forecasting. J Hydrol 590:125380 476 Khan N, Sachindra DA, Shahid S, et al (2020b) Prediction of droughts over Pakistan using
477 machine learning algorithms. Adv Water Resour 139:103562
478 Komasi M, Sharghi S, Safavi HR (2018) Wavelet and cuckoo search-support vector machine
479 conjugation for drought forecasting using Standardized Precipitation Index (case study:
480 Urmia Lake, Iran). J Hydroinformatics 20:975–988. doi: 10.2166/hydro.2018.115
481 Koster RD, Suarez MJ (1996) Energy and Water Balance Calculations in the Mosaic LSM.
482 NASA Tech Memo 9:76
483 Leblanc MJ, Tregoning P, Ramillien G, et al (2009) Basin-scale, integrated observations of the
484 early 21st century multiyear drought in Southeast Australia. Water Resour Res 45:1–10. doi:
485 10.1029/2008WR007333
486 Li J, Zhang S, Huang L, et al (2020) Drought prediction models driven by meteorological and
487 remote sensing data in Guanzhong Area, China. Hydrol Res 1–17. doi: 10.2166/nh.2020.184
488 Liang X, Lettenmaier DP, Wood EF, Burges SJ (1994) A simple hydrologically based model of
489 land surface water and energy fluxes for general circulation models. J Geophys Res
490 99:14415. doi: 10.1029/94JD00483
491 Long D, Yang Y, Wada Y, et al (2015) Deriving scaling factors using a global hydrological
492 model to restore GRACE total water storage changes for China’s Yangtze River Basin.
493 Remote Sens Environ 168:177–193. doi: 10.1016/j.rse.2015.07.003
494 Malone BP, Styc Q, Minasny B, McBratney AB (2017) Digital soil mapping of soil carbon at the
495 farm scale: A spatial downscaling approach in consideration of measured and uncertain data.
496 Geoderma 290:91–99. doi: 10.1016/j.geoderma.2016.12.008 497 Marcos-Garcia P, Lopez-Nicolas A, Pulido-Velazquez M (2017) Combined use of relative
498 drought indices to analyze climate change impact on meteorological and hydrological
499 droughts in a Mediterranean basin. J Hydrol 554:292–305
500 McKee TB, Doesken NJ, Kleist J (1993) The relationship of drought frequency and duration to
501 time scales. In: Proceedings of the 8th Conference on Applied Climatology. Boston, pp 179–
502 183
503 Mo KC (2011) Drought onset and recovery over the United States. J Geophys Res Atmos 116:
504 Oloruntade AJ, Mohammad TA, Ghazali AH, Wayayok A (2017) Analysis of meteorological and
505 hydrological droughts in the Niger-South Basin, Nigeria. Glob Planet Change 155:225–233
506 Palmer WC (1965) Meteorological droughts. US Department of Commerce Weather Bureau
507 Research Paper 45, 58 pp
508 Rivera JA, Penalba OC, Villalba R, Araneo DC (2017) Spatio-temporal patterns of the 2010–
509 2015 extreme hydrological drought across the Central Andes, Argentina. Water 9:652
510 Rodell M, Chen J, Kato H, et al (2007) Estimating groundwater storage changes in the
511 Mississippi River basin (USA) using GRACE. Hydrogeol J 15:159–166. doi:
512 10.1007/s10040-006-0103-7
513 Rodell M, Houser PR, Jambor U, et al (2004) The Global Land Data Assimilation System. Am
514 Meteorol Soc
515 Rodell M, Velicogna I, Famiglietti JS (2009) Satellite-based estimates of groundwater depletion
516 in India. Nature 460:999–1002. doi: 10.1038/nature08238 517 Rui H, Beaudoing HK (2018) README Document for GLDAS Version 2 Data Products.
518 Goddard Earth Sci Data Inf Serv Cent (GES DISC) 1–32. doi:
519 http://hydro1.sci.gsfc.nasa.gov/
520 data/s4pa/GLDAS/GLDAS_NOAH10_M.2.0/doc/README_ GLDAS2.pdf
521 Sakumura C, Bettadpur S, Bruinsma S (2014) Ensemble prediction and intercomparison analysis
522 of GRACE time-variable gravity field models. Geophys Res Lett 41:1389–1397. doi:
523 10.1002/2013GL058632
524 Shamshirband S, Hashemi S, Salimi H, et al (2020) Predicting Standardized Streamflow index for
525 hydrological drought using machine learning models. Eng Appl Comput Fluid Mech
526 14:339–350. doi: 10.1080/19942060.2020.1715844
527 Shrestha AB, Agrawal NK, Alfthan B, et al (2015) The Himalayan Climate and Water Atlas
528 Shukla S, Wood AW (2008) Use of a standardized runoff index for characterizing hydrologic
529 drought. Geophys Res Lett 35:1–7. doi: 10.1029/2007GL032487
530 Shuttlewor WJ (1992) Evaporation
531 Sinha D, Syed TH, Reager JT (2019) Utilizing combined deviations of precipitation and
532 GRACE-based terrestrial water storage as a metric for drought characterization: A case
533 study over major Indian river basins. J Hydrol 572:294–307. doi:
534 10.1016/j.jhydrol.2019.02.053
535 Strassberg G, Scanlon BR, Rodell M (2007) Comparison of seasonal terrestrial water storage
536 variations from GRACE with groundwater-level measurements from the High Plains
537 Aquifer (USA). Geophys Res Lett 34:1–5. doi: 10.1029/2007GL030139 538 Sun AY, Scanlon BR, Zhang Z, et al (2019) Combining Physically Based Modeling and Deep
539 Learning for Fusing GRACE Satellite Data: Can We Learn From Mismatch? Water Resour
540 Res 1179–1195. doi: 10.1029/2018WR023333
541 Sun Z, Zhu X, Pan Y, et al (2018) Drought evaluation using the GRACE terrestrial water storage
542 deficit over the Yangtze River Basin, China. Sci Total Environ 634:727–738. doi:
543 10.1016/j.scitotenv.2018.03.292
544 Tirivarombo S, Osupile D, Eliasson P (2018) Drought monitoring and analysis: standardised
545 precipitation evapotranspiration index (SPEI) and standardised precipitation index (SPI).
546 Phys Chem Earth, Parts A/B/C 106:1–10
547 Tiwari VM, Wahr J, Swenson S (2009) Dwindling groundwater resources in northern India, from
548 satellite gravity observations. Geophys Res Lett 36:1–5. doi: 10.1029/2009GL039401
549 Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive
550 to global warming: the standardized precipitation evapotranspiration index. J Clim 23:1696–
551 1718
552 WANG L, CHEN W (2014) Applicability analysis of standardized precipitation
553 evapotranspiration index in drought monitoring in China. Plateau Meteorol 33:423–431
554 Wang Y, Li J, Feng P, Hu R (2016) Analysis of drought characteristics over Luanhe River basin
555 using the joint deficit index. J Water Clim Chang 7:340–352
556 Weng Q (2012) Remote sensing of impervious surfaces in the urban areas: Requirements,
557 methods, and trends. Remote Sens Environ 117:34–49
558 Wilhite DA (2000) Drought as a natural hazard: concepts and definitions 559 Wu Q, Si B, He H, Wu P (2019) Determining Regional-Scale Groundwater Recharge with
560 GRACE and GLDAS. Remote Sens 11:154. doi: 10.3390/rs11020154
561 Xiao R, He X, Zhang Y, et al (2015) Monitoring groundwater variations from satellite gravimetry
562 and hydrological models: A comparison with in-situ measurements in the mid-atlantic
563 region of the United States. Remote Sens 7:686–703. doi: 10.3390/rs70100686
564 Yang P, Xia J, Zhan C, et al (2017) Monitoring the spatio-temporal changes of terrestrial water
565 storage using GRACE data in the Tarim River basin between 2002 and 2015. Sci Total
566 Environ 595:218–228. doi: 10.1016/j.scitotenv.2017.03.268
567 Zargar A, Sadiq R, Naser B, Khan FI (2011) A review of drought indices. Environ Rev 19:333–
568 349
569 Zhang Y, Yang H, Cui H, Chen Q (2019) Comparison of the Ability of ARIMA, WNN and SVM
570 Models for Drought Forecasting in the Sanjiang Plain, China. Nat Resour Res 29:1447–1464.
571 doi: 10.1007/s11053-019-09512-6
572 Zhao M, Geruo A, Velicogna I, Kimball JS (2017) A Global Gridded Dataset of GRACE
573 Drought Severity Index for 2002–14: Comparison with PDSI and SPEI and a Case Study of
574 the Australia Millennium Drought. J Hydrometeorol 18:2117–2129. doi: 10.1175/JHM-D-
575 16-0182.1
576 Figures
Figure 1
Location map of the Ganga river basin and its tributaries along with varying elevation.
Figure 2 The normalized distribution of all input variables in modeling CTEI
Figure 3
A schematic daigram to explain the SVM algorithm (adapted from Gong et al., 2016) Figure 4
A schematic daigram to explain the Greedy algorithm (adapted from Cerrone et al., 2017)
Figure 5
The performance metrics bar plots for greedy and SVM-SMO models Figure 6
The scatter plots of all combinations for Greedy and SMO-SVM models (part 1) Figure 7
The comparison between the distribution function of measured and predicted CTEI in models for all datasets Figure 8
The Taylor diagram of all combination for Greedy (Up) and SMO-SVM (Down) models in testing phase Figure 9 the relative error (%) distribution function (Left) and measured CTEI versus relative error (%) (Right) for greedy and SMO-SVM model in superior combinations for testing phase Figure 10
The cumulative frequency of relative absolute error (%) for superior predictive model for index of CTEI
Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.
Tables.docx