<<

Applying GRACE satellite data and hydro- meteorological parameters for predicting the combined terrestrial evapotranspiration index (CTEI) based on an improved SVM and a greedy regression approach

Ahmed Elbeltagi (  [email protected] ) Mansoura University Faculty of Agriculture Mehdi Jamei shohadaye university Ankur Srivastava Newcastle University Iman Ahmadianfar -- Mahfuzur Rahman - Harun I. Gitari ty Jaydeo K. Dharpure e

Research Article

Keywords: Drought index, Climate change, Greedy regression method, GRACE, Remote sensing, Support vector machine

Posted Date: February 23rd, 2021

DOI: https://doi.org/10.21203/rs.3.rs-222279/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License 1 Applying GRACE satellite data and hydro-meteorological parameters for

2 predicting the combined terrestrial evapotranspiration index (CTEI) based on

3 an improved SVM and a greedy regression approach

1,2,* 3 4 5 4 Ahmed Elbeltagi , Mehdi Jamei , Ankur Srivastava , Iman Ahmadianfar , Mahfuzur

5 Rahman6 , Harun I. Gitari7, Jaydeo K. Dharpure 8,*

6 1Agricultural Engineering Dept., Faculty of Agriculture, Mansoura University, Mansoura 35516, 7 Egypt

8 2College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, 9

10 3Faculty of Engineering, Shohadaye Hoveizeh University of Technology, Dasht-e Azadegan, Iran

11 4School of Engineering, The University of Newcastle, Callaghan, NSW 2308,

12 5Department of Civil Engineering, Behbahan Khatam Alanbia University of Technology, 13 Behbahan, Iran

14 6 International University of Business Agricultural Technology, Dhaka 1230, Bangladesh

15 7 Department of Agricultural Science and Technology, School of Agriculture and Enterprise 16 Development, Kenyatta University, P. O. Box 43844-00100, Nairobi, Kenya.

17 8 Centre of Excellence in Disaster Mitigation and Management (CoEDMM), Indian Institute of 18 Technology Roorkee, Uttarakhand, India

19 *Corresponding author: Correspondence to Ahmed Elbeltagi ([email protected]);

20 Jaydeo K. Dharpure ([email protected]). 21 Applying GRACE satellite data and hydro-meteorological parameters for

22 predicting the combined terrestrial evapotranspiration index (CTEI) based on

23 an improved SVM and a greedy regression approach

24 Abstract

25 Drought has become relatively one of the widespread extreme events that affect socio-economic

26 sphere, ecological balance, environmental conditions, crop-yields, climate feedback mechanism,

27 and water resources management. Monitoring of the drought is one of the challenging tasks,

28 hence required to develop new techniques in prediction of this natural calamity accurately and

29 timely. In this work, Combined Terrestrial Evapotranspiration Index (CTEI) that is forced

30 through precipitation (P) and potential evapotranspiration (PET) is used for estimating drought.

31 Also, this is a novel index developed for Indus-Ganga river basin, which also uses a Gravity

32 Recovery and Climate Experiment (GRACE) terrestrial water storage anomalies (TWSA) data.

33 Thirteen input-variables comprised of GRACE-TWSA, Global Land Data Assimilation System

34 GLDAS-TWSA, Groundwater Storage Anomaly (GWSA), P, Land Surface Temperature

35 ( LST), Wind Speed (WS), Evapotranspiration (ET), Runoff, air temperature ( ), PET, net-

36 shortwave (SWN) and long-wave (LWN) along with net radiation (NR) radiations were

37 implemented for predicting CTEI. Further, thirteen different combinations were analyzed using

38 an improved Sequential Minimal Optimization (SMO) algorithm for support vector machine

39 (SVM) and a greedy linear regression method. Therefore, the objectives of this study are to

40 develop several combinations based on machine learning models used for modeling CTEI,

41 compare the accuracy and stability of these models in CTEI prediction, and select the best

42 outcomes have been provided from these models. The performed error measures and statistical 43 indicators prove that Combo 2 and 3 owing to SVM model are superior to the best combinations

44 of Greedy model. Overall based on the results, the improved SVM is superior to Greedy model in

45 estimating CTEI. SVM (Combo2: R=0.81) in spite of having a higher range of relative error, is

46 superior to Greedy model (Combo3, R=0.77) for estimating CTEI. This method can be one of

47 promising alternatives for estimating CTEI over major river basins across the world.

48 Keywords: Drought index; Climate change; Greedy regression method; GRACE; Remote

49 sensing; Support vector machine.

50 1 Introduction

51 One of the most significant catastrophic phenomena in nature is drought that has

52 potentially severe outcomes on agriculture, socio-economic, and ecosystems (Wilhite 2000; Mo

53 2011). Drought-related disasters have been a large increase by changing climate worldwide over

54 the past decades, although with different intensities (Allen et al. 2011). In addition, with respect

55 to the global warming phenomenon, the frequency and effect of drought are gradually growing;

56 hence, it is very important to make a deep study on the drought phenomenon. With the purpose of

57 quantifying drought, many researchers used hydrological and meteorological factors affecting

58 drought to create various drought indices and compare drought frequency, duration, and intensity

59 and its effect on temporal and spatial scales (Marcos-Garcia et al. 2017; Oloruntade et al. 2017).

60 In this regard, several drought indices were introduced as a primary factor for predicting

61 drought such as Palmer Drought Severity Index (PDSI) (Palmer 1965), Standardized Precipitation

62 Index (SPI) (McKee et al. 1993), and Standardized Runoff Index (SRI) (Shukla and Wood, 2008;

63 Paul et al., 2018),Vegetation Drought Response Index (VegDRI) (Brown et al. 2008),

64 Standardized Stream flow Index (SSI) (Shukla and Wood 2008; Vicente-Serrano et al. 65 2010),Standardized Precipitation Evapotranspiration Index (SPEI) (Vicente-Serrano et al. 2010),

66 Multivariate Standardized Drought Index (MSDI) (Hao and AghaKouchak 2013), Gravity

67 Recovery and Climate Experiment (GRACE)derived Drought Severity Index (DSI) (Feng et al.

68 2017; Zhao et al. 2017),water storage deficit index (WSDI) (Sun et al. 2018), Combined

69 Climatologic Deviation Index (CCDI) (Sinha et al. 2019), and Combined Terrestrial

70 Evapotranspiration Index (CTEI) (Dharpure et al. 2020).

71 Among the above-stated drought indices, the SPI, PDSI, and SRI are the most common

72 indicators applied to drought characterization. The SPI uses the long-term precipitation dataset

73 (Wang et al. 2016), while the PDSI considers evaporation, precipitation, soil moisture, runoff,

74 and other variables (Zargar et al. 2011) to estimate long-term drought periods. Besides, the SPEI

75 is a drought indicator based on the PDSI and SPI for expressing meteorological drought. Based

76 on the SPEI, Wang and Chen (2014) evaluated the application of this indicator in China and

77 indicated that it has a good capability to present drought occurrence. Zhao et al. (2017)

78 demonstrated that the SPEI was suitable for monitoring drought in both short- and long-term in

79 China. Rivera et al. (2017) presented that the SSI can precisely show the extreme hydrological

80 drought in the Central , Argentina. Tirivarombo et al, (2018) proved that the SPEI can

81 better describe drought than the SPI in the Kafue basin due to consider the evapotranspiration

82 factor. Dharpure et al.(2020) presented the CTEI by combining the GRACE and meteorological

83 variables such as precipitation (P) and potential evapotranspiration (PET). They indicated that

84 this index has several advantages compared with the other drought indices. For instance, it can

85 decrease simulation, numerical modeling, and parameterization. Also, the CTEI yielded a

86 measurement of climatic parameters and water storage variations to determine hydrological

87 drought occurrences. 88 Predicting drought to decrease the damage made by the drought phenomenon is an

89 inevitable issue. The most commonly utilized prediction models are artificial neural network

90 (ANN) Borji et al., (2016),support vector machines (SVMs) (Ahmad et al. 2010; Weng 2012).

91 Khan et al. (2020)developed wavelet-based hybrid ANN-autoregressive integrated moving

92 average (ARIMA) models for predicting the SPI and the Standard Index of Annual Precipitation

93 (SIAP). They indicated that the proposed model can provide very promising accuracy to predict

94 the drought events. et al. (2020) used three data models (i.e., SVM, random forest (RF), and

95 ARIMA) to predict the SPEI in the Guangzhong area in China. The authors showed that the RF

96 and SVM models have better performance to predict the SPEI than the ARIMA model. Three

97 data-driven models (i.e., SVM, ANN, and k-Nearest Neighbor (KNN)) were investigated for

98 predicting drought over Pakistan (Khan et al., 2020). According to the results, the SVM and ANN

99 were more accurate than the KNN to predict the droughts.

100 The main goal of the present study was to introduce a novel data-driven model for

101 predicting extreme droughts over India based on the Combined Terrestrial Evapotranspiration

102 Index (CTEI). This study employs the CTEI as the drought index because of its incorporation of

103 the GRACE data and meteorological variables (i.e., P and PET), thus better show the drought

104 than the other indices (Dharpure et al. 2020). In this study, two novel models called SMO-SVM

105 and greedy models are utilized for predicting the CTEI, which are not yet employed in

106 developing drought prediction models. Therefore, an efficiency comparison between the CTEI

107 prediction methods developed with the SMO-SVM and greedy methods will make an unrivaled

108 opportunity for specifying their pros and cons. This study is also the first research which

109 implemented the development of India-wide drought prediction models. Consequently, the 110 proposed drought prediction models can help in better managing water resources systems

111 required for India’s agriculture-dependent society.

112

113 2 Material and methods

114 2.1 Study Area

115 The Ganga river basin (GRB) is one of the most populous (about 440 million people)

116 river systems in the World (Anand et al. 2018). The basin is situated in the northern part of the

117 country and lies between latitude 21º 32’ 8.6” - 31º 27’ 36.2” N and longitude 73º 14’ 33.4” - 90º

118 53’ 18.9” E, over an area of 10,86,000 km2 (Fig. 1).The GRB outspreads in four countries, i.e.,

119 India (79%), Nepal (14%), Bangladesh (4%), and China (3%). In India, it covers 8,61,452 km2,

120 which is nearly 26% of the total geographic area of the country(India-WRIS 2012). The GRB

121 originates in the Himalayan Mountains at the snout of the Gangotri glacier at an elevation of

122 ~7000 m a.s.l. The confluence of the Bhagirathi River and Alaknanda Rivers joins in the town of

123 Devprayag, then officially called the Ganga River. The main tributaries of the Ganga River are

124 the Yamuna, the Ramganga, the Gomti, the Ghaghra, the Sone, the Gandak, the Kosi, and the

125 Mahananda. And, it flows for about 2,510 km, generally southeastward, through a vast to

126 the Bay of .

127 The main source of water in the Ganga River is a surface runoff generated by the

128 precipitation (~66%), base flow (~14%), glacier melt (~11.5%), and snowmelt (~8.5%). The

129 GRB received 84% of total rainfall during the monsoon season (June to October). Though, the

130 monsoon season accounts for 75% of the rain in the upper basin and 85% of the rain in the lower

131 basin (Shrestha et al. 2015). The elevation range varies from sea level to the highest mountain

132 peak (~8850 m a.s.l) in the world. 133 2.2 Data used

134 2.2.1 GRACE Terrestrial Water Storage anomalies (TWSA)

135 In this study, the 168 monthly solutions of GRACE data (level-3 RL-05, special

136 harmonics) at 1°×1° spatial resolution were acquired from three research agencies, i.e., Center

137 for Space Research (CSR) at the University of Austin/Texas, NASA Jet Propulsion Laboratory

138 (JPL) and the German Research Centre for Geosciences (GFZ) were used for TWSA changes

139 from January 2003 to December 2016 (14 years) (Table 1). However, 17 months of GRACE

140 solutions were missing during the observation period; therefore, the missing solution of a

141 particular month was replaced by the available solutions of sequent months (before and after)of

142 the missing months (Long et al. 2015; Yang et al. 2017). The GRACE TWSA obtained from

143 three data centre’s solution (JPL, GFZ, and CSR) were averaged for reducing the noise in the

144 gravity field (Sakumura et al. 2014; Xiao et al. 2015). The TWSA includes the groundwater

145 storage (GWSA), soil moisture storage (SMSA), canopy water storage (CWSA), surface water

146 storage (SWSA), and snow water equivalent (SWEA) anomaly, expressed by Eq. 1.

147 (1)

148 2.2.2 GLDAS observation

149 GLDAS is a joint project designed by NASA, the National Oceanic and Atmospheric

150 Administration (NOAA), and the National Centers Environmental Prediction (NCEP) by

151 integrating the hydrological components obtained from the ground and satellite-based observation

152 with fine spatial and temporal resolution (Rodell et al. 2004). The details of the data products are

153 described by Rui and Beaudoing (2018). The GLDAS data comprises four Land Surface models

154 (LSM) data, i.e., the Community Land Model (CLM2.0) (Dai et al. 2003), Variable Infiltration 155 Capacity (VIC) (Liang et al., 1994; Srivastava et al., 2020a, 2020b; Maza et al., 2020), NOAH

156 (Chen et al. 1996; Ek et al. 2003), and Mosaic (Koster and Suarez 1996). The spatial resolution of

157 all the LSMs data is about 1°× 1°. For monthly TWS data through GLDAS, a summation of

158 monthly soil moisture (SM) layer, snow water equivalent (SWE), and the Canopy Water Storage

159 (CWS) from four LSM datasets were used from 2003 to 2016 (Table 1). The average of four

160 LSM datasets was used to estimate the monthly TWS with minimum bias (Yang et al. 2017).

161 Other researchers adopted a similar approach(Ahmed and Abdelmohsen 2018; et al. 2019).

162 None of these LSMs dataset includes groundwater storage and surface water storage (SWS)(Dai

163 et al. 2003; Rodell et al. 2004). We assume that the SWS in the study area likely to be a minor

164 component or small contribution over the ; therefore, we have neglected in this study.

165 Numerous studies neglect the SWS changes for GWS estimation (Strassberg et al. 2007; Rodell

166 et al. 2009; Tiwari et al. 2009). The GLDAS TWSA was converted into anomalies with the same

167 consideration of GRACE data (baseline period of January 2004 to December 2009). The GWSA

168 is obtained by subtracting the model-based from the (Sun et al. 2019)

169 by rearranging Eq. 1. GLDAS has been used in numerous studies to isolate the GWSA from the

170 GRACE derived TWSA in different of the world (Rodell et al. 2007; Leblanc et al. 2009;

171 Tiwari et al. 2009).

172 2.2.3 Tropical Rainfall Measuring Mission (TRMM)

173 The Tropical Rainfall Measuring Mission (TRMM) 3B43monthly research Version 7

174 (TRMM-3B43-V7) precipitation data at 0.25º × 0.25º spatial resolution was used over the Ganga

175 river basins during the period 2003–2016 (Huffman et al. 2018) (Table 1). This product is

176 designed for global precipitation analysis, and its algorithm combines several instruments

177 (Huffman et al. 2007). TRMM precipitation products were widely used worldwide. Several 178 authors have compared the TRMM data with observational data and reported good accuracy (Jia

179 et al. 2011; Yang et al. 2017; Khan et al. 2018).

180 2.2.4 Potential Evapotranspiration (PET)

181 The daily global potential evapotranspiration (PET) with a spatial resolution of 1°× 1°

182 datasets were used from 2003 to 2016 (Table 1). This data is generated from the climate

183 parameter, i.e., extracted from the Global Data Assimilation System (GDAS) analysis fields. The

184 NOAA generates the GDAS data every 6 hours, and it is freely available on the USGS website

185 https://earlywarning.usgs.gov/fews/product/81. The daily PET is calculated on a spatial basis

186 using the Penman-Monteith equation (Allen et al. 2006; Shuttleworth 1992; Srivastava et al.

187 2017, 2018; Kumari and Srivastava 2020). The monthly and yearly PET was obtained using the

188 accumulation of daily data.

189 2.2.5 CTEI Description and Calculation

190 This index was developed by Dharpure et al. (2020) using hydrological and climatological

191 conditions over the Indus, Ganga, and Brahmaputra river basins, and also highlighted that the

192 CTEI was highly correlated with the ground observation wells. The CTEI index was derived

193 using satellite data and hydro-meteorological parameters for the period 2003–2016. They have

194 also revealed that this index is able to quantify the drought occurrence and their severity on

195 spatial as well as temporal scale. This model presents a comprehensive picture to estimate the

196 drought events at regional scale where the ground observation is limited. The normalized

197 distribution of all input variables in modeling CTEI is illustrated in Fig.2. 198 2.3 Methodology

199 2.3.1 SMO-SVM Algorithm

200 One of the most advanced and relatively new machine-learning algorithms in the data-

201 dominated science and research field can be assigned to Support Vector Machine (SVM)

202 algorithms. These SVMs are basically structured on the statistical learning theories and developed

203 from the conceptual optimization hypothesis (Vapnik 1995; Vapnik 1998). It is generally used in

204 order to achieve the best generalization capability for both the empirical relations and confidence

205 intervals of machine learning. These SVMs have always been shown to perform extremely well

206 and efficient for the optimizations and regressions studies (Vapnik et al. 1997; Collobert et al.

207 2001). These are assumed and proven to be highly robust in nature for extremely noise mixed

208 data in comparison to the other local models and algorithms which uses traditional chaotic

209 methods. The SVM estimation function (F) in any given regression scenario can be defined as:

210 2)

211 Where W is the weight age vector; Tf represents the nonlinear transfer function, which projects

212 the input vectors towards a very high dimension feature space; and b is the constant variable. Fig.

213 3 shows the schematic representation of the SVM algorithm.

214 A convex dual optimization problem was introduced by Vapnik (1995) that created an

215 insensitivity loss function. Several algorithms have been developed and suggested for solving

216 these dual optimization issues in the SVM (Shevade et al. 2000; Schölkopf and Smola 2002;

217 Adamala and Srivastava 2018). In this current study, the Sequential Minimal Optimization (SMO)

218 algorithm is used along with the SVM algorithm in order to solve the dual optimization problem.

219 Several studies, such as Platt (1999) and Schölkopf and Smola (2002), have used these SMO in 220 combination with SVMs. One of the major advantages for using SMO is that one can obtain any

221 analytical solution of the subset that can be directly obtained without the help of a quadratic

222 optimizer. Henceforth, all the model parameters of SVMs are trained by using the SMO

223 algorithm in this study. The programming codes from the Library for Support Vector Machines

224 (LIBSVM) are used for the calibration and validation of all the datasets (Chang and Lin 2011).

225 2.3.2 A Greedy Method

226 A greedy strategy is a simple, intuitive algorithmic paradigm that starts optimal solution

227 from the status quo and is used for solving problem-solving heuristic, and repeatedly chooses the

228 ‘superior’ choice from a fixed number of options based on the existing rather than following

229 situations until no alternative is available (Black 2005).There has been a wide existence of this

230 method in the literature for several decades.

231 In the greedy algorithm, there are several input parameters such as positive real numbers

232 , min, < 1, and a function : N × N → N. These parameters , min, and controls the

233 iteration duration for each simulation. The function takes the values and as inputs and sends

234 backs the several numbers of iterations towards the algorithm main stage (Fig.4). In order to

235 provide the best solutions, it is required to run the considerable iterations for maximizing the best

236 fit and its approximation. Mainly there are two steps to carry out this whole process, such as:

237 Initially, we apply a greedy algorithm in order to obtain the attainable solution, which is followed

238 by the destruction phase. In this phase, we eliminate some of the elements from the current

239 solution keeping the partial soluton aside. In the third stage of this process, which is called as

240 construction phase, we implement the algorithm to the partial solution to obtain a different

241 feasible solution.Further, we repeat the second and third stages until the best condition is

242 satisfied. The combination of all variables was implemented as inputs to two models, as shown in 243 Table 2 to select the best combination ofinput variables that has a high correlation with CTEI.

244 The analyzed datasets were divided into two sections: the training period represented 80% of the

245 dataset, and the remaining (20%) was for the testing period.

246 2.4 Statistical analysis

247 The actual data of CTEI and modeled values were compared through the period of this

248 study. To evaluate the accuracy of models, the following statistical indicators have been selected

249 using root mean square error, coefficient of determination, and mean absolute error (Elbeltagi et

250 al. 2020 b, a, c, d). All parameters defined as follows: is an observed or actual value,

251 is simulated or foreseen value, is the mean value of reference samples, and N is

252 the total number of data points.

253 1. Root mean square error

254 RMSE is the sample standard deviation of the differences between foreseen and actual values. It 255 is given by:

256 (3)

257 2. Mean Absolute Error

258 The mean absolute error evaluates the mean magnitude of the errors in a set of forecasts, without

259 considering their sign. It’s an average over the test sample of the absolute differences between

260 foreseen and actual values. It is defined as:

261 (4)

262 3. Relative Absolute Error

263 The relative absolute error takes the total absolute error and normalizes it by dividing by the total

264 absolute error of the simple predictor.

265 (5)

266 4. Root Relative Squared Error

267 The relative squared error takes the total squared error and normalizes it by dividing by the total

268 squared error of the simple predictor. By taking the square root of the relative squared error one

269 reduces the error to the same dimensions as the quantity being predicted.

270 (6)

271 3 Results and Discussions

272 The descriptive statistics were analyzed for the data collected over the studied years, as

273 shown in Table 3. This table showed that the SD had the highest value of 32.74 at SWN and the

274 lowest value of 0.33 at WS. As mentioned before, 13 input variables comprised of GRACE TWSA,

275 GLDAS TWSA, GWSA, P, LST, WS, ET, Runoff, Air Temp, NR, SWN, LWN, and PET were

276 implemented for prediction of CTEI. Among available all influence inputs, 13 different

277 combinations were examined using two soft computing models, namely a Greedy algorithm and

278 SMO-SVM models. Table 4 demonstrates the performance metrics of the Greedy model, based

279 on all datasets, for all combinations. The result clearly shows that Combo3 (R=0.77, RMSE=0.49,

280 MAE=0.38, and RAE=63.23%) and Combo2(R=0.77, RMSE=0.5, MAE=0.37, and 281 RAE=63.63%) by having highest correlation coefficient and lowest error metrics are identified as

282 the superior input combination for prediction of CTEI, respectively. Besides, the Combo13

283 (R=0.51 and RMSE=0.67) has the worth performance among all combinations. The statistical

284 criteria for the SMO-SVM model are listed in Table 5. It is crystal clear that Combo2 and

285 Combo3 have promising predictive performance in term of (R=0.81, RMSE=0.46, MAE=0.35

286 and RAE=57.25%) and (R=0.80, RMSE=0.46, MAE=0.34 and RAE=59.27%), respectively. The

287 statistical measures for training and testing phases are tabulated in Tables A and B in the

288 appendix. Fig. 5 depicted the comparison between Greedy and SMO-SVM models in terms of

289 five considered metrics for all combinations. A closer look at the bar plot of metrics reveals that

290 the best performance has occurred in Combo 2 and 3 for both the AI-based model, and it can be

291 included that SMO-SVM is superior to the Greedy model in estimating CTEI. Fig. 6 shows the

292 scatter plots of all combinations for Greedy and SMO-SVM models in both training and testing

293 phases. The distribution of the predicted points around line 1:1 demonstrates that for both Greedy

294 and SMO-SVM models, Combo2 and Combo3 have more consistency with the measured value

295 of CTEI. Although, the Combos 2, 3, and even Combo 4 of the SMO-SVM model perform better

296 than Combo 2 and Combo 3 of the greedy model. The results obtained generated better

297 performance compared to the findings of El Ibrahimi and Baali (2018), who applied SVM for

298 predicting standardized precipitation index (SPI) and found that the determination coefficients

299 (R2) ranged from 0.52 to 0.77 for training and testing periods. As well, the developed models

300 achieved an acceptable result than Dikshit et al. (2020), who stated that the highest correlation

301 coefficient for modeling Standard Precipitation Evaporation Index (SPEI) was 0.75 by using

302 SVM. In addition, These results are in agreement with those from Borji et al., (2016), who

303 simulated the stream flow drought index (SDI)of different multi timescales and indicated that R2

304 ranged from 0.65 to 0.99. Also, SVM has been used for predicting drought based on SPEI by Deo 305 et al. (2018) and found that R2 ranged from 0.52 to 0.98. Moreover, these findings were extremely

306 an acceptable and agree with those suggested by Fung et al., (2019), who applied an improved

307 SVM for estimation of SPEI-1, SPEI-3 and SPEI-6, at various timescales and their findings

308 resulted that R2 ranged from 0.83 to 0.95 for training and from 0.79 to 0.90 for validation periods.

309 Furthermore, our results are similar to the results of (Komasi et al. 2018), who found that SVM

310 algorithm-generated R2 of 0.86 for modeling SPI. Besides, these results coincide with the

311 observations of Shamshirband et al.(2020), who found that R2 varied from 0.66 to 0.92 for

312 prediction of SPEI, and varied from 0.67 to 0.90 for prediction of SPI during six-time delays.

313 Moreover, the model finding were similar to those revealed by Zhang et al., (2019), who forecast

314 the drought by applying SVM and illustrated that the R2 between SPEI calculated versus

315 observed values were 0.827 to 0.833.

316 To get a better evaluation, visualization was applied by using a violin plot for each

317 combination of models in Fig.7. Considering the shape of the violin plots, it can be concluded

318 that for Greedy model the Combos 1, 2, and 3 have the closed predictive performance in

319 predicting CTEI and Combo7 stand in forth rank. Also, in SMO-SVM the Combo 1, 2, 3, and 4

320 can yield promising results in comparison other combination. Moreover, the Taylor diagrams for

321 testing phase are depicted in Fig.5 for two considered model. Comparison between cut come of

322 two model prove that Combo2, 3 and 4 in SMO-SVM and Combo2 and Combo3 in Greedy

323 model due to having minimal physical distance to target point are identified as the best

324 combination for prediction of CTEI. Hereafter, Combos 2 and 3 form two models are examined

325 for error analysis to introduce the most efficient and accurate predictive model.

326 Fig.8 depicted the violin distribution function (Left) and diffusion plots (Right) of

327 relative error for the best selected combination of Greedy and SMO-SVM model in testing phase. 328 The error analysis the best selection combinations in testing phase implies that optimum relative

329 error range for Greedy model is belong to Comb2 with ( 405 Er  680 ) and for SMO-SVM

330 model is owing to the Combo2 with ( 627 Er  788 ).Besides, the attained result demonstrated

331 that the SMO-SVM (Combo2, R=0.81) in spite of having higher range of relative error is superior

332 to the Greedy model (Combo3, R=0.77) for estimation of CTEI.

333 Eventually, Fig.9 shows the cumulative frequency of relative percentage absolute error

334 (RPAE) for all best combination in both AI models in tree scales comprised of all datasets,

335 training and testing stages. According to the Fig.10, more than 60% of all data points for Greedy-

336 Combo2, Greedy-Combo3, SMO-SVM-Combo2, and SMO-SVM-Combo3 have the RPAE less

337 than79%, 75.35%, 64.23%, and 67.2, respectively. Furthermore, for 90% of all data points,

338 Greedy and SMO-SVM models can predict CTEI by the RPAE less than356.36, 356.36, 355, and

339 356.25 for second and third combinations, respectively. The cumulative frequency of RPAE for

340 training dataset (80% of all datasets) shows that the best selection input combinations of SMO-

341 SVM model are fairly promising approaches for prediction of CTEI. So, the performed error

342 analysis proves that Combo 2 and Combo 3 owing to SMO-SVM model are superior to the best

343 combinations of Greedy model. However, due to the close accuracy of the performance of the

344 Combo2 and Combo3 can be inferred that the Combo3 with less input variable (11 influence

345 variables) is considered as the first priority model for prediction of CTEI. It is worth noting that

346 the entire selected alternative AI models for range of ( 0.5 CTEI  0.5 ) cannot achieve to the

347 high accuracy for prediction of CTEI. 348 4 Conclusion

349 Assessment of drought is one of the most important tasks in the present condition as it

350 leads to several adverse effects on the soil-water-atmosphere cycles of the earth systems across

351 the different climate of the world. There exists a several techniques and machine learning

352 methods in the literature to quantify drought; however based on CTEI this is one of the unique

353 study that compares and contrast the relative role of improve SMO algorithm for support vector

354 machine and a greedy linear method. Thirteen input variables were implemented for prediction of

355 CTEI using an improved SMO algorithm for support vector machine and a greedy linear

356 regression method. Several combinations of each method were made to determine the best fit by

357 using the statistical indicators and error measures. Based on the results of greedy model, we

358 found that Combo3 and Combo2 respectively yield the highest performance than other combos

359 with having correlation coefficient, R=0.77, RMSE=0.49, MAE=0.38 and RAE=63.23%) and

360 (R=0.77, RMSE=0.5, MAE=0.37 and RAE=63.63%). These two combos have the highest value

361 of R and lowest error metrics and it suggested as the superior input combination for prediction of

362 CTEI, over IGB river basin. On the other hand, the least efficiency was shown by combo13 with

363 R = 0.51 and RMSE=57.25% among all other 13 combinations. When coming towards the

364 results of SMO-SVM mode, combo2 and combo2 were having highest performance with R=0.81,

365 RMSE=0.46, MAE=0.35 and RAE=57.25%) and (R=0.80, RMSE=0.46, MAE=0.34 and

366 RAE=59.27% respectively. These combinations are similar to the greedy model but SMO-SVM

367 has slight better performance. Among all the combos, the combo2 and combo3 are well suited for

368 the estimation of CTEI over IGB using both the soft computing techniques. Further, in support to

369 other error measure the RPAE also showed that combo2 and combo3 of Greedy and SMO-SMV

370 model performed better with having less than 79%, 75.35%, 64.23%, and 67.2, respectively. 371 Apart from this in the Greedy model the Combos 3 also has the closed predictive performance in

372 predicting CTEI and followed by Combo7. On the other hand, for SMO-SVM Combo 3 and 4

373 also yield promising results in comparison to rest of the other combinations. The cumulative

374 frequency of RPAE for training dataset (80% of all datasets) shows that the best selection input

375 combinations of SMO-SVM model are fairly promising approaches for prediction of CTEI. So,

376 the perform error analysis proves that Combo 2 and Combo 3 owing to SMO-SVM model are

377 superior to the best combinations of Greedy model and can be one of the promising alternatives

378 for the estimation of CTEI over large river basins of different hydro-climatic regions of the world.

379

380 Acknowledgments: Thanks to the National Centre for Polar and Ocean Research, Goa, India,

381 and Indian Institute of Technology, Roorkee, India

382 Authors Contribution

383 Ahmed Elbeltagi had the original idea of the research. Ahmed Elbeltagi, Jaydeo K. Dharpure,

384 Mehdi Jamei, Ankur Srivastava, Iman Ahmadianfar: Conceptualization, Formal analysis, Writing

385 - original draft, Writing - review & editing. Mahfuzur Rahman, Harun I. Gitari: Writing- review

386 & editing. All authors read and accept the manuscript.

387 Funding: This research did not receive any specific grant from funding agencies in the public, 388 commercial, or not-for-profit sectors.

389 Data Availability Statements: The datasets generated during and/or analysed during the current 390 study are available from the corresponding author on reasonable request.

391 Compliance with ethical standards 392 Ethics approval: The authors confirm that this article is original research and has not been 393 published or presented previously in any journal or conference in any language (in whole or in 394 part).

395 Conflict of interest: The authors declare that they have no conflict of interest

396 Consent to participate and consent to publish: The authors declare that they have consent to 397 participate and consent to publish

398

399 References

400 Adamala S, Srivastava A (2018) Comparative evaluation of daily evapotranspiration using

401 artificial neural network and variable infiltration capacity models. Agricultural Engineering

402 International: CIGR Journal, J 20(1):32–39

403 Ahmad S, Kalra A, Stephen H (2010) Estimating soil moisture using remote sensing data: A

404 machine learning approach. Adv Water Resour 33:69–80

405 Ahmed M, Abdelmohsen K (2018) Quantifying Modern Recharge and Depletion Rates of the

406 Nubian Aquifer in Egypt. Surv Geophys 39:729–751. doi: 10.1007/s10712-018-9465-3

407 Allen PM, Harmel RD, Dunbar JA, Arnold JG (2011) Upland contribution of sediment and

408 runoff during extreme drought: A study of the 1947-1956 drought in the Blackland Prairie,

409 Texas. J Hydrol 407:1–11. doi: 10.1016/j.jhydrol.2011.04.039

410 Allen RG, Perieira LS, Raes D, Smith M (2006) and Drainage Paper Crop No. 56

411 Anand J, Gosain AK, Khosa R, Srinivasan R (2018) Regional scale hydrologic modeling for

412 prediction of water balance, analysis of trends in streamflow and variations in streamflow: 413 The case study of the Ganga River basin. J Hydrol Reg Stud 16:32–53. doi:

414 10.1016/j.ejrh.2018.02.007

415 Borji M, Malekian A, Salajegheh A, Ghadimi M (2016) Multi-time-scale analysis of hydrological

416 drought forecasting using support vector regression (SVR) and artificial neural networks

417 (ANN). Arab J Geosci 9:. doi: 10.1007/s12517-016-2750-x

418 Brown JF, Wardlow BD, Tadesse T, et al (2008) The Vegetation Drought Response Index

419 (VegDRI): A New Integrated Approach for Monitoring Drought Stress in Vegetation.

420 GIScience Remote Sens 45:16–46. doi: 10.2747/1548-1603.45.1.16

421 Chen F, Mitchell K, Schaake J, et al (1996) Modeling of land surface evaporation by four

422 schemes and comparison with FIFE observations. J Geophys Res Atmos 101:7251–7268.

423 doi: 10.1029/95JD02165

424 Dai Y, Zeng X, Dickinson RE, et al (2003) The common land model. Bull Am Meteorol Soc

425 84:1013–1023. doi: 10.1175/BAMS-84-8-1013

426 Deo RC, Salcedo-Sanz S, Carro-Calvo L, Saavedra-Moreno B (2018) Drought prediction with

427 standardized precipitation and evapotranspiration index and support vector regression

428 models. Elsevier Inc.

429 Dharpure JK, Goswami A, Patel A, et al (2020) Drought characterization using the Combined

430 Terrestrial Evapotranspiration Index over the Indus, Ganga and Brahmaputra river basins.

431 Geocarto Int 0:1–25. doi: 10.1080/10106049.2020.1756462

432 Dikshit A, Pradhan B, Alamri AM (2020) Temporal hydrological drought index forecasting for

433 New South Wales, Australia using machine learning approaches. Atmosphere (Basel) 11:1– 434 13. doi: 10.3390/atmos11060585

435 Ek MB, Mitchell KE, Y. L, et al (2003) Implementation of Noah land surface model advances in

436 the National Centers for Environmental Prediction operational mesoscale Eta model. J

437 Geophys Res 108:8851. doi: 10.1029/2002JD003296

438 El Ibrahimi A, Baali A (2018) Application of several artificial intelligence models for forecasting

439 meteorological drought using the standardized precipitation index in the Saïss Plain

440 (Northern Morocco). Int J Intell Eng Syst 11:267–275. doi: 10.22266/ijies2018.0228.28

441 Elbeltagi A, Deng J, Wang K, et al (2020a) Modeling long-term dynamics of crop

442 evapotranspiration using deep learning in a semi-arid environment. Agric Water Manag

443 241:106334. doi: 10.1016/j.agwat.2020.106334

444 Elbeltagi A, Deng J, Wang K, Hong Y (2020b) Crop Water footprint estimation and modeling

445 using an arti fi cial neural network approach in the Nile Delta , Egypt. Agric Water Manag

446 235:106080. doi: 10.1016/j.agwat.2020.106080

447 Elbeltagi A, Zhang L, Deng J, et al (2020c) Modeling monthly crop coefficients of maize based

448 on limited meteorological data : A case study in Nile Delta , Egypt. Comput Electron Agric

449 173:105368. doi: 10.1016/j.compag.2020.105368

450 Feng Y, Cui N, Gong D, et al (2017) Evaluation of random forests and generalized regression

451 neural networks for daily reference evapotranspiration modelling. Agric Water Manag

452 193:163–173. doi: 10.1016/j.agwat.2017.08.003

453 Fung KF, Huang YF, Koo CH, Mirzaei M (2019) Improved SVR machine learning models for

454 agricultural drought prediction at downstream of Langat River Basin, Malaysia. J Water 455 Clim Chang 1–16. doi: 10.2166/wcc.2019.295

456 Hao Z, AghaKouchak A (2013) A Nonparametric Multivariate Multi-Index Drought Monitoring

457 Framework. J Hydrometeorol 15:89–101. doi: 10.1175/jhm-d-12-0160.1

458 Huffman GJ, Bolvin DT, Braithwaite D, et al (2018) Algorithm Theoretical Basis Document

459 (ATBD) Version 5.2 NASA - NASA Global Precipitation Measurement (GPM) Integrated

460 Multi-satellitE Retrievals for GPM (IMERG) - Algorithm Theoretical Basis Document

461 (ATBD) Version 5.2. IMERG Algorithm Theor Basis Doc IMERG Algo:1–31. doi:

462 https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-

463 imerg-algorithm-theoretical-basis-

464 Huffman GJ, Bolvin DT, Nelkin EJ, et al (2007) The TRMM Multisatellite Precipitation Analysis

465 (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales.

466 J Hydrometeorol 8:38–55. doi: 10.1175/JHM560.1

467 India-WRIS (2012) River basin atlas of India. RRSC-West, NRSC, ISRO, Jodhpur, India

468 Jia S, Zhu W, A, Yan T (2011) A statistical spatial downscaling algorithm of TRMM

469 precipitation based on NDVI and DEM in the Qaidam Basin of China. Remote Sens Environ

470 115:3069–3079. doi: 10.1016/j.rse.2011.06.009

471 Khan A, Koch M, Chinchilla K (2018) Evaluation of Gridded Multi-Satellite Precipitation

472 Estimation (TRMM-3B42-V7) Performance in the Upper Indus Basin (UIB). Climate 6:76.

473 doi: 10.3390/cli6030076

474 Khan MMH, Muhammad NS, El-Shafie A (2020a) Wavelet based hybrid ANN-ARIMA models

475 for meteorological drought forecasting. J Hydrol 590:125380 476 Khan N, Sachindra DA, Shahid S, et al (2020b) Prediction of droughts over Pakistan using

477 machine learning algorithms. Adv Water Resour 139:103562

478 Komasi M, Sharghi S, Safavi HR (2018) Wavelet and cuckoo search-support vector machine

479 conjugation for drought forecasting using Standardized Precipitation Index (case study:

480 Urmia Lake, Iran). J Hydroinformatics 20:975–988. doi: 10.2166/hydro.2018.115

481 Koster RD, Suarez MJ (1996) Energy and Water Balance Calculations in the Mosaic LSM.

482 NASA Tech Memo 9:76

483 Leblanc MJ, Tregoning P, Ramillien G, et al (2009) Basin-scale, integrated observations of the

484 early 21st century multiyear drought in Southeast Australia. Water Resour Res 45:1–10. doi:

485 10.1029/2008WR007333

486 Li J, Zhang S, Huang L, et al (2020) Drought prediction models driven by meteorological and

487 remote sensing data in Guanzhong Area, China. Hydrol Res 1–17. doi: 10.2166/nh.2020.184

488 Liang X, Lettenmaier DP, Wood EF, Burges SJ (1994) A simple hydrologically based model of

489 land surface water and energy fluxes for general circulation models. J Geophys Res

490 99:14415. doi: 10.1029/94JD00483

491 Long D, Yang Y, Wada Y, et al (2015) Deriving scaling factors using a global hydrological

492 model to restore GRACE total water storage changes for China’s River Basin.

493 Remote Sens Environ 168:177–193. doi: 10.1016/j.rse.2015.07.003

494 Malone BP, Styc Q, Minasny B, McBratney AB (2017) Digital soil mapping of soil carbon at the

495 farm scale: A spatial downscaling approach in consideration of measured and uncertain data.

496 Geoderma 290:91–99. doi: 10.1016/j.geoderma.2016.12.008 497 Marcos-Garcia P, Lopez-Nicolas A, Pulido-Velazquez M (2017) Combined use of relative

498 drought indices to analyze climate change impact on meteorological and hydrological

499 droughts in a . J Hydrol 554:292–305

500 McKee TB, Doesken NJ, Kleist J (1993) The relationship of drought frequency and duration to

501 time scales. In: Proceedings of the 8th Conference on Applied Climatology. Boston, pp 179–

502 183

503 Mo KC (2011) Drought onset and recovery over the United States. J Geophys Res Atmos 116:

504 Oloruntade AJ, Mohammad TA, Ghazali AH, Wayayok A (2017) Analysis of meteorological and

505 hydrological droughts in the Niger-South Basin, Nigeria. Glob Planet Change 155:225–233

506 Palmer WC (1965) Meteorological droughts. US Department of Commerce Weather Bureau

507 Research Paper 45, 58 pp

508 Rivera JA, Penalba OC, Villalba R, Araneo DC (2017) Spatio-temporal patterns of the 2010–

509 2015 extreme hydrological drought across the Central Andes, Argentina. Water 9:652

510 Rodell M, Chen J, Kato H, et al (2007) Estimating groundwater storage changes in the

511 Mississippi River basin (USA) using GRACE. Hydrogeol J 15:159–166. doi:

512 10.1007/s10040-006-0103-7

513 Rodell M, Houser PR, Jambor U, et al (2004) The Global Land Data Assimilation System. Am

514 Meteorol Soc

515 Rodell M, Velicogna I, Famiglietti JS (2009) Satellite-based estimates of groundwater depletion

516 in India. Nature 460:999–1002. doi: 10.1038/nature08238 517 Rui H, Beaudoing HK (2018) README Document for GLDAS Version 2 Data Products.

518 Goddard Earth Sci Data Inf Serv Cent (GES DISC) 1–32. doi:

519 http://hydro1.sci.gsfc.nasa.gov/

520 data/s4pa/GLDAS/GLDAS_NOAH10_M.2.0/doc/README_ GLDAS2.pdf

521 Sakumura C, Bettadpur S, Bruinsma S (2014) Ensemble prediction and intercomparison analysis

522 of GRACE time-variable gravity field models. Geophys Res Lett 41:1389–1397. doi:

523 10.1002/2013GL058632

524 Shamshirband S, Hashemi S, Salimi H, et al (2020) Predicting Standardized Streamflow index for

525 hydrological drought using machine learning models. Eng Appl Comput Fluid Mech

526 14:339–350. doi: 10.1080/19942060.2020.1715844

527 Shrestha AB, Agrawal NK, Alfthan B, et al (2015) The Himalayan Climate and Water Atlas

528 Shukla S, Wood AW (2008) Use of a standardized runoff index for characterizing hydrologic

529 drought. Geophys Res Lett 35:1–7. doi: 10.1029/2007GL032487

530 Shuttlewor WJ (1992) Evaporation

531 Sinha D, Syed TH, Reager JT (2019) Utilizing combined deviations of precipitation and

532 GRACE-based terrestrial water storage as a metric for drought characterization: A case

533 study over major Indian river basins. J Hydrol 572:294–307. doi:

534 10.1016/j.jhydrol.2019.02.053

535 Strassberg G, Scanlon BR, Rodell M (2007) Comparison of seasonal terrestrial water storage

536 variations from GRACE with groundwater-level measurements from the High

537 Aquifer (USA). Geophys Res Lett 34:1–5. doi: 10.1029/2007GL030139 538 Sun AY, Scanlon BR, Zhang Z, et al (2019) Combining Physically Based Modeling and Deep

539 Learning for Fusing GRACE Satellite Data: Can We Learn From Mismatch? Water Resour

540 Res 1179–1195. doi: 10.1029/2018WR023333

541 Sun Z, Zhu X, Pan Y, et al (2018) Drought evaluation using the GRACE terrestrial water storage

542 deficit over the Yangtze River Basin, China. Sci Total Environ 634:727–738. doi:

543 10.1016/j.scitotenv.2018.03.292

544 Tirivarombo S, Osupile D, Eliasson P (2018) Drought monitoring and analysis: standardised

545 precipitation evapotranspiration index (SPEI) and standardised precipitation index (SPI).

546 Phys Chem Earth, Parts A/B/C 106:1–10

547 Tiwari VM, Wahr J, Swenson S (2009) Dwindling groundwater resources in northern India, from

548 satellite gravity observations. Geophys Res Lett 36:1–5. doi: 10.1029/2009GL039401

549 Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive

550 to global warming: the standardized precipitation evapotranspiration index. J Clim 23:1696–

551 1718

552 WANG L, CHEN W (2014) Applicability analysis of standardized precipitation

553 evapotranspiration index in drought monitoring in China. Plateau Meteorol 33:423–431

554 Wang Y, Li J, Feng P, Hu R (2016) Analysis of drought characteristics over Luanhe River basin

555 using the joint deficit index. J Water Clim Chang 7:340–352

556 Weng Q (2012) Remote sensing of impervious surfaces in the urban areas: Requirements,

557 methods, and trends. Remote Sens Environ 117:34–49

558 Wilhite DA (2000) Drought as a natural hazard: concepts and definitions 559 Wu Q, Si B, He H, Wu P (2019) Determining Regional-Scale Groundwater Recharge with

560 GRACE and GLDAS. Remote Sens 11:154. doi: 10.3390/rs11020154

561 Xiao R, He X, Zhang Y, et al (2015) Monitoring groundwater variations from satellite gravimetry

562 and hydrological models: A comparison with in-situ measurements in the mid-atlantic

563 region of the United States. Remote Sens 7:686–703. doi: 10.3390/rs70100686

564 Yang P, Xia J, Zhan C, et al (2017) Monitoring the spatio-temporal changes of terrestrial water

565 storage using GRACE data in the Tarim River basin between 2002 and 2015. Sci Total

566 Environ 595:218–228. doi: 10.1016/j.scitotenv.2017.03.268

567 Zargar A, Sadiq R, Naser B, Khan FI (2011) A review of drought indices. Environ Rev 19:333–

568 349

569 Zhang Y, Yang H, Cui H, Chen Q (2019) Comparison of the Ability of ARIMA, WNN and SVM

570 Models for Drought Forecasting in the , China. Nat Resour Res 29:1447–1464.

571 doi: 10.1007/s11053-019-09512-6

572 Zhao M, Geruo A, Velicogna I, Kimball JS (2017) A Global Gridded Dataset of GRACE

573 Drought Severity Index for 2002–14: Comparison with PDSI and SPEI and a Case Study of

574 the Australia Millennium Drought. J Hydrometeorol 18:2117–2129. doi: 10.1175/JHM-D-

575 16-0182.1

576 Figures

Figure 1

Location map of the Ganga river basin and its tributaries along with varying elevation.

Figure 2 The normalized distribution of all input variables in modeling CTEI

Figure 3

A schematic daigram to explain the SVM algorithm (adapted from Gong et al., 2016) Figure 4

A schematic daigram to explain the Greedy algorithm (adapted from Cerrone et al., 2017)

Figure 5

The performance metrics bar plots for greedy and SVM-SMO models Figure 6

The scatter plots of all combinations for Greedy and SMO-SVM models (part 1) Figure 7

The comparison between the distribution function of measured and predicted CTEI in models for all datasets Figure 8

The Taylor diagram of all combination for Greedy (Up) and SMO-SVM (Down) models in testing phase Figure 9 the relative error (%) distribution function (Left) and measured CTEI versus relative error (%) (Right) for greedy and SMO-SVM model in superior combinations for testing phase Figure 10

The cumulative frequency of relative absolute error (%) for superior predictive model for index of CTEI

Supplementary Files

This is a list of supplementary les associated with this preprint. Click to download.

Tables.docx