www.nature.com/scientificreports

OPEN Spatial regression and geostatistics discourse with empirical application to precipitation data in Oluyemi A. Okunlola1, Mohannad Alobid2*, Olusanya E. Olubusoye3, Kayode Ayinde4, Adewale F. Lukman1 & István Szűcs2

In this study, we propose a robust approach to handling geo-referenced data and discuss its statistical analysis. The linear regression model has been found inappropriate in this type of study. This motivates us to redefne its error structure to incorporate the spatial components inherent in the data into the model. Therefore, four spatial models emanated from the re-defnition of the error structure. We ftted the spatial and the non-spatial linear model to the precipitation data and compared their results. All the spatial models outperformed the non-spatial model. The Spatial Autoregressive with additional autoregressive error structure (SARAR) model is the most adequate among the spatial models. Furthermore, we identifed the hot and cold spot locations of precipitation and their spatial distribution in the study area.

Te ordinary least squared regression (OLS) has become a household name in many disciplines, especially when there is a need to investigate the cause and efect relationship between a response variable and one or more ­covariates1 . However, the reliability of OLS results depends on certain assumptions commonly called the “Gauss Assumption”. One of the stringent of these assumptions is that the error terms in the model should be independent. Te violation of this assumption in the classical regression makes the inference on the coefcient to be invalid due to infated standard error. In real-life situations, this assumption of OLS is not attainable because observations located in space are related to their nearby units­ 2. Te quest for a new framework that accounts for dependence structure in the data to fll the vacuum in the classical regression led to spatial statistics. Tis study is motivated to discuss a simplifed approach that accounts for spatial dependence in the regression model, illustrates spatial regression analysis, and applies the technique to investigate a linear relationship between precipitation and its likely predictors, namely northing, easting, and elevation­ 3,4. An eminent technique in spatial statistics is the model with spatially autoregressive factors either in the dependent variable or the error term. Trough a Monte Carlo experiment­ 5, used this model type to investigate the unbiasedness and consistency property of the model as against Ordinary Least Square (OLS). Asymptoti- cally, the authors found that OLS and spatial model converged when the spatial efect parameter is negligible. Te necessity of a model that includes spatial efect is a new development in geography; however, it has been widely applied in many other felds in recent years. In climatology, statistics cannot be over-emphasized and mathematical statistics is a viable tool with wide application in climatology research­ 6. Tey also reported that climatology, to a large degree, is studying the statistics of climate and have been described using several adjectives depending upon whether they defne relationships in time (serial correlation, lagged correlation), space (spatial correlation, tele-connection), or between diferent climate variables (cross-correlation)6. It is a known fact that many felds of interest in climate experiments exhibit substantial spatial correlation. Te spatial autocorrelation inherent in the data can be addressed by spatial statistics and other related approaches­ 7. Precipitation/rainfall is the climate variable that has been widely studied more than other climatic variables. It will continue to receive the interest of researchers as the ongoing process of global warming persistent, especially

1Department of Mathematical and Computer Sciences, University of Medical Sciences, Ondo City, , Nigeria. 2Faculty of Economics and Business, Institute of Applied Economic Sciences, University of Debrecen, 4032 Debrecen, Hungary. 3Department of Statistics, University of , Ibadan, , Nigeria. 4Department of Statistics, Federal University of Technology, , Ondo State, Nigeria. *email: mohannad.alobid@ econ.unideb.hu

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 1 Vol.:(0123456789) www.nature.com/scientificreports/

in the developing countries that are prone to climate change. Te instability of climate is a threat to agricultural products, especially where there is dependence on agriculture for good livelihood. Most importantly, the subsist- ence with poor irrigation becomes unbearable­ 8. It is needless to argue that every facet of human life is connected to precipitation and its variability, seasonality and extremity has a lot of consequences on humans and health of the ­plants9,10 . Excess precipitation can result in fooding, damage of structures, roadways, building, pollution of surface and groundwater­ 11,12. Researchers had made several attempts to determine the predictors of precipitation using several statistical methodologies in developed and developing countries. Accordingly, they established that precipitation increases with an increase in elevation, especially when used as a single predictor to enhance the precipitation ­patterns13–18. Precipitation is a complex phenomenon that is afected by many factors depending on geographical and topo- graphical settings. In the geographical sense, reports show that the distribution of precipitation depends on slope, exposure, orientation and other derivatives of ­elevation19,20. Similarly, regional topographic variables such as distance to the Mediterranean, characterization of the general shape of the Alps, distance to corresponding features of the Alps were found to infuenced heavy rains while local measures of topography (e.g. altitude, slope, or azimuth) were less infuential­ 14. A strong and positive relationship exists between the considered variables and precipitation because of the orographic efect of the mountain ­terrain17. A study conducted in Kelantan state, Malaysia using multiple linear regression to determine dominant pre- dictors of precipitation among easting, northing, elevation, slope and wind speed, showed that easting, northing and wind speed were the dominant predictors of ­precipitation3. Studies on precipitation or rainfall modelling in Nigeria is rare. Most authors focused on predicting precipi- tation or rainfall using regression and artifcial neural network­ 21. Compared quadratic and Poisson regression with artifcial models (multilayer feed-forward neural network, cascade feed-forward neural network, and radial basis neural networks to predict monthly rainfall in Jigawa State, Nigeria using average temperature, minimum temperature, maximum temperature, relative humidity, sunshine duration, solar radiation as predictors. Tey reported that both quadratic and Poisson regression performed better than the artifcial ­models22. Compared the performance of linear regression and artifcial neural network (ANN) to ensure reliable prediction of monthly rainfall, , Nigeria. Tey used the dataset that covered thirty-seven (37) years (1981–2017) and was collected from Kano meteorological station. Southern Oscillation Index (SOI); Niño1 + 2, Niño3, Niño3.4 and Niño4 which are climatic indices commonly used in monitoring El Niño–Southern Oscillation (ENSO) were used as the predictors in both the linear regression and ANN. Tey considered climate indices used for monitoring namely; Southern Oscillation Index (SOI), as the predictors. Tis study showed that ANN had a predictive power that was higher than the linear model and they recommended that ANN should be used with ENSO indices in the prediction of monthly rainfall for the study area­ 23. Used data obtained from the archives of the Nigerian Mete- orological Agency (NIMET) for seasonal rainfall prediction in State, Nigeria. Tey made used of monthly means of Sea Surface Temperature (°C), Air Temperature (°C), Specifc Humidity, Relative Humidity (%) and Uwind (m/s) at surface diferent pressure levels, 750hpa, 800hpa, 1000hpa) from January to May for a period of 32 years (1986–2017) as predictors and they buttressed the fndings of Ahmad and Mustapha (2018) that ANN had superior predictive power than multiple linear regression model. ­Similarly24, developed a model using ANN for the prediction of precipitation and evapotranspiration. Te predictors considered in the model were a combi- nation of some large-scale climate indices (El Nino Southern Oscillation (ENSO) and North Atlantic Oscillation (NAO)) and meteorological variables (average air temperature, maximum temperature, minimum temperature, mean speed, mean solar radiation, sunshine hours). Tey alluded to the fact that the meteorological variables and climatic indices were important in the prediction of standardized precipitation and evapotranspiration. Despite the increasing attention on precipitation and inherent spatial correlation problems, limited study has employed spatial statistics analysis and regression modelling. Some studies, discussed precipitation map- ping and spatial–temporal analysis using the time series approach with no attention to spatial regression ­representation25–27. Te current study takes of the existing works on precipitation and extends the scope using various explora- tory data analyses and spatial regression models. Hence, this study proposes a robust approach to handling geo- referenced data and discuss its statistical analysis. It is hypothesized that the spatial models will provide a better ft than the OLS. Precipitation is used as a function of easting, northing, and elevation to verify the statement of the hypothesis. Tese predictors are selected based on the pieces of literature and the availability of data. It is expected that the study will ofer salient information on the distribution of precipitation regarding the location in space and provide a guide for an informed decision on water management planning for agriculture and other purposes. Spatial data concept and model formulation Te conventional non-spatial sample of n independent observations yi, i = 1, ..., n that is linearly related to matrix X is known to have a data generating process (DGP) of the form:

yi = Xiβ + ui 2 ui ∼ N 0, σ (1) i = 1  � �  Tis specifcation indicates that each observation has an underlying mean of Xiβ and a random component µi . From the classical point of view, for a situation where i represent regions or points in space the observed val- ues at one location (or region) are independent. Alternatively, statistically, independent observations imply that E uiuj = E(ui)E uj = 0 . Te assumption of independence greatly simplifes models but in spatial contexts, this simplifcation seems unattainable.  

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 2 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 1. An illustration of the contiguity-based neighbourhood.

Region Neighbours 1 2 2 1,3 3 2,4,5 4 3,5 5 3,4

Table 1. A fve regions queen contiguity relation.

Conversely, “spatial dependence refects a situation where values observed at one location or region depend on the values of neighbouring observations at nearby locations”. In this case, if observations i = 1 and j = 2 represent neighbours (perhaps regions with borders that touch), then there will be a situation which suggests a simultaneous data generating process of the form:

yi = αiyj + Xiβ + ui yj = αjyi + Xjβ + uj 2 ui ∼ N 0, σ , i = 1  (2) u ∼ N 0, σ 2 , j = 2  j � �  Tis assertion emanates from the fact that the value� assumed� by yi depends on that of yj and vice versa. Te very notion of spatial dependence indicates the need to ascertain which other units in the spatial system have an impact on the particular unit under concern. Properly, this is conveyed in the topological notions of a neighbourhood. Tis quantifcation of the locational aspect of our sample data can be done in several ways. Te contiguity based neighbourhood such as rook (common side), bishop (common vertex) or queen (common side or vertex) is a common form of representation. Figure 1 illustrates the defnition of the various contiguity-based neighbourhood between sites siandsj while Table 1 gives a queen contiguity relation among the fve regions. From Table 1, region 2 is a neighbour to region 1 and by the symmetric property, region 1 must be a neigh- bour to region 2. Similarly, regions: 1 and 3; 2, 4 and 5; 3 and 5; and 3 and 4 are neighbours to region 2, 3, 4 and 5, respectively. Tese give rise to the spatial weight matrix, W which refects the frst-order contiguity relation among the fve regions. Te W is expressed as: 01000 10100 W =  01011 00101    00110     Te W is symmetric, and it has zeros on the main diagonal. Tis is done to prevent a unit from being a neighbour to itself. Te spatial weights matrix is row-standardized to have row-sums of unity and produce a spatially weighted average term Wy of the dependent variable in the spatial lag model. Consequently, the spatial parameter associated with Wy has an instinctive interpretation of spatial autocorrelation coefcient; and also accelerates the maximum likelihood (ML) estimation of spatial models. Consequently, row-standardization has become a meeting in practice without further investigation. However, it may not be appropriate in some situa- tions. Hence, the standardized W is given as: Ws = w / w suchthat ws = 1 ij ij ij (3) j j

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 3 Vol.:(0123456789) www.nature.com/scientificreports/

01000 1 20120 0 Ws =  0130131 3  �0 01�2012    0� 0121�20�     � �  Te multiplication of 5 × 5 row standardized matrix, W� s, with� 5 × 1 vector of y values taken by each region produces wy commonly called spatial lag vector of the dependent variable, as illustrated below:

01000 y1 y2 1 20120 0 y2 0.5y1 + 0.5y3 Wy =  0130131 3   y3  =  0.3y2 + 0.3y4 + 0.3y5  �0 01�2012 0.5 + 0.5    y4   y3 y5   0� 0121�20�   y5   0.5y3 + 0.5y4         � �      � � Model formulation If the expression in Eq. (1) is restated in a matrix form and the error structure takes the form u = ρWy + ε or u = Wu + ε , the two resulting models are called spatial lag and spatial error models. Mathematically, they are given as: y = ρWy + Xβ + ε (4)

y = Xβ + u (5) 2 where u = Wu + ε and ε ∼ N(0, σ In) From (4) the implied DGP is given as: −1 −1 y = (In − ρW) Xβ + (In − ρW) ε (6)

Te model statement in (6) can be “interpreted as indicating that the expected value of each observation yi will depend on the mean value Xβ plus a linear combination of values taken by neighbouring observations scaled −1 by the dependence parameter,ρ28. Te infnite series expansion of (In − ρW) is given in (7) according to­ 28,29.

−1 2 2 3 3 (In − ρW) = In + ρW + ρ W + ρ W ... (7) Hence, the re-expression of SAR DGP for vector y shown in Eq. (6) follows thus: 2 2 3 3 2 2 3 3 y = Xβ + ρWXβ + ρ W Xβ + ρ W Xβ ···+ε + ρWε + ρ W ε + ρ W ε + ... (8) Te ideal expressed in (8) is that rows of the weight matrix W are constructed to signify frst-order contiguous 2 neighbours. Equally, matrix W refects second-order contiguous neighbours, that is, those that are neighbours to the frst-order neighbours. Tis connotes neighbour of the neighbour to an observation i includes observation 2 itself. Hence, W has positive elements on the diagonal. “Te implication of this is that higher-order spatial lags 2 can lead to a connectivity relation for an observation i such that W ε will extract observations from the vector ε that point back to itself”. Tis is in stark contrast with the conventional independence relation in ordinary least-squares regression where the Gauss-Markov assumption rules out dependence observation of εi on other observations j , by assuming zero covariances between i and j in the data generating ­process30. Te DGP for spatial error model shown in (5) where the disturbances exhibit spatial dependence is given as 2 2 3 3 y = Xβ + ε + Wε + W ε + W ε + ... (9) From the foregoing, it is clear that in the spatial lag model, the spatially lagged dependent variable captures the spatial dependence between the cross-sectional units whereas in the spatial error model, the spatial autocorrela- tion term captures the spatial dependence­ 31. Posited two economic arguments in support of SEM over SAR and Spatial Durbin models. Firstly, they argued that the SEM model constitutes a fuller representation of the spatial dependence than SAR and spatial Durbin model (an extension of the SAR model in which the lag efect of the dependent and independents variables are included in the model specifcation). Tis is because with the SEM model the spatial dependence can be infuenced by other considerations in addition to shocks to the spatially lagged dependent variable. Secondly, they considered a situation where the total demand is disaggregated into two categories 1 and 2, a Wald test of the whole set of coefcients from the model for category 1 against 2, the set of coefcients from the model for category 2—which is necessary to establish if there is more to be learnt from disaggregating the data can be performed with easy on spatial error model. Tis is because the set of explanatory variables will be the same for a pair of SEM models. However, such a test cannot be performed on a pair of SAR models or a pair of spatial Durbin model because the spatially lagged dependent variables will difer in the two models. Tough an exhaustive discussion of the spatial Durbin model will not be considered in this study, yet it must be remarked that this kind of model was developed with motivation to account for spatial dependence in the independent variable. Tis rationale stems from the idea that dependence in spatial relationships does not only occur in the dependent variable but also in the explanatory variables.

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 4 Vol:.(1234567890) www.nature.com/scientificreports/

Statistics Precipitation Northing Easting Elevation Before transformation Mean 100.78 479,291 809,128.8 267.5 Std. Dev 42.27 221,803.6 256,689 206.22 Minimum 35.89 171,369 317,099 4 Maximum 186.77 830,299 1,500,000 1344 Skewness 0.41 0.18 0.47 1.22 Kurtosis 2.02 1.51 2.91 6.3 Afer transformation Mean 4.52 13.55 12.96 5.13 Std. Dev 0.44 0.33 0.51 1.18 Minimum 3.58 12.67 12.05 1.39 Maximum 5.23 14.22 13.63 7.2 Skewness − 0.2 − 0.34 − 0.2 − 1.11 Kurtosis 2.04 2.78 1.61 3.55 LnPrecipitation Lnnorthing Lneasting LnElevation Correlation matrix LnPrecipitation 1 LnNorthing − 0.203** 1 LnEasting 0.195** 0.233** 1 LnElevation − 0.665** 0.227** − 0.127* 1

Table 2. Statistical properties of the variables. **. Correlation is signifcant at the 0.01 level (2-tailed) *. Correlation is signifcant at the 0.05 level (2-tailed)

Another representative of the family of spatial regression models that is of interest in this study is the one that includes both endogenous interaction impacts and interaction efects among the error terms. Based ­on31 and related works, this model type was advocated for in the World Conference of the Spatial Econometrics Associa- tion held in ­201732. Labelled this model the Kelejian–Prucha model afer their article in 1998 since they were the frst to set out an estimation method for this model, also when the spatial weights matrix used to specify the spatial lag and the spatial error structure is the same. Whereas it was named Spatial Autoregressive with additional Autoregressive error structure (SARAR) or Clif-Ord type spatial model by­ 31 themselves­ 33. Termed the model “Spatial Autoregressive Confused” (SAC)17. Te specifcation takes the form: y = ρW1y + Xβ + u (10) 2 where u = W2u + ε , ε ∼ N(0, σ In) Te DGP of the model is of the form: −1 −1 −1 y = (In − ρW1) Xβ + (In − ρW1) (In − W2) ε (11) At frst glance, the specifcation appears to represent a mixture of both spatial dependences in the depend- ent variable and the disturbances represented by ­W1y and ­W2u, respectively. A more formal examination of specifcation produce from a mixture of spatial dependence in the dependent variable and the disturbances is provided ­by34. Result and discussion Variables’ description and data screening for spatial autocorrelation. Most statistical procedures and inferences usually work well on the assumption of normality of the data. Te data used for the study were explored for this essential criterion. Te statistical properties of the variables are presented in Tables 2 and it showed that the spread of the variables from their central level is substantial and hence the high level of coef- fcient of variation. However, northing is less dispersed (CV = 32%) when compared with precipitation, easting and elevation. Tere is a moderate level of skewness in the variable except for elevation. Due to the instability and skewness tendencies, all the variables were log-transformed, and this enhanced their statistical properties. For instance, the skewness decreased for all the variables while the leptokurtic and platykurtic nature of the variables became smoothed to realize approximately normally distributed variables. Te low-level interrelation- ship among the independent variables depicted by the correlation matrix is a signal that the selected variables passed the Gauss Markov assumption of absence of multicollinearity. Also worthy of note is the relationship of the selected independent variables with precipitation. Negative correlation was established between precipation and northing which indicated that precipitation decreased from south towards the north. However, positive cor- relation existed between easting and northing and this indicated that precipitation increase from west towards the ­east35.

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 5 Vol.:(0123456789) www.nature.com/scientificreports/

(Moran's I=0.8288 and P-value=0.0010) 2 re 1 _P An

ed 0 gg la y all i at -1 Sp -2

-2 -1 0 1 2 An_Pre

WAn_PreFitted values

Figure 2. Moran’s I scatter plot for annual precipitation.

I Z p value C Z p value Variables Moran’s I Geary’s c Precipitation 0.83 25.52 0.000 0.17 − 24.50 0.000 Northing 0.92 28.47 0.000 0.08 − 26.97 0.000 Easting 0.64 19.83 0.000 0.36 − 19.18 0.000 Elevation 0.13 5.28 0.099 0.82 − 1.83 0.067

Table 3. Measures of global spatial autocorrelation.

Te dependent variable was frst diagnosed for spatial autocorrelation using Moran’s I scatter plot (Fig. 2)36. Te data in the plot are standardized so that units on the graph are conveyed in standard deviations from the mean­ 37. Te horizontal axis demonstrates the standardized value of precipitation for a county, the vertical axis shows the standardized value of the average precipitation (WAn­ Pre), for that county’s neighbours as defned by the order one queen weights matrix. Te slope of the regression line through these points expresses the global Moran’s I and this is estimated to be 0.8288 with a p value of 0.0010 in this study­ 37,38. Similarly, a global measure of spatial autocorrelation was computed for the variables using both Moran’s and Greary’s C. As shown in Table 3, all the variables have positive and signifcant spatial autocorrelation, and this implies that similar values of each variable occur near their contiguous locations. Te upper right quadrant of the Moran’s I scatter plot showed those counties with above-average precipitation and share above-average precipitation with neighbouring counties (high-high)39,40. Tese are regarded as the hot spot locations while the lower lef quadrant which shows counties with below-average precipitation values and neighbours also with below-average values (low-low) is the cold spot locations­ 39,40. Te lower right quadrant displays counties with above-average precipitation surrounded by counties with below-average values (high-low), and the upper lef quadrant contains the reverse (low–high)41,42. Tey are called spatial outliers. Figure 3 (top, bottom) is a Local Indicator of Spatial Autocorrelation (LISA) and signifcant maps, respec- tively. Tese maps shed light on the clustering suspected in the Moran’s I scatter plot. Te red colour in this fgure (top) depicted the hot spot location of precipitation and there 82 of such locations in our data predominantly in the Southern regions of the country. In the same vein, the blue colour represented the cold spot i.e., clusters with low levels of precipitation and there are 89 points with such attribute in the dataset. Additional informa- tion provided by the signifcant map (Fig. 3, bottom) indicated that 63 (19.3%), 51 (15.6%) and 59 (18.10%) of the sample observation showed statistically signifcant local spatial autocorrelation at 5%, 1% and 0.1% level of signifcance, respectively.

Spatial variability and continuity. Te knowledge of the spatial clustering in the data from the previous subsection necessitated further exploration with geostatistical tools. Te 3-D surface map presented in Fig. 4 unambiguously described the interrelationship of precipitation in the study area with other geographic variables. it was noted from the map that precipitation values increase with decreases in the height above the sea level (elevation) and latitudinal values whereas longitudinal values have an irregular pattern as one move from west to east. Tis result implies that locations with high latitudes tend to experience low precipitation while those in the low latitudes have high precipitation. Equally, the high altitudes locations have high precipitation as against those in the low altitudes. From geography perspectives, Latitudes is simply a measure of how far one is from the equator while elevation measure how high one is above the sea level, so the locations that are far from the equator

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 6 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 3. LISA (top) and signifcant (bottom) maps showing the spatial distribution of precipitation.

or have high elevation are prone to cold weather or climate compare with those in the low latitudes. Tis explains the uneven distribution of precipitation in the northern and southern part of Nigeria. Te locations in the core north of the country are at high latitudes and altitudes and by implication far from the equator. Tis result is of great relevance in agriculture and water resource management. Further Gaussian model experimental variogram was ftted to the precipitation data and the basic spatial parameters were calculated using R statistical sofware (Fig. 5). Te calculated parameters were a nugget, range and sill. Te nugget is the value at which the model intercept y-axis and it can be interpreted as the variance at zero distance between a unit and its neighbour, the range is the distance where the model frst fattens, and this can be interpreted as the distance where the value of one variable becomes spatially independent­ 43 while the value at which the model attains the range is called the sill and it is interpreted at the lag distance between the measure- ments at which one value for a variable does not infuence neighbouring values (discontinuity). Te nugget, range, and the sill for the variable under investigation were found to be 50, 0.3 and 1200, respectively. Te ratio of the

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 7 Vol.:(0123456789) www.nature.com/scientificreports/

Figure 4. Precipitation surface map (mm).

Figure 5. Variogram plot for precipitation­ 29.

nugget variance to the sill is the spatial coefcient parameter­ 43,44 and for this study, it was estimated to be 4.2%. Following the classifcation of­ 10,45, this value indicated strong spatial autocorrelation in the precipitation series. A diagnostic check was conducted for spatial dependence in OLS regression afer it has been confrmed that spatial clustering is present in the dataset. Te aim here is to unearth the type of spatial efect present in the data- set and model as appropriate. To achieve this, the Lagrange Multiplier specifcation test for spatial lag and error 42,46 ­(LMLAG and LM­ ERROR ) was conducted on the residuals extracted from the ftted OLS regression­ . If neither the ­LMLAG nor LM­ ERROR statistics rejects the null hypothesis then OLS is appropriate. If one of the LM statistics rejects the null hypothesis, but the other does not then the decision is straightforward. Te alternative spatial regression model that matches the test statistic that rejects the null hypothesis­ 47,48. When there is confict, that is when both ­LMLAG and ­LMERROR statistics rejects the null hypothesis, to select an adequate model, focus shif to

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 8 Vol:.(1234567890) www.nature.com/scientificreports/

Statistic df p value Test Spatial error Moran’s I 17.576 1 0.000 Lagrange multiplier 287.857 1 0.000 Robust Lagrange Multiplier 30.101 0.000 SPATIAL LAG Lagrange multiplier 258.502 1 0.000 Robust Lagrange multiplier 0.746 1 0.388 SARMA Lagrange multiplier 288.603 1 0.000

Table 4. Diagnostic tests for spatial dependence in OLS regression.

Independent variables OLS SAR SEM DURBIN SARAR​ LogEasting 0.195* 0.137* 0.214* 0.226* 0.216* LogNorthing 0.050 0.043 0.017 0.006 − 0.010 LogElevation − 0.050 − 0.038 − 0.004 0.000 0.008 wx_LogEasting − 0.206 wx_LogNorthing 0.030 wx_LogElevation − 0.016 Intercept 1.467*** − 0.940*** 1.443*** 0.325*** 3.942*** Rho (ρ) 0.726*** 0.777*** − 0.496*** Lambda ( ) 0.779*** 0.919*** Model selection criteria Sigma 0.569 0.547 0.547 0.501 AIC 787.523 609.734 591.883 597.636 581.789 BIC 802.671 632.456 614.605 631.718 608.297 Weight Matrix (Queen Order 1) None 326 × 326 326 × 326 326 × 326 326 × 326

Table 5. Predictors of annual precipitation and selection criteria estimates. * p < 0.10, ** p < 0.05, *** p < 0.01.

40–42,42,48 robust forms of the test ­statistics . Typically, only one of them will be signifcant (for example ­LMERROR as in Table 4), or one will be more signifcant than the other. It is important to note that when both robust forms are signifcant, a model matching the (most) signifcant statistics is estimated. When both are highly signifcant, the model with the larger value of test statistics is considered appropriate however there may be other causes of ­misspecifcation40. Te LM-SARMA will tend to be signifcant when neither of them is appropriate­ 23. From the foregoing illustra- tion, the SEM model was appropriate if a choice was to be made between it and the SAR counterpart. Observe 40 that both the LM­ LAG and LM­ ERROR were signifcant but the robust form of the spatial lag model was insignifcant­ . To enhance further comparison, fve models were estimated of which the frst is non-spatial while the rest four take spatial specifcation form. Te models included traditional linear regression, spatial lag (SAR), spatial error (SEM), spatial Durbin and SARAR. Te spatial models were estimated by maximizing the corresponding likelihood while the non-spatial model was estimated by the OLS method. Table 5 reported the summary of the result from the fve models. Columns 1, 2, 3, 4 and 5 indicated the esti- mates obtained from OLS, SAR, SEM, Durbin and SARAR models, respectively. A close look at the result showed that the OLS, SAR and SEM models estimates were alike in term of the sign and signifcance but difers in term of the sign when compared with Durbin and SARAR estimates. However, the OLS was characterized with either under or overestimation of the coefcient. For instance, OLS over-estimated the coefcient for easting by 42.9% compared with the SAR model while it is over-estimating this coefcient by 8.6%, 13.6% and 9.5%, respectively when compared with ”SEM”, DURBIN and ”SARAR” models. Tis infation or defation of the coefcient was not surprising because of the signifcance of the spatial dependence in the dataset. Tis result also buttresses the position of­ 49–51 that nonspatial OLS is devastating and to be avoided unless interdependence is known to be very weak or nonexistent. Te spatial efect parameters ( and ρ ) in the spatial models are found to be highly signifcant. Te SAR, SEM, and DURBIN have the spatial efect of ρ = 0:726, p < 0.01, λ = 0:779, p < 0.01 and ρ = 0:777, p < 0.01, respectively. Te SAR coefcient indicates that the association between the dependent variable and its contiguous counties, the SEM coefcient, show the association of the error term with the neighbouring observation while the DURBIN coefcient gives the idea on the level of dependence of the spatial lag of the independent variables. In the case of SARAR, ρ is negative and signifcant while λ is positive and signifcant ( ρ = 0:496, p < 0.01;λ = 0:919, p < 0.01).

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 9 Vol.:(0123456789) www.nature.com/scientificreports/

Te spatial efect coefcients of the SARAR model gives a level of spatial association in the dependent variable and its neighbours as well as the error and its connected regions. Te model selection criteria statistics are presented in the last panel of Table 5. Using the two criteria, the OLS value is highest among other models. Tis indicates that OLS performs poorly in the presence of spatial clustering and that the spatial model will produce a robust estimate of the parameter. Tis fnding is in agreement with the earlier report by­ 3 that spatial models are superior to nonspatial OLS when a spatial efect is detected in the model. Overall, the SARAR model produced a better ft for the regression relation because its model selection cri- teria values were smallest compared with other spatial models. Based on the selected model, only the easting signifcantly explained precipitation. However, it was noted that easting and elevation exerted a positive impact on precipitation while northing impact was negative. It implies that precipitation increase with a corresponding increase in easting and elevation whereas it depreciates as northing appreciate. Te positive efect of easting on precipitation indicated there would an increase in precipitation value for any unit movement from west towards the east, while the negative efect of northing (though not signifcant) depicted a decrease in precipitation for any unit movement from south towards the north. Conclusion Tis study discussed the rationale for an alternate technique to the conventional regression proposition of inde- pendence of observation and applied the approach to building a regression relationship between three predictors (Easting, Northing and Elevation) and precipitation. Exploratory data analysis tools were used to detect spatial autocorrelation, hot spot and a cold spot of precipitation in the study area. Te results agreed with previous studies on the superiority of spatial models over OLS. On the premise that spatial models achieved signifcant improvement over their traditional counterpart, it indicated that spatial models were not only the correct speci- fcation but also a more efective approach. However, this study added that a spatial model that simultaneously accounted for spatial efect in the dependent variable and error term provides a better ft compared with the SAR and SEM used in the earlier studies of­ 4,52 for precipitation modelling. Te spatial modelling approach discussed is quite rich and provided the basis for choosing a particular regres- sion specifcation, unlike the orthodox framework where the model is imposed on the data without investigating what the data reveal about itself and how it should be modelled. Data exploration is very important, and it is one of the key ways of avoiding misspecifcation and misleading result. Materials and methods Study area. Te study was conducted in Nigeria, a country in the Sub-Saharan region. Te country is situ- ated in West Africa and bordered in the North and Northeast by the Niger Republic and the Republic of Chad, respectively. Also, it shared a boundary with the Republic of Cameroon and the Republic of Benin in the East and West, respectively. To the South, Nigeria is bordered by approximately 850 kms of the Atlantic Ocean, stretching from Badagry in the West to Rio del Rey in the East. It lies within latitudes of (4 14 N) and longitudes of (3 13 E) with a total land area of 923,768 square kilometres (Fig. 6, top)53. Nigeria has two distinct seasons: dry and wet. Tese seasons are based on the proximity of each region or location in the nation to the Intertropical Convergence Zone (ITCZ). Te dry season is between October to March while the wet season is between April to September annually with June and July ofen the wettest (Fig. 6, bottom) (online resources: https://​www.​brita​ nnica.​com/​place/​Niger​ia/​Clima​te).

Data and methods. Te data was sourced from the Nigeria Malaria Indicator Survey (MIS) of 2015. Te suitability of MIS for the study was based on its national representativeness and provision of geo-referenced information required for spatial modelling. Te geographical covariates were provided in a shapefle format and consist of climatic variables for 329 clusters. Te precipitation data for the year 2015 for all the 329 clusters and their respective coordinates were extracted but only 326 observations were suitable for analysis afer removing inconsistent ­cases3,4,35,52. Modelled precipitation as a function of easting, northing and elevation and this speci- fcation is adopted in this study due to limited data. Te easting and northing variables for each cluster were obtained by transforming the latitudes and longitudes of these 326 locations to Standard Universal Transverse Mercator (UTM) using ”PAleontological STatistics” (PAST) sofware. By defnition, the northing value is the distance of the position from the equator in meters while the easting value is the distance from the central meridian (longitude) of the used UTM zone (the study area has three UTM zones, namely, 31, 32 and 33). Before model estimation, the variables were diagnosed for spatial variability and clustering. Various exploratory tools were used to describe and visualize spatial distributions; identify uneven locations or spatial outliers; discover pattern of association, cold or hot ­spots39,41–43,54,55. Firstly, a 3-D surface contour map was used to examine the spatial variability of precipitation along the lines of longitude and latitude as well as its behaviour relative to a height above sea level. Secondly, the Variogram plot was used to study the precipitation data for a possible ten- dency of spatial dependence and discontinuity. Te spatial weighting matrix was created by employing GeoDa sofware using the queen defnition of neigh- bour discussed in Section "Spatial data concept and model formulation" and formatted as ”spmat” object and imported to STATA sofware for further exploration of the data. Basic information about the spatial weighting matrix is presented in Table 6. Te number of neighbours among the clusters range between 2 and 12 links with each county having 6 neighbours on average and a total of 1920. Te 3-D surface contour map, Variogram plot, weighting matrix creation, and regression modelling were carried out using Surfer, R, GeoDA and STATA statistical sofware packages, respectively. Each sofware was chosen based on the ease of undertaken assigned task and the time of execution.

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 10 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 6. Map of Nigeria Showing the Six Geopolitical Zones (top) and Monthly distribution of Precipitation of Nigeria (bottom).

Matrix Description Dimension 326 × 326 Total 1920 Minimum 2 Mean 5.889751 Maximum 12

Table 6. Summary of spatial-weighting object, W.

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 11 Vol.:(0123456789) www.nature.com/scientificreports/

Te weighting matrix created by ”spmat” command was used to produce a cluster map for precipitation and thereafer was converted to a text fle by using the “export” command in STATA. Te resulting text fle was saved as “dta” fle and imported as a row standardized weighting matrix of ”spatwmat” object. Tis ”spatwmat” weight- ing matrix format was used for the global and local indicator of spatial autocorrelation as well as to produce Moran’s I scatter plot. y = pWy + Xβ + WXθ + u u = Wu + ε (12) | | < 1, |ρ| < 1, |θ| < 1

Te general spatial regression expressed in Eq. (12) was transformed into fve diferent models by impos- ing zero conditions on the parameters Rho (ρ) , Lamda ( ) and Teta (θ) . Tis produces four spatial and non- spatial regression models. When each ρ , and θ are zeros, the traditional OLS model (Eq. 1) is recovered. Also, SAR and SEM expressed in Eqs. 3 and 4 surfaced when λ = 0 , θ = 0 and ρ = 0 , θ = 0, respectively. DURBIN and SARAR models resulted in the condition that λ = 0 and θ = 0, respectively. In this expression, y = precipitation is the dependent variable, X is a vector of exogenous variables which are northing, easting and elevation. λ,ρ and θ are the coefcients for spatial lagged dependent, the error term and the independent variables while u is the independent and identically distributed error term. All the spatial models were estimated using the”spmlreg” STATA module which is based on the maximum likelihood method.

Received: 25 March 2021; Accepted: 29 July 2021

References 1. Larrabee, B., Scott, H. M. & Bello, N. M. Ordinary least squares regression of ordered categorical data: inferential implications for practice. J. Agric. Biol. Environ. Stat. 19, 373–386 (2014). 2. Tobler, W. R. Philosophy in Geography 379–386 (Springer, 1979). 3. Anees, M. T. et al. Spatial estimation of average daily precipitation using multiple linear regression by using topographic and wind speed variables in tropical climate. J. Environ. Eng. Landsc. Manag. 26(4), 299–316. https://doi.​ org/​ 10.​ 3846/​ jeelm.​ 2018.​ 6337​ (2018). 4. Satagopan, J. & Rajagopalan, B. Comparing spatial estimation techniques for precipitation analysis. In Stochastic and Statistical Methods in Hydrology and Environmental Engineering Water Science and Technology Library Vol. 10/3 (eds Hipel, K. W. et al.) (Springer, Dordrecht, 1994). 5. Olubusoye, O. E., Okunlola, O. A. & Korter, G. O. Estimating bias of omitting spatial efect in spatial autoregressive (SAR) model. Inter. J. Stat. Appl 5, 150–156 (2015). 6. Zwiers, F. W. & Von Storch, H. On the role of statistics in climate research. Int. J. Climatol. J. R. Meteorol. Soc. 24, 665–680 (2004). 7. Unwin, D. J. in International Encyclopedia of Human Geography (eds Rob Kitchin & Nigel Trif) 452–457 (Elsevier, 2009). 8. Gitz, V., Meybeck, A., Lipper, L., Young, C. D. & Braatz, S. Climate change and food security: risks and responses. Food and Agri- culture Organization of the United Nations (FAO) Report 110 (2016). 9. Adewole, O. O. & Serifat, F. Modelling rainfall series in the geo-political zones of Nigeria. J. Environ. Earth Sci. 5, 100–111 (2015). 10. Yasrebi, J. et al. Spatial variability of soil fertility properties for precision agriculture in Southern Iran. J. Appl. Sci 8, 1642–1650 (2008). 11. Winter, T. C., Harvey, J. W., Franke, O. L. & Alley, W. M. Groundwater and Surface Water: A Single Resource Vol. 1139 (US geological Survey, 1998). 12. Semmler, T. & Jacob, D. Modeling extreme precipitation events—a climate change simulation for Europe. Global Planet. Change 44, 119–127 (2004). 13. Goovaerts, P. Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. J. Hydrol. 228, 113–129 (2000). 14. Kiefer Weisse, A. & Bois, P. Topographic efects on statistical characteristics of heavy rainfall and mapping in the French Alps. J. Appl. Meteorol. 40, 720–740 (2001). 15. Marquinez, J., Lastra, J. & Garcia, P. Estimation models for precipitation in mountainous regions: the use of GIS and multivariate analysis. J. Hydrol. 270, 1–11 (2003). 16. Kyriakidis, P. C., Miller, N. L. & Kim, J. A spatial time series framework for simulating daily precipitation at regional scales. J. Hydrol. 297, 236–255 (2004). 17. Arora, M., Singh, P., Goel, N. K. & Singh, R. D. Spatial distribution and seasonal variability of rainfall in a mountainous basin in the Himalayan Region. Water Resour Manag 20, 489–508 (2006). 18. Hession, S. L. & Moore, N. A spatial regression analysis of the infuence of topography on monthly rainfall in East Africa. Int. J. Climatol. 31(10), 1440–1456 (2011). 19. Diodato, N. Te infuence of topographic co-variables on the spatial variability of the precipitation over small regions of complex terrain. Int. J. Climatol. 25, 351–363 (2005). 20. Oettli, P. & Camberlin, P. Infuence of topography on monthly rainfall distribution over East Africa. Climate Res. 283, 199–212 (2005). 21. Youssef, K. & Hüseyin, G. Do quadratic and Poisson regression models help to predict monthly rainfall?. Desalination Water Treat. 215, 288–318. https://​doi.​org/​10.​5004/​dwt.​2021.​26397 (2021). 22. Ahmad, A. B. & Mustapha, B. M. Monthly rainfall prediction using artifcial neural network: a case study of Kano Nigeria. Environ. Earth Sci. Res. J. 5(2), 37–41. https://​doi.​org/​10.​18280/​eesrj.​050201 (2018). 23. Peter, E. E. & Precious, E. E. Ebiendele ebosele peter and ebiendele eromosele precious skill comparison of multiple-linear regres- sion model and artifcial neural network model in seasonal rainfall prediction-North East Nigeria, Asian. Res. J. Math. 11(2), 1–10 (2018). 24. Ogunrinde, A. T., Oguntunde, P. G., Fasinmirin, J. T. & Akinwumiju, A. S. Application of artifcial neural network for forecasting standardized precipitation and evapotranspiration index: a case study of Nigeria. Eng Rep https://​doi.​org/​10.​1002/​eng2.​12194 (2020). 25. Huang, Y. et al. Spatial and temporal variability in the precipitation concentration in the upper reaches of the Hongshui River basin, southwestern China. Adv. Meteorol. 2018 (2018). 26. Gajbhiye, S., Meshram, C., Singh, S. K., Srivastava, P. K. & , T. Precipitation trend analysis of Sindh River basin, India, from a 102-year record (1901–2002). Atmos. Sci. Lett. 17, 71–77 (2016). 27. Odekunle, T., Orinmoogunje, I. & Ayanlade, A. Application of GIS to assess rainfall variability impacts on crop yield in Guinean Savanna part of Nigeria. Afr. J. Biotechnol. 6 (2007).

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 12 Vol:.(1234567890) www.nature.com/scientificreports/

28. LeSage, J. P. An introduction to spatial econometrics. Open Ed. J. 123, 19–44 (2008). 29. Debreu, G. & Herstein, I. N. Nonnegative square matrices. Econometrica 21, 597–607 (1953). 30. LeSage, J. P. & Pace, R. K. Handbook of Applied Spatial Analysis 355–376 (Springer, 2010). 31. Anthony, J. G., Karligash, K., & Robin, S. Te economic case for the spatial error model with an application to state vehicle usage in the U.S. Science of the total environment 407, 3 (2012) 32. Kelejian, H. H. & Prucha, I. R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Finance Econ. 17, 99–121 (1998). 33. Elhorst, J. P. Applied spatial econometrics: raising the bar. Spat. Econ. Anal. 5, 9–28 (2010). 34. LeSage, J. P. What regional scientists need to know about spatial econometrics. Available at SSRN 2420725 (2014). 35. Ram, D., Ravindra, B., Soonil, R. Geostatistical Approaches for Estimating Rainfall over Mauritius, Conference: 3rd Research Week 2009–2010, International Conference, University of Mauritius https://www.​ ​resea​rchga​te.​net/​publi​cation/​23632​9459_​Geost​atist​ ical_​Appro​aches_​For_​Estim​ating_​Rainf​all_​Over_​Mauri​tius 36. Rey, S. J. et al. Open geospatial analytics with PySAL. ISPRS Int. J. Geo Inf. 4, 815–836 (2015). 37. Anselin, L. An introduction to spatial autocorrelation analysis with GeoDa (University of Illinois, Champagne-Urbana, Illinois, 2003). 38. Anselin, L. Spatial data science for an enhanced understanding of urban dynamics. Te Cities Papers (2015). 39. Anselin, L. Chapter eight the Moran scatterplot as an ESDA tool to assess local instability in spatial association. Spatial Anal. 4, 121 (1996). 40. Okunlola, O. A. & Oyeyemi, O. T. Spatio-temporal analysis of association between incidence of malaria and environmental predic- tors of malaria transmission in Nigeria. Sci. Rep. 9, 1–11 (2019). 41. Anselin, L. Exploring spatial data with GeoDaTM: a workbook. Center for Spatially Integrated Social Science (2005). 42. Anselin, L., Syabri, I. & Kho, Y. 2006: GeoDa: an introduction to spatial data analysis. Geogr. Anal. 38, 5–22 (2006). 43. López-Granados, F. et al. Spatial variability of agricultural soil parameters in southern Spain. Plant Soil 246, 97–105 (2002). 44. Mehrjardi, R. T., Jahromi, M. Z. & Heidari, A. Spatial Distribution of Groundwater Quality with Geostatistics (Case Study: Yazd- Ardakan Plain) 1. (2008). 45. Nayanaka, V., Vitharana, W. & Mapa, R. Geostatistical analysis of soil properties to support spatial sampling in a paddy growing alfsol (2010). 46. Kuswantoro, H. & Zen, S. Performance of acid-tolerant soybean promising lines in two planting seasons. Int. J. Biol. 5, 49 (2013). 47. Ly, S., Charles, C. & Degré, A. Diferent methods for spatial interpolation of rainfall data for operational hydrology and hydrological modeling at watershed scale: a review. Biotechnol. Agron. Soc. Environ. 17, 392–406 (2013). 48. Matthews, S. A. in GISPopSci Workshop, Friday, June. 267–281. 49. An, Y. & Wan, L. Monitoring spatial changes in manufacturing frms in seoul metropolitan area using frm life cycle and locational factors. Sustainability 11, 3808 (2019). 50. Franzese, R. J. Jr. & Hays, J. C. Spatial econometric models of cross-sectional interdependence in political science panel and time- series-cross-section data. Polit. Anal. 15, 140–164 (2007). 51. Anselin, L. Spatial Econometrics: Methods and Models (Springer, 1998). 52. Baron, K. & Aldstadt, J. An ArcGIS Application of Spatial Statistics to Precipitation Modeling. Journal of Hydrologic Engineering (2002). https://​cites​eerx.​ist.​psu.​edu/​viewd​oc/​downl​oad?​doi=​10.1.​1.​210.​3308&​rep=​rep1&​type=​pdf 53. National Malaria Elimination Programme (NMEP), National Population Commission (NPopC), National Bureau of Statistics (NBS), and ICF International. 2016. Nigeria Malaria Indicator Survey 2015. , Nigeria, and Rockville, Maryland, USA: NMEP, NPopC, and ICF International 54. George, G. (ed.) Spatial Analysis Methods and Practice: Describe—Explore—Explain through GIS 59–146 (Cambridge University Press, 2020). 55. Kulldorf, M. A spatial scan statistic. Commun. Stat.-Teory Methods 26, 1481–1496 (1997). Acknowledgements Tis publication was supported by the construction EFOP-3.6.3-VEKOP-16–2017-00007 (“Young researchers from talented students—Supporting scientifc career in research activities in higher education”). Te project was supported by the European Union, co-fnanced by the European Social Fund. Author contributions Conceptualization, O.O.A. and O.O.E.; methodology, O.O.A. and M.A.; sofware, O.O.A. and L.A.F.; validation, M.A.; formal analysis, O.O.A. and M.A.; investigation, A.K. and I.S.; resources, I.S.; data curation, O.O.A.; writ- ing original draf preparation, All authors; writing review and editing, M.A. and I.S.; visualization, O. and M.A.; supervision, M.A.

Competing interests Te authors declare no competing interests. Additional information Correspondence and requests for materials should be addressed to M.A. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations.

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 13 Vol.:(0123456789) www.nature.com/scientificreports/

Open Access Tis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© Te Author(s) 2021

Scientifc Reports | (2021) 11:16848 | https://doi.org/10.1038/s41598-021-96124-x 14 Vol:.(1234567890)