Best Weighted Selection in Handling Error Heterogeneity Problem on Spatial Regression Model
Total Page:16
File Type:pdf, Size:1020Kb
Best Weighted Selection in Handling Error Heterogeneity Problem on Spatial Regression Model Sri Sulistijowati Handajani1, Cornelia Ardiana Savita2, Hasih Pratiwi1, and Yuliana Susanti1 1Statistics of Study Program FMIPA, Universitas Sebelas Maret, Jl.Ir. Sutami 36 A Kentingan, Surakarta, Indonesia 2Mathematics of Study Program FMIPA, Universitas Sebelas Maret, Jl.Ir. Sutami 36 A Kentingan, Surakarta, Indonesia Keywords: Spatial Regression Model, Heterogeneity in Error, Ensemble Technique, R2, RMSE. Abstract: Spatial regression model is a regression model that is formed because of the relationship between independent variables with dependent variable with spasial effect. This is due to a strong relationship of observation in a location with other adjacent locations. One of assumptions in spatial regression model is homogeneous of error variance, but we often find the diversity of data in several different locations. This causes the assumption is not met. One such case is the poverty case data in Central Java Province. The objective of this research is to get the best model from this data with the heterogeneity in error. Ensemble technique is done by simulating noises (m) from normal distribution with mean nol and a standard deviation σ of the spasial model error taken and adding noise to the dependent variable. The technique is done by comparing the queen weighted and the cross-correlation normalization weighted in forming the model. Furthermore, with these two weights, the results will be compared using R2 and RMSE on the poverty case data in province of Central Java. Both of weights are calculated to determine the significant factors that give influence on poverty and to choose the best model. The results of the case study show that the spatial regression model of the SEM ensemble already does not have a variance error that is not homogeneous and the model using cross-normalization weight is better than the spatial regression model of SEM ensemble with Queen contiguity weight. 1 INTRODUCTION obtained without considering the spatial effect then the conclusions obtained will be invalid. Regression modeling is one form of classical Models with spatial effects have been proposed by modeling that is used to provide a model of the Qu (2013), i.e. the Spatial Auto Regressive (SAR) relationship between independent variables with model which shows the dependence of observation on dependent variable. Fulfillment of the assumptions of the dependent variable (autoregression) between the this modeling is necessary to ensure that the model is locations. While Mac Millen (1992) discusses the good and can be used for prediction. The problem that model of Spatial Error Model (SEM) which indicates often arises is when the model does not meet the an error correlation between locations. necessary assumptions. In spatial models that formed, there arose another The condition of data in the field that the more problem that is the heterogeneity of errors that diverse the pattern resulted in the development of resulted in instability in the parameter estimation. The existing classical models. One of the assumptions that instability of the parameter prediction resulted in less often is not met is the autocorrelation of errors. If the valid results. This is also stated by Dimopoulus, dependent variable is drawn from several adjacent Tsiros, Serelis and Chronopoulou (2004) regarding areas, this often results in errors of the resulting its application in the neural network model for model being correlated. The phenomena studied often various nonlinear problems. This instability is a show significant associations or interactions of weakness of the model that is formed. variables in adjacent areas, as expressed by Anselin Several approaches to the classical regression (1988). The development of a regression model by model have been done to overcome these assumption adding spatial effects can eliminate any dependencies deviations. Such as the robust regression method by between errors. This is in accordance with the results Chen (2002) with its emphasis on the detection of of Lesage (1997) which states that if the model extreme observations called outliers and its resistance 293 Handajani, S., Savita, C., Pratiwi, H. and Susanti, Y. Best Weighted Selection in Handling Error Heterogeneity Problem on Spatial Regression Model. DOI: 10.5220/0008521002930299 In Proceedings of the International Conference on Mathematics and Islam (ICMIs 2018), pages 293-299 ISBN: 978-989-758-407-7 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ICMIs 2018 - International Conference on Mathematics and Islam to the model. Approach of regression model with 푀 independent variable of the data in the i-th estimation has also been proposed by Montgomery observation unit and the t-time, 푊 is the standardized and Peck (2006) and Yuliana and Susanti (2008). The spatial weighted row matrix, α is the intercept, β is the model solutions with S estimation have been parameter of the independent variable, 푢 spatial error introduced by Rousseeuw and Yohai (1984), and MM 푖푡 estimation has been discussed also by Yohai (1987). in the i-th region of time t, and 휀푖푡 is the model error The problems that arise is how to approach the on the i-th observation and the th time. regression model that error in addition to The model (1) is further simplified to be autocorrelation also there is heterogeneity in the error obtained from the classical regression model. 퐘 = 퐙후 + (퐈 − 훌퐖)−ퟏ휺 To overcome the errors that contain autocorrelation is by spatial regression modeling. In where Z = [I : X] dan 후 = [휶 ∶ 휷]푻 . this research will be discussed the problem of heterogeneity of error on spatial regression model By estimating the parameters using the maximum with solution which will be applied in this research is likelihood method, it is necessary to first form the ensemble method. According to Mevik, Segtnan and likelihood function. Using the Jacobian Naes (2005) and Canuto, Oliveira, Junior, Santos and transformation is obtained Abreu (2005), ensemble techniques can be used to reduce the diversity contained in predictive models 휕휀 휕(퐘−퐙후)(퐈−훌W) and can improve prediction accuracy. This method is | | = | | = |퐈 − 훌퐖|, as one solution by combining k spatial regression 휕퐘 휕퐘 model formed from the addition of noise. The approach is through non-hybrid ensemble approach and likelihood function : and hybrid ensemble approach. The principle of non- 푇 hybrid ensemble method is to combine estimation 2 |퐈−훌퐖| ((퐘−퐙후)(퐈−훌퐖)) (퐘−퐙후)(퐈−훌퐖) 퐿(훾, 휆, 휎 ) = 푛 푒푥푝 (− 2 ) results from simulation of one model to a final (2휋휎2)2 2휎 estimate. While hybrid ensemble method involves (2) several suitable models and combine the predicted simulation results generated by each model into one To facilitate the estimation of the parameters, the final prediction. In this study we studied non-hybrid two sections of (2) are logged and the following ensemble approach by comparing queen weights with results are obtained spatial weights of cross-correlation normalization and using R2 and RMSE indicators to select the best 푛 푛 푙푛퐿(훾, 휆, 휎2) = − ln 2π − ln 휎2 approach. 2 2 푇 ((풀 − 풁휸)(푰 − 흀푾)) (풀 − 풁휸)(푰 − 흀푾) + ln|푰 − 흀푾| (− ) 2휎2 2 SPATIAL PANEL REGRESSION MODEL (3) In the data taken based on time and location, the 2 analyzes were performed using panel data analysis. By partially deriving (3) against 휎 , 훾 and λ and Due to the effect of spatial effect in panel data making it equal to zero, we get the estimator for 휎2, analysis so that the appropriate model used is spatial 훾 and λ as follows panel regression model. One of the spatial panel regression models is the panel spatial error model (SEM) (Lesage, 2009) yang sebelumnya 2 휕푙푛퐿(훾, 휆, 휎 ) dikembangkan dari model SEM yang diusulkan = Anselin (2003). The SEM panel regression model is 휕휎2 푇 푌 = 훼 + 푋 훽 + 푢 ; 푢 = 휆푊푢 + 휀 (1) 푛 ((퐘 − 퐙후)(퐈 − 훌퐖)) (퐘 − 퐙후)(퐈 − 훌퐖) 푖푡 푖푡 푖푡 푖푡 푖푡 푖푡 − + = 0 2휎2 2휎22 (퐘 − 퐙후 − 훒퐖퐘)푇(퐘 − 퐙후 − 훒퐖퐘) where 푌푖푡 being the dependent variable of the data in 휎̂2 = th 푛 the 푖 observation unit and the t-time,푋푖푡is the 294 Best Weighted Selection in Handling Error Heterogeneity Problem on Spatial Regression Model 휕푙푛퐿(훾, 휆, 휎2) 3. Determine independent variables that = 휕훾 significantly influence the percentage of poor people with stepwise method to form a simple (퐙(훌퐖퐘)푻) − 퐙(퐘)푻 + (퐈 − 훌퐖) ( ) linear regression model. 훾(퐙푇풁 − 퐙(퐙훌퐖)푻)푇 4. Determine the weighted matrix by using the = 0 휎2 spatial matrix of Queen contiguity and the cross- correlation normalization matrix. (퐘푇퐙 − 퐙(훌퐖퐘)푻)(퐈 − 훌퐖) 훾̂ = 5. Detect spatial effect by using Moran Index test. (퐙푇퐙 − 퐙(퐙훌퐖)푻)(퐈 − 훌퐖) 6. Test the LM to determine the effect of spatial dependence. 7. Establish a corresponding spatial regression 휕푙푛퐿(훾, 휆, 휎2) = model and test its assumptions. 휕휆 푛 8. Add noise which is generated k times from the −휔 1 푖 푇 푇 normal distribution with zero mean and model ∑ − 2 ((퐈 − 훌퐖)(2(퐙후퐖퐘) − 퐙후(퐙후퐖) )) 1 − 휆휔푖 휎 푖=1 error variance of σ and giving a zero value to the = 0 negative value data, to dependent variable to generate k new data. 9. In the k new data is done spatial regression 푛 휎2 −휔 modeling as follows. ̂ 푖 푇 휆 = ( (∑ )) + 퐙후(퐙후퐖) a. Test Lagrange for spatial dependence 퐈 − 훌퐖 1 − 휆휔푖 푖=1 b. Test the Breusch Pagan to test for spatial 푇 − 2(퐙후퐖퐘) diversity c. Estimate model parameters and test their significance d. Measuring the goodness of the spatial 3 EXPERIMENTAL DETAILS regression model with R2. 10. Establish an ensemble model which is a 1. Derive estimation the parameters for the SEM composite of k spatial regression models by spatial panel regression model with maximum calculating the average coefficients of the model.