Poverty Data Modelling in North Sumatera Province Using Geographically Weighted Regression (GWR) Method

Poverty Data Modelling In North Sumatera Province Using Geographically Weighted Regression (GWR) Method Kristina Pestaria Sinaga Graduate Student of Mathematics, University of North Sumatera email : [email protected] Abstract. In regular regression equation, a response variable is connected with some predictor variables in one main output, which is parameter measurement. This parameter explains relationships of every predictor variable with response variable. However, when it is applied to spatial data, this model is not always valid because the location difference can result in different model estimation. One of the analyses that recommend spatial condition is locally linear regression called Geographically Weighted Regression (GWR). The basic idea from this GWR model is the consideration of geographical aspect or location as weight in estimating the model parameter. Model parameter estimation of GWR is obtained using Weight Least Square (WLS) by giving different weights to every location where the data is obtained. In many analyses of GWR, also in this research, the weight used is Gauss Kernel, which needs bandwidth value as distance parameter that still affects each location. Bandwidth optimum can be obtained by minimalizing cross validation value. In this research, the researcher aims to compare the results of global regression model with GWR model in predicting poverty percentage. The data used as a case study are data from 33 cities/regencies in North Sumatera province. Received dd-mm-2015, Accepted dd-mm-2015. 2015 Mathematics Subject Classification: Key words and Phrases: GWR, WLS, Kernel Function, Poverty. 1 Kristina Pestaria Sinaga – Poverty Data Modelling 2 1. INTRODUCTION Poverty is one of fundamental issues that have been governments concern in any country all over the world. In Indonesia, poverty is still one of the biggest problems[1]. Both central government and local governments has tried to implement policies and programs to overcome poverty but there seem a lot of things that have not been accomplished. One of important aspects to overcome poverty is determining the poverty measurement value. Reliable measurement can be a very important element in policy making regarding the lives of the poor[2]. To know the number, spread, and condition of poverty in certain area, a perfect poverty measurement is needed to achieve effectiveness in over- coming poverty through policies and programs. BPS also develops a certain method to obtain a criterion that operationally can be used to determine the number of poor households[3]. This method is used in Pendataan Sosial Ekonomi (PSE) (Social Economical Census) in 2005 by using 14 variable indicators to determine the poverty status. However in reality, the method to determine poverty rate, according to this notion, is still global; in other words, it applies to all locations being observed. In fact, the condition of one location is not always the same with the condition of other locations, may be due to geographical factor (spatial variation), social cultural back- ground, and other things that surround the location. Therefore, the model to determine the global poverty rate does not fit to be used for its spatial heterogeneity. One of the effects emerging from spatial heterogeneity is spatial varied regression parameter. In global regression, it is assumed that the predictive value of regression parameter will be constant, which means the regression parameter is the same for every point in the research area. If spatial heterogeneity happens to regression parameter, then global regression becomes less capable in explaining the real data phenomena[7].This research aims to model poverty in North Sumatera province in 2013 with Fixed Gaussian Kernel weight and to test the GWR model parameter. Kristina Pestaria Sinaga – Poverty Data Modelling 3 2. LITERATURE REVIEW 2.1 Linier Regression The method that is often used to declare the relationship between response variable and predictor variable is regression method. Linier regression model for p predictor variable and the n number of observation in matrix equation is [11, 12] : Y1 X1 X11 X12 . X1p β0 ε0 P P P P 0 Y2 1 0 X2 X21 X22 . X2p 1 0 β1 1 0 ε1 1 P P P P B . C = B . C B . C + B . C (1) B . C B . C B . C B . C B C B C B C B C B . C B . C B . C B . C @ Yn A @ Xn Xn1 Xn2 . Xnp A @ βp A @ εn A P P P P Equation 1 is the general form of regression equation in matrix symbol. In this general form, Y is a response vector n 1, X states that predictor × matrix with measurement n (k +1), β vektor parameter ukuran (k +1) 1 × × and ε is error vector with measurement n 1. × Model (2) is also called global regression model because global regression model assumes that the relationship between response variable with predictor variable is constant, so the parameter of which the estimation value is the same in all places where the data taken[4, 11]. Ordinary global regression equation is usually defined using parameter estimation method Ordinary Least Square (OLS)[13]. For n observation with p independent variable, the regression model can be noted as below: p y = β0 + βkxik + i; i =1, 2, 3, ..., n (2) Xk=1 with i = 1, 2,...,n; β1,...,βp is model parameter and 1, 2,...,n is error assumed identical independent that has normal distribution with zero mean and constant variants. In this model, the relationship between independent variable with dependent variable is considered constant in each geographical location[11, 12]. The estimator of model parameter can be obtained below: T T −1 T βˆ =(βˆ0, βˆ1, ..., βˆp) =(X X) X Y (3) The testing statistic Fvalue of regression model is [11] : MSR F = (4) value MSE and decision H is rejected if Fvalue > F . 0 | | | table(α, p, n−p−1)| Kristina Pestaria Sinaga – Poverty Data Modelling 4 The coefficient value of determination can be formulated using analysis of variance table [11]: SSR R2 = (5) SST Spatial test is done to obtain significant parameter that can be used on model [11, 12]. Statistical test tvalue of regression model [11] is as below: βˆk tvalue = (6) S(βˆk) The parameter is significant to the model if tvalue > ttable( α ,n−p−1) . | | | 2 | 2.2 Geographically Weighted Regression (GWR) Geographically Weighted Regression (GWR) is a development technique from global regression model to weighted regression model[4, 8]. Response variable depends on the area location. GWR model can be formulated as below. p yi = β0(ui, vi) + βk(ui, vi)xik + i (7) Xk=1 in which: yi : coordinate point (longitude, latitude)of location i βk(ui, vi) : regression coefficient of predictor converter k for each location ui, vi : longitude and latitude for location i xik : observation value of predictor k in observation i i : random observation changer i In hypothesis test, there are a few assumptions used in GWR model, such as: 1. Error forms 1, 2,...,n are assumed identical independent and have normal distributions with zero means and constant variants 2. Ifˆyi is an estimatorof yi dilokasi ke-i, then in all locations (i = 1 , 2 , ..., n), yî is non-bias estimator for E(yi) or it can be written as E(ˆyi) = E(yi) for all i. 2.2.1 Making GWR Model Spatial weight is a weight that explains data locations. Close location and medium distance location are given big weight while far location is given Kristina Pestaria Sinaga – Poverty Data Modelling 5 small weight. Kernel function is a way to determine the size of weight in each different location on GWR model[9]. The weight functions can be written as below: 1. Gaussian : 2 1 dij wj(ui, vi) = exp −2 h " # 2. Adaptive Gaussian : 2 1 dij wj(ui, vi) = exp −2 h " i # 3. Bisquare: 2 2 dij 1 , untuk dij 6 h, wj(ui, vi) = − h 0, untuk dij > h. 4. Adaptive Bisquare: 2 2 dij 1 , untuk dij 6 hi, wj(ui, vi) = − hi 0, untuk dij > hi. 5. Tricube: 3 3 dij 1 , untuk dij 6 h, wj(ui, vi) = − h 0, untuk dij > h. 6. Adaptive Tricube: 3 3 dij 1 , untuk dij 6 hi, wj(ui, vi) = − hi 0, untuk dij > hi. 2 2 with dij = (ui uj) +(vi vj) , the euclidean distance between loca- − − tions (u , v ) to location (u , v ) and h is non-negative parameter usually i i p j j known and called as smoothing parameter or bandwidth. If the weight used is kernel function, then the choice of bandwidth is very important because Kristina Pestaria Sinaga – Poverty Data Modelling 6 bandwidth is a balance controller between curves towards data and data smoothness[10]. The method used to choose optimum bandwidth is Cross Validation (CV). This method is noted as below: n 2 CV (h) = (y ˆy6 (h)) (8) i − =i i=1 X with: y=6 i(h) : parameter value yi (fitting value) which observation on location (ui, vi) is omitted from estimation process. yî(h) : parameter value yi (fitting value) in which observation on location (ui, vi) is included in the estimation process. n : sample total number 2.2.2 Model Hypothesis Testing (GWR) The goodness of fit test of GWR model is done using the following hypothesis: H0 : βk (ui, vi) = βk (there is no difference between OLS and GWR) H : at least there is one βk (ui, vi) = βk (there is difference between 1 6 OLS and GWR) Test statistics: (RSSOLS−RSSGWR) v Fvalue = (9) RSSGWR δi Rejection location: reject H0, Fvalue > F(α;df1;df2) Parameter significance test in each location is done by testing spatial parameter. This testing is done to know the significance of β(ui, vi) to response variable in partial on GWR model.

Poverty Data Modelling in North Sumatera Province Using Geographically Weighted Regression (GWR) Method

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support