Proceeding ICST (2021) e-ISSN: 2722-7375 Vol. 2, June 2021 Poverty data modelling in Province using panel data regression analysis

Shilvia Aodia, Nurul Fitriyani, Marwan Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas , Jl. Majapahit 62, Mataram,

Author’s e-mail: [email protected], [email protected], [email protected]

Abstract. Nowadays, Indonesia is still facing poverty issues. One of which is in its province, West Nusa Tenggara. This study aimed to build a poverty model and determine the dominant factors affecting the number of poor people in West Nusa Tenggara Province using panel data regression analysis. The fixed-effect model, with different intercept values for each individual, was selected as the panel data regression model. Based on the research conducted, we derive the best model as follows. 10 ˆ ˆ 10 10 log Y=β0i+0,234908 log X1− ,2112122 log X2 The values of each individual's intercept were as follows: at 7.347; Regency at 7.101; West Regency at 7.508; at 7.559, at 7.714; at 7.376; Regency at 7.346; West at 7.020; Bima City at 6.905; and Mataram City at 7.310. The two most dominant factors affecting the number of poor people in West Nusa Tenggara Province

were population (X 1 ) and Human Development Index (X 2 ), with a positive and negative effect, respectively. The model obtained can explain the diversity of the number of poor people in West Nusa Tenggara Province by 64.4%.

Keywords: Fixed-effect model; panel data regression model; poverty; West Nusa Tenggara

1. Introduction Many countries in the world are faced with the problem of poverty, including Indonesia [1]. In Indonesia, it is marked by the number of people living below the poverty line [2]. The government continues to reduce this problem, and several efforts have succeeded in reducing the poverty rate in Indonesia [1]. However, the increasing population of Indonesia each year and the diverse characteristics of Indonesia's regions are a challenge in overcoming this poverty problem [3]. West Nusa Tenggara Province is one of among ten provinces with high poverty rates in Indonesia according to Central Statistics Agency (BPS) data in September 2018. Therefore, the government must pay special attention and take appropriate policies according to the province's characteristics and know the

ICST conference, December 14 th 2020, published online: June 1 st 2021

698

factors that affect the number of poor people in the West Nusa Tenggara area. In this research, we used regression analysis to determine the factors that affect the number of poor people in the West Nusa Tenggara area [4]. Regression analysis is a study of the dependence of one dependent variable with one or more independent variables to estimate or predict the population mean or average value of the dependent variable based on the independent variable's known value. The purpose of regression analysis is to estimate the mean and value of the dependent variable based on the value of the independent variable so that this can be used in determining the factors that affect the number of poor people in West Nusa Tenggara Province. However, ordinary regression analysis is not suitable if the data used is panel data or data combining cross-section and time series. That is because there is still the possibility of effect from individual units or units of time. This condition can be overcome by using panel data regression analysis. Panel data is useful for seeing the economic impact inseparable between each individual in several periods and is not obtained from cross-section data and time-series data separately [5]. Some of the advantages of using panel data are heterogeneous data, more informative, varied, the greater degree of freedom, more efficient, superior in studying dynamic changes, more able to detect and measure unobservable effects on pure cross-section data and time series purely, and minimizes bias [6].

2. Research Methods Based on the data and the results to be achieved, this research is a type of applied research using R- Studio software. This study uses secondary data from the Central Statistics Agency (BPS), West Nusa Tenggara Province. This data is divided into districts/cities in West Nusa Tenggara Province with ten districts/cities. This data uses annual data taken from 2014 to 2018. The research variables used are dependent variable,namely the number of poor people in the soul ( Y ) with the independent variable, namely the number of people in the soul ( X 1 ), The human development index in the index ( X 2 ), the average age at first marriage of women aged ten years and over in years ( X 3 ), and the percentage of illiteracy of the population aged 15 years and over in percent ( X 4 ). The data analysis steps carried out in this study are as follows. 1. Preparing the dependent variable and independent variable and exploring general poverty data. 2. Checking the multicollinearity assumptions in the regression model using the following equation. 2 K VIF j = 11 − R j ; j = ,2,1 ,k 3. Performing a regression analysis using the Ordinary Least Square (OLS) method with equations βˆ = (X' X)−1 X' Y . Then choose the best independent variable into the model using the Backward Elimination method. 4. Estimating the parameters of the Common Effect Model using the Ordinary Least Square ˆ −1 method with equations βmg = (X' X) X' Y , the estimation of Fixed Effect Model parameters ~ ~ −1 ~ ~ using the Fixed Effect Within Group (WG) method with equations βˆ = X'X Y'X for k ( ) * ~ ~ −1 ~ ~ individual effect and βˆ = (X* ) X' * (X* ) Y' * for time effect, and the estimation of the Random k ( ) Effect Model parameters using the Generalized Least Square (GLS) method with equations: ˆ −1 −1 −1 βmpa = (X' V X) (X' V Y − X' μ) . 5. Selecting a panel data regression estimation model between the Common Effect Model and the Fixed Effect Model using the Chow test with the following equation.

ICST conference, December 14 th 2020, published online: June 1 st 2021

699

Chow = JKG − JKG N −1 JKG NT − N − k for individual effect, and ( mg mpt ) ( mpt ) Chow = JKG − JKG T −1 JKG NT −T − k to time effect. ( mg mpt ) ( mpt ) ~ ~ ~ ~ with JKG = Y' Y − βˆ X'' Y and JKG = Y'Y * − βˆ Y'X' . mg mpt k 6. Selecting the panel data regression estimation model between the Fixed Effect Model and the Random Effect Model using the Hausman test with the following equation. ˆ ˆ ' ˆ ˆ −1 ˆ ˆ Hausman = (βmpa − βmpt ) (var( βmpa − βmpt )) (βmpa − βmpt ) 7. Selecting the panel data regression estimation model between the Common Effect Model and the Random Effect Model using the Lagrange Multiplier (LM) test. 8. Testing the residual assumption test. a. Normality test, using the Jarque-Bera test with the following equation. S 2 (K − )3 2  JB = n +   6 24  3 2  n   n  2  n   n   3   2   4   2  with S = (ε i − ε ) n (ε i − ε ) n and K = (ε i − ε ) n (ε i − ε ) n .  i=1   i=1   i=1   i=1  b. Homoscedasticity test, using the Lagrange Multiplier test with the following equations. 2 2  N  T    ε it   NT  i=1  t =1   LM = N T −1 . (2 T − )1  2   ε it   i=1t = 1  c. Independence test, using the Durbin-Watson test with the following equations. =ni =ni 2 2 d = ()ε i − ε i−1 ε i . i=2 i=1 9. Handling assumptions if there is a violation of the residual assumption. If the residuals are not normally distributed, then use data transformation. If there is heteroscedasticity in the residuals, use GLS, and if the residuals are not independent, use Feasible Generalized Least Square (FGLS). 10. Test the feasibility of the panel data regression model by testing the model parameters that have 2 2 been selected simultaneously with the equation F hitung = (R 2) (1− R NT − N − K ) and partially with equations t = βˆ SE βˆ and checking the determinant coefficient of the model with the hitung i ( i ) equation R 2 = [βˆ X'' Y − (Y'11' Y n)] [Y' Y − (Y'11' Y n)]. Furthermore, interpreting the significant coefficient to obtain the best model. 11. Making conclusions.

3. Results and Discussion

3.1. Data Exploration The Figure 1 is the graph of the number of poor people in West Nusa Tenggara from 2014 to 2018. Based on the figure, it can be seen that East Lombok Regency has the highest number of poor people from year to year. The high number is thought to follow the population because this district has the largest population. The district/city with the least number of poor people is Bima City, and it can be

ICST conference, December 14 th 2020, published online: June 1 st 2021

700

seen in the graph that Bima City has several poor people only in the range of tens of thousands of people.

Figure 1. Number of Poor Population in West Nusa Tenggara 2014-2018 (Thousands of People)

The graph shows that each district/city has a poverty plotline that tends to be different. East Lombok Regency has the top plot and is separate from others, which shows that this district has the largest number of poor people. The plots between districts/cities tend to be different due to different poverty rates in each district/city.

3.2. Regression Analysis Regression analysis is carried out after examining the multicollinearity assumptions used to check for the relationship between the independent variables seen from the Variance Inflation Factor (VIF) value generated from the regression model. The VIF value of the independent variable generated from the regression model uses the OLS method as follows.

Table 1. Independent Variable Inflation Factor Value Independent Variable VIF V alue

X1 2.503

X 2 1.612

X 3 4.763

X 4 2.618

Based on the resulting VIF value, it can be seen that the VIF value is less than ten, so that the independent variable is said to have no relationship with the other independent variables. The results of the regression model parameter estimation are as follows.

ICST conference, December 14 th 2020, published online: June 1 st 2021

701

Table 2. Regression Analysis Results Variable Estimate t-count Constant 343358.000 3.523

X1 0.145 17.443

X 2 -2405.000 -5.339

X 3 -8345.000 -1.803

X 4 -161.690 -0.229

Based on the regression analysis results, it can be seen that the constants X1 and X 2 have t > t so that the parameter is significant, whereas X and X have t < t so that the count table 3 4 count table parameter is not significant. The Backward Elimination method is used to determine the best independent variable to be included in the model. Suppose independent variables are not significant in the model even though they are closely related to the dependent variable. In that case, it is possible to have a multicollinearity case so that the model is not optimal for making predictions. The variable X 4 was insignificant and became the first independent variable to be excluded from the model based on the results obtained. Researchers then re-estimated the regression parameters using the independent variable X1 , X 2 , and X 3 . Next, the independent variable X 3 became the second independent variable excluded from the model, and then researchers re-estimated the regression parameters by using the independent variable X1 and X 2 . The results of the regression analysis of the Backward Elimination method were as follows.

Table 3. Results of the Backward Elimination Method Regression Model Analysis Variable Estimate t-count Constant 191745.225 7.129

X1 0.157 27.588

X 2 -2826.388 -7.339

Based on the Backward Elimination method, tcount > ttable were obtained, so that the best independent variables that were included in the model were the independent variable of the population (X 1 ) and human development index (X 2 ).

3.3. Panel Data Regression Analysis Panel data regression analysis was carried out to form an assumed model equation between the independent and dependent variables on the number of poor people in West Nusa Tenggara by taking into account individual characteristics and time characteristics. This analysis is also useful for knowing the relationship between the independent variable and the dependent variable. The steps taken are model specification, residual assumption testing, and handling. Model specification is carried out to determine the best panel data regression model parameter estimator according to the observed panel data characteristics. Estimating parameters is carried out using a three-model approach, namely the Common Effect Model, the Fixed Effect Model, and the Random Effect Model. The Common Effect Model combines individual data and time series into one

ICST conference, December 14 th 2020, published online: June 1 st 2021

702

unit regardless of individual effect or time effect. The results of the Common Effect Model analysis on panel data can be seen in the following table.

Table 4. Estimation Results of Common Effect Model Parameters Variable Estimate t-count Constant 191745.225 7.129

X1 0.157 27.588

X 2 -2826.388 -7.339

Based on the results of the Common Effect Model, t > t so that the independent variable X count table 1 and X significant. So it can be concluded that at the significance level of 5%, the independent 2 variable that affects the number of poor people in West Nusa Tenggara is the number of people (X 1 ) and the human development index (X 2 ). The next parameter estimation is the estimation with Fixed Effect Model. The results of the analysis of the Fixed Effect Model with individual effect can be seen in the following table.

Table 5. Estimation Results of Fixed Effect Model Parameters (Individual) Variable Estimate t-count

X1 -0.194 -1.866

X 2 -950.652 -1.151

Based on the Fixed Effect Model results with the individual's effect, t < t obtained, so that the count table independent variable X and X not significant. Therefore, it can be concluded that at the 5% 1 2 significant level, there are no independent variables that affect the number of poor people in West Nusa Tenggara. Furthermore, the results of the analysis of the Fixed Effect Model with the effect of time can be seen in the following table.

Table 6. Estimation Results of Fixed Effect Model Parameters (Time) Variable Estimate t-count

X1 -0.157 26.527

X 2 -2785.709 -6.735

Based on the Fixed Effect Model results with the effect of the time, t > t obtained, so that the count table independent variable X and X significant. Therefore, it can be concluded that at the significant level 1 2 of 5%, the independent variable that affects the number of poor people in West Nusa Tenggara is the number of people (X 1 ) and the human development index (X 2 ). The last parameter estimation is the estimation using the Random Effect Model. The results of the Random Effect Model analysis using the GLS method can be seen in the following table.

ICST conference, December 14 th 2020, published online: June 1 st 2021

703

Table 7. Results of the Estimation of Random Effect Model Parameters Variable Estimate t-count Constant 196097.174 9.368

X1 0.147 23.912

X 2 -2820.927 -9.542

Based on the Random Effect Model results, t > t obtained, so that the independent variable X count table 1 and X significant. Therefore, it can be concluded that at the significant level of 5%, the independent 2 variable that affects the number of poor people in West Nusa Tenggara is the number of people (X 1 ) and the human development index (X 2 ). Determination of the individual effect or time effect is carried out using the Chow test with the hypothesis H 0, namely CommonEffect Model is the right model and H1 that is, the Fixed Effect Model is appropriate. Result Chow's testing for the Fixed Effect Model with individual effect yields a value Chow equal to 68.178 so that Chow > F then the hypothesis H is rejected. So it can be concluded table 0 that at the 5% significant level, the Fixed Effect Model with the individual effect is the right model. Furthermore, Chow's test for the time effect Fixed Effect Model yields a value Chow equal to 0.234so that Chow < F then the hypothesis H is accepted. So it can be concluded that at the 5% significant table 0 level, the Fixed Effect Model with time is not the right model. Furthermore, the Random Effect Model that has been generated will be compared with the Fixed Effect Model that has been previously selected. The choice of model between the Fixed Effect Model and the Random Effect Model uses the Hausman test with the hypothesis H 0 that is the Random Effect Model is an appropriate model H1 that is The Fixed Effect Model is appropriate. The Hausman test 2 results obtained a value Hausman equal to 15.894, Hausman > χ table , so the hypothesis H 0 is rejected. So it can be concluded that at the 5% significant level, the Fixed Effect Model is the right model. Lagrange Multiplier testing was not carried out because the selected model was the Fixed Effect Model. Based on the estimation of Fixed Effect Model parameters with individual effect, the following model is obtained. Yˆ = βˆ − 0.1936686 X − 950.65197 X 0i 1 2 Fixed Effect Model with the individual effect is the right model, so the model has a different intercept for each individual and a fixed slope. Each individual intercept is obtained using the WG method with individual effects on the following equation. ˆ ˆ ˆ L ˆ β0i = Yi −(β X11 i − () β X 22 i ) − −(βK X Ki ) The results of estimating the intercept of each individual using the WG method can be seen in the following table.

Table 8. Estimation of Individual Interceptions ˆ Regency/City β0i 225 641.861 143036.930 300879.618 Central Lombok Regency 381012.224

ICST conference, December 14 th 2020, published online: June 1 st 2021

704

East Lombok Regency 501912.627 North Lombok Regency 170056.490 Sumbawa Regency 218215.024 114215.832 Bima City 116930.971 Mataram City 207376.037

The estimation of each individual's intercept obtained the largest intercept, namely East Lombok Regency with 501912.627, meaning that the individual effect in East Lombok Regency is higher among other districts/cities in West Nusa Tenggara. Next, testing the residual assumptions of the selected models to determine the unfulfilled assumptions. The residual normality test is carried out to determine whether the residuals are normally distributed or not, and the Jarque-Bera test has a hypothesis H 0 is the residual normally distributed and H1 is the residual not normally distributed.The score JB which is obtained is equal to 67.105, so that 2 JB > χ table and the hypothesis H 0 is rejected. Therefore, it can be concluded that at the 5% significant level, the residuals are not normally distributed, or it can be said that the residual normality assumption is not fulfilled. The residual homoscedasticity assumption test is carried out to determine whether the residual is homogeneous or not. The Lagrange Multiplier test has the hypothesis H 0, namely,heteroscedasticity does not occur, and H1 is the residualheteroscedasticity. The score LM which is obtained is equal to 2 6.25,so that LM < χ table and hypothesis H 0 is accepted. Therefore, it can be concluded that at the 5% significant level, there is no residual heteroscedasticity, or it can be said that the residual homoscedasticity assumption is fulfilled. The final residual assumption test is residual independence. The residual independence test is conducted to determine whether the residuals are independent or not using the Durbin-Watson test with hypothesis H 0, namely independent residuals, and H 1, namely non-independent residuals.The score d was obtained by 2.046 with dL = .1 4625 , so that d > dL then the hypothesis H 0 is accepted. So it can be concluded that at the 5% significant level of the residuals are independent, or it can be said that the assumption of residual independence is fulfilled. Based on testing the Fixed Effect Model's residual assumptions with individual effects using the WG method, the assumptions of homoscedasticity and residual independence are met. Still, the residual normality assumptions are not fulfilled. The residual violation that occurs is a violation of the residual normality, so the handling for violation of the residual assumption used is data transformation. The data transformation used on the dependent and independent variables was 10 log . The Fixed Effect Model results with individual effect using data transformation can be seen in the following table.

Table 9. Result of Estimation of Parameters After Handling Variable Estimate

X1 0.234908

X 2 -2.112112

The following model is obtained based on the estimation of Fixed Effect Model parameters with individual effects after handling. 10 ˆ ˆ 10 10 log Y = β0i + .0 234908 log X1 − 2.112122 log X 2

ICST conference, December 14 th 2020, published online: June 1 st 2021

705

The results of estimating each individual's intercept using the WG method after handling can be seen in the following table.

Table 10. Estimated Results of Each Individual's Intercept After Handling ˆ Regency/City β0i Bima Regency 7.347 Dompu Regency 7.101 West Lombok Regency 7.508 Central Lombok Regency 7.559 East Lombok Regency 7.714 North Lombok Regency 7.376 Sumbawa Regency 7.346 West Sumbawa Regency 7.020 Bima City 6.905 Mataram City 7.310

Based on the estimation of each individual's intercept, the largest intercept was obtained, namely East Lombok Regency with 7.714, meaning that the individual effect in East Lombok Regency was higher among other Regencies/Cities in West Nusa Tenggara. This shows an effect of different district/city characteristics, as shown in the previous data exploration. Furthermore, testing the residual assumptions after handling. Testing for normality using the 2 Jarque-Bera test obtained values JB amounting to 3.733 so that JB > χ table . So it can be concluded that at the 5% significant level, the residuals are normally distributed, or it can be said that the residual normality assumption is met.Next, residual homoscedasticity testing using the Lagrange Multiplier test 2 obtained values LM amounting to 6.25 and LM > χ table . So it can be concluded that at the 5% significant level, the residual is homogeneous, or it can be said that the residual homoscedasticity assumption is fulfilled. The final residual assumption test, namely residual independence using the Durbin-Watson test, and d equal to 5 was obtained, with dL = .1 4625 so that d > dL . So it can be concluded that at the 5% significant level, the residuals are independent, or it can be said that the assumption of residual independence is fulfilled.

3.4. Evaluation and Interpretation Model Model evaluation can be done using the panel data regression model's feasibility test, namely simultaneous parameter testing, partial parameter testing, and the coefficient of determination of the panel data regression model. Testing parameters simultaneously using the F test has the hypothesis H 0, namely imprecise model, and H 1 that is the model is right. The value Fcount obtained by 34.371 so that F > F then the hypothesis H is rejected. So it can be concluded that at the 5% significant count table 0 level,the effect of the parameters is simultaneously significant, or the model used is the right model. Therefore, the independent variable that affects the number of poor people in West Nusa Tenggara were the population (X 1 ) and human development index (X 2 ).

Partial parameter testing using the t-test with a hypothesis H0 that isinsignificant parameters and H1is a significant parameter. The results of partial parameter testing using the t-test can be seen in the following table.

ICST conference, December 14 th 2020, published online: June 1 st 2021

706

Table 11. Model Parameter Partial Test Results Parameter t-count Constants (BimaRegency) 4174.390 Constants (Dompu Regency) 4034.798 Constants (West Lombok Regency) 4266.178 Constants (Central Lombok Regency) 4294.975 Constants (East Lombok Regency) 4383.428 Constants (North Lombok Regency) 4191.123 Constants (Sumbawa Regency) 4174.265 Constants (West Sumbawa Regency) 3988.608 Constants (Bima City) 3923.309 Constants (Mataram City) 4153.553

X1 0.612

X 2 -4.458

Based on the partial parameter testing results using the t-test, it is obtained tcount for constants and X bigger than t so the parameters are constant and X significant. Furthermore, t X smaller 2 table 2 count 1 than t so that the parameter X not significant. Furthermore, the coefficient of determination will table 1 show the diversity of the dependent variable that the model can explain. It earned R 2 value amounting to 0.644 which means that the diversity of the dependent variable, the number of poor people in West Nusa Tenggara, can be explained by the model by 64.4%. After performing panel data regression analysis, the best model is obtained, namely the Fixed Effect Model with individual effect, and data transformation is carried out. The best model used is as follows. 10 ˆ ˆ 10 10 log Y = β0i + .0 234908 log X1 − 2.112122 log X 2 with the value of each individual's intercept which is presented in the following table.

Table 12. The Value of the Intercept of Each Individual ˆ Regency/City β0i Bima Regency 7.347 Dompu Regency 7.101 West Lombok Regency 7.508 Central Lombok Regency 7.559 East Lombok Regency 7.714 North Lombok Regency 7.376 Sumbawa Regency 7.346 West Sumbawa Regency 7.020 Bima City 6.905

ICST conference, December 14 th 2020, published online: June 1 st 2021

707

Mataram City 7.310

The model above can be written as follows. ˆ * 234908.0 ˆ β0i X 1 ˆ 10 ˆ * Y = 2.112122 , with β0i = log β0i X 2 Based on the model above, it can be concluded that:

1. Poverty is directly proportional to the population size variable (X 1 ), meaning that the larger the population, the more poverty will increase.

2. Poverty is inversely proportional to the human development index variable (X 2 ), meaning that the more the Human Development Index, the poverty will decrease.

4. Conclussion Based on the analysis that has been done, the conclusions that can be obtained from this study were: 1. The best poverty model using panel data regression Fixed Effect Model with individual effect in West Nusa Tenggara Province is as follows. 10 ˆ ˆ 10 10 log Y = β0i + .0 234908 log X1 − 2.112122 log X 2 with the value of each individual's intercept, namely Bima Regency at 7.347, Dompu Regency at 7.101, West Lombok Regency at 7.508, Central Lombok Regency at 7.559, East Lombok Regency at 7.714, North Lombok Regency at 7.376, Sumbawa Regency at 7.346, West Sumbawa Regency at 7.020, Bima City at 6.905, and Mataram City at 7.310. Based on the coefficient of determination, the model can explain the diversity of the number of poor people in West Nusa Tenggara by 64.4%. 2. Based on the best poverty model obtained, the most dominant factor affecting the number of poor

people in West Nusa Tenggara Province is the population (X 1 ) and human development index

(X 2 ).

Acknowledgment This work was supported by the Statistics Areas of Expertise, Universitas Mataram, by using PNBP Universitas Mataram of year 2020 funding.

References [1] Badan Perencanaan Pembangunan Nasional, 2018, Analisis Wilayah dengan Kemiskinan Tinggi, Kedeputian Bidang Kependudukan dan Ketenagakerjaan Kementerian PPN/Bappenas, Jakarta. [2] Badan Perencanaan Pembangunan Nasional, 2005, Strategi Nasional Penanggulangan Kemiskinan, Kedeputian Bidang Kependudukan dan Ketenagakerjaan Kementrian PPN/Bappenas, Jakarta. [3] Bank Dunia, 2007, Era Baru dalam Pengentasan Kemiskinan di Indonesia, PT. Graha Info Kreasi, Jakarta. [4] Badan Pusat Statistik, 2019, Statistik Indonesia 2019, CV. Dharmaputra, Jakarta. [5] Gujarati, D.N. dan Dawn C. Porter, 2009, Basic Econometrics, 5th Edition, Douglas Reiner, New York. [6] Baltagi, B.H., 2005, Econometric Analysis of Panel Data, 3rd Edition, John Wiley & Sons Ltd, Chichester.

ICST conference, December 14 th 2020, published online: June 1 st 2021

708