Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 5, 193-200 (2020) ASTESJ www.astesj.com ISSN: 2415-6698

Evaluation of Disadvantaged Regions in East Based-on the 33 Indicators of the Ministry of Villages, Development of Disadvantaged Regions, and Transmigration Using the Ensemble ROCK (Robust Clustering Using Link) Method

Luluk Wulandari, Yuniar Farida*, Aris Fanani, Nurissaidah Ulinnuha, Putroue Keumala Intan

UIN Sunan , Mathematics Department, Surabaya, 60237,

A R T I C L E I N F O A B S T R A C T Article history: province is a large province in Indonesia, in which Surabaya is the second largest Received: 01 July, 2020 metropolitan city after . Various problems of development inequality in East Java Accepted: 14 August, 2020 have caused East Java to be defined as a disadvantaged area in 2015. The determination Online: 10 September, 2020 of disadvantaged regions is carried out every 5 years using 6 criteria and 33 indicators that have been set by the Ministry of Villages, Development of Disadvantaged Regions, and Keywords: Transmigration. However, from several studies that have been conducted on the Disadvantaged region determination of disadvantaged regions, there is no research applies 33 indicators as a Clustering of numeric data whole. So in this study, an evaluation of the determination of disadvantaged regions will be Clustering of cathegorical data carried out using 33 indicators that have been determined by The Ministry of Villages, Ensemble ROCK method Development of Disadvantaged Regions, and Transmigration. Criteria data used are the The Ministry of Villages results of the 2014 and 2018 surveys. These data are in the form of numerical data and Development of Disadvantaged categorical data. The method used is ensemble Robust Clustering Using Link (ROCK), Regions which is a clustering method that can accommodate mixed data both categorical and Transmigration (Indonesian: numerical, using the concept of distance to measure the similarity or closeness between a Kementerian Desa, Pembangunan pair of data points. The best cluster results for evaluating the determination of Daerah Tertinggal, dan disadvantaged regions in 2020 consist of 4 clusters with the smallest Sw and Sb ratio of Transmigrasi) 0.3873984 and the optimum threshold value of 0.04. The results of the clustering, place

Trenggalek, Bondowoso, Situbondo, , Tuban, Pamekasan, Sumenep, Bangkalan, and Sampang regions as disadvantaged regions in East Java.

1. Introduction disadvantaged region was determined in 2015 listed in Based on the Presidential Regulation of the Republic of Presidential Regulation number 131 of the year 2015 and will be Indonesia Number 131 the year 2015 concerning the re-established in 2020. In this study, the criteria used are survey Determination of Disadvantaged Regions in 2015-2019, East Java data in 2014 and 2018, which sources data were obtained from the Province is one of the 21 Provinces that are lagging in Indonesia. Central Statistics Agency in the form of data on village potential, Not only that, but East Java Province is also the only Province in statistics on people's welfare and the profile of each province in a Java which has several disadvantaged district or city. Therefore, certain number of years. The data in 2014 are used as a a study needs to be carried out to evaluate various problems of comparison with government decisions related to the development inequality that have left some regions in East Java determination of disadvantaged regions in 2015. While the data in behind. 2018 will be used as predictions for the determination of disadvantaged regions in 2020. The results of this study are Government Regulation number 78 of the year 2014 article 6 expected to provide a relevant picture in which regions have the paragraph 1 states that the determination of disadvantaged regions potential to be left behind in the future. Thus, the government of is carried out every 5 years based on criteria and indicators District/City can take policies towards their regions that are established by the Ministry of Villages, Development of adjusted to the characteristics of each region to alleviate the region Disadvantaged Regions, and Transmigration. In this case, the last from being left behind. *Corresponding Author: Yuniar Farida, UIN Sunan Ampel Surabaya, Indonesia, In practice, the government determines disadvantaged regions +62 81252347261, [email protected] based on Presidential Regulation Number 131 the Year 2015 www.astesj.com 193 https://dx.doi.org/10.25046/aj050524 L. Wulandari et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 5, 193-200 (2020) Article 6 Paragraph 2, using composite aspects and range values. article 2 paragraph 1, 2 and 3 which states that the determination Statistically, the two methods are only suitable for analyzing of disadvantaged regions uses six criteria (community economy, numerical data. While in reality, indicators to determine the status human resources, facilities and infrastructure, regional financial of disadvantaged regions do not only refer to numerical data. But capacity, accessibility, and regional characteristics) consisting of several indicators are categorical. Thus, if the composite aspect 33 indicators used to determine the status of disadvantaged and interval values are used as an analysis, it will not be able to regions. accommodate 6 criteria consisting of 33 indicators. To overcome Based on several related studies mentioned above, no research this, a special method is needed that can accommodate all types evaluates disadvantaged regions using all the criteria and of data, both categorical and numerical. The statistical method indicators that have been determined as a whole. So, in this study that can be used for clustering mixed data is the ensemble method an evaluation of disadvantaged regions will be conducted based [1]-[3]. In this study, the ensemble method used is Robust on all the criteria and indicators set by the Indonesia Ministry of Clustering Using Link (ROCK). Ensemble ROCK method is a Villages, Development of Disadvantaged Regions, and clustering method that uses the concept of distance to measure the Transmigration. similarity or closeness between a pair of data points [4], [5]. The advantage of the ensemble ROCK method is it has better accuracy 3. Theoretical Framework compared to the agglomerative hierarchy method with good scalability [6]. 3.1. Factor Analysis

Ensemble ROCK method has proven to be optimal for Factor analysis is a step to reduce research variables both conducting mixed data clustering in solving various cases [7], numerical and categorical data using the Principal Component such as the research conducted by Shashi Sharma and Ram Lal Analysis (PCA) method. The technique of this analysis is Yadav, the research proved that ensemble ROCK method is more conducted by finding the relationship between the variables that optimal when compared to the K-Means method for the cluster were originally independent of each other, becoming a set of new analysis process [8]. Similar to the research conducted by Dwi variables that have a strong correlation and number fewer than the Harid Setiadi, in the application of ensemble ROCK method for original variable [13]-[15]. The first step is to test the assumption mapping disadvantaged regions, it proved to be more optimal of the adequacy of the variables to be processed using the Kaiser when compared to the SWFM ensemble method [9]. Then Meyer Olkin Measure of Sampling (KMO) and the Barlett Test. Alvionita compared the SWFM and ROCK methods for grouping If the KMO value is more than 0.5, then it has fulfilled the variable orange accessions. In that study, it was found that the ROCK adequacy requirements. So that the data is enough to be factored. method had better grouping performance than the SWFM method While the Hypothesis test for the Barlett test is as follows: [10]. Therefore, in this study, researchers will evaluate disadvantaged regions in East Java based-on indicators Ministry 퐻0 : The partial correlation formed from the data is not enough to of Villages, Development of Disadvantaged Regions, and be factored Transmigration using ensemble ROCK method. 퐻1 : The partial correlation formed from the data is enough to be 2. Related Works factored

In the last few years evaluation of disadvantaged regions has If 푠푖푔 < 훼(푎 = 0.05) , then 퐻0 is rejected. So it can be been carried out, including Anik Djuridah in his research concluded that the partial correlation formed from the data is evaluating the status of disadvantaged regions using Discriminant sufficient to be factored [16], [17]. analysis [11]. In that research, it was only determined the number of indicators that influence the determination of the status of being 3.2. K-Means left behind from an area, without being known with certainty K-Means Clustering method is a method that partition data which regions are included in the group of disadvantaged regions into K groups, where K is the number of groups determined by and not. Similar to the research conducted by Satria, Herman, and the researcher. In this research, numerical data will be clustered Fajar who analyzed the development of disadvantaged regions in using K-Means. The K-Means algorithm is as follows [17], [18]: East Java using Location Quotient dan Shift Share Esteban Marquillas analysis [12]. In that study, it was only used the GRDP a. Determine the desired number of clusters (Gross Regional Domestic Product) variable. b. Determine the initial centroid randomly as much as 푘 Furthermore, Dwi Hariadi Setiadi in his final project was mapping the District/City of disadvantaged regions using the c. Determine the closest distance from each observation object Ensemble Similarity Weight And Filter Method (SWFM) and to the cluster center which is determined using euclidean Robust Clustering Using Link (ROCK) [9]. In that study, distance as follows: researchers only used 5 criteria and 13 indicators. The five indicators are infrastructure, regional characteristics, economy, 푛 2 (1) human resources (HR), and regional financial capacity, without 푑(푥푖, 푥푗) = √∑ (푥푖푘 − 푥푗푘) including accessibility criteria. Whereas in the Government 푘=1 Regulation Ministry of Villages, Development of Disadvantaged Regions, and Transmigration listed in Law No. 78 of 2014 and where explained in Presidential Regulation No. 131 of the year 2015 푑(푥푖, 푥푗) : Distance between two objects of 푖 and 푗 www.astesj.com 194 L. Wulandari et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 5, 193-200 (2020)

푥푖푘 : The value of 푖 object in the 푘 group 푛 : Total number of objects under observation 푥푗푘 : The value of 푗 object in the 푘 group 푛푘 : Number of members in the k group 푥 : The value of i object in the j variable d. Determine the average value of each cluster as follows: 푖푗 푥푗̅ : Average of all j variable 푥푖푗ℎ : The value of i object in the j variable, and h group 푥1푗 + 푥2푗 + ⋯ + 푥푛푗 푇 = (2) 푥̅ : Average of all j variable, and h group 푘푗 푛 푗ℎ 3.3. K-Modes where K-Modes is the development of the K-Means method 푇푘푗 : The average value of the k cluster on the j variable specifically used to handle categorical data type cases [20], [18]. 푛 : Amount of data This method has an efficient algorithm based on frequency to find modes [21], [18]. e. Determine the new centroid closest distance using euclidean distance using (1) Several modifications to the K-Modes method are accommodated from the K-Means method, as follows: f. If it doesn’t get the right result, then return to the calculation in step b a. The distance of two data points between X and Y is the number of features found in X and Y. Measuring the similarity The optimum grouping validation uses R-Square and Pseudo between objects X and Y is given by: F-statistic values. The optimum number of groups can be shown by the highest R-Square and Pseudo F-statistics values [19] . Pseudo F-Statistics values can be calculated by: 푒 (8) 푑(푥, 푦) = ∑ 훿 ∈ (푋푗 , 푌푗) 푗=1 푅2 ( ) 푘 − 1 푃푠푒푢푑표 퐹 − 푆푡푎푡푖푠푡푖푐푠 = (3) where 1 − 푅2 ( ) 푛 − 푘 푒 : Number of Features

훿 ∈ (푋푗, 푌푗): Matching value, the value is based on: where the value of 푅2 is

0 (푋푗 = 푌푗) 푆푆퐵 훿(푋, 푌) = { (9) 푅2 = (4) 1 (푋푗 ≠ 푌푗) 푆푆푇

The R-Square calculation involves several diversity data b. Change the means value (average) to mode value (modes) calculations, they are total diversity, diversity within groups, and c. In searching for mode values, data frequency is used. The diversity between groups [10]. The value of diversity can be centroid point is obtained from each feature’s mode. calculated by: The validation method to find out the most optimum grouping in categorical data uses the calculation of the value of r is given 푆푆퐵 = 푆푆푇 − 푆푆푊 (5) by:

푚 푛 푘 2 1 푆푆푇 = ∑ ∑(푥푖푗 − 푥푗̅ ) (6) 푟 = ∑ 푞ℎ (10) 푗=1 푖=1 푛 ℎ=1

푘 푚 푛푘 where 2 (7) 푆푆푊 = ∑ ∑ ∑(푥 − 푥̅ ) 푖푗ℎ 푗ℎ 푛 : The number of observations ℎ=1 푗=1 푖=1 푞ℎ : The highest number of objects (dominance) in the h- group with (ℎ = 1,2, . . . , 푘).

3.4. Ensemble ROCK where 푆푆푇 : Sum of Square Total The ensemble ROCK method uses the concept of a link that is 푆푆푊: Sum of Square Within Group used to measure the similarity and closeness that occurs at a pair 푆푆퐵 : Sum of Square Between Group of data points [22], [23] and [4]. Here are the steps of clustering 푚 : The number of numerical variables in the observation data by using ensemble ROCK method: 푘 : The number of groups www.astesj.com 195 L. Wulandari et al. / Advances in Science, Technology and Engineering Systems Journal Vol. 5, No. 5, 193-200 (2020) a. Calculate 푠푖푚(푋푖, 푋푗) as a measurement of similarity as 푝 folllows: 푘 (18) 푛.ℎ 1 2 푆푆푊 = ∑ ( − ∑ 푛푖ℎ) 2 2푛.ℎ ℎ=1 푖=1 푋푖 ∩ 푋푗 (11) 푘 푛 푠푖푚(푋푖, 푋푗) = , 푖 ≠ 푗 푛 1 1 2 푋푖 ∪ 푋푗 = − ∑ ∑ 푛푖ℎ 2 2 푛.ℎ ℎ=1 푖=1 where

푋푖 : The i group observation group 푘 푛 푛 1 1 1 푋푗 : The j group observation group 2 2 푆푆퐵 = (∑ ∑ 푛푖ℎ) − ∑ 푛푖. 2 푛.ℎ 2푛 (19) b. Determine Neighbors by calculating the link value as follows: ℎ=1 푖=1 푖=1

(12) where 푙푖푛푘(퐶푖, 퐶푗) = ∑ 푙푖푛푘(푋푖, 푋푗) 푋 ,푋 푖∈퐶푖 푗∈퐶푗 푛푖ℎ : The number of observations in the i category, and h group, with ℎ = 1,2,3, … , 푘 푛 c. Calculate the Goodness measure value 퐺(퐶푖, 퐶푗) as follows: 푛.ℎ = ∑푖=1 푛푖ℎ : The number of observations in the h group 푘 푛푖. = ∑ℎ=1 푛푖ℎ : The number of observations in the i category 푙푖푛푘(퐶푖, 퐶푗) (13) 퐺(퐶푖, 퐶푗) = 4. Research Method 1+2푓(휃) 1+2푓(휃) 1+2푓(휃) (푛푖 + 푛푗) − 푛푖 − 푛푗 4.1. Research data

where 푙푖푛푘(퐶푖, 퐶푗) are the number of links of all possible pairs The research data were obtained from the Central Statistics Agency (BPS) of East Java with the website address of objects contained in 퐶푖 and 퐶푗, and 푓(휃) is the threshold 1−휃 https://jatim.bps.go.id/. The data consists of survey data in 2014 function obtained, with 푓(휃) = where 휃 (0 < 휃 < 1) is a 1+휃 and 2018. Data in 2014 as a comparison with the determination of random threshold value determined by the researcher. disadvantaged regions in 2015 by the Ministry of Villages, d. Compare the results of clustering from each threshold