International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

Mining Data Analysis Using CRISP-DM to Implement Successfully Multipurpose Financing at Astra Credit Companies (ACC) Branch Bogor

Erwin Husein1, Emil. R. Kaburuan*2, Mauritsius Tuga3 Information Systems Management Department, BINUS Graduate Program – Master of Information Systems Management, Bina Nusantara University, Jakarta, Indonesia 11480 [email protected], [email protected], [email protected]

Abstract The growing need for life to encourage the emergence of a finance company. This is due to financing has become one of the solutions to help meet the needs of a person's life. Astra Credit Companies (ACC) is a finance company that provides financing in the form of a multipurpose loan. To optimize financing made by the company, required an analysis of customer segmentation. It is intended to allow better targeted financing deals. The focus in this study is used to finance the purchase of secondhand vehicles types. The analysis was conducted by classifying customers based on job data, vehicle type, the installment, the tenor, credit quality and payment. The technique used is the technique of applying clustering with K-means clustering algorithm. The method used is the CRISP- DM. Results from this study is the formation of three groups where each group has its own characteristics. To make offers multipurpose financing through guarantees manifold motor vehicle reg car, the company can offer to potential customers who already do segmentation so easy to do marketing.

Keywords: Financing, Clustering, K-means clustering, CRISP-DM

1. Introduction The growing need for life to encourage the emergence of a finance company. This is due to financing has become one of the solutions to help meet the needs of a person's life. Astra Credit Companies (ACC)Astra Credit Companies (ACC) is a finance company Astra, which consists of a combination of four finance company Astra, PT Astra Sedaya Finance, PT Swadharma Bhakti Sedaya Finance, PT Astra Auto Finance, PT Staco aesthetic Sedaya Finance as well as a company engaged in the field of services billing, PT Pratama Sadya Sadhana, ACC stands at ACC stands at 15 July 1982. ACC Network spread across almost all major cities in Indonesia. The ACC currently has 75 branch offices in 59 cities in Indonesia and is still growing. With many branches and a network owned by the ACC, we focus group research on one of the branches from 75 branches owned ACC today. Branch that we select must be representative of the large branches owned by ACC, with the following criteria: • The branch had OSA (Outstanding Amount) over one trillion rupiah. • The branch has several customers is now over ten thousand customers. • The branch has several employees over 50 people. Multipurpose product is part of a financing product that is owned by ACC. Multipurpose product allows customers to get funds directly to ensure only their four- wheeled vehicle, with the intended use of the funds for any purpose. The loan term up to 36 months (3 years) with a loan amount ranging from only 20 million rupiah up to

ISSN: 2005-4238 IJAST 4497 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

hundreds of millions of rupiah per unit funded, multi-purpose products can penetrate the numbers up to 4,000 units per month financing. To optimize financing made by the company, required an analysis of customer segmentation. It is intended to allow better targeted financing deals. Clustering is one of data mining techniques that can be used to perform customer segmentation analysis at Astra Credit Companies (ACC). Grouping customers based on the similarity criteria, customers have many similarities will be grouped into one cluster, while a different customer to put in another cluster. The algorithms used in this study is a k-means clustering algorithm. K-means clustering algorithm is simple and effective algorithm to find clusters in the data [1]. The model used is the Cross-Industry Standard Process for Data Mining (CRISP-DM), which consists of six stages, namely business understanding, understanding the data, the data preparation, modeling, evaluation, and deployment. In his research entitled Credit Risk Analysis Motorcycles on PT.X Finance with a case study branches Gresik and Lamongan), [2] using the k-means clustering algorithm to determine the indicators that influence the credit risk. In addition, the k-means clustering algorithm also used for grouping customers based on indicators of age, employment sector, percent DP, income, type of motor, the motor condition, and the price on the road (OTR). K-means cluster algorithm is also used by [3] in his research entitled Analysis Segmentation, Targeting, Positioning Car Financing at. Adira Dinamika Multifinance, Tbk. Manado branch, In this research, customer segmentation analysis will be done Astra Credit Companies (ACC) in Bogor branch using k-means clustering algorithm. The analysis is focused on the financing of the type of motor vehicle purchases. The analysis was conducted by classifying customers based on indicators of age, the number of installments per month, income, tenor, pricing On the Road (OTR), the down payment, type of job, home status, and type of vehicle. With their customer segmentation analysis is expected to facilitate the process of financing deals. Financing offer can be aimed at prospective customers in accordance with the segment.

2. Literature Review A. Data Mining Data mining is the process of finding a correlation, patterns, and trends that have meaning. The discovery process is done by sorting large amounts of data stored in the repository using statistical pattern recognition technology and engineering and math [1]. According to [1], there are six functions in data mining, which is a function description, estimation, prediction, classification, clustering, and association. Below is an explanation of each function that exist in data mining: 1) Function Description (Description) The results of data mining models can describe the pattern as much as possible clear and in accordance with the interpretation and explanation of the intuitive. 2) Function Estimation (Estimation) Estimation function like the function classification, but the difference is in the target variable estimation function not form a category, which is numeric. Estimation function is used to estimate the unknown of a set of data such as the average population and the population variance. 3) Function Prediction (Prediction)

ISSN: 2005-4238 IJAST 4498 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

Prediction function like the function classification and estimation, but the difference prediction is used to estimate the result of things that have happened based on existing data. 4) Function Classification (Classification) In the classification functions, such as target variable categories that are usually partitioned into several classes such as low grade, medium, and high. Data mining models to test the large data sets where each data contains information about the target and predictor variables. In the new data, the algorithm classified the data is based on knowledge gained from the results of the test data set. 5) Function Grouping (Clustering) Grouping function is a function that performs grouping data, observations, and a case into a class with characteristics similar objects. Grouping function is used to categorize the entire data into a relatively homogeneous group in which the similarity of data within the group is maximized while existing outside the group is minimized. 6) Functions Association (Association) Association function is used to find the rules to measure the relationship between two or more attributes. The rules are in the form of if-then and the level of support (support) and the level of trust (confidence) associated with these regulations.

B. K-means Clustering Algorithm K-means algorithm is used to assess the quality of the partition so that objects that are in one group are like each other but have no resemblance to the object in the other group. K-means algorithm is an objective function that aims for a high commonality and similarity intra inter group low. Grouping using centroid-based partitioning technique in which the centroid of each group represents the characteristics of the group. Conceptually, the centroid is the center point. Centroid can be defined in various ways, such as by calculating the mean or medoid of objects or points entered the group. Differences between objects are then measured by calculating the Euclidean distance between two points. According to [1,4] there are four steps in the k-means clustering algorithm. The following is an explanation of the four steps: 1) Determine the number of groups. 2) Selecting data records as many as the number of groups at random as the initial centroid value. 3) Determining the center point (centroid) which is the nearest group and enter the data record as a member of the group nearest its center point. This step is performed to all the existing data record. After that, calculate the ratio between the mount of Variation Between Cluster (BCV) with Within Cluster Variation (WCV). If the ratio calculation result is greater than the previous rate, then the algorithm proceeds to the next step. But if not, then the algorithm is stopped. 4) Renewing the centroid by calculating the average of the data in each group and repeat step 3. Calculation BCV represented by centroid Euclidean distance between groups. If the object (p1, p2, p3, ..., pn) has three attributes, namely x, y, and z, and these objects will be grouped into three groups (C1, C2, C3) with each centroid- it is m1, m2 and m3. Thus, the calculation of Euclidean distance can be seen in equation (1) and the calculation of BCV can be seen in equation (2).

ISSN: 2005-4238 IJAST 4499 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

WCV calculation represented by the Sum of Squared Errors (SSE) or the sum of squared errors between the object at its center. If the Calculation group BVC can be seen in equation (3) where k is the number of groups, Ci is the i-th group, and mi is a group centroid to i. BCV with WCV ratio can be seen in equation (4).

3. Research Method The model used in this study is Cross-Industry Standard Process for Data Mining (CRISP-DM). Model CRISP-DM was developed in 1996 by analysts representing the Daimler Chrysler, SPSS, and NCR [8,9,10]. Pictures of the model CRISP-DM can be seen in Figure 1.

Figure 1. Phase-Phase model CRISP-DM [4,5,6,7]

The following is an explanation of the stages that exist in the model CRISP-DM: 1) Business Understanding Phase (Phases Understanding Business) Understanding stage business focus is to understand the purpose of research from a business perspective. The destination will be used for data mining describe the problem which will then be planned to achieve those goals. The measures contained in this stage is to define business objectives, assess the situation, determine business objectives, assess the situation, determine the goal of data mining, and create a project plan.

ISSN: 2005-4238 IJAST 4500 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

2) Data Understanding Phase (Phases Understanding Data) This stage begins with initiating the collection of data. After that, an analysis to improve the understanding of the data, identify data quality, finding initial knowledge in the data, or to detect interesting subsets to construct hypotheses about hidden information. The measures contained in this stage is to collect data, describing data, and ensure the return of data quality. 3) Data Preparation Phase (Phases Data Preparation) Data preparation phase covers all activities related to the development of data that will be used for modeling. The measures contained in this stage is to choose the data, clearing data, build the data, integrate data and perform data format. 4) Modeling Phase (Phase Modeling) In this stage, the selection and application techniques or algorithms that have been selected. The measures contained in this stage is to choose the modeling techniques, perform the learning phase, build models, and assess models. 5) Evaluation Phase (Phase Evaluation) Prior to stage the final deployment of the model was made necessary to evaluate and review whether the models are made in accordance with business objectives. After the evaluation, the results of data mining can already be used. The measures contained in this stage is to evaluate the results, review the models that have been made, and determine next steps. 6) Deployment Phase (Phases Deployment) At the stage of deployment, the knowledge gained from the results of data mining should be organized and presented in a form where the user can use it. The measures contained in this stage is to create a deployment plan, make a final report, and re-evaluate the project.

4. Results and Discussions A. Understanding Business The business processes that need to be understood in this study relates to financing the purchase of motor vehicles in Astra Credit Companies (ACC) in Bogor branch. The focus is on products Multipurpose financing. The selection of the appropriate ACC Bogor branch because of previous branch selection criteria, namely: • The branch had OSA (Outstanding Amount) over one trillion rupiah. • The branch has several customers is now over ten thousand customers. • The branch has several employees over 50 people. Astra Credit Companies (ACC) Branch Bogor, located at Jln. Raya Pajajaran No. 24 Babakan, Bogor Tengah, Bogor City, West Java, Bogor 16128. ACC branch office established since 1987, this year has been 32 years in operation. Class ACC Bogor branch is AB, the total OSA (Outstanding Amount) ACC Bogor amounting to Rp. 1.142 trillion. Total Customer now counted: 10.836 customer does not include customers who already paid off, with the number of employees who owned as many as 104 employees ACC Bogor. ACC Bogor business objectives of today want to increase product sales target of 100 units Multipurpose with financing every month. The purpose of data mining is to predict how many of the candidates who will do the funding back to using multipurpose products

ISSN: 2005-4238 IJAST 4501 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

in ACC Bogor, derived from active customer database that will be paid off and that has paid off. B. Understanding Data The data used in this study is the customer data Astra Credit Companies (ACC) Branch Bogor. With the number of customer databases owned by ACC Bogor either already settled or still be active customer. ACC supposed Bogor has no difficulty in selling the Multipurpose products. The amount of data managed team we get as much as 2.104 which is a database of customers who already paid and will be paid within three months. We get the data team had 24 columns by 2,104 rows. So, a lot of information from the customer to us though as data mining in doing research. We perform data mining customer interest towards multipurpose products taken from Data Works, Vehicle Type, installment, Tenor Time, Credit Quality and DP.

Figure 2. Customer Data ACC Bogor branch C. Data Preparation At the stage of data preparation, election data has been collected at the stage of understanding the data. Selection is made by selecting attributes so that the attributes used are the Employment, vehicle type, installment, Masa Tenor, Credit Quality and DP. Once it is done cleaning the data, which is removing unnecessary data, repair damaged data by deleting records attribute data contained empty or fix it, and do uniformity of data for consistency. Examples of data used can be seen in Figure 2. To simplify the modeling process, then all the data that nominal manifold needs to be done to provide initial data transformation based on frequency data on each attribute. Data that has the highest frequency that will be the initial one, the second most frequently will be given the initials 2, and so on. Initialize the job attribute data can be seen in Table 1. Table 1. Customer Data ACC Bogor branch that has been cleaned

ISSN: 2005-4238 IJAST 4502 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

To simplify the modeling process, then all the data that nominal manifold needs to be done to provide initial data transformation based on frequency data on each attribute. Data that has the highest frequency that will be the initial one, the second most frequently will be given the initials 2, and so on. Attribute data initialization work can be seen in Table 2 to Table 7. Table 2. Data Initialization Works PEKERJAAN FREKUENSI INISIAL KARYAWAN SWASTA 874 1 PENGUSAHA 419 2 TIDAK BEKERJA 229 3 PEGAWAI NEGERI SIPIL 206 4 PEKERJA MANDIRI 189 5 PERUSAHAAN SEDANG 48 6 ANGGOTA MILITER 44 7 PEGAWAI BUMN/BUMD 22 8 PERUSAHAAN BESAR 19 9 PEGAWAI SWASTA 15 10 PERUSAHAAN PERORANGAN 9 11 PERUSAHAAN KECIL 7 12 WIRASWASTA 7 13 IBU RUMAH TANGGA 6 14 KARYAWAN ACC 2 15 PELAJAR 1 16 PENSIUNAN 1 17 POLISI 1 18 PROFESI 1 19 SUPIR 1 20

Table 3. Initialization Data Vehicle type

JENIS KENDARAAN FREKUENSI INISIAL GRAND AVANZA 239 1 AYLA 224 2 TOYOTA AGYA 175 3 DAIHATSU GRANMAX 158 4 TOYOTA CALYA 124 5 DAIHATSU GREAT XENIA 107 6 TOYOTA ALL NEW AVANZA 88 7 82 8 TOYOTA RUSH 79 9 76 10 TOYOTA ALL NEW INNOVA 49 11 DAIHATSU XENIA (AIR BAG) 46 12 (AIR BAG) 41 13 DAIHATSU XENIA 37 14 TOYOTA KIJANG INNOVA 34 15 TOYOTA YARIS 28 16 HONDA JAZZ 27 17 TOYOTA NEW DYNA 25 18 TOYOTA SIENTA 23 19 TOYOTA ALL NEW YARIS 22 20 TOYOTA ETIOS 20 21 DAIHATSU LUXIO 19 22

Table 4. Initialize Data Installment

ISSN: 2005-4238 IJAST 4503 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

ANGSURAN FREKUENSI INISIAL 1.000.000-5.999.999 1869 1 6.000.000-10.999.999 202 2 11.000.000-15.999.999 28 3 >=21.000.000 3 4 16.000.000-20.999.999 2 5 Table 5. Data Initialization Tenor (Old Installment) TENOR FREKUENSI INISIAL 48 1063 1

36 463 2 60 453 3 24 100 4 12 14 5 72 11 6 Table 6. Initialization Data OTR (On the Road) OTR FREKUENSI INISIAL 50.000.000-199.999.999 1548 1 200.000.000-399.999.999 500 2 400.000.000-599.999.999 51 3 >=600.000.000 3 4 <50.000.000 2 5

Table 7. Data Initialization Down Payment (DP)

D. Data Mining Modelling At this stage there are three processes, namely: • Select modeling technique is a first step in modeling, using modeling techniques that have been assigned to the business understanding phase. • Build a model that is run in accordance with the procedures of modeling tools. • Generate test design is to test the quality and validity of the model using a dataset that has been prepared. The method used for modeling data mining in this study is a k-means clustering algorithm with the number of groups by 5 groups. Applications used is Weka application version 3.8. The results of customer data grouping Astra Credit Companies (ACC) of the Bogor branch can be seen in Figure 4.

Figure 3. Customer Data ACC Bogor branch in .csv format

ISSN: 2005-4238 IJAST 4504 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

Figure 4. Grouping Bogor branch ACC Data Using K-Means Clustering

E. Evaluation Evaluation of the results of data mining needs to be conducted to determine whether the modeling results are in accordance with the goals set at the stage of business understanding. Here is the evaluation of the results of data mining customer segmentation at Astra Credit Companies (ACC) in Bogor branch: a. The first group consisted of 281 members. Customers are classified in the first group is the customer who has a job as a private employee, with the previous vehicle ownership is Toyota Calya, with long tenor financing or 37.40 months, which have between 1 million installments up to 5.9 million with an advance or DP above 36%, and vehicles owned has a price range of 50 million rupiah up to 199 million rupiah. b. The second group consists of 432 members. Customers are classified in the first group is the customer who has a job as a private employee, with the previous vehicle ownership is , tenor or duration 47 months financing, which has between 1 million installments up to 5.9 million with an advance or DP which has a range of 15% - 20.9%, and vehicles owned has a price range of 50 million rupiah up to 199 million rupiah. c. The third group consists of 387 members. Customers are classified in the first group is the customer who has a job as a private employee, with the previous vehicle ownership is Daihatsu GranMax, with long tenor financing or 44.56 months, which have between 1 million installments up to 5.9 million with an advance or DP which has a range of 15% - 20.9%, and vehicles owned has a price range of 50 million rupiah up to 199 million rupiah. d. The fourth group consists of 288 members. Customers are classified in the first group is the customer who has a job as a private employee, with the previous vehicle ownership is the Toyota Grand Avanza, with long tenor financing or 43.75 months, which have between 1 million installments up to 5.9 million with an advance or DP which has a range of 21% - 25.9% and owned vehicles have a price range of 50 million rupiah up to 199 million rupiah. e. The fifth group consists of 385 members. Customers are classified in the first group is the customer who has a job as a private employee, with the previous vehicle ownership is the All New Toyota Avanza, with long tenor financing or 55.32 months, which have between 1 million installments up to 5.9 million with money advance or DP which has a range of 21% - 25.9%, and vehicles owned has a price range of 50 million rupiah up to 199 million rupiah. F. Deployment At this stage of deployment or deployment there are two types of processes, namely: • Deployment plan which describes an overview of the plan to the preparation of reports to be made. • Produce final report namely to provide visualization of reports that have been made based on the deployment plan. In this study, the CRISP-DM method performed only at the stage of evaluation. Not done due to the deployment phase of this research focused only on the financing of motor vehicles so that should be an analysis of other types of financing prior to a deployment.

5. Conclusions and Recommendations

ISSN: 2005-4238 IJAST 4505 Copyright ⓒ 2020 SERSC

International Journal of Advanced Science and Technology Vol. 29, No. 5, (2020), pp. 4497 - 4506

Steps CRISP-DM does not explicitly mention the dataset as submissions to the task of preparing the data at any stage, but the dataset should be archived and documented. The dataset will not correspond exactly with the existing tasks at each stage, but information about the data used must be included in each report must be submitted. Data mining can create a business environment that intelligence, namely efficiency in obtaining the necessary information, provide a special tool for the analytical capabilities are much better, reducing the strain on the operating system to create reports and activity analytics, as well as provide the ability for users within the scope of business process. These results indicate that there have been customer segmentation Astra Credit Companies (ACC) in Bogor branch using k-means clustering algorithm. Customer data are divided into five groups where each group has its own characteristics. To perform a multipurpose product offerings guarantee the disbursement of funds from various motor vehicle reg car, the company can offer to potential customers who have a job as a private employee with a price range of vehicles 50 million rupiah up to 199 million rupiah, with a brand or brand Toyota or Daihatsu. It can be added or improved for future research is to analyze the data mining for other types of financing that the entire financing of marketing is right on target.

References

[1] Larose, DT. Discovering Knowledge in Data: An Introduction to Data Mining. New Jersey: John Wiley & Sons, Inc.Pal, NR, & Jain, LC (2005). Advanced Techniques in Knowledge Discovery and Data Mining. New York: Springer. 2014. [2] Rozi, Dwi JF and John, Kresnayana. (2013). "Credit Risk Analysis Motorcycles on PT.X Finance (Case Study Regional Branch Office Gresik and Lamongan)". Journal of Science and Arts Pomits Vol.2, No.2, 2013. [3] Kembuan, Precylia C., et al. (2014). "Analysis Segmentation, Targeting, and Positioning Car Financing in PT.Adira Dynamics Multifinance Tbk Branch Manado". EMBA Journal Vol.2, No.3, September 2014. [4] Bramer, M. (2007). Principles of Data Mining. London: Springer-Verlag. [5] Gorunescu, F. (2011). Data Mining - Concepts, Models, and Techniques. Berlin: Springer- Verlag Berlin Heidelberg. [6] Han, Jiawei, et al. (2011). Data Mining Concepts and Technique 3rd ed. USA: Morgan Kaufmann Publishers. [7] Kantardzic, M. (2011). Data Mining: Concepts, Models, Methods, and Algorithms. Hoboken, New Jersey: John Wiley & Sons, Inc. [8] Shafique, U., & Qaiser, H. (2014). International Journal of Innovation and Scientific Research. A Comparative Study of Process Models Data Mining (KDD, CRISP- DM and SEMMA), 12 (1), 217-222. [9] Tan, P.-N., Steinbach, M., & Kumar, V. (2005). Introduction To Data Mining. New York: Pearson. [10] Yan, J., Zhang, C., Zha, H., Gong, M., Sun, C. Huang, J., et al. (2015). Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. On Machine Learning towards Predictive Analytics Sales Pipeline, 1945-1951.

ISSN: 2005-4238 IJAST 4506 Copyright ⓒ 2020 SERSC