An Effective SVM Algorithm Based on K-Means Clustering

2632 JOURNAL OF COMPUTERS, VOL. 8, NO. 10, OCTOBER 2013 K-SVM: An Effective SVM Algorithm Based on K-means Clustering Yukai Yao, Yang Liu, Yongqing Yu, Hong Xu, Weiming Lv, Zhao Li, Xiaoyun Chen* School of Information Science and Engineering, Lanzhou University, Lanzhou, China, 730000 Email: [email protected] [email protected] [email protected] {hxu11, lvwm10, lizh10, chenxy}@lzu.edu.cn Abstract—Support Vector Machine (SVM) is one of the most which supposed has the same statistical distribution as the popular and effective classification algorithms and has training data. The main classification algorithms include attracted much attention in recent years. As an important Decision Tree, Bayes, Neural network, Support Vector large margin classifier, SVM dedicates to find the optimal Machine (SVM), etc. separating hyperplane between two classes, thus can give Support Vector Machine (SVM) [1-2] is one of the outstanding generalization ability for it. In order to find the optimal hyperplane, we commonly take most of the labeled most popular and effective algorithms in machine records as our training set. However, the separating learning. SVM is based on the structural risk hyperplane is only determined by a few crucial samples minimization criterion and its goal is to find the optimal (Support Vectors, SVs), we needn’t train SVM model on the separating hyperplane where the separating margin whole training set. This paper presents a novel approach should be maximized. This approach improves the based on clustering algorithm, in which only a small subset generalization ability of the learning machine and solves was selected from the original training set to act as our final some problems like non-linear, high dimension data training set. Our algorithm works to select the most separation and the classification issue that lacking of informative samples using K-means clustering algorithm, prior knowledge. SVM is used in systems for face and the SVM classifier is built through training on those selected samples. Experiments show that our approach recognition [3-4], road sign recognition and other similar greatly reduces the scale of training set, thus effectively application areas [5] because of its sound theoretical saves the training and predicting time of SVM, and at the foundation and good generalization ability in practical same time guarantees the generalization performance. application. SVM works well in both linear and non-linear Index Terms—SVM model, K-means clustering, Kernel conditions, and finding the optimal separating hyperplane function, predict is the key to separate data. For non-linear situations, SVM exploits the kernel trick to map low-dimension data into high dimension feature space. In the practical I. INTRODUCTION application, SVM makes use of all the labeled data to find With the rapid development of pattern recognition and the separating rule, but training on large scale data can machine learning, a lot of data mining algorithms have bring with higher computation cost. In order to decrease been proposed by experts, through which researchers can the computational complexity, the solution that can be find much interesting hidden information from the exploited includs two species, one is to improve the observational data, classification information is one of the algorithm itself, such as the Least Square SVM [6-7], the most important ones in it, which can be used to recognize, SMO [8] (Sequential Minimal Optimization) under semi- predict or classify those current unseen data. Generally, positive definite kernel; the other is to decrease the machine learning algorithms can be divided into three number of input vectors. categories: Supervised Learning algorithm, Unsupervised The main task of clustering [9] is to group the objects Learning algorithm, and Semi-supervised Learning into clusters, objects in the same cluster are more algorithm. The common supervised learning algorithms semblable than those in different clusters. It can find the include regression analysis and classification analysis. relationships among data objects in an unsupervised way. Data classification process includes two stages: the first A lot of clustering algorithms have been proposed and one is the learning stage, the aim of which is to build a improved, aiming to enhance the efficiency and accuracy. classifier through analyzing the labeled data; the second According to cluster mode, clustering algorithms can be one is the predicting stage, which using the established categorized into: centroid-based clustering, hierarchical model for predicting. The model should have enough clustering, distribution-based clustering and density- generalization ability, i.e., that the model not only has based clustering, etc. good classification performance on the training data, but In this paper, we present an efficient approach for also has a high classification accuracy for the future data, minimizing the sample space. The final training data are © 2013 ACADEMY PUBLISHER doi:10.4304/jcp.8.10.2632-2639 JOURNAL OF COMPUTERS, VOL. 8, NO. 10, OCTOBER 2013 2633 selected from the clustering result; they have massive value of ||w||2 . Generally, we solve this constrained information and important contribution for building SVM model, and can effectively decrease the scale of training optimization problem with Lagrange multipliers. We set on the premise of guaranteeing the accuracy of the construct the following Lagrange function with formula 6: n model. 1 T Lwb(,,)αα= ww−+− [ y ( wx b )1] (6) The rest of this paper is organized as follows: section 2 ∑ ii i II is a brief introduction of SVM and K-means clustering. i=1 The details of the improved algorithm are described in Easily, we can obtain the following formula 7: section III. Section IV is mainly about the experiments. n Section V is the conclusion of this paper. wyx= ∑αiii i=1 II. RELATED WORK (7) n α y = 0 A. Support Vector Machine ∑ ii i=1 In cases of linear separable and binary classification, , and we substitute the formula 7 into the Lagrange the goal of SVM is to find an optimal hyperplane [10] function 6 and then get the corresponding dual problem, which can separate the two classes obviously with a which is described as formula 8: maximal separating margin. The margin is defined as the nnn 1 T geometrical distance of blank space between the two maxWyyxx (αα ) =−iijijij αα α ∑∑∑ species. The greater of the margin, the better of the iij===1112 generalization ability of SVM classifier. The boundaries n of this margin are hyperplanes paralleling with the st..∑αii y = 0 (8) separating hyperplane. i=1 Supposed a training set X ={x ,x ,...,x } and the 12 n αi ≥=0,in 1,..., corresponding label setY = {y12 ,y ,...,y n } , a sample can And the Karush-Kuhn-Tucker complementary be expressed as the expression below: conditions are formula 9: d αii[(ywxb i+ )1]0,−= {(xii ,y )} , xi ∈ R , yi ∈{+1,-1} , i ∈{1,2,...,n} , (9) where d is dimension number of the input space, n is in=1,..., Consequently, we can distinguish the Support Vectors the number of samples. For standard SVM, w and b are the weight vector and bias of the optimal hyperplane, (SVs) from the other vectors. These vectors whose αi s respectively. The separating function can be written as are nonzero can be called SVs, they are used to determine formula 1. the optimal separating hyperplane. wx+ b = 0 (1) The dual problem (the original problem are called We can modulate formula 1, so as to let samples in the Primal Problem.) in formula 8 is a typical convex dataset meet the following requirements in formula 2 and quadratic programming optimization problem. After formula 3: * determined the Lagrange multipliersαi , we can get the wxi +≥+ b 1, where w through equation 7. n yi =+1. (2) ** wyx= ∑αiii (10) wxi +≤− b 1, where i=1 y =−1. (3) According to the information of SVs, we also can i * and the boundary function of the separating margin can compute the b , then the optimization hyperplane be defined with formula 4: function can be obtained. **T wx+= b 1, bwx=−1 SV for (4) wx+= b -1. ySV =+1 (11) For all the data points in the training set, they must But in practical applications, we could not find a clear satisfy the following constraints (Linear Separable separating hyperplane to differentiate the data, because of Problem): the complexity of dataset. In such conditions, we allow a few samples existing in the wrong side of the separating yii(wx+ b)≥ 1 (5) hyperplane, and accordingly, the maximal margin According to the definition of separating margin and classifier in this pattern is so called Soft Margin SVM the formulas of the separating hyperplane and separating [11-12]. The constraints become: boundaries, it is not hard to get the value of the (12) classifying margin, which is 2/||w|| . Maximizing the yi (wxii+ b)≥ 1-ξ value of the separating margin is equal to minimizing the © 2013 ACADEMY PUBLISHER 2634 JOURNAL OF COMPUTERS, VOL. 8, NO. 10, OCTOBER 2013 where ξ is called slack variable. The idea of “Soft n i f (x)= sgn(α y K(x ,x)+ b) (17) Margin” aims to improve the generalization ability of ∑ ii i i1= SVM. In order to maximize the margin, the optimization B. K-means problem is equivalent to a quadratic programming Clustering is aims to divide the data into groups. And problem [12], which can be expressed with formula 13: each group is constructed by similar data, in other words, n 1 2 it means that the similarity between dates in the same min ||w|| + C ξi group is smaller than others. wb, ∑ 2 i=1 K-means is a clustering algorithm in data mining field. st. yii (wx+≥ b) 1-ξ i ,ξi ≥ 0, i = 1,2,..., n (13) It is used to cluster analysis, and has a high efficiency on data partition especially in large dataset.

An Effective SVM Algorithm Based on K-Means Clustering

No-Decision Classification: an Alternative to Testing for Statistical

Comparison of Machine Learning Techniques When Estimating Probability of Impairment

Binary Classification Models with “Uncertain” Predictions

Social Dating: Matching and Clustering

How Will AI Shape the Future of Law?

Gaussian Process Classification Using Privileged Noise

Binary Classification Approach to Ordinal Regression

Analysis of Confirmed Cases of COVID-19

Evidence Type Classification in Randomized Controlled Trials Tobias Mayer, Elena Cabrio, Serena Villata

Estimating the Operating Characteristics of Ensemble Methods

Binary Classification and Logistic Regression Models Application to Crash Severity

Benchmarking Machine Learning Models for the Analysis of Genetic Data Using FRESA.CAD Binary Classiﬁcation Benchmarking