Support Vector Machines—The Interface to Libsvm in Package E1071

Support Vector Machines * The Interface to libsvm in package e1071 by David Meyer FH Technikum Wien, Austria [email protected] September 16, 2021 \Hype or Hallelujah?" is the provocative title used by Bennett & Campbell (2000) in an overview of Support Vector Machines (SVM). SVMs are currently a hot topic in the machine learning community, creating a similar enthusiasm at the moment as Artificial Neural Networks used to do before. Far from being a panacea, SVMs yet represent a powerful technique for general (nonlinear) classification, regression and outlier detection with an intuitive model representation. The package e1071 offers an interface to the award-winning1 C++- implementation by Chih-Chung Chang and Chih-Jen Lin, libsvm (current version: 2.6), featuring: C- and ν-classification one-class-classification (novelty detection) - and ν-regression and includes: linear, polynomial, radial basis function, and sigmoidal kernels formula interface k-fold cross validation For further implementation details on libsvm, see Chang & Lin(2001). Basic concept SVMs were developed by Cortes & Vapnik(1995) for binary classification. Their approach may be roughly sketched as follows: Class separation: basically, we are looking for the optimal separating hyperplane between the two classes by maximizing the margin between the classes' closest points (see Figure1)|the points lying on the boundaries are called support vectors, and the middle of the margin is our optimal separating hyperplane; *A smaller version of this article appeared in R-News, Vol.1/3, 9.2001 1The library won the IJCNN 2001 Challenge by solving two of three problems: the Gener- alization Ability Challenge (GAC) and the Text Decoding Challenge (TDC). For more information, see: http://www.csie.ntu.edu.tw/~cjlin/papers/ijcnn.ps.gz. 1 Overlapping classes: data points on the \wrong" side of the discriminant margin are weighted down to reduce their influence (\soft margin"); Nonlinearity: when we cannot find a linear separator, data points are pro- jected into an (usually) higher-dimensional space where the data points effectively become linearly separable (this projection is realised via kernel techniques); Problem solution: the whole task can be formulated as a quadratic optimization problem which can be solved by known techniques. A program able to perform all these tasks is called a Support Vector Machine. Margin { Separating Hyperplane Support Vectors Figure 1: Classification (linear separable case) Several extensions have been developed; the ones currently included in libsvm are: ν-classification: this model allows for more control over the number of support vectors (see Schölkopf et al., 2000) by specifying an additional parameter ν which approximates the fraction of support vectors; One-class-classification: this model tries to find the support of a distribution and thus allows for outlier/novelty detection; Multi-class classification: basically, SVMs can only solve binary classification problems. To allow for multi-class classification, libsvm uses the one- against-one technique by fitting all binary subclassifiers and finding the correct class by a voting mechanism; -regression: here, the data points lie in between the two borders of the margin which is maximized under suitable conditions to avoid outlier inclusion; 2 ν-regression: with analogue modifications of the regression model as in the classification case. Usage in R The R interface to libsvm in package e1071, svm(), was designed to be as intuitive as possible. Models are fitted and new data are predicted as usual, and both the vector/matrix and the formula interface are implemented. As expected for R's statistical functions, the engine tries to be smart about the mode to be chosen, using the dependent variable's type (y): if y is a factor, the engine switches to classification mode, otherwise, it behaves as a regression machine; if y is omitted, the engine assumes a novelty detection task. Examples In the following two examples, we demonstrate the practical use of svm() along with a comparison to classification and regression trees as implemented in rpart(). Classification In this example, we use the glass data from the UCI Repository of Machine Learning Databases for classification. The task is to predict the type of a glass on basis of its chemical analysis. We start by splitting the data into a train and test set: > library(e1071) > library(rpart) > data(Glass, package="mlbench") > ## split data into a train and test set > index <- 1:nrow(Glass) > testindex <- sample(index, trunc(length(index)/3)) > testset <- Glass[testindex,] > trainset <- Glass[-testindex,] Both for the SVM and the partitioning tree (via rpart()), we fit the model and try to predict the test set values: > ## svm > svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1) > svm.pred <- predict(svm.model, testset[,-10]) (The dependent variable, Type, has column number 10. cost is a general penal- izing parameter for C-classification and gamma is the radial basis function-specific kernel parameter.) > ## rpart > rpart.model <- rpart(Type ~ ., data = trainset) > rpart.pred <- predict(rpart.model, testset[,-10], type = "class") A cross-tabulation of the true versus the predicted values yields: 3 > ## compute svm confusion matrix > table(pred = svm.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 18 3 1 0 0 0 2 12 19 1 1 0 4 3 0 1 2 0 0 0 5 0 0 0 2 0 0 6 0 0 0 0 2 0 7 0 0 0 0 0 5 > ## compute rpart confusion matrix > table(pred = rpart.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 19 4 0 0 0 1 2 9 16 1 0 1 1 3 1 1 3 0 0 0 5 0 2 0 3 1 1 6 0 0 0 0 0 0 7 1 0 0 0 0 6 method Min. 1st Qu. Median Mean 3rd Qu. Max. Accuracy svm 0.58 0.6 0.61 0.62 0.63 0.68 rpart 0.37 0.42 0.45 0.45 0.48 0.55 Kappa svm 0.61 0.63 0.64 0.65 0.66 0.7 rpart 0.44 0.5 0.5 0.52 0.54 0.59 Table 1: Performance of svm() and rpart() for classification (10 replications) Finally, we compare the performance of the two methods by computing the respective accuracy rates and the kappa indices (as computed by classAgree- ment() also contained in package e1071). In Table1, we summarize the results of 10 replications|Support Vector Machines show better results. Non-linear -Regression The regression capabilities of SVMs are demonstrated on the ozone data. Again, we split the data into a train and test set. > library(e1071) > library(rpart) > data(Ozone, package="mlbench") > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) 4 > ## svm > svm.model <- svm(V4 ~ ., data = trainset, cost = 1000, gamma = 0.0001) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) [,1] [1,] 10.97535 > ## rpart > rpart.model <- rpart(V4 ~ ., data = trainset) > rpart.pred <- predict(rpart.model, testset[,-3]) > crossprod(rpart.pred - testset[,3]) / length(testindex) [,1] [1,] 23.1046 Min. 1st Qu. Median Mean 3rd Qu. Max. svm 8.58 10.17 11.48 11.32 12.33 14.28 rpart 14.48 17.33 19.97 19.31 21.26 22.44 Table 2: Performance of svm() and rpart() for regression (Mean Squared Error, 10 replications) We compare the two methods by the mean squared error (MSE)|see Table 2 for a summary of 10 replications. Again, as for classification, svm() does a better job than rpart()|in fact, much better. Elements of the svm object The function svm() returns an object of class \svm", which partly includes the following components: SV: matrix of support vectors found; labels: their labels in classification mode; index: index of the support vectors in the input data (could be used e.g., for their visualization as part of the data set). If the cross-classification feature is enabled, the svm object will contain some additional information described below. Other main features Class Weighting: if one wishes to weight the classes differently (e.g., in case of asymmetric class sizes to avoid possibly overproportional influence of bigger classes on the margin), weights may be specified in a vector with named components. In case of two classes A and B, we could use something like: m <- svm(x, y, class.weights = c(A = 0.3, B = 0.7)) 5 Cross-classification: to assess the quality of the training result, we can perform a k-fold cross-classification on the training data by setting the parameter cross to k (default: 0). The svm object will then contain some additional values, depending on whether classification or regression is performed. Values for classification: accuracies: vector of accuracy values for each of the k predictions tot.accuracy: total accuracy Values for regression: MSE: vector of mean squared errors for each of the k predictions tot.MSE: total mean squared error scorrcoef: Squared correlation coefficient (of the predicted and the true values of the dependent variable) Tips on practical use Note that SVMs may be very sensitive to the proper choice of parameters, so allways check a range of parameter combinations, at least on a reasonable subset of your data. For classification tasks, you will most likely use C-classification with the RBF kernel (default), because of its good general performance and the few number of parameters (only two: C and γ). The authors of libsvm suggest to try small and large values for C|like 1 to 1000—first, then to decide which are better for the data by cross validation, and finally to try several γ's for the better C's.

Support Vector Machines—The Interface to Libsvm in Package E1071

Quick Tour of Machine Learning (機器 )

Understanding Support Vector Machines

Using a Support Vector Machine to Analyze a DNA Microarray∗

Arxiv:1612.07993V1 [Stat.ML] 23 Dec 2016 Until Recently, No Package Providing Multiple Semi-Supervised Learning Meth- Ods Was Available in R

Xgboost: a Scalable Tree Boosting System

LIBSVM: a Library for Support Vector Machines

Scikit-Learn

Liblinear: a Library for Large Linear Classification

Release 0.80 Xgboost Developers

How Machine Learning Has Been Applied in Software Engineering?

Implementation of Machine Learning Algorithms for Different Datasets Using Python Programming Language

High Performance Implementation of Support Vector Machines Using Opencl