
An Algorithm for the Exact Computation of the Centroid of Higher Dimensional Polyhedra and its Application to Kernel Machines Frederic Maire Smart Devices Laboratory, School of SEDC, IT Faculty, Queensland University of Technology, 2 George Street, GPO Box 2434, Brisbane Q 4001, Australia. Abstract (SVM) returns the central direction wsvm (a unit vector) of the largest spheric cone contained in the polyhedral cone i The Support Vector Machine (SVM) solution corre- w|∀i ∈ [1,m] , w, yiφ(x ) ≥ 0 . The weight vector sponds to the centre of the largest sphere inscribed in ver- wsvm can be expressed as a linear combination of the vec- i sion space. Alternative approaches like Bayesian Point tors yiφ(x )’s. That is, there exist (α1,...,αm) such that m i Machines (BPM) and Analytic Centre Machines have sug- wsvm = i=1 αiyiφ(x ). gested that the generalization performance can be further The Kernel trick is that for some feature spaces and map- enhanced by considering other possible centres of version pings φ, there exist easily computable kernel functions k de- space like the centroid (centre of mass) or the analytic cen- fined on the input space such that k(x, y)=φ(x),φ(y). tre. We present an algorithm to compute exactly the cen- A new input vector x is classified with the sign of troid of higher dimensional polyhedra, then derive approx- m m imation algorithms to build a new learning machine whose i i wsvm,x = αiyi φ(x ),φ(x) = αiyik(x ,x) performance is comparable to BPM. We also show that for i=1 i=1 regular kernel matrices (Gaussian kernels for example), the k SVM solution can be obtained by solving a linear system of With a kernel function , the computation of inner prod- φ(x),φ(y) equalities. ucts does not require the explicit knowledge of φ. In fact for a given kernel function k, there may exist many suitable mappings φ. Bayes Point Machines (BPM) are a well-founded im- 1. Introduction provement over to SVM which approximate the Bayes- optimal decision by the centroid (also known as the cen- In the Kernel Machine framework [5, 8], a feature map- tre of mass or barycentre) of version space. It happens ping x →φ(x) from an input space to a feature space is that the Bayes point is very close to the centroid of ver- given (generally, implicitly via a kernel function), as well sion space in high dimensional spaces. The Bayes point as a training set T of pattern vectors and their class la- achieves better generalization performance in comparison 1 m bels (x ,y1),...,(x ,ym) where the class labels are in to SVM [6, 9, 1, 10]. {−1, +1}. The learning problem is formulated as a search An intuitive way to see why the centroid is a good choice problem for a linear classifier (a weight vector w)inthe is to view version space as a (infinite) committee of experts feature space. Because only the direction of w matters for who all are consistent with the training set. A new and un- classification purpose, without loss of generality, we can re- labelled input vector corresponds to a hyperplane in feature strict the search for w to the unit sphere. The set of weight space that may split version space in two. It is reasonable vectors w that classify correctly the training set is called to use the opinion of the majority of the experts that were version space and denoted by V(T ). Version space is the consistent with the training set to predict the class label of region of feature space defined as the intersection of the a new pattern. The expert that agrees the most with the ma- unit sphere and the polyhedral cone of the feature space jority vote on new inputs is precisely the Bayesian point. i w|∀i ∈ [1,m] , w, yiφ(x ) ≥ 0 . In a standard committee machine, for each new input we The training algorithm of a Support Vector Machine seek the opinions of a finite number of experts then take a Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03) 0-7695-1978-4/03 $ 17.00 © 2003 IEEE majority vote, whereas with a BPM, the expert that most of- Gh of its intersection with the hyperplane xn = h by the ten agrees with the majority vote of the infinite committee following equalities; (version space) is delegated the task of classifying the new inputs. h h x n−1 h Following Rujan [7], Herbrich and Graepel [2] intro- V = V dx = V × dx = × V duced two algorithms to stochastically approximate the cen- x h h n h 0 0 troid of version space; a billiard sampling algorithm and a −−→ h −−→ sampling algorithm based on the well known perceptron al- V × OG = OGx × Vx × dx 0 gorithm. −−→ n −−→ OG = × OG In this paper, we present an algorithm to compute exactly n +1 h the centroid of a polyhedron in high dimensional spaces. From this exact algorithm, we derive an algorithm to ap- proximate a centroid position in a polyhedral cone. We The above formulae were derived by considering the n-fold show empirically that the corresponding machine presents integral defining the n-dimensional volume. These formu- better generalization capability than SVMs on a number a lae allow a recursive computation of the centroid of a poly- benchmark data sets. hedron P by partitioning P into polyhedral cones generated In Section 2, we introduce an algorithm to compute ex- by its facets. actly the centroid of higher dimensional polyhedra. In Sec- tion 3, we show a simple algorithm to compute the SVM so- It is useful to observe that the computation of the vol- lution of regular kernels. In Section 4, we sketch the idea of ume and the centroid of a (n − 1)-dimensional polyhedron Balancing Board Machines. In Section 5, some implemen- in a n-dimensional space is identical to the computation of tation issues are considered and some experimental results the volume and the centroid of a facet of a n-dimensional are presented. polyhedron. 2. Exact Computation of the Centroid of a Algorithm 1 [G, V ] P Higher Dimensional Polyhedron =measurePolyhedron( ) Require: P = {x | Ax ≤ b} non-empty and irredundant Ensure: G is the centroid of P ,andV its volume A polyhedron P is the intersection of a finite number of {m is the number of rows of A, n is its number of half-spaces. It is best represented by a system of non redun- columns} dant linear inequalities P = {x|Ax ≤ b}. Recall that the 1- for i =1to m do volume is the length, the 2-volume is the surface and the 3- {Compute recursively the centroids GF and the (n − volume is the every-day-life volume. The algorithm that we i 1)-volumes VF of each facet Fi of P } introduce for computing the centroid of an n-dimensional i [GF ,VF ]=measurePolyhedron(Fi) polyhedron is an extension of the work by Lasserre [4] i i end for who showed that the n-dimensional volume V (n, A, b) of {Compute GE, the centroid of the envelope of P } a polyhedron P is related to the (n − 1)-dimensional vol- V G = Fi × G E i V Fi umes of its facets and the row vectors of its matrix A by the j Fj following formula; for i =1to m do { G (n − Compute recursively the centroids Ci and the 1 bi 1) V C = (G ,F ) V (n, A, b)= × V (n − 1,A,b) -volumes Ci of each cone i cone E i n a i G F } i i rooted at E and generated by i Compute hi, the distance form GE to the hyperplane th where ai denotes the i row of A and Vi(n − 1,A,b) containing Fi th V = hi × V denotes the (n − 1)-dimensional volume of the i facet Ci n Fi T −−−−→ n −−−−→ P ∩{x|ai x = bi}. We obtain the centroid and the (n−1)- G G = × G G E Ci n+1 E Fi volume of a facet by variable elimination. Geometrically, end for this amounts to projecting the facet onto an axis parallel V = V i Ci V hyperplane, then computing the volume and the centroid Ci G = × GC of this projection recursively in a lower dimensional space. i V i From the volume and centroid of the projected facet, we can derive the centroid and volume of the original facet. The n-volume V and the centroid G of a cone rooted The Matlab code for this algorithm is available at at 0 are related to the (n − 1)-volume Vh and the centroid http://www.fit.qut.edu.au/∼maire/G Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03) 0-7695-1978-4/03 $ 17.00 © 2003 IEEE 3. Computing the Spheric Centre of a Poly- formulae allow us to derive an approximation of the volume hedral Cone Derived from a Non Singular and the centroid of a polyhedron once we have approxima- Mercer Kernel Matrix tions for the volumes and the centroids of its facets. Because the balancing board algorithm requires several board centroid estimations, it is desirable to recycle inter- P = {x|Ax ≤ 0} Let be the non-empty polyhedral cone mediate results as much as possible to achieve a significant A derived from a non-singular kernel matrix. The matrix is reduction in computation time. Because the intersection of m = n square ( ).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-