A Fast Linear Separability Test by Projection of Positive Points on Subspaces
Total Page:16
File Type:pdf, Size:1020Kb
A Fast Linear Separability Test by Projection of Positive Points on Subspaces Yogananda A P Yogananda [email protected] Applied Research Group, Satyam Computer Services Limited, SID Block, IISc, Bangalore 560012, India M Narasimha Murthy [email protected] Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India Lakshmi Gopal Lakshmi [email protected] Applied Research Group, Satyam Computer Services Limited, SID Block, IISc, Bangalore 560012, India Abstract 1. Introduction The Linear separability test determines if a hyper- A geometric and non parametric procedure plane separates two sets P and Q Rd. A hy- for testing if two nite set of points are lin- perplane H = fvjwtv + b = 0; v 2 Rd; w 2 early separable is proposed. The Linear Sep- Rd; b 2 R; w; b are xedg separates the two sets if arability Test is equivalent to a test that de- wty + b > 0 8y 2 P termines if a strictly positive point h > 0 ex- wty + b < 0 8y 2 Q ists in the range of a matrix A (related to the or wt( y) + b( 1) > 0 8y 2 Q points in the two nite sets). The algorithm 0 0 1¢d Let P = fvijvi 2 R ; i = 1;:::;N1g proposed in the paper iteratively checks if 0 0 1¢d Q = fuijui 2 R ; i = 1;:::;N2g a strictly positive point exists in a subspace n = N1 + N2 by projecting a strictly positive vector with t x = [w1; w2; : : : ; wd; b] equal co-ordinates (p), on the subspace. At 2 0 0 0 3 v1;1 v1;2 : : : v1;d 1 the end of each iteration, the subspace is re- 6 v0 v0 : : : v0 1 7 6 2;1 2;2 2;d 7 duced to a lower dimensional subspace. The 6 . 7 6 . .. 7 test is completed within r min(n; d + 1) 6 . 7 steps, for both linearly separable and non 6 v0 v0 : : : v0 1 7 6 N1;1 N1;2 N1;d 7 and A = 6 0 0 0 7. separable problems (r is the rank of A, n 6 u1;1 u1;2 ::: u1;d 1 7 is the number of points and d is the di- 6 0 0 0 7 6 u2;1 u2;2 ::: u2;d 1 7 mension of the space containing the points). 6 . 7 4 . .. 5 The worst case time complexity of the al- . 0 0 0 3 u u ::: u 1 gorithm is O(nr ) and space complexity of N2;1 N2;2 N2;d the algorithm is O(nd). A small review of The linear separability test is equivalent to verifying some of the prominent algorithms and their if B = fxjAx > 0g is non-empty. Many algorithms time complexities is included. The worst case have been proposed for the test and a comprehensive computational complexity of our algorithm discussion of the algorithms can be found in (Elizondo, is lower than the worst case computational 2006) and Ch. 5 in (Duda et al., 2000). complexity of Simplex, Perceptron, Support Vector Machine and Convex Hull Algorithms, 1.1. Algorithms 2 if d < n 3 . A general discussion of Linear Programming, Quadratic Programming, Convex Hull and the Perceptron Algorithms to solve the linear separa- bility problem is included in this subsection. The Appearing in Proceedings of the 24 th International Confer- time complexities of the algorithms are discussed ence on Machine Learning, Corvallis, OR, 2007. Copyright subsequently. 2007 by the author(s)/owner(s). A Fast Linear Separability Test 1.1.1. Linear Programming be the convex hull of P . Let Cv(Q) be the convex hull of Q. B is nonempty if and only if C (P )\C (Q) = ;. Let p = [1; 1;:::; 1]t 2 Rn; 2 R. The linear pro- v v There are many algorithms to compute the Convex gramming formulation for solving the problem is hull of a nite set of points. Quick Hull is a fast incre- min mental algorithm for nding the convex hull (Barber x; et al., 1996). Given an initial convex hull (a simplex subject to Ax + p 1 of d+1 points in Rd), each unprocessed point is either 0 inside or outside the current convex hull. If the point B 6= ;, optimal = £ = 0 is within the convex hull, it is discarded, otherwise a convex hull is constructed with the new point as one The linear programming problem is solved either by of the vertices. Simplex or Interior Point methods. A feasible solution to the above problem is easy to determine. Simplex 1.1.4. Perceptron solves the optimization problem iteratively by moving on the boundary of the convex polyhedral region de- Perceptron is an iterative procedure to nd the sepa- ned by the constraints. The Interior Point algorithms rating hyperplane between sets P and Q. The algo- can move through the interior of the convex region. rithm terminates only if the two sets are separable. It has been proved that the algorithm converges in - 1.1.2. Quadratic Programming nite number of iterations when there is a separating hyperplane (Duda et al., 2000). In every iteration, x n¢1 t Let 2 R . Let x = [w1; w2; : : : ; wd; b] as de ned is changed to increase the sum of the non-positive co- previously. Let C > 0 be an arbitrarily large constant. ordinates of the vector Ax (misclassi ed examples in P and Q). Let A 2 R1¢(d+1); i = 1; : : : ; n be the rows min 1 kwk2 + Ckk2 i w;b 2 of matrix A. Let > 0 be a xed small value. Algo- sub: Ax 1 rithm 1 is the Batch Perceptron algorithm for verifying B 6= ;. 0 B 6= ; , kk2 = k£k2 = 0 in the optimal solution Algorithm 1 B = fxjAx > 0;A 2 Rn¢(d+1); x 2 The feasible region determined by the constraints is Rd+1g, Test if B 6= ; convex and polyhedral. An incremental algorithm Input: n ¢ d + 1 matrix A nds a separating hyperplane for pairs of points in P L 0 //L = 1 if B is non-empty and Q, and re nes a common separating hyperplane x random vector between all pairs of points. If the problem is linearly a(x) Ax separable, the common separator can be written as if a(x) > 0 then a linear combination of very few pairs of boundary L 1 points in P and Q, called support vectors (Cristianini end if & Taylor, 2000). SVM nds a hyperplane which has while L ==P 0 do large margin (perpendicular distance of the points in P x x + At ai(x)0 i and Q closest to the hyperplane). Given that most real a(x) Ax life data sets are linearly non separable (Wasserman, if a(x) > 0 then 1989, Linear Separability : Ch. 2), SVM identi es an L 1 unique hyperplane which has less misclassi cations ir- end if respective of the distributions from which sets P and end while Q are drawn, if feature vectors in P and Q are drawn return L independently and identically from xed distributions and n is large (Cristianini & Taylor, 2000, Ch. 4). The number of iterations that the SVM requires to nd a separator for the linearly separable problems is inde- 1.2. Complexity pendent of the proximity of P and Q (Tsang et al., A few algorithms converge in nite number of iter- 2005). ations, if and only if B 6= ;. Perceptron algorithm and Ho-Kashyap Procedure belong to this category. 1.1.3. Convex Hull Separation Upper Bounds on the number of iterations for conver- The convex hull of a set of points is the smallest convex gence of the algorithms on linearly separable problems region that encloses the points of the set. Let Cv(P ) can be found in (Duda et al., 2000), (Elizondo, 2006). A Fast Linear Separability Test The bounds are dependent on the proximity of the the projected point z(p) has strictly positive co- two sets P; Q and the number of points in the two ordinates or some negative co-ordinates, or some zero sets. The Perceptron algorithm has time complexity co-ordinates. If z(p) has strictly positive co-ordinates, of O(n2) in the number of points. Examples of algo- then the sets P and Q are linearly separable. If rithms which converge in nite number of iterations for z(p) 0 the test is performed on linear combina- both linearly separable and non separable problems are tions of rows with indices Iz of matrix A. If z(p) has Linear Programming Algorithms, Convex Hull proce- some negative co-ordinates, the test is performed on dure and Quadratic Programming Algorithms. Con- a lower dimensional subspace Wr 1 Wr which ex- vex Hull procedure provides a direct solution to the cludes z(p). At each step, the dimensionality of the minimum number of misclassi cations of any hyper- subspace reduces and a test in one dimensional sub- plane, subsequent to identifying the intersection re- space is needed in the nal step. It is easy to verify gion of the convex hulls of the two sets P and Q. The whether an one dimensional subspace is a scaling of a worst case time complexity of Linear Programming Al- strictly positive point. Hence, the maximum number gorithms (LP solution by Simplex) and Convex Hull of steps needed for the completion of the test is equal procedure is exponential in dimension (Elizondo, 2006) to the dimension of the range of A. though on most problems Simplex has low computa- tional complexity of O(n).