A Fast Linear Separability Test by Projection of Positive Points on Subspaces

Yogananda A P Yogananda [email protected] Applied Research Group, Satyam Computer Services Limited, SID Block, IISc, Bangalore 560012, India M Narasimha Murthy [email protected] Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India Lakshmi Gopal Lakshmi [email protected] Applied Research Group, Satyam Computer Services Limited, SID Block, IISc, Bangalore 560012, India

Abstract 1. Introduction The Linear separability test determines if a hyper- A geometric and non parametric procedure separates two sets P and Q  Rd. A hy- for testing if two nite set of points are lin- perplane H = fvjwtv + b = 0; v 2 Rd; w 2 early separable is proposed. The Linear Sep- Rd; b 2 R; w; b are xedg separates the two sets if arability Test is equivalent to a test that de- wty + b > 0 8y 2 P termines if a strictly positive h > 0 ex- wty + b < 0 8y 2 Q ists in the range of a matrix A (related to the or wt( y) + b( 1) > 0 8y 2 Q points in the two nite sets). The algorithm 0 0 1¢d Let P = fvijvi 2 R ; i = 1;:::;N1g proposed in the paper iteratively checks if 0 0 1¢d Q = fuijui 2 R ; i = 1;:::;N2g a strictly positive point exists in a subspace n = N1 + N2 by projecting a strictly positive vector with t x = [w1; w2; : : : ; wd; b] equal co-ordinates (p), on the subspace. At 2 0 0 0 3 v1;1 v1;2 : : : v1;d 1 the end of each iteration, the subspace is re- 6 v0 v0 : : : v0 1 7 6 2;1 2;2 2;d 7 duced to a lower dimensional subspace. The 6 . . . . . 7 6 ...... 7 test is completed within r  min(n; d + 1) 6 . . . . 7 steps, for both linearly separable and non 6 v0 v0 : : : v0 1 7 6 N1;1 N1;2 N1;d 7 and A = 6 0 0 0 7. separable problems (r is the rank of A, n 6 u1;1 u1;2 ::: u1;d 1 7 is the number of points and d is the di- 6 0 0 0 7 6 u2;1 u2;2 ::: u2;d 1 7 mension of the space containing the points). 6 . . . . . 7 4 ...... 5 The worst case time complexity of the al- . . . . 0 0 0 3 u u ::: u 1 gorithm is O(nr ) and space complexity of N2;1 N2;2 N2;d the algorithm is O(nd). A small review of The linear separability test is equivalent to verifying some of the prominent algorithms and their if B = fxjAx > 0g is non-empty. Many algorithms time complexities is included. The worst case have been proposed for the test and a comprehensive computational complexity of our algorithm discussion of the algorithms can be found in (Elizondo, is lower than the worst case computational 2006) and Ch. 5 in (Duda et al., 2000). complexity of Simplex, , Support Vector Machine and Convex Hull Algorithms, 1.1. Algorithms 2 if d < n 3 . A general discussion of Linear Programming, Quadratic Programming, Convex Hull and the Perceptron Algorithms to solve the linear separa- bility problem is included in this subsection. The Appearing in Proceedings of the 24 th International Confer- time complexities of the algorithms are discussed ence on , Corvallis, OR, 2007. Copyright subsequently. 2007 by the author(s)/owner(s). A Fast Linear Separability Test

1.1.1. Linear Programming be the convex hull of P . Let Cv(Q) be the convex hull of Q. B is nonempty if and only if C (P )\C (Q) = ;. Let p = [1; 1;:::; 1]t 2 Rn;  2 R. The linear pro- v v There are many algorithms to compute the Convex gramming formulation for solving the problem is hull of a nite set of points. Quick Hull is a fast incre- min  mental algorithm for nding the convex hull (Barber x; et al., 1996). Given an initial convex hull (a simplex subject to Ax + p  1 of d+1 points in Rd), each unprocessed point is either   0 inside or outside the current convex hull. If the point B 6= ;, optimal  =  £ = 0 is within the convex hull, it is discarded, otherwise a convex hull is constructed with the new point as one The linear programming problem is solved either by of the vertices. Simplex or Interior Point methods. A feasible solution to the above problem is easy to determine. Simplex 1.1.4. Perceptron solves the optimization problem iteratively by moving on the boundary of the convex polyhedral region de- Perceptron is an iterative procedure to nd the sepa- ned by the constraints. The Interior Point algorithms rating between sets P and Q. The algo- can move through the interior of the convex region. rithm terminates only if the two sets are separable. It has been proved that the algorithm converges in - 1.1.2. Quadratic Programming nite number of iterations when there is a separating hyperplane (Duda et al., 2000). In every iteration, x n¢1 t Let  2 R . Let x = [w1; w2; : : : ; wd; b] as de ned is changed to increase the sum of the non-positive co- previously. Let C > 0 be an arbitrarily large constant. ordinates of the vector Ax (misclassi ed examples in P and Q). Let A 2 R1¢(d+1); i = 1; : : : ; n be the rows min 1 kwk2 + Ckk2 i w;b 2 of matrix A. Let  > 0 be a xed small value. Algo- sub: Ax  1  rithm 1 is the Batch Perceptron algorithm for verifying B 6= ;.   0 B 6= ; , kk2 = k£k2 = 0 in the optimal solution Algorithm 1 B = fxjAx > 0;A 2 Rn¢(d+1); x 2 The feasible region determined by the constraints is Rd+1g, Test if B 6= ; convex and polyhedral. An incremental algorithm Input: n ¢ d + 1 matrix A nds a separating hyperplane for pairs of points in P L 0 //L = 1 if B is non-empty and Q, and re nes a common separating hyperplane x random vector between all pairs of points. If the problem is linearly a(x) Ax separable, the common separator can be written as if a(x) > 0 then a linear combination of very few pairs of boundary L 1 points in P and Q, called support vectors (Cristianini end if & Taylor, 2000). SVM nds a hyperplane which has while L ==P 0 do large margin (perpendicular distance of the points in P x x +  At ai(x)0 i and Q closest to the hyperplane). Given that most real a(x) Ax life data sets are linearly non separable (Wasserman, if a(x) > 0 then 1989, Linear Separability : Ch. 2), SVM identi es an L 1 unique hyperplane which has less misclassi cations ir- end if respective of the distributions from which sets P and end while Q are drawn, if feature vectors in P and Q are drawn return L independently and identically from xed distributions and n is large (Cristianini & Taylor, 2000, Ch. 4). The number of iterations that the SVM requires to nd a separator for the linearly separable problems is inde- 1.2. Complexity pendent of the proximity of P and Q (Tsang et al., A few algorithms converge in nite number of iter- 2005). ations, if and only if B 6= ;. Perceptron algorithm and Ho-Kashyap Procedure belong to this category. 1.1.3. Convex Hull Separation Upper Bounds on the number of iterations for conver- The convex hull of a set of points is the smallest convex gence of the algorithms on linearly separable problems region that encloses the points of the set. Let Cv(P ) can be found in (Duda et al., 2000), (Elizondo, 2006). A Fast Linear Separability Test

The bounds are dependent on the proximity of the the projected point z(p) has strictly positive co- two sets P;Q and the number of points in the two ordinates or some negative co-ordinates, or some zero sets. The Perceptron algorithm has time complexity co-ordinates. If z(p) has strictly positive co-ordinates, of O(n2) in the number of points. Examples of algo- then the sets P and Q are linearly separable. If rithms which converge in nite number of iterations for z(p)  0 the test is performed on linear combina- both linearly separable and non separable problems are tions of rows with indices Iz of matrix A. If z(p) has Linear Programming Algorithms, Convex Hull proce- some negative co-ordinates, the test is performed on dure and Quadratic Programming Algorithms. Con- a lower dimensional subspace Wr 1  Wr which ex- vex Hull procedure provides a direct solution to the cludes z(p). At each step, the dimensionality of the minimum number of misclassi cations of any hyper- subspace reduces and a test in one dimensional sub- plane, subsequent to identifying the intersection re- space is needed in the nal step. It is easy to verify gion of the convex hulls of the two sets P and Q. The whether an one dimensional subspace is a scaling of a worst case time complexity of Linear Programming Al- strictly positive point. Hence, the maximum number gorithms (LP solution by Simplex) and Convex Hull of steps needed for the completion of the test is equal procedure is exponential in dimension (Elizondo, 2006) to the dimension of the range of A. though on most problems Simplex has low computa- tional complexity of O(n). Interior point algorithms If a vector with strictly positive co-ordinates h > 3 for solving LP have worst case complexity of O(n L), 0 exists in the range of A matrix, then a separating where L is a measure of the machine precision (Gon- hyperplane exists. The following theorem proves the zaga, 1988). The worst case computational complexity same result when h = z(p) > 0. of Quadratic Programming Algorithm is O(n3). The algorithm presented (Subspace Projection - SP) in this Theorem 2.1 If z(p) > 0, then 9 x such that Ax > paper is a method to verify if a strictly positive vector 0. exists in a subspace. It avoids computation of a hyper- plane which misclassi es the least number of points. Proof z(p) 2 Wr. Wr is the range of A. Hence 9 x such that Ax = z(p) > 0. 2. Iterative Reduction of Subspaces In this section, a veri cation scheme for existence of If z(p) is not a strictly positive vector, either z(p)  x 2 Rd+1 such that Ax > 0 (A is a n ¢ (d + 1) matrix) 0 or zi(p) < 0 for some i. Let z(p)  0. Let Iz is described. Consider an orthonormal basis of range be the co-ordinate indices corresponding to zi(p) = 0. of A. Let U = fu ; u ; : : : ; u 2 Rng be the basis. Let The following two theorems prove that h > 0 exists 1 2 r in the range of A if and only if a vector with positive the subspace spanned by U be Wr. Let a positive point n values in co-ordinate indices Iz exists in the range of p in R be projected on the subspace Wr. Let the pro- jected point be z(p) = (z (p); z (p); : : : ; z (p))t 2 W . A. Consequently, to verify if h > 0 exists in Wr, it is 1 2 n r 0 jI j In brief, the following results are proved in this sec- sucient to test if h > 0 exists in a subspace of R z . tion. If z(p) is a strictly positive point, then x ex- The subspace W (Iz) is the set of linear combinations of v = (A ;A ;:::;A ); j = 1; : : : ; d + 1, ists. If z(p)  0 and I = fijz (p) = 0g is non-empty, j Iz (1);j Iz (2);j Iz (jIz j);j z i th th where Ai;j is the element in i row and j column of a strictly positive point exists in Wr if and only if A. a vector v of the form vi > 0 8i 2 Iz(p) exists in Wr. If p > 0 has equal co-ordinates and z(p) = 0, then a strictly positive vector does not exist in W . If Theorem 2.2 If a strictly positive vector exists in Wr r and z(p)  0, a vector with positive values in co- zi(p) < 0 for some i, a subspace of r 1 (Wr 1) vec- ordinates Iz(p) exists in Wr. tors of Wr not including z, intersects with an unique n n 1 dimensional subspace in R . Wr 1 is the in- n t Proof Let h > 0 2 W . h has positive values in the tersection of Wr and Hn 1 = fv 2 R jv ej = 0; j = r pi co-ordinates I . arg mini;z (p)<0 g. A strictly positive vector ex- z(p) i pi zi(p) ists in Wr if and only if a positive vector exists in Wr 1. A simple method to compute an upper bound on the Theorem 2.3 If a vector with positive values in co- number of misclassi cations is proved in the last the- ordinates Iz(p) exists in Wr and if z(p)  0, then a orem. strictly positive vector exists in Wr.

Proof Let v 2 W with v > 0 8i 2 I . Then The algorithm performs the test by projecting r i z(p) vi z(p) + v > 0 for > maxi=2I . points with positive co-ordinates on Wr. Either z(p) zi(p) A Fast Linear Separability Test

If jIzj = n, the subspace W (Iz) is the same as Wr. that zj(v) < 0; j 2 Sl(z), then every point q > 0 is The following theorem proves that h > 0 does not at a distance greater than 0 from the subspace Wr. exist in Wr if jIzj = n (i.e. z(p) = 0), provided the Contradiction (h > 0 2 Wr exists; geometrically, all positive vector p has equal co-ordinates. points of T cannot be above Wr).

Theorem 2.4 If p is the unit vector with all compo- Hence a point t 2 T exists either on or below nents equal, and z(p) = 0 then W does not contain a r (zj(t) > 0 for some j 2 Sl(z)) the subspace Wr. The strictly positive vector. signed distance of points of T from Wr is a contin- uous function. Hence the distance between z(v) and n Proof Every vector x 2 R of the form 0  xi  v 2 T takes all values between d(t; z(t)) and d(l; z(l)). p1 ; 8i is at a distance less than or equal to 1 from p. n Therefore a point t0 2 T exists at a distance 0 from Every point of W f0g is at a distance greater than n t r Wr. T = Hn 1 = fv 2 R j v ej = 0; j 2 Sl(z); j = 1 from p, because 0 is the unique closest point in W pi r arg mini;z (p)<0 g. Let Wr 1 be the intersection to p. 0 is at distance 1 from p. Hence, other than 0, i pi zi(p) of Hn 1 and Wr. It is easy to prove that the intersec- no vector of Wr is in the set fx = (x1; x2; : : : ; xn) 2 tion of H and W is at most r 1 dimensional n 1 n 1 r R j 0  xi  p ; i 2 1; : : : ; ng. Hence h = x; > 0 n (vectors of Wr 1 = Wr \ H are linearly independent does not exists in W . Hence h > 0 does not exist in r of z(p), as zj(p) < 0 and vj = 0 8 v 2 Wr 1). Hence, if W . 0 r h > 0 exists in Wr, a point t  0 exists in Wr 1. It is easy to see that, if z(p) = 0, then z(p) = z( p) = The existence of a positive point t0  0 can 0; 2 R. be veri ed by projecting a positive point p0  0 2 Hn 1 on Wr 1. Hence the test is recur- t Let p = [1; 1;:::; 1] be projected on Wr. Let the sive. The recursion has a maximum depth of r, projection be z(p). Let zi(p) < 0 for some i. The because testing whether an one dimensional sub- following two theorems prove that a vector h > 0 exists space has a strictly positive point, is trivial. Let 0 n in Wr if and only if a vector h > 0 exists in a subspace R = Hn. Hn;Hn 1;:::;Hn r+k;:::Hn r+1 and with lesser dimension Wr 1. Wr 1 does not include Wr;Wr 1;:::;Wk;:::;W1 form a decreasing sequence z(p) and one of the co-ordinates of vectors v in Wr 1 of subspaces. is zero (zi(p) < 0 and vi = 0). If more than one co-ordinate of z(p) is less than zero, the co-ordinate with index i = arg mini zi(p) is set to zero. Hence the linear separability test can be performed in a lower dimensional subspace. e3

Theorem 2.5 If a strictly positive vector exists in Wr and zi(p) < 0 for some i, there exists a positive vector in Wr 1. p > 0

Proof Let h > 0 2 Wr. Let a non negative point l e1 closest to z(p) on the joining z(p) and p be consid- l ≥ 0 ee, ,,' lt∈ H ered (Figure 1). l = 0 for some j such that z (p) < 0. 1 2 2 j P j h > 0 t ' ≥ 0 z( p ) ? ? z(l) = z(p). Let z(l) l = w (w are or- e2 < hzp, ( ), t ' ∈ W i i i i ∈ z3 ( p ) 0 r t' W r−1 thonormal to Wr and belong to the complementary subspace). Let a point q be projected on PWr. Let the projected point be z(q). Let z(q) q = w?. If Figure 1. Projection of p > 0 on subspace Wr P i Pi i (q l)t(z(l) l)  0, ( )  0, 2  P pP P i i i Pi Pi i  2 2. Hence 2  2. i i i i i i i i i i i Theorem 2.6 If zi(p) < 0 for some i and v  0 exists The least distance of q from W is greater than or r in Wr such that vi > 0 8 i 2 Gz(p) = fijzi(p)  0g, equal to the least distance of l from W . r then Wr contains a strictly positive vector. Proof h = z(p) + v > 0 exists in W , for > PLet Sl(z) = fijzi(l) < 0; li =P 0g, q = l + r t zi(p) iei; i > 0. (q l) (z(l) l) = izi(l) < 0. maxi2G . i2Sl(z) i z(p) vi Hence q is at a distance greater than l from Wr. If n n every positive point in the set T = fvjv 2 R ; v 6= Let the standard ordered basis of R be Bn = 0; vj = 0 8 j 2 Sl(z)g has a projection z(v) 2 Wr such fe1; e2; : : : ; eng. Let the basis vectors of Hn r+1 be

A Fast Linear Separability Test 2 3 B = fe ; e ; : : : e g. Let p = e + e + ::: + e 2 p1 0 1 k i1 i2 ik 1 i1 i2 ik 2 2 6 1 1 7 Hn r+1. Let z(p1) be the projection of p1 on W1. 6 0 p 7 of A is U = 6 2 2 7. The projection Let z(pk) be the projection of pk 2 Hn r+k on Wk. 4 0 p1 1 5 The following theorem proves that the number of non 2 2 p1 0 1 positive co-ordinates of some vector v 2 W is equal 2 2 r of p = 1 (1; 1; 1; 1)t on the range of A is U(U tp) = to the number of non positive co-ordinates of z(p ) 2 1 (0; 0; 0; 0)t. Hence no strictly positive point exists in (excluding the co-ordinates of Bn Bk), assuming Wk. Hence P and Q are linearly non separable. zi1 (pr) < 0; zi2 (pr 1) < 0; : : : ; zir (p1) < 0 for some indices i ; i ; : : : ; i . Equivalently, there exists a vec- 1 2 r 2.1.2. AND Problem tor v in Wr, such that the number of strictly positive co-ordinates of v is equal to the number of strictly pos- Let2 P = f(1; 1)g3, Q = f(0; 0); (0; 1); (1; 0)g. A = itive co-ordinates of z(p1) plus n k. The minimum 1 1 1 number of misclassi cations of the points in P and Q 6 0 0 1 7 6 7. The projection of 1 (1; 1; 1; 1)t on is bounded by the number of non positive co-ordinates 4 0 1 1 5 2 of a vector in v 2 Wr. 1 0 1 the range of A is (0:25; 0:75; 0:25; 0:25)t. A strictly pos- itive point exists in W . Hence P and Q are linearly Theorem 2.7 The minimum number of misclassi ca- k separable. tions of any hyperplane M is bounded by the cardinal- ity of N (W ) = fijz (p )  0; z(p ) 2 W ; p > z(p1) 1 i 1 1 1 1;i t 2.2. Algorithm 0; p1 2 Hn r+1; p1 = [1; 1;:::; 0; 1;:::; 1] g, for a lin- early non separable problem. In Algorithm 2 an iterative procedure to verify if a strictly positive point exists in a subspace Wr (range of A) is presented. It repeatedly projects a point p  0 Proof Let pk be a vector with equal co-ordinates in on progressively smaller subspaces. In the algorithm, standard order basis of Hn r+k, and the rest of the projection of p on Wr is performed by rst orthonor- co-ordinates set to 0. Let z(pk) be the projection of malizing the vectors which span Wr. This can be re- pk on Wk; k  r. Let r = 1. 9 x such that Ax = placed by a function to solve the least squares problem z(p ). Hence, M  jN (W )j. Let r > 1, v = 2 1 z(p1) 1 arg minx kAx pk . For simplicity of presentation, the zi(p2) z(p1) z(p2); > 1 + maxz (p )>0 . Hence, recursion is described as an iterative procedure. i 1 zi(p1) vi > 0 8i 2 fjjzj(p1) > 0 _ (zi(p1) = 0; zi(p2) < 0)g. v 2 W contains N (W ) + jfijp = 0gj number 2 z(p1) 1 2;i 3. Experiments of co-ordinates less than or equal to zero. Similarly, 0 0 zi(p3) Subspace Projection algorithm, Linear Programming let v3 = v z(p3); > maxv >0 . The number i vi of co-ordinates less than or equal to 0 is N (W ) + Procedure, Perceptron algorithm and Support Vector z(p1) 1 light jfijp = 0gj. By induction v = v z has Machines (SVM (Joachims, 1999)) give identical 3;i k k 1 pk N (W ) + jfijp = 0gj co-ordinates less than or results on a representative set of benchmark datasets z(p1) 1 k;i in UCI Machine Learning Repository (Newman et al., equal to zero. jfijpr;i = 0gj = 0, (pr > 0 is a vec- tor with all co-ordinates equal in Rn). Hence v has 1998). The SVM algorithm is trained with large value r 12 N (W ) co-ordinates less than or equal to zero. of C. C is set to 10 . Experiments were performed on z(p1) 1 3.2GHz Pentium4 machine with 512MB RAM. In Ta- ble 1, the execution time of SVMlight (implemented in C++) and Subspace Projection (implemented in Further, it follows that the minimum number of mis- C++) are compared (processing time includes le classi cations U is bounded by the minimum of B read). The execution time of linear programming pro- jN (W )j and jN (W )j. z(p1) 1 z(p1) 1 cedure (LP - implemented as an optimized MATLAB r function linprog), Perceptron Algorithm (PR imple- 2.1. Examples mented in MATLAB r ) and Subspace Projection (SP 2.1.1. XOR Problem - implemented in MATLAB r ) are compared in Table 2 (processing time excludes le read). The last column 2Let P = f(0; 1);3(1; 0)g, Q = f(0; 0); (1; 1)g. A = describes whether the data sets are Linearly Separable 1 0 1 (LS) or not. The Perceptron algorithm does not con- 6 7 6 0 1 1 7 verge on linearly non separable problems. Hence the 4 5. An orthonormal basis of the range 0 0 1 execution time of Perceptron algorithm on linearly non 1 1 1 A Fast Linear Separability Test

separable problems is not included in Table 2. Algorithm 2 B = fxjAx > 0;A 2 Rn¢(d+1); x 2 Rd+1g, Test if B 6= ; 3.1. Description of Datasets Input: n ¢ d + 1 matrix A L 0 //L = 1 if B is non-empty In Iris1 dataset, Iris-setosa examples are considered to A orthonormal basis of the range of A be set P , Iris-versicolor and Iris-virginica examples are r rank(A) considered to be set Q. In Iris2 dataset, Iris-virginica N r examples are considered to be set P , Iris-setosa ex- z 0 2 Rn amples and Iris-versicolor examples are considered to while ((N 6= 0) ^ (L == 0) ^ (A 6= ;)) do be set Q. The Breast Cancer dataset has attributes p [1; 1;:::; 1]t 2 Rn with values in a speci ed range (e.g. 0-4). Such at- z A(Atp) tributes are split into two derived attributes (e.g. 0 if z > 0 then and 4). Nominal attributes are converted into dis- L 1 //by Theorem 2.1 crete values. Examples with missing values are re- else if z == 0 then moved from the dataset. In the Glass1 dataset, class 1 N 1 //by Theorem 2.4 (building windows oat processed) feature vectors are else if z  0 then considered to be set P and the rest of the 6 classes are considered to be set Q. Similarly, in Glass2 dataset Iz fijzi = 0g //by Theorem 2.2 and Theo- rem 2.3 class 2 (building windows non oat processed) feature i0 0 vectors are considered to be set P and the rest of the for i = 1 to n do 6 classes are considered to be set Q. A small sample of the Spam dataset (Spam1) is used for testing the if i 2 Iz then i0 i0 + 1 algorithm, because small samples of spam dataset are linearly separable. The Ionosphere dataset(Ion) is a Ai0;k Ai;k 8 k = 1; : : : ; r end if binary classi cation problem of radar signals. The col- end for umn containing indicies of training sample is ignored in all the datasets. ES is the number of misclassi ed n jIzj else points of the hyperplane found by SVM, UB is the pi upper bound on the misclassi cations, as computed j arg mini;z (p)<0 //by Theorem 2.5 i pi zi(p) by our algorithm. Table 3 is a comparison of E and and Theorem 2.6 s U . The test results suggest that subspace projection for k = 1 to r do B (SP) has much lower time complexity than LP, PR and A =z (p) j;k j SVM. The time complexity of SP is less than O(nr3) A A z (p); 8 i 2 1; : : : ; n i;k i;k i due to compiler optimizations. On the spam dataset, end for Perceptron converges fast only if the initial value of E fijz (p) < 0;A = 0 8 k = 1; : : : ; rg z i i;k the vector x is chosen close to the optimal vector x£ i0 0 satisfying Ax£ > 0. for i = 1 to n do if i2 = Ez then 0 0 i i + 1 Table 1. Computation Time SVM, and SP in seconds Ai0;k Ai;k 8 k = 1; : : : ; r end if end for Data n d r SVM SP LS n n jE j z Iris1 150 4 5 0.03 0.03 YES end if Iris2 150 4 5 17.1 0.05 NO A orthonormal basis of the range of A Ion 351 33 34 22.8 0.14 NO r rank(A) Breast 277 12 10 16.7 0.05 NO N N 1 Glass1 214 9 10 46.8 0.03 NO end while Glass2 214 9 10 46.8 0.03 NO Spam1 359 57 58 307 0.33 YES UB min(jfijzi  0; i = 1; : : : ; ngj; jfijzi  0; i = 1; : : : ; ngj) //UB is the upper bound on the min- imum number of misclassi cations by any hyper- plane, by Theorem 2.7 3.2. Scalability return L; UB The algorithm checks linear separability in linear time, if the dimension is xed. A randomly labeled data set A Fast Linear Separability Test

Table 2. Computation Time LP, PR and SP in seconds 22

20

Data n d r LP PR SP LS 18 16

Iris1 150 4 5 0.26 .002 .003 YES 14

Iris2 150 4 5 0.46 ¢ .008 NO 12 Ion 351 33 34 3.95 ¢ 0.45 NO Breast 277 12 10 10.89 ¢ 0.02 NO sec in cputime 10 Glass1 214 9 10 4.33 ¢ 0.02 NO 8

Glass2 214 9 10 5.2 ¢ 0.02 NO 6 Spam1 359 57 58 3.97 14.1 1.20 YES 4

2 400 600 800 1000 1200 1400 1600 1800

number of points n Table 3. Number of misclassi cations by SVM (Es) and Subspace Projection (UB ) Figure 2. Computational Complexity of SP on a randomly

Data n d ES UB labeled dataset

Iris1 150 4 0 0 Iris2 150 4 2 4 gorithm. However, our algorithm is not incremental. Ionosphere 351 33 102 126 SP can be made incremental if incremental projection Breast 277 12 174 133 can be achieved. Another interesting problem which Glass1 214 9 98 92 can be explored is a procedure to nd a point in the Glass2 214 9 98 105 range of A with few negative co-ordinates, by taking Spam1 359 57 0 0 projections of the convex region v > 0; v 2 Rn on the range of A. with large number of points is used to verify this result. It is a collection of linearly non separable data sets 3.3.1. Convex Hulls with points in 153 dimensions. Number of points n is SP can be used to test if a separating hyperplane exists varied between 500 and 1500 in steps of 100. The plot between a nite set of points P and a single point of CPU time in seconds of our algorithm for various set Q = fv 2 Rdg. Hence SP can be used to test if values of n is presented in Figure 2. From the plot a point v is inside or outside the convex hull of the it appears that the number of computations increases set P in linear time. Consequently, an incremental almost linearly with number of points. procedure to identify the vertices of the convex hull, without constructing the faces of the convex hull can 3.3. Conclusions be developed from our algorithm. SP is a simple non-parametric algorithm. Experi- mental results and complexity estimates suggest SP Acknowledgments is faster than Linear Programming, Perceptron and We want to thank the reviewers for their valuable com- SVM. It has lower worst case time complexity than ments that signi cantly improved the presentation. Convex Hull Algorithm. On linearly non-separable problems, SVM takes considerably more time than our algorithm to nd a hyperplane with less error. Hence References SP can be used to quickly identify an initial hyperplane Barber, C. B., Dobkin, D. P., & Huhdanpaa, H. with few misclassi cations. Subsequently the hyper- (1996). The quickhull algorithm for convex hulls. plane with least misclassi cations can be obtained by ACM Transactions on Mathematical Software, 22, training a SVM, hence reducing training time of SVM. 469{483. On linearly separable problems, SP nds a separating hyperplane very fast compared to SVM if the points Cristianini, N., & Taylor, J. S. (2000). An introduction in the two classes are close. Identifying the maximum to support vector machines. Cambridge: Cambridge margin hyperplane given a separating hyperplane is University Press. geometrically not a dicult one. Hence SP can be used as a method for seeding a primal space SVM al- Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pat- A Fast Linear Separability Test

tern classi cation. New York: Wiley-Interscience. Elizondo, D. (2006). The linear separability problem: Some testing methods. IEEE Transactions on Neu- ral Networks, 17, 330{344. Gonzaga, C. C. (1988). An algorithm for solving lin- ear programming problems in O(n3L) operations. In N. Megiddo (Ed.), Progress in mathematical pro- gramming, 1{28. Springer-Verlag. Joachims, T. (1999). Making large-Scale SVM learning practical. In B. Schlkopf, C. Burges and A. Smola (Eds.), Advances in Kernel Methods - Support Vec- tor Learning. MIT-Press. Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM Training on Very Large Data Sets. Journal of Machine Learning Re- search, 6, 363{392. Wasserman, P. (1989). Neural computing theory and practice. Van Nostrand Reinhold.