United States Patent 19 11 Patent Number: 5,640,492 Cortes Et Al

US005640492A United States Patent 19 11 Patent Number: 5,640,492 Cortes et al. 45 Date of Patent: Jun. 17, 1997 54). SOFT MARGIN CLASSIFIER B.E. Boser, I. Goyon, and V.N. Vapnik, “A Training Algo rithm. For Optimal Margin Classifiers”. Proceedings Of The 75 Inventors: Corinna Cortes, New York, N.Y.: 4-th Workshop of Computational Learning Theory, vol. 4, Vladimir Vapnik, Middletown, N.J. San Mateo, CA, Morgan Kaufman, 1992. 73) Assignee: Lucent Technologies Inc., Murray Hill, Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, N.J. W. Hubbard, and L. Jackel, "Handwritten Digit Recognition with a Back-Propagation Network', (D. Touretzky, Ed.), Advances in Neural Information Processing Systems, vol. 2, 21 Appl. No.: 268.361 Morgan Kaufman, 1990. 22 Filed: Jun. 30, 1994 A.N. Referes et al., “Stock Ranking: Neural Networks Vs. (51] Int. Cl. ................ G06E 1/00; G06E 3100 Multiple Linear Regression':, IEEE International Conf. on 52 U.S. Cl. .................................. 395/23 Neural Networks, vol. 3, San Francisco, CA, IEEE, pp. 58) Field of Search ............................................ 395/23 1419-1426, 1993. M. Rebollo et al., “AMixed Integer Programming Approach 56) References Cited to Multi-Spectral Image Classification”, Pattern Recogni U.S. PATENT DOCUMENTS tion, vol. 9, No. 1, Jan. 1977, pp. 47-51, 54-55, and 57. 4,120,049 10/1978 Thaler et al. ........................... 365/230 V. Uebele et al., "Extracting Fuzzy Rules From Pattern 4,122,443 10/1978 Thaler et al. ....... 340/.46.3 Classification Neural Networks”, Proceedings From 1993 5,040,214 8/1991 Grossberg et al. ....................... 381/43 International Conference on Systems, Man and Cybernetics, 5,214,716 5/1993 Refregier et al... ... 382.f42 LeTouquet, France, 17-20, Oct. 1993, vol. 2, pp. 578-583. 5,239,594 8/1993 Yoda ......................................... 382/15 5,239,619 8/1993 Takatori et al. .......................... 395/23 C. Cortes et al., "Support-Vector Networks”, Machine 5,245,696 9/1993 Stork et al. ............................... 395/13 Learning, vol. 20, No. 3, Sep. 1995, pp. 273–297. 5,263,124 11/1993 Weaver et al............................. 382/37 5,271,090 12/1993 Boser ........................................ 395/21 5,333,209 7/1994 Sinden et al. ............................. 382/13 Primary Examiner-Robert W. Downs Assistant Examiner A. Katbab OTHER PUBLICATIONS 57 ABSTRACT Bottou et al., "Comparison of classifier methods: A case study in Handwritten digit recognition,” Proc. of 12th IAPR A soft margin classifier and method are disclosed for pro Conf., Oct. 9-13, 1994, pp. 77-82. cessing input data of a training set into classes separated by Schulmeister, B, "The piecewise linear classifier DIPOL92.” soft margins adjacent optimal hyperplanes. Slack variables Machine Learning ECML-94. pp. 411-414. are provided, allowing erroneous or difficult data in the R. Courant and D. Hilbert, Methods of Mathematical Phys training set to be taken into account in determining the ics, Interscience: New York, pp. 122-141, 1953. optimal hyperplane. Inseparable data in the training set are D.G. Luenberger, Linear and Non-Linear Programming, separated without removal of data obstructing separation by Addison-Wesley: Reading, MA, pp. 326-330, 1984. determining the optimal hyperplane having minimal number V.N. Vapnik, Estimation of Dependencies Based on Empiri of erroneous classifications of the obstructing data. The cal Data, Springer-Verlag: New York, pp. 355-367, 1982. parameters of the optimal hyperplane generated from the M. Aizerman, E. Braverman, and L. Rozonoer, "Theoretical training set determine decision functions or separators for Foundations of The Potential Function Method in Pattern classifying empirical data. Recognition Learning', Automation and Remote Control, vol. 25, pp. 821-837, Jun. 1964. 31 Claims, 7 Drawing Sheets U.S. Patent Jun. 17, 1997 Sheet 1 of 7 5,640,492 s U.S. Patent Jun. 17, 1997 Sheet 3 of 7 5,640,492 FIG. 3 60 |START CLASSIFICATION PROCEDURE RECEIVE DATA VECTORS OF A TRAINING 65 SET FROM A TRS PATTERN DATA 70 PARSE RECEIVED DATA VECTORS TO OBTAIN CLASS LABELS RECEIVE AN INPUT SEPARATOR VALUE 75 REPRESENTING A CLASS FOR CLASSIFICATION OF THE DATA VECTORS 80 DETERMINE PARAMETERS OF A SEPARATOR CLASSIFYING DATA WECTORS IN THE CLASS ACCORDING TO THE SEPARATOR VALUE 85 TRANSFORM INPUT DATA VECTORS DETERMINE PARAMETERS OF AN OPTIMAL 90 MULTIDIMENSIONAL SURFACE SEPARATING DATA WECTORS BY SOFT MARGINS USING SLACK WARIABLES DETERMINE A DECISION FUNCTION FROM 95 THE PARAMETERS CORRESPONDING TO THE SEPARATOR VALUE GENERATE A CLASSIFICATION SIGNAL /100 FROM THE DECISION FUNCTION TO INDICATE MEMBERSHIP STATUS OF EACH DATA WECTOR IN THE CLASS 05 OUTPUT THE CLASSIFICATION SIGNAL U.S. Patent Jun. 17, 1997 Sheet 4 of 7 5,640,492 FIG. 4 O START DETERMINING PARAMETERS OF OPTIMAL MULTIDIMENSIONAL SURFACE 15 MINIMIZE A COST FUNCTION BY GUADRATIC OPTIMIZATION 20 DETERMINE WEIGHT VECTOR AND BIAS 25 DETERMINE MINIMUM NON-NEGATIVE SLACKWARIABLESSATISFYING SOFT MARGIN CONSTRAINTS 30 RETURN U.S. Patent 5,640,492 U.S. Patent Jun. 17, 1997 Sheet 6 of 7 5,640,492 U.S. Patent Jun. 17, 1997 Sheet 7 of 7 5,640,492 s E u 5,640.492 1. 2 SOFT MARGIN CLASS FER to a decision surface to locate and remove the obstructing data such as erroneous, outlying, or difficult training pat BACKGROUND OF THE INVENTON terns. 1. Field of the Invention It is preferable to absorb such separation obstructing 5 training patterns within soft margins between classes and to This disclosure relates to automated data classifiers. In classify training patterns that may not be linearly separable particular, this disclosure relates to an apparatus and method using a global approach in locating such separation obstruct for performing two-group classification of input data in ing training patterns. It is also advantageous to implement a automated data processing applications. training method which avoids restarting of detecting difficult 2. Description of the Related Art O training patterns for pruning. Automated systems for data classifying application such as, for example, pattern identification and optical character SUMMARY recognition, process sets of input data by dividing the input A soft margin classification system is disclosed for dif data into more readily processible subsets. Such data pro ferentiating data vectors to produce a classification signal cessing employs at least two-group classifications; i.e. the 15 indicating membership status of each data vector in class. classifying of input data into two subsets. The soft margin classification system includes a processing As known in the art, some learning systems such as unit having memory for storing the data vectors in a training artificial neural networks (ANN) require training frominput set; stored programs including a data vector processing training data to allow the trained learning systems to per program; and a processor controlled by the stored programs. form on empirical data within a predetermined error toler The processor includes determining means for determining ance. In one example, as described in Y. Le Cun et al., parameters including slack variables from the data vectors, "Handwritten Digit Recognition with a Back-propagation the parameters representing a multidimensional surface dif Network” (D. Touretzky, Ed.), ADVANCES IN NEURAL ferentiating the data vectors with respect to the class; and INFORMATION PROCESSING SYSTEMS, Volume 2, means for generating the classification signal from the data Morgan Kaufman, 1990; a five layer back propagation 25 vectors and the parameters. neural networkis applied to handwritten digit recognition on a U.S. Postal Service database of 16x16 pixel bit-mapped The generating means evaluates a decision function of the digits containing 7300 training patterns and 2000 test pat parameters and each data vector to indicate membership of terns recorded from actual mail. a respective data vector in the class to generate the classi 30 fication signal. The determining means determines the One classification method known in the artis the Optimal parameters including a weight vector and a bias for all of the Margin Classifier (OMC) procedure, described in B. E. data vectors in a training set, and determines a minimum Boser, I. Goyan, and V.N. Vapnik, A Training Algorithm for non-negative value of the slack variables for each data Optimal Margin Classifiers, PROCEEDINGS OF THE vector from a plurality of constraints. FOURTH WORKSHOP OF COMPUTATIONAL LEARN. 35 The determining means minimizes a cost function to ING THEORY, Vol. 4, Morgan Kaufman, San Mateo, Calif. satisfy a plurality of constraints. The determining means 1992. An application of the OMC method is described in determines the weight vector and bias representing an opti commonly assigned U.S. patent application No. 08/097.785, mal hyperplane separating classes A, B. An input device is filed Jul. 27, 1993, and entitled AN OPTEMAL MARGEN provided for inputting the training set of data vectors, and MEMORYBASED DECISION SYSTEM, which is incor 40 the processor includes means for transforming the data porated herein by reference. vectors using convolution of dot products. Generally, using a set of vectors in n-dimensional space as A method is also disclosed for differentiating pattern input data, the OMC classifies the input data with non-linear vectors to indicate membership in a class comprising the decision surfaces, where the input patterns undergo a non steps of storing data vectors in memory; processing the data linear transformation to a new space using convolution of 45 vectors using stored programs including a data vector pro dot products for linear separation by optimal hyperplanes in cessing program; determining parameters including slack the transformed space, such as shown in F.G. 1 for two variables from the data vectors, the parameters representing dimensional vectors in classes indicated by X's and O's. In a multidimensional surface differentiating the data vectors this disclosure the term "hyperplane” means an with respect to a class; and generating a classification signal n-dimensional surface and includes 1-dimensional and 50 2-dimensional surfaces; i.e.

Load more