Statistical Pattern Recognition: a Review

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000 Statistical Pattern Recognition: A Review Anil K. Jain, Fellow, IEEE, Robert P.W. Duin, and Jianchang Mao, Senior Member, IEEE AbstractÐThe primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field. Index TermsÐStatistical pattern recognition, classification, clustering, feature extraction, feature selection, error estimation, classifier combination, neural networks. æ 1INTRODUCTION Y the time they are five years old, most children can depend in some way on pattern recognition... We need to Brecognize digits and letters. Small characters, large pay much more explicit attention to teaching pattern characters, handwritten, machine printed, or rotatedÐall recognition.º Our goal here is to introduce pattern recogni- are easily recognized by the young. The characters may be tion as the best possible way of utilizing available sensors, written on a cluttered background, on crumpled paper or processors, and domain knowledge to make decisions may even be partially occluded. We take this ability for automatically. granted until we face the task of teaching a machine how to do the same. Pattern recognition is the study of how 1.1 What is Pattern Recognition? machines can observe the environment, learn to distinguish Automatic (machine) recognition, description, classifica- patterns of interest from their background, and make sound tion, and grouping of patterns are important problems in a and reasonable decisions about the categories of the variety of engineering and scientific disciplines such as patterns. In spite of almost 50 years of research, design of biology, psychology, medicine, marketing, computer vision, a general purpose machine pattern recognizer remains an artificial intelligence, and remote sensing. But what is a elusive goal. pattern? Watanabe [163] defines a pattern ªas opposite of a The best pattern recognizers in most instances are chaos; it is an entity, vaguely defined, that could be given a humans, yet we do not understand how humans recognize name.º For example, a pattern could be a fingerprint image, patterns. Ross [140] emphasizes the work of Nobel Laureate a handwritten cursive word, a human face, or a speech Herbert Simon whose central finding was that pattern signal. Given a pattern, its recognition/classification may recognition is critical in most human decision making tasks: consist of one of the following two tasks [163]: 1) supervised ªThe more relevant patterns at your disposal, the better classification (e.g., discriminant analysis) in which the input your decisions will be. This is hopeful news to proponents pattern is identified as a member of a predefined class, 2) unsupervised classification (e.g., clustering) in which the of artificial intelligence, since computers can surely be pattern is assigned to a hitherto unknown class. Note that taught to recognize patterns. Indeed, successful computer the recognition problem here is being posed as a classifica- programs that help banks score credit applicants, help tion or categorization task, where the classes are either doctors diagnose disease and help pilots land airplanes defined by the system designer (in supervised classification) or are learned based on the similarity of patterns (in . A.K. Jain is with the Department of Computer Science and Engineering, unsupervised classification). Michigan State University, East Lansing, MI 48824. Interest in the area of pattern recognition has been E-mail: [email protected]. renewed recently due to emerging applications which are . R.P.W. Duin is with the Department of Applied Physics, Delft University of Technology, 2600 GA Delft, the Netherlands. not only challenging but also computationally more E-mail: [email protected]. demanding (see Table 1). These applications include data . J. Mao is with the IBM Almaden Research Center, 650 Harry Road, San mining (identifying a ªpattern,º e.g., correlation, or an Jose, CA 95120. E-mail: [email protected]. outlier in millions of multidimensional patterns), document Manuscript received 23 July 1999; accepted 12 Oct. 1999. classification (efficiently searching text documents), finan- Recommended for acceptance by K. Bowyer. For information on obtaining reprints of this article, please send e-mail to: cial forecasting, organization and retrieval of multimedia [email protected], and reference IEEECS Log Number 110296. databases, and biometrics (personal identification based on 0162-8828/00/$10.00 ß 2000 IEEE JAIN ET AL.: STATISTICAL PATTERN RECOGNITION: A REVIEW 5 TABLE 1 Examples of Pattern Recognition Applications various physical attributes such as face and fingerprints). well-defined and sufficiently constrained recognition pro- Picard [125] has identified a novel application of pattern blem (small intraclass variations and large interclass recognition, called affective computing which will give a variations) will lead to a compact pattern representation computer the ability to recognize and express emotions, to and a simple decision making strategy. Learning from a set respond intelligently to human emotion, and to employ of examples (training set) is an important and desired mechanisms of emotion that contribute to rational decision attribute of most pattern recognition systems. The four best making. A common characteristic of a number of these known approaches for pattern recognition are: 1) template applications is that the available features (typically, in the matching, 2) statistical classification, 3) syntactic or struc- thousands) are not usually suggested by domain experts, tural matching, and 4) neural networks. These models are but must be extracted and optimized by data-driven not necessarily independent and sometimes the same procedures. pattern recognition method exists with different interpreta- The rapidly growing and available computing power, tions. Attempts have been made to design hybrid systems while enabling faster processing of huge data sets, has also involving multiple models [57]. A brief description and facilitated the use of elaborate and diverse methods for data comparison of these approaches is given below and analysis and classification. At the same time, demands on summarized in Table 2. automatic pattern recognition systems are rising enor- 1.2 Template Matching mously due to the availability of large databases and One of the simplest and earliest approaches to pattern stringent performance requirements (speed, accuracy, and recognition is based on template matching. Matching is a cost). In many of the emerging applications, it is clear that generic operation in pattern recognition which is used to no single approach for classification is ªoptimalº and that determine the similarity between two entities (points, multiple methods and approaches have to be used. curves, or shapes) of the same type. In template matching, Consequently, combining several sensing modalities and a template (typically, a 2D shape) or a prototype of the classifiers is now a commonly used practice in pattern pattern to be recognized is available. The pattern to be recognition. recognized is matched against the stored template while The design of a pattern recognition system essentially taking into account all allowable pose (translation and involves the following three aspects: 1) data acquisition and rotation) and scale changes. The similarity measure, often a preprocessing, 2) data representation, and 3) decision correlation, may be optimized based on the available making. The problem domain dictates the choice of training set. Often, the template itself is learned from the sensor(s), preprocessing technique, representation scheme, training set. Template matching is computationally de- and the decision making model. It is generally agreed that a manding, but the availability of faster processors has now 6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 22, NO. 1, JANUARY 2000 TABLE 2 Pattern Recognition Models made this approach more feasible. The rigid template 1.4 Syntactic Approach matching mentioned above, while effective in some In many recognition problems involving complex patterns, application domains, has a number of disadvantages. For

Statistical Pattern Recognition: a Review

Crash Prediction and Collision Avoidance Using Hidden

Probabilistic Sequence Models with Speech and Language Applications

Unsupervised Language Acquisition: Theory and Practice

Simplicity: a Unifying Principle in Cognitive Science?

CSCI 5521: Pattern Recognition Prof

Epurdy-Phd-Dissertation.Pdf

Cveigh, E., Kass, D., Miller, M.I., Winslow, R.L

Generating Open World Descriptions of Video Using Commonsense

NEW BOOKS Pattern Theory: the Stochastic Analysis of Real-World

Pattern Theory: the Mathematics of Perception

Discriminative, Descriptive and Generative Models

Markov Chain Monte Carlo Methods for Statistical Inference Julian Besag1 Department of Statistics University of Washington, Seattle, USA