Gradient-Based Learning Applied to Document Recognition

PROC OF THE IEEE NOVEMBER GradientBased Learning Applied to Do cument Recognition Yann LeCun Leon Bottou Yoshua Bengio and Patrick Haner Abstract I Introduction Multilayer Neural Networks trained with the backpropa gation algorithm constitute the b est example of a successful Over the last several years machine learning techniques GradientBased Learning technique Given an appropriate particularly when applied to neural networks haveplayed network architecture GradientBased Learning algorithms an increasingly imp ortant role in the design of pattern can b e used to synthesize a complex decision surface that can recognition systems In fact it could be argued that the classify highdimensional patterns such as handwritten char acters with minimal prepro cessing This pap er reviews var availability of learning techniques has b een a crucial fac ious metho ds applied to handwritten character recognition tor in the recent success of pattern recognition applica and compares them on a standard handwritten digit recog tions suchascontinuous sp eech recognition and handwrit nition task Convolutional Neural Networks that are sp ecif ically designed to deal with the variability of D shap es are ing recognition shown to outp erform all other techniques The main message of this pap er is that b etter pattern Reallife do cument recognition systems are comp osed recognition systems can b e built by relying more on auto of multiple mo dules including eld extraction segmenta tion recognition and language mo deling A new learning matic learning and less on handdesigned heuristics This paradigm called Graph Transformer Networks GTN al is made p ossible by recent progress in machine learning ultimo dule systems to b e trained globally using lows suchm and computer technology Using character recognition as GradientBased metho ds so as to minimize an overall p er a case study we show that handcrafted feature extrac formance measure Two systems for online handwriting recognition are de tion can b e advantageously replaced by carefully designed scrib ed Exp eriments demonstrate the advantage of global learning machines that op erate directly on pixel images training and the exibility of Graph Transformer Networks Using do cument understanding as a case study we show A Graph Transformer Network for reading bank checkis that the traditional way of building recognition systems by also describ ed It uses Convolutional Neural Network char acter recognizers combined with global training techniques manually integrating individually designed mo dules can b e to provides record accuracy on business and p ersonal checks replaced by a unied and wellprincipled design paradigm It is deployed commercially and reads several million checks called Graph Transformer Networks that allows training per day all the mo dules to optimize a global p erformance criterion Keywords Neural Networks OCR Do cument Recogni tion Machine Learning GradientBased Learning Convo Since the early days of pattern recognition it has been lutional Neural Networks Graph Transformer Networks Fi known that the variability and richness of natural data nite State Transducers be it sp eech glyphs or other typ es of patterns make it almost imp ossible to build an accurate recognition system entirely by hand Consequen tly most pattern recognition Nomenclature systems are built using a combination of automatic learn GT Graph transformer ing techniques and handcrafted algorithms The usual GTN Graph transformer network metho d of recognizing individual patterns consists in divid HMM Hidden Markov mo del ing the system into twomain mo dules shown in gure HOS Heuristic oversegmentation The rst mo dule called the feature extractor transforms KNN Knearest neighb or the input patterns so that they can b e represented bylow NN Neural network dimensional vectors or short strings of symb ols that a can OCR Optical character recognition b e easily matched or compared and b are relatively in PCA Principal comp onent analysis variant with resp ect to transformations and distortions of RBF Radial basis function the input patterns that do not change their nature The RSSVM Reducedset supp ort vector metho d feature extractor contains most of the prior knowledge and SDNN Space displacement neural network is rather sp ecic to the task It is also the fo cus of most of SVM Supp ort vector metho d the design eort b ecause it is often entirely handcrafted TDNN Time delay neural network The classier on the other hand is often generalpurp ose VSVM Virtual supp ort vector metho d and trainable One of the main problems with this ap proach is that the recognition accuracy is largely deter The authors are with the Sp eech and Image Pro mined by the ability of the designer to come up with an cessing Services Research Lab oratory ATT Labs appropriate set of features This turns out to b e a daunt Research Schulz Drive Red Bank NJ Email fyannleonbyoshuahanergresearchattcom Yoshua Bengio ing task which unfortunatelymust b e redone for eachnew is also with the Departement dInformatique et de Recherche problem A large amount of the pattern recognition liter Operationelle UniversitedeMontreal CP Succ CentreVille Chemin de la Tour Montreal Qu eb ec Canada HC J ature is devoted to describing and comparing the relative PROC OF THE IEEE NOVEMBER Class scores manipulate directed graphs This leads to the concept of ransformer Network GTN also intro trainable Graph T in Section IV Section V describ es the now clas TRAINABLE CLASSIFIER MODULE duced sical metho d of heuristic oversegmentation for recogniz words or other character strings Discriminative and Feature vector ing nondiscriminative gradientbased techniques for training a recognizer at the word level without requiring manual FEATURE EXTRACTION MODULE segmentation and lab eling are presented in Section VI Sec tion VI I presents the promising SpaceDisplacementNeu Raw input ral Network approach that eliminates the need for seg mentation heuristics by scanning a recognizer at all pos Fig Traditional pattern recognition is p erformed with twomod sible lo cations on the input In section VIII it is shown ules a xed feature extractor and a trainable classier that trainable Graph Transformer Networks can be for mulated as multiple generalized transductions based on a general graph comp osition algorithm The connections b e merits of dierent feature sets for particular tasks tween GTNs and Hidden Markov Mo dels commonly used Historically the need for appropriate feature extractors in sp eech recognition is also treated Section IX describ es was due to the fact that the learning techniques used by a globally trained GTN system for recognizing handwrit the classiers were limited to lowdimensional spaces with ing entered in a p en computer This problem is known as easily separable classes Acombination of three factors online handwriting recognition since the machine must have changed this vision over the last decade First the pro duce immediate feedback as the user writes The core of availabilityoflowcost machines with fast arithmetic units the system is a Convolutional Neural Network The results allows to rely more on bruteforce numerical metho ds clearly demonstrate the advantages of training a recognizer than on algorithmic renements Second the availability at the word level rather than training it on presegmented of large databases for problems with a large market and handlab eled isolated characters Section X describ es a wide interest such as handwriting recognition has enabled complete GTNbased system for reading handwritten and designers to rely more on real data and less on handcrafted machineprinted bank checks The core of the system is the feature extraction to build recognition systems The third Convolutional Neural Network called LeNet describ ed in and very imp ortantfactoristheavailabilityofpowerful ma Section I I This system is in commercial use in the NCR chine learning techniques that can handle highdimensional Corp oration line of check recognition systems for the bank inputs and can generate intricate decision functions when ing industry It is reading millions of checks p er month in fed with these large data sets It can b e argued that the several banks across the United States recent progress in the accuracy of sp eech and handwriting recognition systems can b e attributed in large part to an A Learning from Data increased reliance on learning techniques and large training data sets As evidence to this fact a large prop ortion of There are several approaches to automatic machine mo dern commercial OCR systems use some form of multi learning but one of the most successful approaches p op layer Neural Network trained with backpropagation ularized in recentyears by the neural network community In this studywe consider the tasks of handwritten char can b e called numerical or gradientbased learning The p p acter recognition Sections I and I I and compare the p er learning machine computes a function Y F Z W p formance of several learning techniques on a benchmark where Z is the pth input pattern and W represents the data set for handwritten digit recognition Section I I I collection of adjustable parameters in the system In a p While more automatic learning is b enecial no learning pattern recognition setting the output Y may be inter p technique can succeed without a minimal amount of prior preted as the recognized class lab el of pattern Z or as knowledge ab out the task In the case of multilayer neu scores or probabilities asso ciated with each

Gradient-Based Learning Applied to Document Recognition

Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

NVIDIA CEO Jensen Huang to Host AI Pioneers Yoshua Bengio, Geoffrey Hinton and Yann Lecun, and Others, at GTC21

On Recurrent and Deep Neural Networks

Generative Adversarial Networks (Gans)

Arxiv:2008.08516V1 [Cs.LG] 19 Aug 2020

Robust Deep Learning: a Case Study Victor Estrade, Cécile Germain, Isabelle Guyon, David Rousseau

Yoshua Bengio and Gary Marcus on the Best Way Forward for AI

Mlcheck– Property-Driven Testing of Machine Learning Models

Hello, It's GPT-2

Persian Handwritten Digit Recognition Using Combination of Convolutional Neural Network and Support Vector Machine Methods

Hierarchical Multiscale Recurrent Neural Networks

Dynamic Factor Graphs for Time Series Modeling