Training Multilayered Perceptrons for Pattern Recognition: a Comparative Study of Four Training Algorithms D.T

International Journal of Machine Tools & Manufacture 41 (2001) 419–430 Training multilayered perceptrons for pattern recognition: a comparative study of four training algorithms D.T. Pham a,*, S. Sagiroglu b a Intelligent Systems Laboratory, Manufacturing Engineering Centre, Cardiff University, P.O. Box 688, Cardiff CF24 3TE, UK b Intelligent Systems Research Group, Computer Engineering Department, Faculty of Engineering, Erciyes University, Kayseri 38039, Turkey Received 25 May 2000; accepted 18 July 2000 Abstract This paper presents an overview of four algorithms used for training multilayered perceptron (MLP) neural networks and the results of applying those algorithms to teach different MLPs to recognise control chart patterns and classify wood veneer defects. The algorithms studied are Backpropagation (BP), Quick- prop (QP), Delta-Bar-Delta (DBD) and Extended-Delta-Bar-Delta (EDBD). The results show that, overall, BP was the best algorithm for the two applications tested. 2001 Elsevier Science Ltd. All rights reserved. Keywords: Multilayered perceptrons; Training algorithms; Control chart pattern recognition; Wood inspection 1. Introduction Artificial neural networks have been applied to different areas of engineering such as dynamic system identification [1], complex system modelling [2,3], control [1,4] and design [5]. An important group of applications has been classification [4,6–13]. Both supervised neural networks [8,11–13] and unsupervised networks [9,14] have proved good alternatives to traditional pattern recognition systems because of features such as the ability to learn and generalise, smaller training set requirements, fast operation, ease of implementation and tolerance of noisy data. However, of all available types of network, multilayered perceptrons (MLPs) are the most commonly adopted. There are many algorithms for training MLP networks [3,4,6,15–26]. The popular back- * Corresponding author. Tel.: +44-029-20-874429; fax: +44-029-20-874003. E-mail address: [email protected] (D.T. Pham). 0890-6955/01/$ - see front matter 2001 Elsevier Science Ltd. All rights reserved. PII: S0890-6955(00)00073-0 420 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 Nomenclature a amplitude of cyclic variations in a cyclic pattern A, Aa, Am positive constants ALECO algorithm for learning efficiently with constrained optimization AVI automated visual inspection b shift position in an upward shift pattern and a downward shift pattern BP backpropagation CG conjugate gradient DBD delta-bar-delta Ϫ Dji(k 1) weighted average of gradients of error surface E(wji) sum of squared errors EDBD extended-delta-bar-delta f activation function g gradient of an increasing trend pattern or a decreasing trend pattern k iteration number M number of patterns in the training set MLP multilayered perceptron N number of outputs p(t) value of the sampled data point at time t QP quick propagation r(·) a function that generates random numbers rmse root-mean-squared error s magnitude of the shift t discrete time at which the monitored process variable is sampled T period of a cycle in a cyclic pattern wji weight of connection between neurons i and j xi input signal ydj desired output value of neuron j yj actual output value of neuron i yj output of neuron j a learning coefficient amax preset upper bound of a g maximum growth factor ga, gm positive coefficients d weight decay coefficient ⌬ wji change to weight wji e quadratic term coefficient h nominal mean value of the process variable under observation q positive coefficient m momentum coefficient mmax preset upper bound of m s standard deviation of the process variable under observation j, ja, jm positive coefficients D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 421 propagation (BP) algorithm is simple but reportedly has a problem with slow convergence. Vari- ous related algorithms have been introduced to address that problem [16–18]. A number of researchers have carried out comparative studies of MLP training algorithms. For example, Charalambous [19] evaluated BP against the conjugate gradient (CG) algorithm. The results for three test problems showed that the performance of CG was superior to that of standard BP. Karras and Perantonis [20] assessed five learning algorithms, quick propagation (QP), on- line and off-line BPs, delta-bar-delta (DBD) and an algorithm known as ALECO (algorithm for learning efficiently with constrained optimization). ALECO gave the best performance on several simple test problems. This paper reports on an evaluation of four algorithms on the training of different MLPs for two realistic industrially oriented applications. The four algorithms studied were: standard BP, QP, DBD and extended DBD (EDBD). QP, DBD and EDBD were chosen because they were better known relations of the standard BP algorithm. The test applications were recognition of control chart patterns and classification of wood veneer defects. The paper briefly describes the algorithms and the benchmark applications, and presents the results of the evaluation. 2. MLP training algorithms As shown in Fig. 1, an MLP consists of three types of layer: an input layer, an output layer and one or more hidden layers. Neurons in the input layer act only as buffers for distributing the input signal xi to neurons in the hidden layer. Each neuron j in the hidden layer sums its input signals xi after multiplying them by the strengths of the respective connection weights wji and computes its output yj as a function of the sum, i.e., ϭ ͸ yj f( wjixi), (1) where f is usually a sigmoidal or hyperbolic tangent function. The outputs of neurons in the output layer are computed similarly. Fig. 1. Backpropagation multilayer perceptron. 422 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 Training a network consists of adjusting its weights using a training algorithm. The training algorithms adopted in this study optimise the weights by attempting to minimise the sum of squared differences between the desired and actual values of the output neurons, namely: 1 Eϭ ͸(y Ϫy )2, (2) 2 dj j j where ydj is the desired value of output neuron j and yj is the actual output of that neuron. ⌬ ⌬ Each weight wji is adjusted by adding an increment wji to it. wji is selected to reduce E as rapidly as possible. The adjustment is carried out over several training iterations until a satisfac- ⌬ torily small value of E is obtained or a given number of iterations is reached. How wji is computed depends on the training algorithm adopted. 2.1. BP algorithm ⌬ The BP algorithm [15] gives the change wji(k) in the weight of the connection between neurons i and j at iteration k as: ∂ ⌬ ϭϪ E ϩ ⌬ Ϫ wji(k) a∂ m wji(k 1), (3) wji(k) ⌬ Ϫ where a is called the learning coefficient, m the momentum coefficient and wji(k 1) the weight change in the immediately preceding iteration. Training an MLP by BP involves presenting it sequentially with all training tuples (input, target output). Differences between the target output yd(k) and the actual output y(k) of the MLP are propagated back through the network to adapt its weights. A training iteration is completed after a tuple in the training set has been presented to the network and the weights updated. 2.2. QP algorithm QP was developed as a method of improving the rate of convergence to a minimum value of E(wji) by using information about the curvature of the error surface E(wji) [16]. ∂ ∂ An underlying assumption of QP is that E(wji) is a paraboloid. QP employs the gradient E/ wji Ϫ ⌬ at two points wji(k) and wji(k 1) to find the minimum of E. The weight change wji(k) is computed as follows: ∂ ⌬ ϭϪ E ϩ ⌬ Ϫ wji(k) a∂ m wji(k 1), (4) wji(k) where ∂ ∂ ϭ E/ wji(k) m ∂ ∂ − − ∂ ∂ . [ E/ wji(k 1)] [ E/ wji(k)] Note that the momentum coefficient m is variable and thus QP can be viewed as a form of BP D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 423 that employs a dynamic momentum coefficient. Another interpretation of QP is obtained by rearranging Eq. (4) as follows: ⌬w (k−1) ∂E ⌬ ϭͩϪ ϩ ji ͪ wji(k) a ∂ ∂ − − ∂ ∂ ∂ . (5) [ E/ wji(k 1)] [ E/ wji(k)] wji(k) This shows that QP is a variation of BP where the learning coefficient is adaptable. Further details on QP are given in Ref. [26], which describes the heuristics adopted to prevent the search for a minimum value of E(wji) from proceeding in the wrong direction. 2.3. DBD algorithm Aimed at speeding up the training of MLPs, DBD [17] is based on the hypothesis that a learning coefficient suitable for one weight may not be appropriate for all weights. By assigning a learning coefficient to each weight and permitting it to change over time, more freedom is introduced to facilitate convergence towards a minimum value of E(wji). The weight change is given by: ∂ ⌬ ϭϪ E wji(k) aji(k)∂ , (6) wji(k) where aji(k) is the learning coefficient assigned to the connection from neuron i to j. The learning coefficient change is given as: ∂ − E Ͼ A if Dji(k 1)∂ 0 wji(k) ⌬ ϭ ∂E aji(k) − − Ͻ (7) jaji(k)ifDji(k 1)∂ 0 Ά wji(k) 0 otherwise Ϫ ∂ ∂ Ϫ ∂ ∂ Ϫ where Dji(k 1) represents a weighted average of E/[ wji(k 1)] and E/[ wji(k 2)] given by ∂ ∂ Ϫ ϭ Ϫ E ϩ E Dji(k 1) (1 q)∂ − q∂ − , (8) wji(k 1) wji(k 2) and j, q and A are positive constants. From Eq.

Training Multilayered Perceptrons for Pattern Recognition: a Comparative Study of Four Training Algorithms D.T

PATTERN RECOGNITION LETTERS an Official Publication of the International Association for Pattern Recognition

A Wavenet for Speech Denoising

An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods

Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J

Training Generative Adversarial Networks in One Stage

Deep Neural Networks for Pattern Recognition

Mixed Pattern Recognition Methodology on Wafer Maps with Pre-Trained Convolutional Neural Networks

Introduction to Reinforcement Learning: Q Learning Lecture 17

Image Classification by Reinforcement Learning with Two-State Q-Learning

Chapter 1 Pattern Recognition in Time Series

Clustering Driven Deep Autoencoder for Video Anomaly Detection

Pattern Recognition by a Distributed Neural Network: an Industrial Application