International Journal of Machine Tools & Manufacture 41 (2001) 419–430

Training multilayered for pattern recognition: a comparative study of four training algorithms D.T. Pham a,*, S. Sagiroglu b a Intelligent Systems Laboratory, Manufacturing Engineering Centre, Cardiff University, P.O. Box 688, Cardiff CF24 3TE, UK b Intelligent Systems Research Group, Engineering Department, Faculty of Engineering, Erciyes University, Kayseri 38039, Turkey Received 25 May 2000; accepted 18 July 2000

Abstract

This paper presents an overview of four algorithms used for training multilayered (MLP) neural networks and the results of applying those algorithms to teach different MLPs to recognise control chart patterns and classify wood veneer defects. The algorithms studied are (BP), Quick- prop (QP), Delta-Bar-Delta (DBD) and Extended-Delta-Bar-Delta (EDBD). The results show that, overall, BP was the best algorithm for the two applications tested.  2001 Elsevier Science Ltd. All rights reserved.

Keywords: Multilayered perceptrons; Training algorithms; Control chart pattern recognition; Wood inspection

1. Introduction

Artificial neural networks have been applied to different areas of engineering such as dynamic system identification [1], complex system modelling [2,3], control [1,4] and design [5]. An important group of applications has been classification [4,6–13]. Both supervised neural networks [8,11–13] and unsupervised networks [9,14] have proved good alternatives to traditional pattern recognition systems because of features such as the ability to learn and generalise, smaller training set requirements, fast operation, ease of implementation and tolerance of noisy . However, of all available types of network, multilayered perceptrons (MLPs) are the most commonly adopted. There are many algorithms for training MLP networks [3,4,6,15–26]. The popular back-

* Corresponding author. Tel.: +44-029-20-874429; fax: +44-029-20-874003. E-mail address: [email protected] (D.T. Pham).

0890-6955/01/$ - see front matter  2001 Elsevier Science Ltd. All rights reserved. PII: S0890-6955(00)00073-0 420 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

Nomenclature a amplitude of cyclic variations in a cyclic pattern

A, Aa, Am positive constants ALECO algorithm for learning efficiently with constrained optimization AVI automated visual inspection b shift position in an upward shift pattern and a downward shift pattern BP backpropagation CG conjugate gradient DBD delta-bar-delta Ϫ Dji(k 1) weighted average of gradients of error surface E(wji) sum of squared errors EDBD extended-delta-bar-delta f g gradient of an increasing trend pattern or a decreasing trend pattern k iteration number M number of patterns in the training set MLP multilayered perceptron N number of outputs p(t) value of the sampled data point at time t QP quick propagation r(·) a function that generates random numbers rmse root-mean-squared error s magnitude of the shift t discrete time at which the monitored process variable is sampled T period of a cycle in a cyclic pattern

wji weight of connection between neurons i and j xi input signal ydj desired output value of neuron j yj actual output value of neuron i yj output of neuron j a learning coefficient

amax preset upper bound of a g maximum growth factor

ga, gm positive coefficients d weight decay coefficient ⌬ wji change to weight wji e quadratic term coefficient h nominal mean value of the process variable under observation q positive coefficient m momentum coefficient

mmax preset upper bound of m s standard deviation of the process variable under observation

j, ja, jm positive coefficients D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 421 propagation (BP) algorithm is simple but reportedly has a problem with slow convergence. Vari- ous related algorithms have been introduced to address that problem [16–18]. A number of researchers have carried out comparative studies of MLP training algorithms. For example, Charalambous [19] evaluated BP against the conjugate gradient (CG) algorithm. The results for three test problems showed that the performance of CG was superior to that of standard BP. Karras and Perantonis [20] assessed five learning algorithms, quick propagation (QP), on- line and off-line BPs, delta-bar-delta (DBD) and an algorithm known as ALECO (algorithm for learning efficiently with constrained optimization). ALECO gave the best performance on several simple test problems. This paper reports on an evaluation of four algorithms on the training of different MLPs for two realistic industrially oriented applications. The four algorithms studied were: standard BP, QP, DBD and extended DBD (EDBD). QP, DBD and EDBD were chosen because they were better known relations of the standard BP algorithm. The test applications were recognition of control chart patterns and classification of wood veneer defects. The paper briefly describes the algorithms and the benchmark applications, and presents the results of the evaluation.

2. MLP training algorithms

As shown in Fig. 1, an MLP consists of three types of : an input layer, an output layer and one or more hidden layers. Neurons in the input layer act only as buffers for distributing the input signal xi to neurons in the hidden layer. Each neuron j in the hidden layer sums its input signals xi after multiplying them by the strengths of the respective connection weights wji and computes its output yj as a function of the sum, i.e., ϭ ͸ yj f( wjixi), (1) where f is usually a sigmoidal or hyperbolic tangent function. The outputs of neurons in the output layer are computed similarly.

Fig. 1. Backpropagation . 422 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

Training a network consists of adjusting its weights using a training algorithm. The training algorithms adopted in this study optimise the weights by attempting to minimise the sum of squared differences between the desired and actual values of the output neurons, namely: 1 Eϭ ͸(y Ϫy )2, (2) 2 dj j j where ydj is the desired value of output neuron j and yj is the actual output of that neuron. ⌬ ⌬ Each weight wji is adjusted by adding an increment wji to it. wji is selected to reduce E as rapidly as possible. The adjustment is carried out over several training iterations until a satisfac- ⌬ torily small value of E is obtained or a given number of iterations is reached. How wji is computed depends on the training algorithm adopted.

2.1. BP algorithm

⌬ The BP algorithm [15] gives the change wji(k) in the weight of the connection between neurons i and j at iteration k as: ∂ ⌬ ϭϪ E ϩ ⌬ Ϫ wji(k) a∂ m wji(k 1), (3) wji(k) ⌬ Ϫ where a is called the learning coefficient, m the momentum coefficient and wji(k 1) the weight change in the immediately preceding iteration. Training an MLP by BP involves presenting it sequentially with all training tuples (input, target output). Differences between the target output yd(k) and the actual output y(k) of the MLP are propagated back through the network to adapt its weights. A training iteration is completed after a tuple in the training set has been presented to the network and the weights updated.

2.2. QP algorithm

QP was developed as a method of improving the rate of convergence to a minimum value of

E(wji) by using about the curvature of the error surface E(wji) [16]. ∂ ∂ An underlying assumption of QP is that E(wji) is a paraboloid. QP employs the gradient E/ wji Ϫ ⌬ at two points wji(k) and wji(k 1) to find the minimum of E. The weight change wji(k) is computed as follows: ∂ ⌬ ϭϪ E ϩ ⌬ Ϫ wji(k) a∂ m wji(k 1), (4) wji(k) where ∂ ∂ ϭ E/ wji(k) m ∂ ∂ − − ∂ ∂ . [ E/ wji(k 1)] [ E/ wji(k)]

Note that the momentum coefficient m is variable and thus QP can be viewed as a form of BP D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 423 that employs a dynamic momentum coefficient. Another interpretation of QP is obtained by rearranging Eq. (4) as follows: ⌬w (k−1) ∂E ⌬ ϭͩϪ ϩ ji ͪ wji(k) a ∂ ∂ − − ∂ ∂ ∂ . (5) [ E/ wji(k 1)] [ E/ wji(k)] wji(k) This shows that QP is a variation of BP where the learning coefficient is adaptable. Further details on QP are given in Ref. [26], which describes the heuristics adopted to prevent the search for a minimum value of E(wji) from proceeding in the wrong direction.

2.3. DBD algorithm

Aimed at speeding up the training of MLPs, DBD [17] is based on the hypothesis that a learning coefficient suitable for one weight may not be appropriate for all weights. By assigning a learning coefficient to each weight and permitting it to change over time, more freedom is introduced to facilitate convergence towards a minimum value of E(wji). The weight change is given by: ∂ ⌬ ϭϪ E wji(k) aji(k)∂ , (6) wji(k) where aji(k) is the learning coefficient assigned to the connection from neuron i to j. The learning coefficient change is given as: ∂ − E Ͼ A if Dji(k 1)∂ 0 wji(k) ⌬ ϭ ∂E aji(k) − − Ͻ (7) jaji(k)ifDji(k 1)∂ 0 Ά wji(k) 0 otherwise Ϫ ∂ ∂ Ϫ ∂ ∂ Ϫ where Dji(k 1) represents a weighted average of E/[ wji(k 1)] and E/[ wji(k 2)] given by ∂ ∂ Ϫ ϭ Ϫ E ϩ E Dji(k 1) (1 q)∂ − q∂ − , (8) wji(k 1) wji(k 2) and j, q and A are positive constants. From Eq. (7), it can be noted that the algorithm increments the learning coefficients linearly but decrements them geometrically.

2.4. Extended DBD algorithm

As its name implies, this algorithm [18] is an extension of DBD. It also aims to decrease the training time for MLPs. The changes in weights are calculated as: ∂ ⌬ ϭϪ E(k) ϩ ⌬ Ϫ wji(k) aji(k)∂ mji(k) wji(k 1), (9) wji(k) 424 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

where aji(k) and mji(k) are learning and momentum coefficients, respectively. The use of momentum in EDBD is one of the differences between it and DBD. The learning coefficient change is given as: ∂ − ͉ ͉ − E Ͼ Aa exp( ga Dji(k) )ifDji(k 1)∂ 0 wji(k) ⌬ ϭ ∂E aji(k) − − Ͻ (10) jaaji(k)ifDji(k 1)∂ 0 Ά wji(k) 0 otherwise where Aa, ja and ga are positive constants and Dji is computed as in the case of DBD. The momentum coefficient change is obtained as: ∂ − ͉ ͉ − E Ͼ Am exp( gm Dji(k) )ifDji(k 1)∂ 0 wji(k) ⌬µ ϭ ∂E ji(k) − − Ͻ (11) jmmji(k)ifDji(k 1)∂ 0 Ά wji(k) 0 otherwise where Am, jm and gm are positive constants. Note that, unlike for DBD, increments in the learning coefficient are not constant, but vary as an exponentially decreasing function of the magnitude of the weighted average gradient compo- nent Dji(k). To prevent oscillations in the values of the weights, aji(k) and mji(k) are kept below preset upper bounds amax and mmax.

3. Test problems

3.1. Control chart pattern recognition

Control charts are used in Statistical Process Control to give information on the state of a process. Fig. 2 depicts the six main types of pattern that might be observed on a control chart. These are: normal, cyclic, downward shift, upward shift, increasing trend and decreasing trend patterns. Except for normal patterns, all other patterns indicate that the process being monitored is not functioning correctly and requires to be adjusted. For this study, in order to obtain a large number of patterns of all these different types, they were generated synthetically. Each pattern was taken as a of 60 data points. The following equations were used to create the data points for the various patterns.

1. Normal patterns: p(t)ϭhϩr(t)s, (12) D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 425

Fig. 2. Control chart patterns.

2. cyclic patterns: p(t)ϭhϩr(t)sϩa sin(2pt/T), (13)

3. increasing trend patterns: p(t)ϭhϩr(t)sϩgt, (14)

4. decreasing trend patterns: p(t)ϭhϩr(t)sϪgt, (15)

5. upward shift patterns: p(t)ϭhϩr(t)sϩbs, (16)

6. downward shift patterns: p(t)ϭhϩr(t)sϪbs, (17) where h is the nominal mean value of the process variable under observation (set to 80), s is the standard deviation of the process variable (set to 5), a is the amplitude of cyclic variations in a cyclic pattern (set to 15 or less), g is the gradient of an increasing trend pattern or a decreasing trend pattern (set in the range 0.2 to 0.5), b indicates the shift position in an upward shift pattern and a downward shift pattern (b=0 before the shift and b=1 at the shift and thereafter), s is the magnitude of the shift (set between 7.5 and 20), r(·) is a function that generates random numbers normally distributed between Ϫ3 and 3, t is the discrete time at which the monitored process variable is sampled (set within the range 0 to 59), T is the period of a cycle in a cyclic pattern (set between 4 and 12 sampling intervals) and p(t) is the value of the sampled data point at time t. 426 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

In total, 498 patterns were generated, 83 of each type. 366 patterns were used for training the MLP classifiers and the rest for testing the trained classifiers. Each MLP had 60 input neurons, one for each data point in the time series, and six output neurons, one for each control chart pattern type.

3.2. Classification of wood veneer defects

Wood veneer sheets are used to manufacture plywood. The quality of the sheets determines that of the resulting plywood boards. The number and type of defects present affect the quality of a sheet. Due to the speeds of production in a plywood factory, human inspectors cannot reliably perform the task of examining veneer sheets for defects and assessing their quality [27]. An automated visual inspection (AVI) system is therefore required [28]. In the final stages of the operation of an AVI system for veneer sheet inspection, features of objects detected in images of a sheet are extracted and provided to a classifier. The function of the latter is to identify the nature of the objects represented by the input features. In this study, 17 features were employed to characterise objects into one of 13 types: clear wood plus 12 different defect categories; namely, bark, coloured streak, curly grain, discoloured patch, hole, pin knot, rotten knot, rough patch, sound knot, split, streak and worm hole. In total, 232 feature vectors were obtained from 18 images of veneer sheets containing the various types of defect as well as of defect-free sheets. 186 vectors were used for training the MLP classifiers and the remainder for testing the trained classifiers. Each classifier had 17 input neurons (one neuron for each feature) and 13 output neurons (one neuron for each object class).

4. Tests and results

For each test problem, two types of MLP classifier were constructed: a classifier with one hidden layer of neurons and a classifier with two hidden layers of neurons. Those numbers of hidden layers were chosen as they had been found appropriate for most classification problems [5,11,13,16,26]. Both types of classifier were trained using the BP, QP, DBD and EDBD algor- ithms. Thus eight classifiers were obtained for each of the test problems. The values of the training parameters adopted for the algorithms were determined empirically. They were as follows:

1. BP — a=0.1 and m=0.2; 2. QP — a=0.1; 3. DBD — A=0.01, j=0.5 and q=0.7; ======4. EDBD — amax mmax 2.0, Aa 0.095, Am 0.01, ja 0.1, jm 0.01, ga 0.0, gm 0.0, and q 0.7.

The version of the QP algorithm employed in this study [26] required three other parameters to be chosen. They were the coefficient e of the quadratic term (the term designed to lead to the ⌬ minimum of the paraboloid) in the expression for wji, the “maximum growth factor” g and the weight decay coefficient d [26]. The values of e, g and d were also found empirically, with e=1.0, g=2.0 and d=0.0001. D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 427

At the beginning of training, the weights of all the classifiers were initialised randomly to small values in the range Ϫ0.1 to 0.1. This random initialisation of weight values was carried out in accordance with the usual practice in MLP training [6]. Two series of tests were conducted, one aimed at assessing the speeds of convergence of the different training algorithms and the other at evaluating the accuracies of the classifiers obtained. In the first series of tests, the classifiers were trained until a specified value of E was reached. The number of training iterations required to achieve that value of E was recorded. The actual stopping criterion employed was the root-mean-squared error defined as 1 M N rmseϭ ͸͸(y(i)−y(i))2, (18) ΊMN dj j iϭ1jϭ1 where N is the number of outputs and M is the number of patterns in the training set. In the second series of tests, training was carried out for a fixed number of iterations and the accuracies of the resulting classifiers when applied to patterns in the training set and the test set were noted.

4.1. Control chart pattern recognition results

The four pattern recognisers with a single hidden layer had 40 neurons in that layer. The other four pattern recognisers with two hidden layers had 40 neurons in the first hidden layer and 20 neurons in the second hidden layer. Those numbers of neurons were chosen empirically as there is no reliable method for systematically determining them. The results for the convergence tests are shown in Table 1. Table 2 presents the results of the accuracy tests. It can be seen that the training of some neural networks did not converge in a finite number of iterations. This could be due to the inappropriateness of the structure of those neural networks for the given application. For network 2, which appears to have an appropriate structure, the BP training algorithm required the smallest number of iterations and still produced an acceptably accurate classifier. The rates of convergence of QP, DBD and EDBD were disap- pointing, with DBD yielding the poorest performance in this respect.

Table 1 Convergence results for control chart pattern recognition (rmse set to 0.03)

Number of neurons in the Network number Learning algorithm Number of iterations hidden layer

140BP Ͼ500,000 240×20 BP 5855 340QP Ͼ500,000 440×20 QP 181,169 5 40 DBD Ͼ500,000 640×20 DBD Ͼ500,000 7 40 EDBD 211,547 840×20 EDBD 278,159 428 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

Table 2 Accuracy results for control chart pattern recognition (number of iterations set to 500,000)

Classification Classification Number of neurons in Network number Learning algorithm accuracy on training accuracy on test set the hidden layer set (%) (%)

1 40 BP 100.0 94.70 240×20 BP 100.0 95.45 3 40 QP 100.0 94.70 440×20 QP 100.0 96.21 5 40 DBD 99.7 96.21 640×20 DBD 100.0 96.21 7 40 EDBD 100.0 95.45 840×20 EDBD 100.0 97.73

4.2. Wood veneer defect classification results

The single-hidden-layer pattern recognisers had 33 hidden neurons and the double-hidden-layer pattern recognisers had 17 neurons in each hidden layer. Again, the numbers of hidden neurons were found empirically. Tables 3 and 4 present the results of the convergence and accuracy tests, respectively. It can be noted that the BP algorithm again provided a very strong performance, being the fastest as well as generating the classifier with the best accuracy. The QP, DBD and EDBD algorithms once more had poor convergence rates, with QP and DBD failing to reach the preset rmse value within a finite number of training iterations.

5. Conclusion

Two industrially oriented problems of realistic levels of difficulty were used to assess the performance of four MLP training algorithms. All four algorithms belong to the well-known class

Table 3 Convergence results for wood inspection (rmse set to 0.03)

Number of neurons in the Network number Learning algorithm Number of iterations hidden layer

133BPϾ500,000 217×17 BP 29,573 333QPϾ500,000 417×17 QP Ͼ500,000 5 33 DBD Ͼ500,000 617×17 DBD Ͼ500,000 7 33 EDBD 484,157 817×17 EDBD 121,457 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430 429

Table 4 Accuracy results for wood inspection (number of iterations set to 500,000)

Number of neurons in Classification accuracy Classification accuracy Network number Learning algorithm the hidden layer on training set (%) on test set (%)

1 33 BP 99.46 84.78 217×17 BP 100.0 86.96 3 33 QP 96.24 84.78 417×17 QP 100.0 80.43 5 33 DBD 88.17 82.61 617×17 DBD 89.25 82.61 7 33 EDBD 100.0 84.78 817×17 EDBD 100.0 78.26 of techniques of training by error backpropagation. The standard BP algorithm is the simplest of the algorithms evaluated. Its implementation only requires the selection of two parameters, as opposed to three for DBD, four for QP and nine for EDBD. The need for choosing larger numbers of parameters in the latter algorithms increases the possibility of incorrectly setting their values. This could be the reason for the poorer performances of those algorithms compared with BP. Another possible explanation for the inferior results in the case of QP is that the assumption of a paraboloidal E(wji) surface does not hold for the test problems. Finally, even for problems where algorithms such as QP, DBD and EDBD could converge to minimum error values in fewer iter- ations than achievable with BP, their training times might still not be shorter than for BP because they require more computations per iteration. In conclusion, BP remains the algorithm of choice for training MLPs although care must be paid to the design of MLP structures appropriate for the given tasks.

Acknowledgements

The authors would like to thank the Royal Society and the Turkish Science Research Council for supporting Dr S. Sagiroglu’s stay at the University of Wales Cardiff.

References

[1] D.T. Pham, X. Liu, Neural Networks for Identification, Prediction and Control, Springer-Verlag, London, 1995. [2] D.T. Pham, S. Sagiroglu, Synergistic neural models of a robot sensor for part orientation detection, Robotica 13 (1995) 531–538. [3] D.T. Pham, S. Sagiroglu, Three methods of training multi-layer perceptrons to model a robot sensor, Proceedings of the IMechE, Part B, Journal of Engineering Manufacture 210 (1996) 69–76. [4] A. Maren, C. Harston, R. Pap, Handbook of Neural Computing Applications, Academic Press, London, 1990. [5] S. Sagiroglu, K. Gu¨ney, M. Erler, Resonant frequency calculation for circular microstrip antennas using artificial neural networks, International Journal of Microwave and Millimeter-Wave Computer-Aided Engineering 8 (1998) 220–227. [6] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. 430 D.T. Pham, S. Sagiroglu / International Journal of Machine Tools & Manufacture 41 (2001) 419–430

[7] D.T. Pham, E. Oztemel, Control chart pattern recognition using combinations of multi-layered perceptrons and learning-vector quantisation neural networks, Proceedings of the IMechE, Part E, Journal of Process Mechanical Engineering 207 (1994) 113–118. [8] D.T. Pham, E. Oztemel, Control chart pattern recognition using learning vector quantisation networks, International Journal of Production Research 32 (1994) 721–729. [9] D.T. Pham, A.B. Chan, Control chart pattern recognition using a new type of self-organising neural network, Proceedings of the IMechE, Part I, Journal of Systems and Control Engineering 212 (1998) 115–127. [10] D.T. Pham, R.J. Alcock, Artificial techniques for processing segmented images of wood boards, Proceedings of the IMechE, Part E, Journal of Process Mechanical Engineering 212 (1998) 119–129. [11] P.R. Drake, M.S. Packianather, A of neural networks for classifying images of wood veneer, Inter- national Journal of Advanced Manufacturing Technology 14 (1998) 280–285. [12] D.T. Pham, S. Sagiroglu, Neural network classification of defects in veneer boards, Technical Report, Cardiff University, Cardiff, Wales, 1999. [13] D.T. Pham, R.J. Alcock, Automated visual inspection of wood boards: selection of features for defect classification by a neural network, Proceedings of the IMechE, Part E, Journal of Process Mechanical Engineering, 213 (1999) 231–245. [14] D.T. Pham, A.B. Chan, A novel self-organising neural network for control chart pattern recognition, in: P.K. Chawdhry, R. Roy, R.K. Pant (Eds.), Soft Computing in Engineering Design and Manufacturing, Springer-Verlag, London, 1998, pp. 381–390. [15] D.E. Rumelhart, J.L. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of , vol. 1, MIT Press, Cambridge, MA, 1986. [16] S.E. Fahlman, An empirical study of learning speed in backpropagation networks, Technical Report CMU-CS- 88-162, Carnegie Mellon University, Pittsburgh, PA, 1988. [17] R.A. Jacobs, Increased rate of convergence through learning rate adaptation, Neural Networks 1 (1988) 295–307. [18] A.A. Minai, R.D. Williams, Back-propagation heuristics: a study of the extended delta-bar-delta algorithm, in: Proceedings of International Joint Conference on Neural Networks, San Diego, California, 17–21 June, vol. 1, 1990, pp. 595–600. [19] C. Charalambous, Conjugate-gradient algorithm for efficient training of artificial neural networks, IEE Proceed- ings — G: Circuits Devices and Systems 139 (1992) 301–310. [20] D.A. Karras, S.J. Perantonis, Comparison of learning algorithms for feedforward networks in large scale networks and problems, in: Proceedings of International Joint Conference on Neural Networks, Nagoya, Japan, 25–29 October, 1993, pp. 532–535. [21] D. Alpsan, M. Towsey, O. Ozdamar, A.C. Tsoi, D.N. Ghista, Efficacy of modified backpropagation and optimis- ation methods on a real-world medical problem, Neural Networks 8 (6) (1995) 945–962. [22] K.W. Tang, G. Pingle, G. Srikant, Artificial neural networks for the diagnosis of coronary artery disease, Journal of Intelligent Systems 7 (1997) 307–338. [23] Y. Solano, H. Ikeda, A comparative study of eight learning algorithms for artificial neural networks based on a real application, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E81A (1998) 355–357. [24] J.M. Hannan, J.M. Bishop, Comparison of fast training algorithms over two real problems, in: Proceedings of the 5th International Conference on Artificial Neural Networks, Reading, UK, IEE, London, 1997, pp. 1–6. [25] F. Stager, M. Agarwal, Three methods to speed up the training of feedforward and feedback perceptrons, Neural Networks 10 (1997) 1435–1443. [26] NeuralWare Handbook, Neural Computing, A Technology Handbook for Professional II/PLUS and NeuralWorks Explorer, Technical Publication Unit, Pittsburgh, PA, 1996. [27] W. Polzleitner, G. Schwingshakl, Real-time surface grading of profiled wooden boards, Industrial Metrology 2 (1992) 283–298. [28] D.T. Pham, R.J. Alcock, Automatic detection of defects on birch wood boards, Proceedings of the IMechE, Part E, Journal of Process Mechanical Engineering 210 (1996) 45–52.