The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements
Total Page:16
File Type:pdf, Size:1020Kb
The Neocognitron as a System for HandAvritten Character Recognition: Limitations and Improvements David R. Lovell A thesis submitted for the degree of Doctor of Philosophy Department of Electrical and Computer Engineering University of Queensland March 14, 1994 THEUliW^^ This document was prepared using T^X and WT^^. Figures were prepared using tgif which is copyright © 1992 William Chia-Wei Cheng (william(Dcs .UCLA. edu). Graphs were produced with gnuplot which is copyright © 1991 Thomas Williams and Colin Kelley. T^ is a trademark of the American Mathematical Society. Statement of Originality The work presented in this thesis is, to the best of my knowledge and belief, original, except as acknowledged in the text, and the material has not been subnaitted, either in whole or in part, for a degree at this or any other university. David R. Lovell, March 14, 1994 Abstract This thesis is about the neocognitron, a neural network that was proposed by Fuku- shima in 1979. Inspired by Hubel and Wiesel's serial model of processing in the visual cortex, the neocognitron was initially intended as a self-organizing model of vision, however, we are concerned with the supervised version of the network, put forward by Fukushima in 1983. Through "training with a teacher", Fukushima hoped to obtain a character recognition system that was tolerant of shifts and deformations in input images. Until now though, it has not been clear whether Fukushima's ap- proach has resulted in a network that can rival the performance of other recognition systems. In the first three chapters of this thesis, the biological basis, operational principles and mathematical implementation of the supervised neocognitron are presented in detail. At the end of this thorough introduction, we consider a number of important issues that have not previously been addressed^. How should S-cell selectivity and other parameters be chosen so as to maximize the network's performance? How sensitive is the network's classification ability to the supervisor's choice of training patterns? Can the neocognitron achieve state-of-the-art recognition rates and, if not, what is preventing it from doing so? Chapter 4 looks at the Optimal Closed-Form Training (OCFT) algorithm, a method for adjusting S-cell selectivity, suggested by Hildebrandt in 1991. Exper- iments reveal flaws in the assumptions behind OCFT and provide motivation for the development and testing (in Chapter 5) of three new algorithms for selectivity ^ at least not with any proven degree of success ii Abstract iii adjustment: SOFT, SLOG and SHOP. Of these methods, SHOP is shown to be the most effective, determining appropriate selectivity values through the use of a validation set of handwritten characters. SHOP serves as a method for probing the behaviour of the neocognitron and is used to investigate the effect of cell masks, skeletonization of input data and choice of training patterns on the network*s performance. Even though SHOP is the best selectivity adjustment algorithm to be described to date, the system's peak correct recognition rate (for isolated ZIP code digits from the CEDAR database) is around 75% (with 75% reliability) after SHOP training. It is clear that the neocognitron, as originally described by Fukushima, is unable to match the performance of to- day's most accurate digit recognition systems which typically achieve 90% correct recognition with near 100% reliability. After observing the neocognitron's failure to exploit the distinguishing features of different kinds of digits in its classification of images, Chapter 6 proposes modifi- cations to enhance the networks ability in this regard. Using this new architecture, a correct clcissification rate of 84.62% (with 96.36% reliability) was obtained on CEDAR ZIP codes, a substantial improvement but still a level of performance that is somewhat less than state-of-the-art recognition rates. Chapter 6 concludes with a critical review of the hierarchical feature extraction paradigm. The final chapter summarizes the material presented in this thesis and draws the significant findings together in a series of conclusions. In addition to the investigation of the neocognitron, this thesis also contains a derivation of statistical bounds on the errors that arise in multilayer feedforward networks as a result of weight perturbation (Appendix E). List of Publications [1] D. R. Lovell, A. C. Tsoi and T. Downs, "A Comparison of Transfer Functions for Feature Extracting Layers in the Neocognitron," in Proceedings of the Third Australian Conference on Neural Networks^ P. Leong and M. Jabri, Eds. Sydney: Sydney University Department of Electrical Engineering, pp. 202-205, February 1992. [2] D. R. Lovell, P. L. Bartlett and T. Downs, "Error and Variance Bounds on Sig- moidal Neurons with Weight and Input Errors," Electronics Letters, 28, no. 8, pp. 760-762, April 1992. [3] D. R. Lovell, D. Simon and A. C. Tsoi, "Improving the performance of the neocog- nitron," in Proceedings of the Fourth AustraUan Conference on Neural Networks, P, Leong and M. Jabri, Eds. Sydney: Sydney University Department of Electrical Engineering, pp. 22-25, February 1993. [4] D. R. Lovell and P. L. Bartlett, "Error and variance bounds in multilayer neural networks," in Proceedings of the Fourth AustraUan Conference on Neural Net- works, P. Leong and M. Jabri, Eds. Sydney: Sydney University Department of Electrical Engineering, pp. 161-164, February 1993. [5] D. R. Lovell, A. C. Tsoi and T. Downs, "Comments on "Optimal Training of Thresholded Linear Correlation Classifiers"," IEEE Transactions on Neural Net- works, 4, no. 2, pp. 367-368, March 1993. IV List of Publications [6] D. R. Lovell and A. C. Tsoi, "Determining selectivities in the neocognitron," in Proceedings of the World Congress on Neural Networks^ vol. 3. New Jersey: International Neural Network Society, pp. 695-699, July 1993. [7] D. R. Lovell and A. C, Tsoi, "Is the neocognitron capable of state-of-the-art digit recognition?," in Proceedings of the Fifth Australian Conference on Neural Networks, A. C. Tsoi and T. Downs, Eds. Brisbane: University of Queensland Department of Electrical and Computer Engineering, pp. 94-97, February 1994. Acknowledgments Acknowledgments are curious creatures; composed almost last, yet printed almost first, they must convey the author's immense gratitude with sincerity, clarity and without omission^, I would like to thank Professor Tom Downs, my principal supervisor, for his guidance, wisdom and meticulous proofreading of my work over the past years. To Professor Ah Chung Tsoi, my associate supervisor, I owe a debt of gratitude for suggesting the topic investigated in this thesis and providing enthusiastic assistance throughout my candidature. I am also pleased to acknowledge the assistance that I have received from Pro- fessor Kunihiko Fukushima, inventor of the neocognitron. Professor Thomas Hilde- brandt and Dr. Nicolas Darbel are two researchers who have worked closely with the neocognitron and have supplied me with valuable comments and useful information. I very much appreciate the help they have given me. I am thankful to have been on the receiving end of the ideas and advice of Dr. Cyril Latimer and the patient counsel of Biological Sciences' statistician, Joan Hendrikz. Almost any work that is associated with neuroscience owes a great deal to the animals that have been sacrificed in the name of brain research. We must not forget the cost of knowledge. To the Australian Government and taxpayers throughout the land, thanks for ^The sincerity is no problem; I'll do my best with the other two,.. VI Acknowledgments vii the Australian Postgraduate Research Award that I held from 1990-92. I am also very grateful for being a recipient of the Maude Walker Scholarship in 1990 and for being employed by the University of Queensland as a lecturer in the Department of Electrical and Computer Engineering from 1993 onwards. Proofreaders are responsible for taking the scratches out of, and putting the polish on to a piece of literature. The following people have helped to remove the mitsakes from this thesus: Dr. David Liley, Hans Christian Andersen (no relation), Shaun Hayes, Mum and Dad, and Rachel Tyson. This next list of friends have my thanks for providing countless bits of useful information and words of encouragement: Dr. Peter Bartlett (I'll always look up to you, Peter), Dr. Nick Redding, Steven Young, Paul Murtagh, Dr. Andrew Back, Dr. Mark Schulz, Dr. Janet Wiles and Dr. Ray Lister. Without the help of the Department's office staff and Gerri Weier, Len Allen, Greg Angus, Pat White, Len Payne and Clary Harridge, I wouldn't have stood a chance of getting this thesis done. The little things in life (a photocopier, a telephone, a computer, a postgraduate scholarship and work as a tutor/lecturer) make a big difference and I really appreciate the assistance that people have given me in this regard. Finally, I would like to thank my friends (particularly those who have shared houses with me) and family. To Fiona, Nana and Grandpa, and especially Mum and Dad I am thankful for your patience, love and encouragement for the past twenty-five years. And to Rachel, who understands best of all, thank you for your love and support — I could not have done this without you. Vlll This thesis is dedicated to my Grandpa, Roger Albert Lovell. IX Contents Statement of Originality i Abstract ii List of Publications iv Acknowledgments vi Contents x List of Figures xvi List of Tables xxiii List of Experiments xxvi List of Algorithms