Automatic Character Recognition : a State-Of-The-Art Report
Total Page:16
File Type:pdf, Size:1020Kb
* M — — — 151 arda Iff H7 SI Art 1 t ' 196t PB 161613 ^ecUnic&L v|©te 112 AUTOMATIC CHARACTER RECOGNITION A STATE-OF-THE-ART REPORT U. S. DEPARTMENT OF COMMERCE NATIONAL BUREAU OF STANDARDS THE NATIONAL BUREAU OF STANDARDS Functions and Activities The functions of the National Bureau of Standards are set forth in the Act of Congress, March 3, 1901, as amended by Congress in Public Law 619, 1950. These include the development and maintenance of the na- tional standards of measurement and the provision of means and methods for making measurements consistent with these standards; the determination of physical constants and properties of materials; the development of methods and instruments for testing materials,- devices, and structures; advisory services to government agen- cies on scientific and technical problems; invention and development of devices to serve special needs of the Government; and the development of standard practices, codes, and specifications. The work includes basic and applied research, development, engineering, instrumentation, testing, evaluation, calibration services, and various consultation and information services. Research projects are also performed for other government agencies when the work relates to and supplements the basic program of the Bureau or when the Bureau's unique competence is required. The scope of activities is suggested by the listing of divisions and sections on the inside of the back cover. Publications The results of the Bureau's research are published either in the Bureau's own series of publications or in the journals of professional and scientific societies. The Bureau itself publishes three periodicals avail- able from the Government Printing Office: The Journal of Research, published in four separate sections, presents complete scientific and technical papers; the Technical News Bulletin presents summary and pre- liminary reports on work in progress; and Basic Radio Propagation Predictions provides data for determining the best frequencies to use for radio communications throughout the world. There are also five series of non- periodical publications: Monographs, Applied Mathematics Series, Handbooks, Miscellaneous Publications, and Technical Notes. A complete listing of the Bureau's publications can be found in National Bureau, of Standards Circular 460, Publications of the National Bureau of Standards, 1901 to June 1947 ($1.25), and the Supplement to Na- tional Bureau of Standards Circular 460, July 1947 to June 1957 ($1.50), and Miscellaneous Publication 240, July 1957 to June 1960 (Includes Titles of Papers Published in Outside Journals 1950 to 1959) ($2.25); avail- able from the Superintendent of Documents, Government Printing Office, Washington 25, D. C. NATIONAL BUREAU OF STANDARDS technical Ntete 112 MAY 1961 AUTOMATIC CHARACTER RECOGNITION A STATE-OF-THE-ART REPORT Mary Elizabeth Stevens NBS Technical Notes are designed to supplement the Bu- reau's regular publications program. They provide a means for making available scientific data that are of transient or limited interest. Technical Notes may be listed or referred to in the open literature. They are for sale by the Office of Technical Services, U. S. Depart- ment of Commerce, Washington 25, D. C. DISTRIBUTED BY UNITED STATES DEPARTMENT OF COMMERCE OFFICE OF TECHNICAL SERVICES WASHINGTON 25, D. C. Price ACKNOWLEDGMENTS The survey of automatic character recognition techniques covered in this report has been conducted over a period of several years by teams of National Bureau of Standards personnel including, at various times, S. N. Alexander, D. Boyle, L. Cahn, H. D. Cook, M. A. Fishier, R. A. Kirsch, J. L. Little, L. C. Ray, M. E. Stevens, G. H. Urban, and A. Wetter. Much of the information herein was obtained through the courtesy of personnel of other organizations, especially those which are engaged in the design and development of reader devices. Their cooperation is therefore gratefully acknowledged. The bibliography, which appears in separate covers, NBS Report 7175-B, was prepared by F. Y. Neeland and M. E. Stevens, with the able assistance of F. L. Gottlieb. Editorial, clerical, drafting, and typing services were most ably provided by A. K. Smilow, A. Bucek, E. Pratt and H. B. Grantham. TABLE OF CONTENTS Page ABSTRACT 1 ACKNOWLEDGMENTS ii 1. INTRODUCTION 1 1. 1 Background. 1 1. 2 General Observations 3 1. 3 Presentation 8 2. AREAS OF APPLICABILITY OF AUTOMATIC READING TECHNIQUES 8 3. CONTROLLED SOLUTIONS TO CHARACTER RECOGNITION PROBLEMS 17 3. 1 Preprinting of Input Material 18 3. 2 Quality Control of Input Material 20 3.3 Stylization and Standardization 21 3. 4 Limitation of Vocabulary 30 4. THE CHARACTER RECOGNITION PROCESS 32 4. 1 Some Common Methods for Character Recognition 32 4. 1. 1 Template Matching 32 4. 1.2 Peephole Template Matching 38 4. 1. 3 Coordinate Description Matching 41 4. 1. 4 Characteristic Waveform Matching 46 4. 1. 5 Vector Crossings 47 4. 1. 6 Criterial Feature Analysis 51 4. 1. 7 Curve -following Recognition Techniques 53 4. 2 Process Steps in Character Recognition 55 4. 2. 1 Input and Transformations of a Source Pattern 55 4.2.2 Matching-Recognition-Identification 61 4. 2. 3 Target Pattern Selection and Output 66 5. OPERATIONAL REQUIREMENTS IN AUTOMATIC CHARACTER RECOGNITION 68 5. 1 Overall Requirements 68 5. 2 Specific Requirements 72 5. 3 Special Difficulties with Typewritten Material 75 5. 4 Difficulties with Other Types of Special Material 76 5. 5 Criteria for Performance Measurement 82 6. DESIGN CHARACTERISTICS OF SELECTED SYSTEMS 86 6. 1 Scanning and Transformations of Source Patterns 86 6. 1. 1 Special Transformations in Source-to-Input-Pattern Processing 88 6. 1.2 Characteristics of Selected Coordinate Description Methods 90 6. 1. 3 Various Methods of Input Pattern Improvement 93 6. 1. 4 Examples of Criterial Feature Extraction 95 6.2 Pattern Comparison Processing 98 6.3 Bases for Recognition-Identification Decisions 101 6. 4 General Characterization and Relative Advantages and Disadvantages of Selected Systems 103 7. FURTHER PROSPECTS FOR CHARACTER RECOGNITION DEVELOPMENT 106 7. 1 Machine Simulation 106 7. 2 Self -Adjusting and Self-Setting Systems 109 7. 3 Use of Context in the Automatic Recognition of Characters and Words 110 TABLE OF CONTENTS (Continued) Page 8. POTENTIALLY RELATED RESEARCH IN PATTERN RECOGNITION 114 8. 1 The Search for Relative Invariance 118 8.2 The Search for Pattern Separability 123 8. 3 Automatic Classification of Patterns 127 8. 4 Machine Models of Perception, Recognition, and Pattern Generalization 130 9. CONCLUSION 135 APPENDIX 137 LIST OF TABLES Table I. Representative Limitations of Vocabulary 33 Table II. Relationships Between Economic Break-Even Points and Reject Rates 85 Table III. Examples of Coordinate Description Resolution 92 Table IV. Combinations of Design Characteristics in Selected Systems 105 Table V. Frequencies of Upper Case Characters 111 Table VI. Frequencies of Upper Case Characters, Excluding Proper Names 112 Table VII. Frequencies of Initial Words 113 Table VIII. Sample Rules for Interpreting the Ambiguous Character of Figure 23 116 LIST OF FIGURES Figure 1. Samples of Typed Material for Test of Automatic Character Recognition 6 Figure 2. Sample of Tally Roll Record Read by Machine 12 Figure 3. Examples of Special Fonts for Use with Magnetic Ink Character Readers 23 Figure 4. Examples of Externally Coded Special Fonts 24 Figure 5. Examples of Internally Coded, Area-Covering Fonts 25 Figure 6. Examples of 'Cut' Fonts 26 Figure 7. Examples of Special Fonts for Optical Reading Equipment 27 Figure 8. Construction of Characters in the 5x7 Grid 29 Figure 9. The Recommended 5x7 Font 29 Figure 10. Some of the Alternate Characters Possible in the 5x7 Font 29 Figure 11. A Possible 5x9 Standardized Font 31 Figure 12. A Generalized Recognition Process 34 Figure 13. Some Common Methods of Character Recognition 35 Figure 14. Peephole Template for Cyrillic Characters 40 Figure 15. Terms Used in Identifying Type Style Characteristics 43 Figure 16. Hand Written Numerals, Vector Crossings Technique 49 LIST OF FIGURES (Continued) Page Figure 17. Example of Samples Obtained from Different Typewriters 77 Figure 18. Samples from Two Different Typewriters of the Same Make and Model 78 Figure 19. Example of Broken Strokes and Overlap 79 Figure 20. Two Lines from the Same Typewriter 80 Figure 21. Example of Bleeding and Noisy Characters 81 Figure 22. Examples of Equivalent Patterns in Z-Threshold Local Averaging 96 Figure 23. Ambiguous Character 115 AUTOMATIC CHARACTER RECOGNITION: A STATE-OF-THE-ART REPORT M. E. Stevens ABSTRACT A state-of-the-art report on current progress in automatic character recognition is presented. Areas of applicability and possibilities for con- trolled solutions to automatic character reading problems are discussed. Some commonly used methods for character recognition, the steps involved in a generalized recognition process, and comparative characteristics of certain representative character recognition systems are considered. Prospects for further progress, including potentially related research in pattern recognition, are reported. 1. INTRODUCTION Current progress in research looking toward large-scale information selection and retrieval systems or toward mechanized translation, has been accompanied by increased attention to problems of input, file preparation, and file maintenance. These two are important areas for the potential application of automatic information processing techniques. In both there is an