Framework for Automatic Identification of Paper Watermarks with Chain Codes” Successfully Achieved Five Important Goals
Total Page:16
File Type:pdf, Size:1020Kb
FRAMEWORK FOR AUTOMATIC IDENTIFICATION OF PAPER WATERMARKS WITH CHAIN CODES A DISSERTATION in Electrical and Computer Engineering and Telecommunications and Computer Networking Presented to the Faculty of the University of Missouri - Kansas City in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY by PLAMEN DOYNOV Dipl. Eng., Technical University of Sofia, Sofia, Bulgaria M.E.E., Catholic University of America, Washington, D.C., USA Kansas City, Missouri 2017 © 2017 PLAMEN DOYNOV ALL RIGHTS RESERVED FRAMEWORK FOR AUTOMATIC IDENTIFICATION OF PAPER WATERMARKS WITH CHAIN CODES Plamen Doynov, Candidate for the Doctor of Philosophy University of Missouri-Kansas City, 2017 ABSTRACT In this dissertation, I present a new framework for automated description, archiving, and identification of paper watermarks found in historical documents and manuscripts. The early manufacturers of paper have introduced the embedding of identifying marks and patterns as a sign of a distinct origin and perhaps as a signature of quality. Thousands of watermarks have been studied, classified, and archived. Most of the classification categories are based on image similarity and are searchable based on a set of defined contextual descriptors. The novel method presented here is for automatic classification, identification (matching) and retrieval of watermark images based on chain code descriptors (CC). The approach for generation of unique CC includes a novel image preprocessing method to provide a solution for rotation and scale invariant representation of watermarks. The unique codes are truly reversible, providing high ratio lossless compression, fast searching, and image matching. The development of a novel distance measure for CC comparison is also presented. Examples for the complete process are given using the recently acquired watermarks digitized with hyper-spectral imaging of Summa Theologica, the work of Antonino Pierozzi (1389 – 1459). The performance of the algorithm on large datasets is demonstrated using watermarks datasets from well-known library catalogue collections. iii APPROVAL PAGE The faculty listed below, appointed by the Dean of the School of Graduate Studies, have examined a thesis titled “Framework for Automatic Identification of Paper Watermarks with Chain Codes," presented by Plamen Doynov, candidate for the Doctor of Philosophy degree, and certify that in their opinion it is worthy of acceptance. Supervisory Committee Reza Derakhshani, Ph.D., Committee Chair Department of Computer Science and Electrical Engineering Ghulam M. Chaudhry, Ph.D. Department of Computer Science and Electrical Engineering Appie Van de Liefvoort, Ph.D. Department of Computer Science and Electrical Engineering Deendayal Dinakarpandian, Ph.D. Department of Computer Science and Electrical Engineering Jeffrey Rydberg-Cox, Ph.D. Department of English, Director of Classical and Ancient Studies Program, Affiliated Faculty, Department of Computer Science and Electrical Engineering iv CONTENTS ABSTRACT ............................................................................................................................ iii LIST OF ILLUSTRATIONS .................................................................................................. ix LIST OF TABLES ................................................................................................................ xiv ACKNOWLEDGMENTS ..................................................................................................... xv Chapter 1. INTRODUCTION ....................................................................................................... 1 1.1 Research motivation.............................................................................................. 2 1.2 Thesis objective .................................................................................................... 3 1.3 Thesis contributions .............................................................................................. 5 1.4 Thesis overview .................................................................................................... 6 2. PAPER AND PAPER WATERMARKS .................................................................... 9 2.1 The history of paper .............................................................................................. 9 2.2 Origin and evolution of paper watermarks ......................................................... 13 2.3 The Study of watermarks—literature review ...................................................... 19 2.3.1 Imaging and reproduction techniques for watermarks ............................. 25 2.3.2 Description, classification and storage ..................................................... 29 2.3.3 Current methods for search and retrieval ................................................. 31 2.4 Hyperspectral imaging of historical documents .......................................... 52 2.4.1 Hyperspectrial imaging system for historical documents ........................ 60 2.4.2 Processing of the hyperspectral cube ....................................................... 75 2.4.3 Watermark image extraction .................................................................... 87 2.5 Watermark image preprocessing ......................................................................... 88 v 2.6 Watermark image binarization and skeletonization ............................................ 92 2.7 An alternative non-contact method for watermarks reproduction ...................... 97 2.8 Conclusions ....................................................................................................... 100 3. AUTOMATIC IDENTIFICATION OF PAPER WATERMARKS ....................... 102 3.1 Introduction ........................................................................................................ 102 3.2 Shape descriptors .............................................................................................. 102 3.3 Chain codes as image descriptors ..................................................................... 111 3.4 Chain code generation for watermarks (from image to chain code) ................. 115 3.5 Reverse transform (from chain code to the original image) ............................. 118 3.6 Chain code as a lossless compression ............................................................... 119 3.7 Limitations of chain codes ................................................................................ 126 3.8 Conclusions ....................................................................................................... 126 4. ROTATION, SCALE AND TRANSLATION INVARIANT CHAIN CODE ....... 129 4.1 Introduction ....................................................................................................... 129 4.2 Watermark minimum boundary box (MBB) .................................................... 130 4.3 Minimum boundary box attributes.................................................................... 132 4.4 Minimum boundary box for rotation and scale invariance ............................... 133 4.5 Watermark recto and vero images .................................................................... 134 4.6 Conclusions ....................................................................................................... 137 5. COMPARISON OF RST_INVARIANT CHAIN CODES ..................................... 138 5.1 Introduction ....................................................................................................... 138 5.2 Comparing chain codes with the same length................................................... 139 5.3 Comparing chain codes with different length ................................................... 140 vi 5.4 Chain code distance (CCD) as measure of watermark similiarity .................... 142 5.5 Application of CCD for watermarks comparison ............................................. 146 5.6 Conclusions ....................................................................................................... 153 6. AUTOMATIC IDENTIFICATION OF WATERMARKS WITH CHAIN CODES ........................................................................................... 155 6.1 Introduction ....................................................................................................... 155 6.2 Watermark datasets ........................................................................................... 155 6.3 Performance evaluation .................................................................................... 156 6.3.1 Dataset sample size ............................................................................ 157 6.3.2 receiver operating characteristics ....................................................... 159 6.3.3 Experimental results using CCD ........................................................ 160 6.4 Alternative methods for watermark identifications .......................................... 164 6.4.1 Machine learning and image matching .............................................. 165 6.4.2 Convolutional Neural Networks (CNN) ............................................ 166 6.4.3 Watermarks dataset for use with CNN .............................................. 167 6.4.4 Experimental results of watermarks matching using CNN................ 175 6.4.5 Bag of visual words ........................................................................... 178 6.4.6 Experimental results using BoVW....................................................