A Language for Specifying and Comparing Table Recognition Strategies
Total Page:16
File Type:pdf, Size:1020Kb
A Language for Specifying and Comparing Table Recognition Strategies by Richard Zanibbi A thesis submitted to the School of Computing in conformity with the requirements for the degree of Doctor of Philosophy Queen's University Kingston, Ontario, Canada December 2004 Copyright c Richard Zanibbi, 2004 Abstract Table recognition algorithms may be described by models of table location and struc- ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize these algorithms as strategies in an imitation game, where the goal of the game is to match table interpretations from a chosen procedure as closely as possible. The chosen procedure may be a person or persons producing `ground truth,' or an algorithm. To describe table recognition strategies we have defined the Recognition Strat- egy Language (RSL). RSL is a simple functional language for describing strategies as sequences of abstract decision types whose results are determined by any suit- able decision method. RSL defines and maintains interpretation trees, a simple data structure for describing recognition results. For each interpretation in an interpreta- tion tree, we annotate hypothesis histories which capture the creation, revision, and rejection of individual hypotheses, such as the logical type and structure of regions. We present a proof-of-concept using two strategies from the literature. We demon- strate how RSL allows strategies to be specified at the level of decisions rather than ii algorithms, and we compare results of our strategy implementations using new tech- niques. In particular, we introduce historical recall and precision metrics. Con- ventional recall and precision characterize hypotheses accepted after a strategy has finished. Historical recall and precision provide additional information by describing all generated hypotheses, including any rejected in the final result. iii Co-Authorship Chapter 2 was written in collaboration with my supervisors Dr. Dorothea Blostein and Dr. James R. Cordy. Chapter 2 previously appeared as a paper in the International Journal of Document Analysis and Recognition (Volume 7, Number 1, September 2004). iv Acknowledgements This dissertation is the culmination of years of work, much of it being comprised of tangential research projects, personal interests, and plain wastes of time. I wish to acknowledge here those who helped insure that this document was actually completed, rather than forgotten in a dark corner. First I wish to thank my supervisors, Dr. Dorothea Blostein and Dr. James R. Cordy, for putting a lot of faith in me and allowing me to make my own mistakes (of which there were many). The central ideas and approach in this thesis were significantly refined through their comments and oversight of my work. I have been lucky enough to get help from friends when I needed it. In no particular order, these include: Jeremy Bradbury, Chris McAloney, Burton Ma, Harinder Aujla, James Wasserman, Michael Lantz, Dan Ghica, Dean Jin, Edward Lank, Ken Whelan, Amber Simpson, Jiro Inoue, and Michelle Crane. Each of these people endured rambling explanations and complaints about my work, and offered helpful advice and information anyway. Thanks so much, guys. Dr. John Handley and Dr. George Nagy have provided me with helpful insights into pattern recognition and the research profession. In the summer of 2003, Dr. Nagy reminded me that if I didn't finish my thesis in a couple years' time, my daugh- ter would start school before I stopped. Avoiding that situation has been a great v motivator. I want to thank the members of Queen's School of Computing support staff that I've dealt with over the last six years: Debby Robertson, Irene Lafleche, Sandra Pryal, Nancy Barker, Tom Bradshaw, Gary Powley, Richard Linley, and Dave Dove. They made my time in the School of Computing enjoyable and productive, and saved me on a few occasions. I am dedicating this dissertation to Katey, my wife, and Alexandra, our daugh- ter. Katey has supported me through this whole process, and having her to bounce research ideas off of has been a great help. Alix is a healthy dose of reality, fun, and happiness. I love them both very much. In closing, I'm very happy to report that I managed to finish school before Alix started. vi Statement of Originality This thesis and the research to which it refers are the product of my own work. Any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices in Computer Sci- ence. I gratefully acknowledge the helpful guidance and support of my supervisors, Dr. Dorothea Blostein and Dr. James R. Cordy. The second chapter of this thesis was written in collaboration with Dr. Blostein and Dr. Cordy, and appeared previ- ously in the International Journal of Document Analysis and Recognition (Volume 7, Number 1, September 2004). vii Contents Abstract ii Co-Authorship iv Acknowledgements v Statement of Originality vii Contents viii List of Figures xiii 1 Introduction 1 1.1 Table Structure Recognition . 2 1.2 Problem Statement . 7 1.3 Overview of Chapters . 9 2 A Survey of Table Recognition 11 2.1 Introduction . 11 2.2 Table Models . 13 2.2.1 Physical and Logical Structure of Tables . 16 viii 2.2.2 A Table Model for Generation: Wang's Model . 18 2.2.3 Table Models for Recognition . 19 2.3 Observations . 23 2.4 Transformations . 26 2.5 Inferences . 29 2.5.1 Classifiers, Segmenters, and Parsers . 29 2.5.2 Inference Sequencing . 33 2.6 Performance Evaluation . 35 2.6.1 Recall and Precision . 37 2.6.2 Edit Distance and Graph Probing . 38 2.6.3 Experimental Design . 39 2.7 Conclusion . 39 3 A Functional Language for Recognition Strategies 41 3.1 Motivation . 41 3.2 Table Recognition as Imitation Games . 44 3.3 Interpretation Trees . 49 3.4 RSL Strategies . 51 3.4.1 A Simple Example . 55 3.5 Table Models in RSL . 57 3.5.1 Physical Structure: Region Geometry . 58 3.5.2 Logical Structure: Region Types and Relations . 59 3.5.3 Hypothesis History . 61 3.6 Inferences in RSL . 62 3.6.1 External Functions . 63 ix 3.6.2 Confidences . 66 3.7 Observations in RSL . 67 3.7.1 Hypothesis Observations . 67 3.7.2 Parameter Observations . 70 3.8 Transformations (Book-Keeping) in RSL . 71 3.9 RSL Operations in Detail . 71 3.9.1 Terminology . 72 3.9.2 Region Creation and Classification . 74 3.9.3 Region Segmentation . 77 3.9.4 Relations on Regions . 80 3.9.5 Rejecting Region Type and Relation Hypotheses . 81 3.9.6 Accepting and Rejecting Interpretations . 83 3.9.7 Conditional Application of Strategies to Interpretations . 85 3.9.8 Parameter Adaptation . 86 3.9.9 File and Terminal Output . 87 3.10 Representing Multiple Inference Results . 88 3.11 Summary . 90 4 Implementation 91 4.1 The RSL Compiler . 91 4.2 The TXL Programming Language . 93 4.3 Translating RSL Programs to TXL . 96 4.3.1 RSL Header to TXL Main Function Translation . 98 4.3.2 RSL Strategy to TXL Function Translation . 100 x 4.3.3 RSL External Function Call to TXL Function Translation . 104 4.4 Implementing External Functions in TXL . 107 4.5 Running Translated Strategies: the RSL Library . 109 4.6 RSL Data . 111 4.6.1 Interpretation Trees and Log Files . 111 4.6.2 Adapted Parameters . 115 4.6.3 Interpretation Graphs . 116 4.7 Recovering Previous Interpretations . 119 4.8 Metrics Based on Hypothesis Histories . 120 4.9 Visualizing and Creating Interpretations . 122 4.9.1 Visualization . 123 4.9.2 Manually Creating Interpretations . 124 4.10 Visualizing Table Models . 129 4.11 Summary . 133 5 Specifying and Comparing Strategies 134 5.1 Handley's Structure Recognition Algorithm . 135 5.2 Hu et al.'s Structure Recognition Algorithm . 137 5.3 Summary Graphs for RSL Strategies . 140 5.4 Ground Truth and Imitation Games . 142 5.5 Illustrative Example: A Cell Imitation Game . 147 5.5.1 Game Definition . 148 5.5.2 Game Outcome . 150 xi 5.5.3 Analysis Using Hypothesis Histories . 151 5.6 Summary . 156 6 Conclusion 159 6.1 Contributions . 160 6.2 Directions for Future Work . 162 6.3 Summary . 166 Bibliography 167 A RSL Operation Summary 184 B RSL Syntax 190 B.1 TXL Grammar Syntax . 190 B.2 RSL Grammar . 191 C Handley's Structure Recognition Algorithm in RSL 196 D Hu et al.'s Structure Recognition Algorithm in RSL 206 E Table Cell Interpretations 211 xii List of Figures 1.1 Table from UW-I database, image h01c . 4 1.2 Table from UW-I database, image a038 . 5 1.3 Table from UW-I database, image v00c . 6 2.1 The Table Recognition Process . 12 2.2 Table Anatomy . 15 2.3 A Table Describing Document Recognition Journals . 18 2.4 Grid Describing the Location of Cells for the Table in Figure 2.3 . 19 2.5 Partial HTML Source Code for the Table in Figure 2.3 . 20 2.6 A Table Describing Available Document Recognition Software. 21 2.7 Types of Structures in Table Models for Recognition . 22 2.8 Observations in the Table Recognition Literature . 25 2.9 Transformations in the Table Recognition Literature. 27 2.10 Classifiers: Inferences Used to Assign Structure and Relation Types . 30 2.11 Segmenters: Inferences Used to Locate Structures . 32 2.12 Parsers: Inferences Used to Relate Structures . 34 3.1 Table Recognition as Imitation Games . 45 3.2 Interpretation Trees . 50 xiii 3.3 Simple RSL Strategy for Table Structure Recognition .