A Language for Specifying and Comparing Table Recognition Strategies

Total Page:16

File Type:pdf, Size:1020Kb

A Language for Specifying and Comparing Table Recognition Strategies A Language for Specifying and Comparing Table Recognition Strategies by Richard Zanibbi A thesis submitted to the School of Computing in conformity with the requirements for the degree of Doctor of Philosophy Queen's University Kingston, Ontario, Canada December 2004 Copyright c Richard Zanibbi, 2004 Abstract Table recognition algorithms may be described by models of table location and struc- ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize these algorithms as strategies in an imitation game, where the goal of the game is to match table interpretations from a chosen procedure as closely as possible. The chosen procedure may be a person or persons producing `ground truth,' or an algorithm. To describe table recognition strategies we have defined the Recognition Strat- egy Language (RSL). RSL is a simple functional language for describing strategies as sequences of abstract decision types whose results are determined by any suit- able decision method. RSL defines and maintains interpretation trees, a simple data structure for describing recognition results. For each interpretation in an interpreta- tion tree, we annotate hypothesis histories which capture the creation, revision, and rejection of individual hypotheses, such as the logical type and structure of regions. We present a proof-of-concept using two strategies from the literature. We demon- strate how RSL allows strategies to be specified at the level of decisions rather than ii algorithms, and we compare results of our strategy implementations using new tech- niques. In particular, we introduce historical recall and precision metrics. Con- ventional recall and precision characterize hypotheses accepted after a strategy has finished. Historical recall and precision provide additional information by describing all generated hypotheses, including any rejected in the final result. iii Co-Authorship Chapter 2 was written in collaboration with my supervisors Dr. Dorothea Blostein and Dr. James R. Cordy. Chapter 2 previously appeared as a paper in the International Journal of Document Analysis and Recognition (Volume 7, Number 1, September 2004). iv Acknowledgements This dissertation is the culmination of years of work, much of it being comprised of tangential research projects, personal interests, and plain wastes of time. I wish to acknowledge here those who helped insure that this document was actually completed, rather than forgotten in a dark corner. First I wish to thank my supervisors, Dr. Dorothea Blostein and Dr. James R. Cordy, for putting a lot of faith in me and allowing me to make my own mistakes (of which there were many). The central ideas and approach in this thesis were significantly refined through their comments and oversight of my work. I have been lucky enough to get help from friends when I needed it. In no particular order, these include: Jeremy Bradbury, Chris McAloney, Burton Ma, Harinder Aujla, James Wasserman, Michael Lantz, Dan Ghica, Dean Jin, Edward Lank, Ken Whelan, Amber Simpson, Jiro Inoue, and Michelle Crane. Each of these people endured rambling explanations and complaints about my work, and offered helpful advice and information anyway. Thanks so much, guys. Dr. John Handley and Dr. George Nagy have provided me with helpful insights into pattern recognition and the research profession. In the summer of 2003, Dr. Nagy reminded me that if I didn't finish my thesis in a couple years' time, my daugh- ter would start school before I stopped. Avoiding that situation has been a great v motivator. I want to thank the members of Queen's School of Computing support staff that I've dealt with over the last six years: Debby Robertson, Irene Lafleche, Sandra Pryal, Nancy Barker, Tom Bradshaw, Gary Powley, Richard Linley, and Dave Dove. They made my time in the School of Computing enjoyable and productive, and saved me on a few occasions. I am dedicating this dissertation to Katey, my wife, and Alexandra, our daugh- ter. Katey has supported me through this whole process, and having her to bounce research ideas off of has been a great help. Alix is a healthy dose of reality, fun, and happiness. I love them both very much. In closing, I'm very happy to report that I managed to finish school before Alix started. vi Statement of Originality This thesis and the research to which it refers are the product of my own work. Any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices in Computer Sci- ence. I gratefully acknowledge the helpful guidance and support of my supervisors, Dr. Dorothea Blostein and Dr. James R. Cordy. The second chapter of this thesis was written in collaboration with Dr. Blostein and Dr. Cordy, and appeared previ- ously in the International Journal of Document Analysis and Recognition (Volume 7, Number 1, September 2004). vii Contents Abstract ii Co-Authorship iv Acknowledgements v Statement of Originality vii Contents viii List of Figures xiii 1 Introduction 1 1.1 Table Structure Recognition . 2 1.2 Problem Statement . 7 1.3 Overview of Chapters . 9 2 A Survey of Table Recognition 11 2.1 Introduction . 11 2.2 Table Models . 13 2.2.1 Physical and Logical Structure of Tables . 16 viii 2.2.2 A Table Model for Generation: Wang's Model . 18 2.2.3 Table Models for Recognition . 19 2.3 Observations . 23 2.4 Transformations . 26 2.5 Inferences . 29 2.5.1 Classifiers, Segmenters, and Parsers . 29 2.5.2 Inference Sequencing . 33 2.6 Performance Evaluation . 35 2.6.1 Recall and Precision . 37 2.6.2 Edit Distance and Graph Probing . 38 2.6.3 Experimental Design . 39 2.7 Conclusion . 39 3 A Functional Language for Recognition Strategies 41 3.1 Motivation . 41 3.2 Table Recognition as Imitation Games . 44 3.3 Interpretation Trees . 49 3.4 RSL Strategies . 51 3.4.1 A Simple Example . 55 3.5 Table Models in RSL . 57 3.5.1 Physical Structure: Region Geometry . 58 3.5.2 Logical Structure: Region Types and Relations . 59 3.5.3 Hypothesis History . 61 3.6 Inferences in RSL . 62 3.6.1 External Functions . 63 ix 3.6.2 Confidences . 66 3.7 Observations in RSL . 67 3.7.1 Hypothesis Observations . 67 3.7.2 Parameter Observations . 70 3.8 Transformations (Book-Keeping) in RSL . 71 3.9 RSL Operations in Detail . 71 3.9.1 Terminology . 72 3.9.2 Region Creation and Classification . 74 3.9.3 Region Segmentation . 77 3.9.4 Relations on Regions . 80 3.9.5 Rejecting Region Type and Relation Hypotheses . 81 3.9.6 Accepting and Rejecting Interpretations . 83 3.9.7 Conditional Application of Strategies to Interpretations . 85 3.9.8 Parameter Adaptation . 86 3.9.9 File and Terminal Output . 87 3.10 Representing Multiple Inference Results . 88 3.11 Summary . 90 4 Implementation 91 4.1 The RSL Compiler . 91 4.2 The TXL Programming Language . 93 4.3 Translating RSL Programs to TXL . 96 4.3.1 RSL Header to TXL Main Function Translation . 98 4.3.2 RSL Strategy to TXL Function Translation . 100 x 4.3.3 RSL External Function Call to TXL Function Translation . 104 4.4 Implementing External Functions in TXL . 107 4.5 Running Translated Strategies: the RSL Library . 109 4.6 RSL Data . 111 4.6.1 Interpretation Trees and Log Files . 111 4.6.2 Adapted Parameters . 115 4.6.3 Interpretation Graphs . 116 4.7 Recovering Previous Interpretations . 119 4.8 Metrics Based on Hypothesis Histories . 120 4.9 Visualizing and Creating Interpretations . 122 4.9.1 Visualization . 123 4.9.2 Manually Creating Interpretations . 124 4.10 Visualizing Table Models . 129 4.11 Summary . 133 5 Specifying and Comparing Strategies 134 5.1 Handley's Structure Recognition Algorithm . 135 5.2 Hu et al.'s Structure Recognition Algorithm . 137 5.3 Summary Graphs for RSL Strategies . 140 5.4 Ground Truth and Imitation Games . 142 5.5 Illustrative Example: A Cell Imitation Game . 147 5.5.1 Game Definition . 148 5.5.2 Game Outcome . 150 xi 5.5.3 Analysis Using Hypothesis Histories . 151 5.6 Summary . 156 6 Conclusion 159 6.1 Contributions . 160 6.2 Directions for Future Work . 162 6.3 Summary . 166 Bibliography 167 A RSL Operation Summary 184 B RSL Syntax 190 B.1 TXL Grammar Syntax . 190 B.2 RSL Grammar . 191 C Handley's Structure Recognition Algorithm in RSL 196 D Hu et al.'s Structure Recognition Algorithm in RSL 206 E Table Cell Interpretations 211 xii List of Figures 1.1 Table from UW-I database, image h01c . 4 1.2 Table from UW-I database, image a038 . 5 1.3 Table from UW-I database, image v00c . 6 2.1 The Table Recognition Process . 12 2.2 Table Anatomy . 15 2.3 A Table Describing Document Recognition Journals . 18 2.4 Grid Describing the Location of Cells for the Table in Figure 2.3 . 19 2.5 Partial HTML Source Code for the Table in Figure 2.3 . 20 2.6 A Table Describing Available Document Recognition Software. 21 2.7 Types of Structures in Table Models for Recognition . 22 2.8 Observations in the Table Recognition Literature . 25 2.9 Transformations in the Table Recognition Literature. 27 2.10 Classifiers: Inferences Used to Assign Structure and Relation Types . 30 2.11 Segmenters: Inferences Used to Locate Structures . 32 2.12 Parsers: Inferences Used to Relate Structures . 34 3.1 Table Recognition as Imitation Games . 45 3.2 Interpretation Trees . 50 xiii 3.3 Simple RSL Strategy for Table Structure Recognition .
Recommended publications
  • OCR Pwds and Assistive Qatari Using OCR Issue No
    Arabic Optical State of the Smart Character Art in Arabic Apps for Recognition OCR PWDs and Assistive Qatari using OCR Issue no. 15 Technology Research Nafath Efforts Page 04 Page 07 Page 27 Machine Learning, Deep Learning and OCR Revitalizing Technology Arabic Optical Character Recognition (OCR) Technology at Qatar National Library Overview of Arabic OCR and Related Applications www.mada.org.qa Nafath About AboutIssue 15 Content Mada Nafath3 Page Nafath aims to be a key information 04 Arabic Optical Character resource for disseminating the facts about Recognition and Assistive Mada Center is a private institution for public benefit, which latest trends and innovation in the field of Technology was founded in 2010 as an initiative that aims at promoting ICT Accessibility. It is published in English digital inclusion and building a technology-based community and Arabic languages on a quarterly basis 07 State of the Art in Arabic OCR that meets the needs of persons with functional limitations and intends to be a window of information Qatari Research Efforts (PFLs) – persons with disabilities (PWDs) and the elderly in to the world, highlighting the pioneering Qatar. Mada today is the world’s Center of Excellence in digital work done in our field to meet the growing access in Arabic. Overview of Arabic demands of ICT Accessibility and Assistive 11 OCR and Related Through strategic partnerships, the center works to Technology products and services in Qatar Applications enable the education, culture and community sectors and the Arab region. through ICT to achieve an inclusive community and educational system. The Center achieves its goals 14 Examples of Optical by building partners’ capabilities and supporting the Character Recognition Tools development and accreditation of digital platforms in accordance with international standards of digital access.
    [Show full text]
  • Extracción De Eventos En Prensa Escrita Uruguaya Del Siglo XIX Por Pablo Anzorena Manuel Laguarda Bruno Olivera
    UNIVERSIDAD DE LA REPÚBLICA Extracción de eventos en prensa escrita Uruguaya del siglo XIX por Pablo Anzorena Manuel Laguarda Bruno Olivera Tutora: Regina Motz Informe de Proyecto de Grado presentado al Tribunal Evaluador como requisito de graduación de la carrera Ingeniería en Computación en la Facultad de Ingeniería ​ ​ 1 1. Resumen ​ ​ En este proyecto, se plantea el diseño y la implementación de un sistema de extracción de eventos en prensa uruguaya del siglo XIX digitalizados en formato de imagen, generando clusters de eventos agrupados según su similitud semántica. La solución propuesta se divide en 4 módulos: módulo de preprocesamiento compuesto por el OCR y un corrector de texto, módulo de extracción de eventos implementado en Python y utilizando Freeling1, módulo de clustering de eventos implementado en Python utilizando Word Embeddings y por último el módulo de etiquetado de los clusters también utilizando Python. Debido a la cantidad de ruido en los datos que hay en los diarios antiguos, la evaluación de la solución se hizo sobre datos de prensa digital de la actualidad. Se evaluaron diferentes medidas a lo largo del proceso. Para la extracción de eventos se logró conseguir una Precisión y Recall de un 56% y 70% respectivamente. En el caso del módulo de clustering se evaluaron las medidas de Silhouette Coefficient, la Pureza y la Entropía, dando 0.01, 0.57 y 1.41 respectivamente. Finalmente se etiquetaron los clusters utilizando como etiqueta las secciones de los diarios de la actualidad, realizándose una evaluación del etiquetado. 1 http://nlp.lsi.upc.edu/freeling/demo/demo.php 2 Índice general 1.
    [Show full text]
  • Tesseract Als Komponente Im OCR-D-Workflow
    - Projekt Optimierter Einsatz von OCR-Verfahren – Tesseract als Komponente im OCR-D-Workflow OCR Noah Metzger, Stefan Weil Universitätsbibliothek Mannheim 30.07.2019 Prozesskette Forschungdaten aus Digitalisaten Digitalisierung/ Text- Struktur- Vorverarbeitung erkennung parsing (OCR) Strukturierung Bücher Generierung Generierung der digitalen Inhalte digitaler Ausgangsformate der digitalen Inhalte (Datenextraktion) 28.03.2019 2 OCR Software (Übersicht) kommerzielle fett = eingesetzt in Bibliotheken Software ABBYY Finereader Tesseract freie Software BIT-Alpha OCRopus / Kraken / Readiris Calamari OmniPage CuneiForm … … Adobe Acrobat CorelDraw ABBYY Cloud OCR Microsoft OneNote Google Cloud Vision … Microsoft Azure Computer Vision OCR.space Online OCR … Cloud OCR 28.03.2019 3 Tesseract OCR • Open Source • Komplettlösung „All-in-1“ • Mehr als 100 Sprachen / mehr als 30 Schriften • Liest Bilder in allen gängigen Formaten (nicht PDF!) • Erzeugt Text, PDF, hOCR, ALTO, TSV • Große, weltweite Anwender-Community • Technologisch aktuell (Texterkennung mit neuronalem Netz) • Aktive Weiterentwicklung u. a. im DFG-Projekt OCR-D 28.03.2019 4 Tesseract an der UB Mannheim • Verwendung im DFG-Projekt „Aktienführer“ https://digi.bib.uni-mannheim.de/aktienfuehrer/ • Volltexte für Deutscher Reichsanzeiger und Vorgänger https://digi.bib.uni-mannheim.de/periodika/reichsanzeiger • DFG-Projekt „OCR-D“ http://www.ocr-d.de/, Modulprojekt „Optimierter Einsatz von OCR-Verfahren – Tesseract als Komponente im OCR-D-Workflow“: Schnittstellen, Stabilität, Performance
    [Show full text]
  • Paperport 14 Getting Started Guide Version: 14.7
    PaperPort 14 Getting Started Guide Version: 14.7 Date: 2019-09-01 Table of Contents Legal notices................................................................................................................................................4 Welcome to PaperPort................................................................................................................................5 Accompanying programs....................................................................................................................5 Install PaperPort................................................................................................................................. 5 Activate PaperPort...................................................................................................................6 Registration.............................................................................................................................. 6 Learning PaperPort.............................................................................................................................6 Minimum system requirements.......................................................................................................... 7 Key features........................................................................................................................................8 About PaperPort.......................................................................................................................................... 9 The PaperPort desktop.....................................................................................................................
    [Show full text]
  • Introduction to Digital Humanities Course 3 Anca Dinu
    Introduction to Digital Humanities course 3 Anca Dinu DH master program, University of Bucharest, 2019 Primary source for the slides: THE DIGITAL HUMANITIES A Primer for Students and Scholars by Eileen Gardiner and Ronald G. Musto, Cambridge University Press, 2015 DH Tools • Tools classification by: • the object they process (text, image, sound, etc.) • the task they are supposed to perform, output or result. • Projects usually require multiple tools either in parallel or in succession. • To learn how to use them - free tutorials on product websites, on YouTube and other. DH Tools Tools classification by the object they process • Text-based tools: • Text Analysis: • The simplest and most familiar example of text analysis is the document comparison feature in Microsoft Word (taking two different versions of the same document and highlight the differences); • Dedicated text analysis tools create concordances, keyword density/prominence, visualizing patterns, etc. (for instance AntConc) DH Tools • Text Annotation: • On a basic level, digital text annotation is simply adding notes or glosses to a document, for instance, putting comments on a PDF file for personal use. • For complex projects, there are interfaces specifically designed for annotations. • Example: FoLiA-Linguistic-Annotation-Tool https://pypi.org/project/FoLiA-Linguistic-Annotation-Tool/ DH Tools • Text Conversion and Encoding tools: • Every text in digital format is encoded with tags, whether this is apparent to the user or not. • Everything from font size, bold, italics and underline, line and paragraph spacing, justification and superscripts, to meta-data as title and author are the result of such coding tags. • Common encoding standards: XML, HTML, TEI, RTF, etc.
    [Show full text]
  • Kofax Omnipage Ultimate
    PRODUCT SUMMARY Kofax OmniPage Ultimate Don't just convert documents —transform them The Speed, Quality and Features to Save Valuable Time, Reduce Costs, Increase Productivity, and Get the Most out of Your Scanning Devices The continued use of paper slows organizations down and frequently leads to “knowledge re-creation.” Not only does this waste time—as you are forced to re-create information from existing paper or search for misplaced documents—but it is also a drain on overall productivity and profitability. Additionally, small businesses and individuals underutilize millions of scanning devices that could save valuable time and effort. Finally, PDF documents—the digital version of paper—are prolific and can be difficult to edit or repurpose. The question remains: “How can you get the document into the format you want without having to do extensive re-typing or re-editing?” The answer is Kofax OmniPage Ultimate. Convert paper, PDF files and forms into documents you can automatically send to others, edit on your PC or archive in a document repository. Amazing accuracy, support for virtually any scanner, the best tools to customize your process, and automatic document routing make it the perfect choice to maximize productivity. Omnipage Ultimate Advantages Maximize PDF search The eDiscovery Assistant for searchable PDF is a revolution in safely converting a single PDF or batches of PDFs of all types into completely searchable documents. Now you don’t have to open PDF files individually, or use an OCR process that may unintentionally wipe out valuable information. Improve accuracy OmniPage Ultimate now delivers an average increase of 25% in OCR accuracy of digital camera images.
    [Show full text]
  • Representation of Web Based Graphics and Equations for the Visually Impaired
    TH NATIONAL ENGINEERING CONFERENCE 2012, 18 ERU SYMPOSIUM, FACULTY OF ENGINEERING, UNIVERSITY OF MORATUWA, SRI LANKA Representation of Web based Graphics and Equations for the Visually Impaired C.L.R. Gunawardhana, H.M.M. Hasanthika, T.D.G.Piyasena,S.P.D.P.Pathirana, S. Fernando, A.S. Perera, U. Kohomban Abstract With the extensive growth of technology, it is becoming prominent in making learning more interactive and effective. Due to the use of Internet based resources in the learning process, the visually impaired community faces difficulties. In this research we are focusing on developing an e-Learning solution that can be accessible by both normal and visually impaired users. Accessibility to tactile graphics is an important requirement for visually impaired people. Recurrent expenditure of the printers which support graphic printing such as thermal embossers is beyond the budget for most developing countries which cannot afford such a cost for printing images. Currently most of the books printed using normal text Braille printers ignore images in documents and convert only the textual part. Printing images and equations using normal text Braille printers is a main research area in the project. Mathematical content in a forum and simple images such as maps in a course page need to be made affordable using the normal text Braille printer, as these functionalities are not available in current Braille converters. The authors came up with an effective solution for the above problems and the solution is presented in this paper. 1 1. Introduction In order to images accessible to visually impaired people the images should be converted into a tactile In this research our main focus is to make e-Learning format.
    [Show full text]
  • Choosing Character Recognition Software To
    CHOOSING CHARACTER RECOGNITION SOFTWARE TO SPEED UP INPUT OF PERSONIFIED DATA ON CONTRIBUTIONS TO THE PENSION FUND OF UKRAINE Prepared by USAID/PADCO under Social Sector Restructuring Project Kyiv 1999 <CHARACTERRECOGNITION_E_ZH.DOC> printed on June 25, 2002 2 CONTENTS LIST OF ACRONYMS.......................................................................................................................................................................... 3 INTRODUCTION................................................................................................................................................................................ 4 1. TYPES OF INFORMATION SYSTEMS....................................................................................................................................... 4 2. ANALYSIS OF EXISTING SYSTEMS FOR AUTOMATED TEXT RECOGNITION................................................................... 5 2.1. Classification of automated text recognition systems .............................................................................................. 5 3. ATRS BASIC CHARACTERISTICS............................................................................................................................................ 6 3.1. CuneiForm....................................................................................................................................................................... 6 3.1.1. Some information on Cognitive Technologies ..................................................................................................
    [Show full text]
  • Downloadable
    UC Berkeley Research and Occasional Papers Series Title THE UC CLIOMETRIC HISTORY PROJECT AND FORMATTED OPTICAL CHARACTER RECOGNITION Permalink https://escholarship.org/uc/item/9xz1748q Author Bleemer, Zachary Publication Date 2018-02-10 eScholarship.org Powered by the California Digital Library University of California Research & Occasional Paper Series: CSHE.3.18 UNIVERSITY OF CALIFORNIA, BERKELEY http://cshe.berkeley.edu/ The University of California@150* THE UC CLIOMETRIC HISTORY PROJECT AND FORMATTED OPTICAL CHARACTER RECOGNITION February 2018 Zachary Bleemer** UC Berkeley Copyright 2018 Zachary Bleemer, all rights reserved. ABSTRACT In what ways—and to what degree—have universities contributed to the long-run growth, health, economic mobility, and gender/ethnic equity of their students’ communities and home states? The University of California ClioMetric History Project (UC- CHP), based at the Center for Studies in Higher Education, extends prior research on this question in two ways. First, we have developed a novel digitization protocol—formatted optical character recognition (fOCR)—which transforms scanned structured and semi-structured texts like university directories and catalogs into high-quality computer-readable databases. We use fOCR to produce annual databases of students (1890s to 1940s), faculty (1900 to present), course descriptions (1900 to present), and detailed budgets (1911-2012) for many California universities. Digitized student records, for example, illuminate the high proportion of 1900s university students who were female and from rural areas, as well as large family income differences between male and female students and between students at public and private universities. Second, UC- CHP is working to photograph, process with fOCR, and analyze restricted student administrative records to construct a comprehensive database of California university students and their enrollment behavior.
    [Show full text]
  • Document Management Solution
    Canon’s high-speed DR-Series Scanners are an integral element of Enterprise Content Management solutions. As front-end scan capture hardware components, the scanners’ reliability, performance, and feature-rich functionality are critical for ensuring the accuracy and overall effectiveness of any document OVERVIEW management workflow. Recognizing this, many of the industry’s leading software com- panies have tested Canon’s DR-Series Scanners for compatibility with their products. With this compatibility, customers are ensured the reliability and benefits of a complete document management solution. Moreover, they have the flexibility of selecting a hardware product that will operate effectively with the software applications they’re currently using or implementing. With an extensive portfolio of DR-Series Scanner products designed to target various business applications, customers are assured the best hardware product to suit their needs. From decentralized imaging and desktop scanning workflows to centralized batch processing of high-production workflows, Canon’s DR-Series Scanners are equipped to perform. Compatibility Table WORKGROUP DEPARTMENTAL Canon’s DR-Series Scanners have been tested for compatibility with many of the industry’s leading Enterprise Content Management solution providers. This is a compatibility list that’s I I C C C 0 0 consistently being updated. For the most up-to-date list, please visit Canon U.S.A’s Web site 0 5 8 8 0 5 at , and click on the page of the specific DR-Series Scanner in which 0 2 2 www.usa.canon.com 3 - - - R R R D D you’re interested. D COMPANY SOFTWARE VERSION INTERFACE A2iA A2iA FieldReader v2.4R1 ISIS/TWAIN/KOFAX • Abbyy FineReader v7.0 Corporation TWAIN • AccuSoft Corp.
    [Show full text]
  • Geometric Layout Analysis of Scanned Documents
    Geometric Layout Analysis of Scanned Documents Dissertation submitted to the Department of Computer Science Technical University of Kaiserslautern for the fulfillment of the requirements for the doctoral degree Doctor of Engineering (Dr.-Ing.) by Faisal Shafait Thesis supervisors: Prof. Dr. Thomas Breuel, TU Kaiserslautern Prof. Dr. Horst Bunke, University of Bern Chair of supervisory committee: Prof. Dr. Karsten Berns, TU Kaiserslautern Kaiserslautern, April 22, 2008 D 386 Abstract Layout analysis–the division of page images into text blocks, lines, and determination of their reading order–is a major performance limiting step in large scale document dig- itization projects. This thesis addresses this problem in several ways: it presents new performance measures to identify important classes of layout errors, evaluates the per- formance of state-of-the-art layout analysis algorithms, presents a number of methods to reduce the error rate and catastrophic failures occurring during layout analysis, and de- velops a statistically motivated, trainable layout analysis system that addresses the needs of large-scale document analysis applications. An overview of the key contributions of this thesis is as follows. First, this thesis presents an efficient local adaptive thresholding algorithm that yields the same quality of binarization as that of state-of-the-art local binarization methods, but runs in time close to that of global thresholding methods, independent of the local window size. Tests on the UW-1 dataset demonstrate a 20-fold speedup compared to traditional local thresholding techniques. Then, this thesis presents a new perspective for document image cleanup. Instead of trying to explicitly detect and remove marginal noise, the approach focuses on locating the page frame, i.e.
    [Show full text]
  • Design and Implementation of a System for Recording and Remote Monitoring of a Parking Using Computer Vision and Ip Surveillance Systems
    VOL. 11, NO. 24, DECEMBER 2016 ISSN 1819-6608 ARPN Journal of Engineering and Applied Sciences ©2006-2016 Asian Research Publishing Network (ARPN). All rights reserved. www.arpnjournals.com DESIGN AND IMPLEMENTATION OF A SYSTEM FOR RECORDING AND REMOTE MONITORING OF A PARKING USING COMPUTER VISION AND IP SURVEILLANCE SYSTEMS Albeiro Cortés Cabezas, Rafael Charry Andrade and Harrinson Cárdenas Almario Department of Electronic Engineering, Surcolombiana University Grupo de Tratamiento de Señales y Telecomunicaciones, Pastrana Av. Neiva Colombia, Huila E-Mail: [email protected] ABSTRACT This article presents the design and implementation of a system for detection of license plates for a public parking located at the municipality of Altamira at the state of Huila in Colombia. The system includes also, a module of surveillance cameras for remote monitoring. The detection system consists of three steps: the first step consists in locating and cutting the vehicle license plate from an image taken by an IP camera. The second step is an optical character recognition (OCR), responsible for identifying alphanumeric characters included in the license plate obtained in the first step. And the third step consists in the date and time register operation for the vehicles entry and exit. This data base stores also the license plate number, information about the contract between parking and vehicle owner and billing. Keywords: image processing, vision, monitoring, surveillance, OCR. 1. INTRODUCTION vehicles in a parking and store them in digital way to gain In recent years the automatic recognition plates control of the activity in the parking, minimizing errors numbers technology -ANPR- (Automatic Number Plate that entail manual registration and storage of information Recognition) has had a great development in congestion in books accounting and papers.
    [Show full text]