Language Independent Statistical Grammar Checking Approach and Compare It to Existing Approaches

Total Page:16

File Type:pdf, Size:1020Kb

Language Independent Statistical Grammar Checking Approach and Compare It to Existing Approaches Hochschule Darmstadt & Reykjavík University Departments of Computer Science LISGrammarChecker: Language Independent Statistical Grammar Checking Master Thesis to achieve the academic degree Master of Science M.Sc. Thesis Verena Henrich and Timo Reuter February 2009 First advisor: Prof. Dr. Bettina HarriehausenMühlbauer Second advisor: Hrafn Loftsson, Ph.D., Assistant Professor Master ii “There is something funny about me grammar checking a paper about grammar checking...” William Scott Harvey iii iv Abstract People produce texts, and therefore the use of computers rises more and more. The gram matical correctness is often very important and thus grammar checkers are applied. Most nowadays grammar checkers are based on rules, but often they do not work as properly as the users want. To counteract this problem, new approaches use statistical data instead of rules as a basis. This work introduces such a grammar checker: LISGrammarChecker, a Language Independent Statistical Grammar Checker. This work hypothesizes that it is possible to check grammar up to a certain extent by only using statistical data. The approach should facilitate grammar checking even in those lan guages where rulebased grammar checking is an insufficient solution, e.g. because the lan guage is so complex that a mapping of all grammatical features to a set of rules is not possi ble. LISGrammarChecker extracts ngrams from correct sentences to built up a statistical data base in a training phase. This data is used to nd errors and propose error corrections. It contains bi, tri, quad and pentagrams of tokens and bi, tri, quad and pentagrams of part ofspeech tags. To detect errors every sentence is analyzed with regard to its ngrams. These ngrams are compared to those in the database. If an ngram is not found in the database, it is assumed to be incorrect. For every incorrect ngram an error point depending on the type of ngram is assigned. Evaluation results prove that this approach works for different languages although the accu racy of the grammar checking varies. Reasons are due to differences in the morphological richness of the languages. The reliability of the statistical data is very important, i.e. it is mandatory to provide enough data in good quality to nd all grammatical errors. The more tags the used tagset contains, the more grammatical features can be represented. Thus the quality of the statistical data and the used tagset inuence the quality of the grammar check ing result. The statistical data, i.e. the ngrams of tokens, can be extended by ngrams from the Internet. In spite of all improvements there are still many issues in nding reliably all grammatical errors. We counteract this problem by a combination of the statistical ap proach with selected language dependent rules. v vi Contents I. Introduction 1. Introduction 3 1.1. Motivation ..................................... 4 1.2. Goal and Definition ................................ 5 1.3. Structure of this Document ............................ 6 2. Fundamentals 9 2.1. Natural Languages and Grammar Checking ................... 9 2.1.1. Definition: The Grammar of a Natural Language ........... 9 2.1.2. Tokenization ............................... 10 2.1.3. Grammar Checking ............................ 11 2.1.4. Types of Grammatical Errors ...................... 12 2.1.5. Definition: ngrams ............................ 14 2.1.6. Multiword Expressions .......................... 15 2.1.7. Sphere of Words ............................. 16 2.1.8. Language Specialities ........................... 16 2.2. CorporaCollections of Text .......................... 17 2.2.1. Definition: Corpus ............................ 17 2.2.2. Sample Corpora .............................. 18 2.3. PartofSpeech Tagging .............................. 19 2.3.1. Tagset ................................... 20 2.3.2. Types of PoS Taggers ........................... 21 2.3.3. Combined Tagging ............................ 22 3. Related Work 25 3.1. Rulebased Approaches .............................. 25 3.1.1. Microsoft Word 97 Grammar Checker ................. 26 3.1.2. LanguageTool for Openffice ....................... 27 3.2. Statistical Approaches ............................... 27 3.2.1. Differential Grammar Checker ..................... 27 3.2.2. ngram based approach .......................... 28 3.3. Our Approach: LISGrammarChecker ...................... 29 vii Contents II. Statistical Grammar Checking 4. Requirements Analysis 33 4.1. Basic Concept and Idea .............................. 33 4.1.1. ngram Checking ............................. 34 4.1.2. Word Class Agreements ......................... 36 4.1.3. Language Independence ......................... 37 4.2. Requirements for Grammar Checking with Statistics ............. 37 4.3. Programming Language .............................. 39 4.4. Data Processing with POSIXShells ....................... 41 4.5. Tokenization .................................... 41 4.6. PartofSpeech Tagging .............................. 42 4.6.1. Combination of PoS Taggers ....................... 42 4.6.2. Issues with PoS Tagging ......................... 43 4.7. Statistical Data Sources .............................. 44 4.8. Data Storage .................................... 44 5. Design 47 5.1. Interaction of the Components .......................... 47 5.2. User Interface: Input and Output ........................ 48 5.3. Training Mode ................................... 49 5.3.1. Input in Training Mode ......................... 49 5.3.2. Data Gathering .............................. 50 5.4. Grammar Checking Mode ............................ 54 5.4.1. Input in Checking Mode ......................... 55 5.4.2. Grammar Checking Methods ...................... 55 5.4.3. Error Counting .............................. 57 5.4.4. Correction Proposal ............................ 60 5.4.5. Grammar Checking Output ....................... 61 5.5. Tagging ....................................... 61 5.6. Data ......................................... 63 6. Implementation 69 6.1. User Interaction .................................. 69 6.2. Tokenization .................................... 71 6.3. Tagging ....................................... 71 6.4. External Program Calls .............................. 73 6.5. Training Mode ................................... 74 6.6. Checking Mode .................................. 75 6.6.1. Checking Methods ............................ 76 6.6.2. Internet Functionality .......................... 78 viii Contents 6.6.3. Correction Proposal ............................ 79 6.6.4. Grammar Checking Output ....................... 80 6.7. Database ...................................... 80 6.7.1. Database Structure/Model ........................ 81 6.7.2. Communication with the Database ................... 81 III. Evaluation 7. Test Cases 87 7.1. Criteria for Testing ................................ 87 7.1.1. Statistical Training Data ......................... 88 7.1.2. Input Data for Checking ......................... 89 7.1.3. Auxiliary Tools .............................. 89 7.1.4. PoS Tagger and Tagsets .......................... 92 7.2. Operate Test Cases ................................ 92 7.2.1. Case 1: Selfmade Error Corpus English, Penn Treebank Tagset .. 92 7.2.2. Case 2: Same as Case 1, Refined Statistical Data ............ 95 7.2.3. Case 3: Selfmade Error Corpus English, Brown Tagset ....... 97 7.2.4. Case 4: Selfmade Error Corpus German ............... 98 7.2.5. Case 5: Several Errors in Sentence English ............... 100 7.3. Operate Test Cases with Upgraded Program .................. 100 7.3.1. Case 6: Selfmade Error Corpus English, Brown Tagset ....... 100 7.3.2. Case 7: Selfmade Error Corpus with Simple Sentences English ... 101 7.4. Program Execution Speed ............................. 102 7.4.1. Training Mode ............................... 102 7.4.2. Checking Mode .............................. 102 8. Evaluation 105 8.1. Program Evaluation ................................ 105 8.1.1. Correct Statistical Data .......................... 106 8.1.2. Large Amount of Statistical Data .................... 107 8.1.3. Program Execution Speed ........................ 107 8.1.4. Language Independence ......................... 108 8.1.5. Internet Functionality .......................... 108 8.1.6. Encoding .................................. 109 8.1.7. Tokenization ............................... 109 8.2. Error Classes .................................... 110 8.3. Evaluation of Test Cases 15 ............................ 112 8.4. Program Extensions ................................ 117 8.4.1. Possibility to Use More Databases at Once ............... 118 ix Contents 8.4.2. More Hybrid ngrams ........................... 118 8.4.3. Integration of Rules ............................ 119 8.4.4. New Program Logic: Combination of Statistics with Rules ...... 120 8.5. Evaluation of Upgraded Program ......................... 120 IV. Concluding Remarks 9. Conclusion 127 10. Future work 129 10.1. More Statistical Data ............................... 129 10.2. Encoding ...................................... 130 10.3. Split Long Sentences ................................ 130 10.4. Statistical Information About Words and Sentences .............. 132 10.5. Use ngram Amounts ............................... 132 10.6. Include more Rules ................................ 132 10.7. Tagset that Conforms Requirements ....................... 133 10.8. Graphical User Interface ............................
Recommended publications
  • FREE CAT TOOLS AS an ALTERNATIVE to COMMERCIAL SOFTWARE: Omegat
    FACULTAD DE TRADUCCIÓN E INTERPRETACIÓN Grado en Traducción e Interpretación TRABAJO FIN DE GRADO FREE CAT TOOLS AS AN ALTERNATIVE TO COMMERCIAL SOFTWARE: OmegaT Presentado por Veronica Nicoleta Anica Tutelado por Ana María Alconchel Soria, 2014 Free CAT tools as an alternative to commercial software: OmegaT Content ACKNOWLEDGEMENT ............................................................................................................... 4 I. INTRODUCTION ....................................................................................................................... 6 1. Connection with competencies ............................................................................................ 7 1.1. General competencies ................................................................................................... 7 1.2. Specific competencies ................................................................................................... 8 1. PURPOSE ............................................................................................................................. 10 2. METHODOLOGY ................................................................................................................... 11 II. THEORETICAL APPROACH ................................................................................................... 13 1. Translation and technology ................................................................................................ 13 1.1. Technological advances and the process of globalization
    [Show full text]
  • Online Spanish for Kids
    Learn more online • www.lingualinkup.com ONLINE SPANISH FOR KIDS TOOL FOR PROOFREADING IN SPANISH – LANGUAGETOOL REVIEW I used to struggle with Spanish grammar when creating content, but this tool I’m reviewing today is great for proofreading your Spanish online (spell checker & grammar checker in Spanish). The name is LanguageTool and it has helped me learn about the accents (á, é, í, ó, ú, ü, ñ, ¿, ¡) on certain words, or if I got a word spelled wrong, the el/la wrong, and in many more corrections. LanguageTool for Spanish Proofreading: https://languagetool.org/ It corrects more than just misspellings, it also corrects words in sentences that should be different. If you haven’t checked it out you should, test it out for a few days because it will truly help you to correct your Spanish. You can use it in all of your emails, Google Docs, Google Chrome, Firefox, Microsoft Word, LibreOffice, and it can also be used offline – no need for the internet to use it. Screenshots of Some of our Blogs Before Editing Here are a few examples where we used the tool in a Google Doc. This first example is where you can see how much the tool can correct mistakes in Spanish: LEARN SPANISH | ONLINE SPANISH FOR KIDS | 1 Learn more online • www.lingualinkup.com ONLINE SPANISH FOR KIDS However, as we see in these two next examples, the tool isn’t perfect. And so keep that in mind that it will make the wrong suggestions at times if the mistake is too difficult for LanguageTool to figure it out.
    [Show full text]
  • Getting Started with Libreoffice 3.4 Copyright
    Getting Started with LibreOffice 3.4 Copyright This document is Copyright © 2010–2012 by its contributors as listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), version 3.0 or later. Contributors Jean Hollis Weber Jeremy Cartwright Ron Faile Jr. Martin Fox Dan Lewis David Michel Andrew Pitonyak Hazel Russman Peter Schofield John A Smith Cover art: Drew Jensen Christoph Noack Klaus-Jürgen Weghorn Jean Hollis Weber Acknowledgements This book is adapted from Getting Started with OpenOffice.org 3.3. The contributors to that book are listed on page 13. Feedback Please direct any comments or suggestions about this document to: [email protected] Publication date and software version Published 19 April 2012. Based on LibreOffice 3.4.6. Documentation for LibreOffice is available at http://www.libreoffice.org/get-help/documentation Contents Copyright..................................................................................................................................... 2 Preface.................................................................................................................................. 9 Who is this book for?................................................................................................................. 10 What's in this book?..................................................................................................................
    [Show full text]
  • Download Open Office.Org Writer
    Download open office.org writer Official Apache OpenOffice download page. Join the Download Apache OpenOffice. (Hosted by Are you an experienced lead technical writer? Are you ​ - Download · ​Apache OpenOffice Downloads · ​Porting · ​Licenses. Screendump of Apache OpenOffice Writer Of course, you are also free to create your own templates, or download templates from our Templates repository. Apache OpenOffice Downloads - Official Site - All Builds To use this verfication you have to download the respective ASC file in the table below and this KEYS. OpenOffice Writer, free and safe download. OpenOffice Writer latest version: Create and edit DOC files with ease. OpenOffice Writer is a lightweight app that lets. Note: As of April , commercial development of project has been terminated. The code was contributed to the Apache Software Foundation. Open Office free download. Get new version of Open Office. An office software program ✓ Free ✓ Updated ✓ Download now. If you're looking for Microsoft- caliber applications for free, OpenOffice has Download Now Secure Download . Publisher web site, Download Apache OpenOffice for free. Free alternative for Office productivity tools: Apache OpenOffice - formerly known as - is an open-source. Download Apache OpenOffice () for Windows. that covers all the requirements that offices might need, including word processing (Writer). Download Freeware ( MB) This version of OpenOffice features improved ODF support, including new OpenOffice is also available on Mac. words in a document (Word or OpenOffice) · Myth - is written in Java Checker under Writer · Disable the launch of OpenOffice at startup. NCDAE Tips and Tools: Writer. Created: August Large file size may make it difficult to download a file.
    [Show full text]
  • Linuxforbiologists.Pdf
    Linux for Biologists A Cookbook Vimalkumar Velayudhan First edition June 9, 2021 This work is licensed under Attribution‑NonCommercial‑ShareAlike 4.0 International. To view a copy of this license, visit http://creativecommons.org/licenses/by‑nc‑sa/4.0/ For Shanthi Thanks I would like to express my gratitude to my mentors, colleagues, students, friends and family. Without their support and encouragement, this book wouldn’t have been possible. Thanks also to the wonderful world of Linux and open source software and the community around it. i ii Contents 1 About this book 1 1.1 Who is it for? .................... 2 1.2 What you will learn ................. 3 1.3 What you will need ................. 4 1.3.1 Linux desktop ............... 5 1.3.2 Administrator privileges ......... 7 1.4 About the author .................. 8 2 Getting started with Linux 9 2.1 Linux — an overview ................ 10 2.1.1 Linux distribution ............. 11 2.1.2 Desktop environment ........... 12 2.1.3 Ways to run a Linux desktop ....... 17 2.2 Running a Linux virtual machine ......... 18 2.2.1 Requirements ............... 19 2.2.2 Importing the virtual machine image .. 22 2.2.3 Starting the virtual machine ....... 28 2.2.4 Stopping the virtual machine ....... 30 iii 2.3 The desktop ..................... 31 2.3.1 The Cinnamon desktop .......... 32 2.3.2 Changing system settings ......... 33 2.4 Available software ................. 36 2.4.1 Files — manage files and directories ... 38 2.4.2 Firefox — browse the web ......... 41 2.4.3 Text Editor — create and edit text files .. 42 2.4.4 LibreOffice — edit documents and spread‑ sheets ..................
    [Show full text]
  • Emacs Documentation Release Latest
    Emacs Documentation Release latest Jun 04, 2021 Documentation 1 Licence 3 2 How to Use This Document 5 3 Install Packages 7 4 General 9 4.1 Utilities..................................................9 4.2 Remove Keybind.............................................9 4.3 Assorted Pieces..............................................9 4.4 Window Layout/Navigation....................................... 11 4.5 System Path/Keyboard.......................................... 12 4.6 General Editing.............................................. 13 4.7 Minibuffer history............................................ 13 5 GUI - Emacs Looks Cool 15 5.1 Fonts................................................... 15 5.2 Minimalists GUI............................................. 15 5.3 Theme.................................................. 15 5.4 Mode Line................................................ 16 6 Completion and Selection 17 6.1 Helm - Fuzzy Match........................................... 17 6.2 Multi-Cursor & Helm-swoop - Multiple Selection........................... 18 6.3 ace-jump................................................. 19 6.4 Expand-Region - Incremental Selection................................. 19 7 File Management 21 7.1 Alternative to shell............................................ 21 7.2 Projectile - Directory Access....................................... 23 7.3 Remote (SSH).............................................. 23 7.4 Git Sync................................................. 24 7.5 Testing Buffers.............................................
    [Show full text]
  • A Rule-Based Style and Grammar Checker
    A Rule-Based Style and Grammar Checker Daniel Naber Diplomarbeit Technische Fakultät, Universität Bielefeld Datum: 28.08.2003 Betreuer: Prof. Dr.-Ing. Franz Kummert, Technische Fakultät Dr. Andreas Witt, Fakultät für Linguistik und Literaturwissenschaft Contents 1 Introduction 3 2 Theoretical Background 4 2.1 Part-of-Speech Tagging . 5 2.2 Phrase Chunking . 6 2.3 Grammar Checking . 7 2.3.1 Grammar Errors . 8 2.3.2 Sentence Boundary Detection . 9 2.4 Controlled Language Checking . 10 2.5 Style Checking . 12 2.6 False Friends . 12 2.7 Evaluation with Corpora . 13 2.7.1 British National Corpus . 14 2.7.2 Mailing List Error Corpus . 15 2.7.3 Internet Search Engines . 15 2.8 Related Projects . 16 2.8.1 Ispell and Aspell . 16 2.8.2 Style and Diction . 17 2.8.3 EasyEnglish . 17 2.8.4 Critique . 18 2.8.5 CLAWS as a Grammar Checker . 18 2.8.6 GramCheck . 18 2.8.7 Park et al’s Grammar Checker . 19 2.8.8 FLAG . 19 3 Design and Implementation 20 3.1 Class Hierarchy . 20 3.2 File and Directory Structure . 21 3.3 Installation . 23 3.3.1 Requirements . 23 3.3.2 Step-by-Step Installation Guide . 23 3.4 Spell Checking . 25 3.5 Part-of-Speech Tagging . 25 3.5.1 Constraint-Based Extension . 26 3.5.2 Using the Tagger on the Command Line . 27 3.5.3 Using the Tagger in Python Code . 28 3.5.4 Test Cases . 29 3.6 Phrase Chunking . 29 3.7 Sentence Boundary Detection .
    [Show full text]
  • Linguistic Style Checkingwith Program Checking Tools
    Linguistic Style Checking with Program Checking Tools Fabrizio Perin Lukas Renggli Jorge Ressia Software Composition Group, University of Bern, Switzerland http://scg.unibe.ch/ Abstract Written text is an important component in the process of knowledge acquisition and communication. Poorly written text fails to deliver clear ideas to the reader no matter how revolutionary and ground-breaking these ideas are. Providing text with good writing style is essential to transfer ideas smoothly. While we have sophisti- cated tools to check for stylistic problems in program code, we do not apply the same techniques for written text. In this paper we present TextLint, a rule-based tool to check for common style errors in natural language. TextLint provides a structural model of written text and an extensible rule-based checking mechanism. 1 Introduction In a typical programming language the parser and compiler validate the syn- tax of the program. IDEs often provide program checkers [1] that help us to detect problematic code. The goal of program checkers is to provide hints to developers on how to improve coding style and quality. Today's program checkers [2] reliably detect issues like possible bugs, portability issues, viola- tions of coding conventions, duplicated, dead, or suboptimal code. While a program checker can assist the review process of source code, its suggestions are not necessarily applicable to all given contexts and might need further review of a senior developer. Most of today's text editors are equipped with spelling and grammar checkers. These checkers are capable of detecting a variety of errors in various languages as well as pointing out invalid grammatical constructs.
    [Show full text]
  • Tools to Improve English Text [LWN Subscriber-Only Content]
    Tools to improve English text [LWN subscriber-only content] Open-source developers put a lot of emphasis on quality and have created many tools to improve source code, such June 16, 2020 as lintersand code formatters. Documentation, on the other This article was contributed by hand, doesn't receive the attention it deserves. LWN Martin Michlmayr reviewed several grammar and style-checking tools back in 2016. It seems like a good time to evaluate progress in this area. Spell checkers almost seem too basic to mention, but, given the number of typos I encounter in open-source documentation, they might warrant a brief look. One problem with technical texts is that English prose is often mixed with code or URLs that trigger spell checkers. Aspell offers several filter modes (including Markdown as of version 0.60.8), but Hunspell believes that other tools like editors should do the work to distinguish between code and text. This is where PySpelling comes in handy. It ships with filters that make it easy to run spell checkers on formatted text (such as Markdown) or programming languages (extracting docstrings from Python code). One aim of PySpelling is to embed spell checking into continuous integration (CI) systems, which is a laudable goal. Speaking of spell checking and editors, writing this article reminded me that, despite being a frequent user of spell checkers, I never configured spell checking in my editor of choice. Vim has had a built-in spell checker for a long time. One minor annoyance is that Vim relies on its own word list instead of using system-wide dictionaries from Aspell and Hunspell.
    [Show full text]
  • Distilling Crowd Knowledge from Software‑Specific Q&A Discussions
    This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. Distilling crowd knowledge from software‑specific Q&A discussions for assisting developers’ knowledge search Chen, Chunyang 2018 Chen, C. (2017). Distilling crowd knowledge from software‑specific Q&A discussions for assisting developers’ knowledge search. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/75873 https://doi.org/10.32657/10356/75873 Downloaded on 07 Oct 2021 22:31:50 SGT NANYANG TECHNOLOGICAL UNIVERSITY Distilling Crowd Knowledge from Software-Specific Q&A Discussions for Assisting Developers’ Knowledge Search Chunyang Chen School of Computer Science and Engineering A thesis submitted to Nanyang Technological University in partial fulfillment of the requirements for the degree of Doctor of Philosophy June, 2018 THESIS ABSTACT Distilling Crowd Knowledge from Software-Specific Q&A Discussions for Assisting Developers’ Knowledge Search by Chunyang Chen Doctor of Philosophy School of Computer Science and Engineering Nanyang Technological University, Singapore With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow.
    [Show full text]
  • Computer Support for the Analysis and Improvement of the Readability of IT-Related Texts
    Department of Informatics TECHNISCHE UNIVERSITÄT MÜNCHEN Master’s Thesis in Information Systems Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf Department of Informatics TECHNISCHE UNIVERSITÄT MÜNCHEN Master’s Thesis in Information Systems Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Computergestützte Analyse und Verbesserung der Lesbarkeit von IT-bezogenen Texten Author: Matthias Holdorf Supervisor: Prof. Dr. Florian Matthes Advisor: Bernhard Waltl, M.Sc. Submission Date: 15.11.2016 I confirm that this master’s thesis is my own work and I have documented all sources and materials used. Ich versichere, dass ich diese Master’s Thesis selbstständig verfasst und nur die angegebenen Quellen und Hilfsmittel verwendet habe. München, 15.11.2016 Matthias Holdorf Acknowledgments First and foremost, I would like to thank my advisor Bernhard Waltl and my industry advisor Andreas Zitzelsberger for their preeminent support, interest, and time. I feel fortunate to have had the opportunity to learn from both the academic field and the industry. Furthermore, I would like to thank Prof. Dr. Florian Matthes for his time and feedback, and for providing me the opportunity to write this thesis at the Software Engineering for Business Information Systems (SEBIS) chair, which he holds. I also want to thank my conversation partners Tobias Waltl, Mark Becker, and Henning Femmer. I would like to thank the numerous participants of the quantitative survey, and especially my interview partners. During the search for interview partners, we had an astonishing 100% confirmation rate. Even the managing directors took the time to answer our questions.
    [Show full text]
  • Interface for Integration of Language Checking Tools to Text Editing Software
    Masaryk University Faculty of Informatics Interface for Integration of Language Checking Tools to Text Editing Software Bachelor’s Thesis Jan Tojnar Brno, Spring 2018 Masaryk University Faculty of Informatics Interface for Integration of Language Checking Tools to Text Editing Software Bachelor’s Thesis Jan Tojnar Brno, Spring 2018 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jan Tojnar Advisor: RNDr. Adam Rambousek, Ph.D. i Acknowledgements I would like to thank my advisor RNDr. Adam Rambousek, Ph.D. for his patience, and my parents for their support and proofreading. iii Abstract In this thesis, we propose a library that unifies various text check- ing tools behind a single interface for easier integration of grammar checking into applications. The library is modular and supports dif- ferent providers; a grammar checking provider using LanguageTool and a spell checking provider using Enchant were developed as ex- amples. Additionally, AbiWord text editor was modified to use our library. iv Keywords grammar checking, spell checking, text editor integration, linguistic framework, freedesktop v Contents 1 Introduction 1 2 Overview of existing checkers 3 2.1 Elixir ............................. 3 2.2 Enchant ............................ 3 2.3 Link Grammar ........................ 3 2.4 LanguageTool ......................... 4 2.5 After the Deadline ......................
    [Show full text]