The Ispell and Aspell Command Line Spellcheckers SPELLING LESSON

Total Page:16

File Type:pdf, Size:1020Kb

The Ispell and Aspell Command Line Spellcheckers SPELLING LESSON Command Line: Ispell and Aspell LINUXUSER The Ispell and Aspell command line spellcheckers SPELLING LESSON www.sxc.hu Nobody is safe from typos and jumbled up words. Spellcheckers like Ispell and Aspell keep the letters in the right places. BY HEIKE JURZIK pellcheckers in office packages strate how to call both Ispell and Aspell coded, define a character set using the -T and mail programs help find while working with the Vim and option. Stypos and suggest alternative (X)Emacs editors. There is no need to specify the diction- spellings. Ispell and Aspell proofread ary and character set options each time your files at the command line, and, al- Simply Ispell you run the program; instead you can though they can’t replace a good diction- Ispell runs interactively at the command add appropriate entries to the ~/.bashrc ary, they are extremely reliable helpers. line. If the spellchecker finds an error, it Bash configuration file in your home Ispell is a mature application that has suggests an alternative spelling. If you directory: been on various Unix derivatives for decide not to accept the suggestion, you many years now. The GNU project intro- can add the word to your personal dic- export U duced Aspell as a potential successor. tionary, or just ignore the term for the DICTIONARY=dictionary name Aspell comes with dictionaries for vari- rest of the document. export U ous languages and can handle UTF-8 (in The simple command syntax is CHARSET=character set contrast to Ispell). This article shows you how to proof- ispell filename Then reparse the configuration file using read at the command line. I’ll describe the following command: some tips and tricks, and I’ll demon- You may need to run Ispell with the -d (for “dictionary”) option, depending on source ~/.bashrc Command Line the dictionary you need. A quick glance at the /usr/lib/ispell directory (Figure 1) Interaction Required Although GUIs such as KDE or GNOME are useful for various tasks, if you intend tells you which dictionaries are available Ispell uses the top line to display un- to get the most out of your Linux ma- on your system. You should see a num- known words. If the spellchecker finds chine, you will need to revert to the ber of dictionaries with the .aff suffix – similar words in the dictionary, it gives good old command line from time to Ispell uses these dictionaries to manage you an enumerated list of the alternative time. Apart from that, you will probably hyphenation and accented characters in words (Figure 2). To accept one of these be confronted with various scenarios the various languages. suggestions, simply press the key for the where some working knowledge will be Some Linux distributions use different number that appears with the word in extremely useful in finding your way names for the dictionaries. To make sure the enumerated list to tell Ispell to re- through the command line jungle. that special characters are correctly en- place the word. WWW.LINUX - MAGAZINE.COM ISSUE 63 FEBRUARY 2006 85 LINUXUSER Command Line: Ispell and Aspell Figure 1: The Ispell on our lab machine understands British and US English. Figure 2: Ispell offers similar words as alternatives. If you do not want to replace the word actually specifying the option for creat- .html or .htm suffix as an HTML file. In that Ispell thinks is wrong, you can ei- ing a backup copy. This is sometimes the this case, the spellchecker ignores any ther press the space key to ignore the default behavior as specified by the HTML tags it discovers during the spell- word once, press [A] to ignore the word package maintainer. If prefer not to cre- checking session. The only exception is for the rest of the Ispell session, or press ate backup copies, you can disable this the ALT attribute, which you can use to [I] to store the word permanently. In the feature by setting the -x flag. define alternative text for an embedded latter case, Ispell adds the term to your image. personal dictionary. Other Formats If the file suffix is missing or in capi- Your personal word lists are stored as Ispell has the ability to spellcheck the tals (.HTM), you can tell Ispell that the hidden files in your home directory. The HTML and TeX/ LaTeX formats in addi- file is HTML when launching the pro- filename is made up of .ispell_ and the tion to text files. The spellchecker auto- gram: name of the dictionary you used. For ex- matically identifies any file with the ample, if you were working with the ispell -H file.HTM ngerman dictionary, your personal word Spellcheckers in list would be .ispell_ngerman. Vim and Emacs The same thing applies to XML and If Ispell discovers an error but does Ispell and Aspell run in the background SGML files: pass the -H option to the not give you a (reasonable) alternative, in many KDE and Gnome applications, program when launching Ispell to tell you can press [R] (for “replace”) and and will check your documents at a click. the spellchecker to ignore tags when type your changes at the prompt. Ispell But it is quite easy to talk Vim and spellchecking. (Some distributions, such replaces the word throughout the text. (X)Emacs into cooperating. as Debian, use -h (with a lowercase “h”) A single line in the Vim configuration file for this option, whereas others, such as The End of the Story (~/.vimrc) gives you a macro. To map a Suse, use a capital -H.) Ispell quits after spellchecking, but you function key [F10] to the spellchecker, Ispell identifies TeX-/ LaTeX files by can alternatively press [Q] to quit the add the following line to the file for reference to the .tex file suffix and does Ispell: spellchecker and discard any changes. not spellcheck formatting instructions. Ispell will prompt you to determine if map <F10> :w!<CR>:U Ispell uses backslash, which normally you really do want to discard the !ispell %<CR>:e! %<CR> introduces control sequences in the lay- changes. You can then press either [Y] to If you prefer Aspell, use the following out system, to identify formatting, how- quit the program or [N] to carry on spell- entry instead: ever, text in curly brackets is spell- checking. The alternative is to press [X] checked. The spellchecker identifies map <F10> :w!<CR>:U to quit Ispell, but store any changes you have made. !aspell -c %<CR>:e! %<CR> Heike Jurzik studied Place any command line options you German, Computer Creating Backups need after the command. Science and English Ispell gives you a safe option if you spec- If you use Emacs or Xemacs, add the fol- at the University of Cologne, Germany. ify the -b parameter when launching the lowing line to the configuration file (~/. She discovered program. This tells the spellchecker to emacs or ~/.xemacs/custom.el): Linux in 1996 and (setq-default U create a backup for each file it checks. has been fascinated The backup copies are identifiable by a ispell-program-name "aspell") with the scope of the Linux com- .bak or ~ suffix and contain the original THEAUTHOR or: mand line ever since. In her leisure text. time you might find Heike hanging (setq-default U Many distributions have versions of out at Irish folk sessions or visiting Ispell that create backups without you ispell-program-name "ispell") Ireland. 86 ISSUE 63 FEBRUARY 2006 WWW.LINUX - MAGAZINE.COM Command Line: Ispell and Aspell LINUXUSER comments, which are escaped with the can use your package manager to install dictionaries below your home directory, percent character (%) in TeX-/ LaTeX the missing resource. Just like in Ispell, and – in a similar approach to Ispell – it files, and checks them for typos. you can use the -d option to specify a uses a combination of the word .aspell. dictionary. If you have a UTF-8 encoded and the current dictionary to identify the Aspell Alternative text, don’t forget to let Aspell know by file. Aspell is an alternative spellchecker for specifying --encoding=utf-8. Aspell also quits automatically after the command line. This program is the completing the spellcheck. You can press designated successor for Ispell, and it al- Interaction Required [X] to quit the program and store any ready offers a lot more. For example, the Aspell highlights unknown terms and changes you have made up to that point. current versions (0.60.x) support the gives you a numbered list of suggested To quit the spellchecker and discard your UTF-8 character set. The generic syntax replacements. To accept a suggestion, changes, press [B] and confirm when is as follows: press the appropriate number and press prompted. [Enter]. aspell -c filename If you need to replace a word with a A Question of Format completely different word, press [R] and Just like Ispell, Aspell supports various It is not typically necessary to explicitly type the replacement for the word you file formats. You can specify the -H (for specify a dictionary, as Aspell evaluates are replacing. In contrast to Ispell, this HTML/ SGML/ XML) or -t (for TeX/ your language settings. However, if you option in Aspell only replaces the cur- LaTeX). see an error telling you the correct dic- rent instance of the word. If you want If you would like to migrate to the tionary could not be found, you might Aspell to replace all instances of the newer spellchecker, the Aspell package like to check if you really have the nec- word in the document, you need to press includes a Perl script titled aspell-import essary dictionary installed on your sys- [Shift]+[R] instead.
Recommended publications
  • GNU Emacs GNU Emacs
    GNU Emacs GNU Emacs Reference Card Reference Card Computing and Information Technology Computing and Information Technology Replacing Text Replacing Text Replace foo with bar M-x replace-string Replace foo with bar M-x replace-string foo bar foo bar Query replacement M-x query replace Query replacement M-x query replace then replace <Space> then replace <Space> then skip to next <Backspace> then skip to next <Backspace> then replace and exit <Period> then replace and exit <Period> then replace all ! then replace all ! then go back ^ then go back ^ then exit query <Esc> then exit query <Esc> Setting a Mark Setting a Mark Set mark here C-<Space> or C-@ Set mark here C-<Space> or C-@ Exchange point and mark C-x C-x Exchange point and mark C-x C-x Mark paragraph M-h Mark paragraph M-h Mark entire file C-x h Mark entire file C-x h Sorting Text Sorting Text Normal sort M-x sort-lines Normal sort M-x sort-lines Reverse sort <Esc>-1 M-x sort lines Reverse sort <Esc>-1 M-x sort lines Inserting Text Inserting Text Scan file fn M-x view-file fn Scan file fn M-x view-file fn Write buffer to file fn M-x write-file fn Write buffer to file fn M-x write-file fn Insert file fn into buffer M-x insert-file fn Insert file fn into buffer M-x insert-file fn Write region to file fn M-x write-region fn Write region to file fn M-x write-region fn Write region to end of file fn M-x append-to-file fn Write region to end of file fn M-x append-to-file fn Write region to beginning of Write region to beginning to specified buffer M-x prepend-to-buffer specified buffer
    [Show full text]
  • Emacspeak — the Complete Audio Desktop User Manual
    Emacspeak | The Complete Audio Desktop User Manual T. V. Raman Last Updated: 19 November 2016 Copyright c 1994{2016 T. V. Raman. All Rights Reserved. Permission is granted to make and distribute verbatim copies of this manual without charge provided the copyright notice and this permission notice are preserved on all copies. Short Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright ::::::::::::::::::::::::::::::::::::::::::: 2 2 Announcing Emacspeak Manual 2nd Edition As An Open Source Project ::::::::::::::::::::::::::::::::::::::::::::: 3 3 Background :::::::::::::::::::::::::::::::::::::::::: 4 4 Introduction ::::::::::::::::::::::::::::::::::::::::: 6 5 Installation Instructions :::::::::::::::::::::::::::::::: 7 6 Basic Usage. ::::::::::::::::::::::::::::::::::::::::: 9 7 The Emacspeak Audio Desktop. :::::::::::::::::::::::: 19 8 Voice Lock :::::::::::::::::::::::::::::::::::::::::: 22 9 Using Online Help With Emacspeak. :::::::::::::::::::: 24 10 Emacs Packages. ::::::::::::::::::::::::::::::::::::: 26 11 Running Terminal Based Applications. ::::::::::::::::::: 45 12 Emacspeak Commands And Options::::::::::::::::::::: 49 13 Emacspeak Keyboard Commands. :::::::::::::::::::::: 361 14 TTS Servers ::::::::::::::::::::::::::::::::::::::: 362 15 Acknowledgments.::::::::::::::::::::::::::::::::::: 366 16 Concept Index :::::::::::::::::::::::::::::::::::::: 367 17 Key Index ::::::::::::::::::::::::::::::::::::::::: 368 Table of Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright :::::::::::::::::::::::::::::::::::::::
    [Show full text]
  • Emacspeak User's Guide
    Emacspeak User's Guide Jennifer Jobst Revision History Revision 1.3 July 24,2002 Revised by: SDS Updated the maintainer of this document to Sharon Snider, corrected links, and converted to HTML Revision 1.2 December 3, 2001 Revised by: JEJ Changed license to GFDL Revision 1.1 November 12, 2001 Revised by: JEJ Revision 1.0 DRAFT October 19, 2001 Revised by: JEJ This document helps Emacspeak users become familiar with Emacs as an audio desktop and provides tutorials on many common tasks and the Emacs applications available to perform those tasks. Emacspeak User's Guide Table of Contents 1. Legal Notice.....................................................................................................................................................1 2. Introduction.....................................................................................................................................................2 2.1. What is Emacspeak?.........................................................................................................................2 2.2. About this tutorial.............................................................................................................................2 3. Before you begin..............................................................................................................................................3 3.1. Getting started with Emacs and Emacspeak.....................................................................................3 3.2. Emacs Command Conventions.........................................................................................................3
    [Show full text]
  • Spelling Correction: from Two-Level Morphology to Open Source
    Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.
    [Show full text]
  • Hunspell – the Free Spelling Checker
    Hunspell – The free spelling checker About Hunspell Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. Hunspell interfaces: Ispell-like terminal interface using Curses library, Ispell pipe interface, OpenOffice.org UNO module. Main features of Hunspell spell checker and morphological analyzer: - Unicode support (affix rules work only with the first 65535 Unicode characters) - Morphological analysis (in custom item and arrangement style) and stemming - Max. 65535 affix classes and twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.) - Support complex compoundings (for example, Hungarian and German) - Support language specific features (for example, special casing of Azeri and Turkish dotted i, or German sharp s) - Handle conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms. - Free software (LGPL, GPL, MPL tri-license) Usage The src/tools dictionary contains ten executables after compiling (or some of them are in the src/win_api): affixcompress: dictionary generation from large (millions of words) vocabularies analyze: example of spell checking, stemming and morphological analysis chmorph: example of automatic morphological generation and conversion example: example of spell checking and suggestion hunspell: main program for spell checking and others (see manual) hunzip: decompressor of hzip format hzip: compressor of
    [Show full text]
  • Finite State Recognizer and String Similarity Based Spelling
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by BRAC University Institutional Repository FINITE STATE RECOGNIZER AND STRING SIMILARITY BASED SPELLING CHECKER FOR BANGLA Submitted to A Thesis The Department of Computer Science and Engineering of BRAC University by Munshi Asadullah In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering Fall 2007 BRAC University, Dhaka, Bangladesh 1 DECLARATION I hereby declare that this thesis is based on the results found by me. Materials of work found by other researcher are mentioned by reference. This thesis, neither in whole nor in part, has been previously submitted for any degree. Signature of Supervisor Signature of Author 2 ACKNOWLEDGEMENTS Special thanks to my supervisor Mumit Khan without whom this work would have been very difficult. Thanks to Zahurul Islam for providing all the support that was required for this work. Also special thanks to the members of CRBLP at BRAC University, who has managed to take out some time from their busy schedule to support, help and give feedback on the implementation of this work. 3 Abstract A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the suggestions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word.
    [Show full text]
  • SDL Tridion Docs Release Notes
    SDL Tridion Docs Release Notes SDL Tridion Docs 14 July 2019 ii SDL Tridion Docs Release Notes 1 Welcome to Tridion Docs Release Notes 1 Welcome to Tridion Docs Release Notes This document contains the complete Release Notes for SDL Tridion Docs 14. Customer support To contact Technical Support, connect to the Customer Support Web Portal at https://gateway.sdl.com and log a case for your SDL product. You need an account to log a case. If you do not have an account, contact your company's SDL Support Account Administrator. Acknowledgments SDL products include open source or similar third-party software. 7zip Is a file archiver with a high compression ratio. 7-zip is delivered under the GNU LGPL License. 7zip SFX Modified Module The SFX Modified Module is a plugin for creating self-extracting archives. It is compatible with three compression methods (LZMA, Deflate, PPMd) and provides an extended list of options. Reference website http://7zsfx.info/. Akka Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event- driven applications on the JVM. Amazon Ion Java Amazon Ion Java is a Java streaming parser/serializer for Ion. It is the reference implementation of the Ion data notation for the Java Platform Standard Edition 8 and above. Amazon SQS Java Messaging Library This Amazon SQS Java Messaging Library holds the Java Message Service compatible classes, that are used for communicating with Amazon Simple Queue Service. ANTLR ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files.
    [Show full text]
  • Spelling Correction: from Two-Level Morphology to Open Source
    Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.
    [Show full text]
  • Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization
    10 Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization Molouk Sadat Hosseini Beheshti PhD in General Linguistics; Assistant Professor; Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran; Corresponding Auther [email protected] Hadi Abdi Ghavidel Msc. Graduate in Computational Linguistics; Sharif University of Technology; Tehran, Iran [email protected] Received: 26, Jan. 2016 Accepted: 2, Aug. 2016 Abstract: One of the most primary phases of automatic text processing is spelling error detection and grapheme normalization. Storing textual Iranian Research Institute for Science and Technology documents faces several problems without passing this phase, which ISSN 2251-8223 causes a disturbance in retrieving the documents automatically. Therefore, eISSN 2251-8231 specialists in the fields of natural language processing and computational Indexed by SCOPUS, ISC, & LISTA linguistics usually make an attempt to sample various data through Vol. 32 | No. 4 | pp. 1143-1170 presenting ideal methods and algorithms in order to reach the normalized Summer 2017 data. Several researches have been conducted on English and some other languages, which have been followed by a certain amount of researches on Farsi too. Sometimes, these several researches have remained to be a pure study and sometimes they have been released as a product. This paper carries out the categorization of the different methods and essential datasets in these researches and depicts each
    [Show full text]
  • Ocrspell: an Interactive Spelling Correction System for OCR Errors in Text
    UNLV Retrospective Theses & Dissertations 1-1-1996 OCRspell: An interactive spelling correction system for OCR errors in text Eric Stofsky University of Nevada, Las Vegas Follow this and additional works at: https://digitalscholarship.unlv.edu/rtds Repository Citation Stofsky, Eric, "OCRspell: An interactive spelling correction system for OCR errors in text" (1996). UNLV Retrospective Theses & Dissertations. 3222. http://dx.doi.org/10.25669/t1t6-9psi This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in UNLV Retrospective Theses & Dissertations by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact [email protected]. INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter &ce, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely afreet reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted.
    [Show full text]
  • IRIX® 6.5.19 Update Guide
    IRIX® 6.5.19 Update Guide 1600 Amphitheatre Pkwy. Mountain View, CA 94043-1351 Telephone (650) 960-1980 FAX (650) 961-0595 February 2003 Dear Valued Customer, SGI® is pleased to present the new IRIX® 6.5.19 maintenance and feature release. Starting with IRIX® 6.5, SGI created a new software upgrade strategy, which delivers both the maintenance (6.5.19m) and feature (6.5.19f) streams to our support contract customers. This upgrade is part of a family of releases that periodically enhances IRIX 6.5. There are several benefits to this release strategy: it provides periodic fixes to IRIX, it assists in managing upgrades, and it supports all platforms. Additional information on this strategy and how it affects you is included in the updated Installation Instructions manual contained in this package. If you need assistance, please visit the Supportfolio™ online website at http://support.sgi.com/ or contact your local support provider. In conjunction with the release of IRIX® 6.5.15, SGI added to the existing life cycle management categories the Limited Support Mode that customizes services we deliver to our users. This new support mode is targeted for open source products. We now offer eight modes of service 2 for software supported by SGI: Active, Maintenance, Limited, Legacy, Courtesy, Divested, Retired, and Expired. Active Mode is our highest level of service. It applies to products that are being actively developed and maintained and are orderable through general distribution. Software fixes for all levels of problems can be expected. Maintenance Mode software is maintained and is still an important part of our product mix.
    [Show full text]
  • A Survey on Creation of Hindi-Spell Checker to Improve the Processing of OCR
    International Journal of Research in Engineering, Science and Management 517 Volume-2, Issue-3, March-2019 www.ijresm.com | ISSN (Online): 2581-5792 A Survey on Creation of Hindi-Spell Checker to Improve the Processing of OCR Shabana Pathan1, Nikhil Khuje2, Punam Kolhe3 1Assistant Professor, Department of Information Technology, SVPCET, Nagpur, India 2,3Student, Department of Information Technology, SVPCET, Nagpur, India Abstract: This project investigates the principles of Optical categorized under non-word errors which occur when the Character Recognition used in the Tesseract OCR engine and correct spelling of the word is known but the word is mistyped techniques to improve its efficiency and processing. This mainly by mistake. These errors are mostly related to the wrong key focuses on improving the Tesseract OCR efficiency for Hindi/Devanagari languages. Improving Hindi/Devanagari text press. For example, typing आपमान for अपमान. Real-word extraction will increase Tesseract's performance and in turn will errors are those error words that are acceptable words in the draw developers to contribute towards Hindi /Devanagari OCR. dictionary but not correct according to sentence. For example, The project introduces the efficiency of new Tesseract OCR मेरा घर उस और है(incorrect) for मेरा घर उस ओर है(correct) और version 4.0 beta and heads us towards its capabilities which reflects in its output. Spell checker analyzes the written text in is an acceptable word in the Hindi dictionary but it occurs as an order to identify any misspellings and gives best correct error for ओर word. Possibility of spelling mistakes in Hindi suggestions for those misspellings.
    [Show full text]