Direct Speech Access

Total Page:16

File Type:pdf, Size:1020Kb

Direct Speech Access Emacspeak ±Direct Speech Access T. V. Raman Adobe Systems i E-mail: [email protected] Voice-mail: 1 (415) 962-3945 Abstract Keywords Emacspeak is a full-¯edged speech output inter- Direct Speech Access, Access to UNIX worksta- face to Emacs, and is being used to provide direct tions. speech access to a UNIX workstation. The kind of speech access provided by Emacspeak is qual- itatively different from what conventional screen- 1 Introduction readers provide Ðemacspeak makes applications speakÐ as opposed to speaking the screen. Emacspeak is an Emacs subsystem that allows Emacspeak is the ®rst full-¯edged speech output the user to get feedback using synthesized speech. system that will allow someone who cannot see to Traditionally, screen reading programs have al- work directly on a UNIX system (Until now, the lowed a visually impaired user to get feedback only option available to visually impaired users has using synthesized speech. Such programs have been to use a talking PC as a terminal.) Emacspeak been commercially available for well over a dec- is built on top of Emacs. Once Emacs is started, the ade. Most of them run on PC's under DOS, and user gets complete spoken feedback. there are now a few screen-readers for the Win- I currently use Emacspeak at work on my SUN dows platform. Early screen-reading programs re- SparcStation and have also used it on a DECAL- lied on the character representation of the contents PHA workstation under Digital UNIX while at Di- of the screen to produce the spoken feedback. A gital's CRL1 . I also use Emacspeak as the only signi®cant amount of research and development speech output system on my laptop running Linux. has been carried out to provide access to Graph- Emacspeak is available on the Internet: ical User Interfaces (GUI). These provide spoken access by ®rst constructing an off-screen model FTP ftp://crl.dec.com/pub/digital/emacspeak/ (OSM) Ða data structure that encapsulates the in- formation displayed visuallyÐand then using this WWW http://www.research.digital.com/CRL OSM to provide spoken feedback. The best and perhaps the most complete speech access system The work described in this paper was performed at Di- to the GUI is Screenreader/2 (ScreenReader For gital Equipment Corporation's Cambridge Research Lab. OS/2) developed by Dr. Jim Thatcher at the IBM 1 Emacspeak was developed as a spare-time project while Watson Research Center [Tha94]. This package I worked at Digital's Cambridge Research Lab (CRL). provides robust spoken access to applications un- der the OS2 Presentation Manager and Windows 3.1. Commercial packages for Microsoft Windows 3.1 provide varying levels of spoken access to the GUI. The Mercator project [ME92, WKES94, MW94, Myn94] has focused on providing spoken Ða crufty if workable solution on the desktop. access to the X-Windows system. However, this was clearly impractical in the mo- As is clear from the above, screen-readers for the bile environment ÐI would have had to carry two UNIX environment have been conspicuous in their laptops! 3 absence2 . The available options at the time were: Emacspeak is an emacs subsystem that provides complete speech access under UNIX. Emacspeak Run DOS on the laptop and be limited to a will always have the shortcoming that it will only highly restricted environment. work under Emacs. This said, there is very little Run Windows 3.1 on the laptop with a com- that cannot be done inside Emacs, so it's not a real mercial screen-access package for Windows shortcoming. Ðmost of which were still ¯aky to say the Emacspeak does have a signi®cant advantage: least. since it runs inside Emacs, a structure-sensitive, fully customizable environment, Emacspeak of- Run Linux on the laptop and get the advant- ten has more context-speci®c information about ages of a 32-bit OS with full multitasking cap- what it is speaking than its commercial counter- abilities. parts. In this sense, Emacspeak is not a ªscreen- readerº, it is a subsystem that produces speech out- Run OS2 with IBM ScreenReader. put. A traditional screen-reader speaks the con- At the time I approached the problem, Linux was tent of the screen, leaving it to the user to inter- the most attractive solution Ðexcept that there was pret the visually laid-out information. Emacspeak, no speech access system available for Linux (or on the other hand, treats speech as a ®rst-class out- any other UNIX). put modality; it speaks the information in a man- ner that is easy to comprehend when listening. The traditional screen-reading paradigm suffers from a 3 Development Of Emacspeak severe shortcoming Ðthe user has to interpret the semantics encapsulated in the visual layout in or- Initially, I decided to write a speech-enabling ex- der to arrive at the meaning of the information dis- tension to Emacs as opposed to writing a conven- played by an application. In contrast, Emacspeak tional screen-reader at the TTY level in order to Ða direct speech access systemÐ speech-enables get a working system in the most expedient man- speci®c user applications to to speak the informa- ner editting editing possible. The ®rst working pro- tion that is being conveyed to the user. By doing totype of Emacspeak took under a week to design this, Emacspeak has much more contextual know- and implement. Once this prototype was working, ledge about the information being spoken than the advantages of the speech-enabling approach does a conventional screen-reading program. outlined earlier became apparent. I then decided to turn Emacspeak into more than a prototype Ð Emacspeak turned into my full-time speech access 2 Motivation interface. Using Emacs' power Emacspeak was motivated by my desire to run and ¯exibility, it has proven straightforward to add a multitasking OS on my laptop. Before Emac- modules that customize how different applications speak, the only way I could access a UNIX work- provide spoken feedback, e.g., depending on the station was via a PC emulating a talking terminal major/minor mode of a given buffer. Note that 2 This means that most visually impaired computer users 3 October 1994 face the additional handicap of being DOS-impaired ± a far more serious problem! the basic speech functionality provided by Emac- AUCTEX Editing (LA )TEX documents. speak is suf®cient to use most Emacs packages ef- fectively; adding package-speci®c customizations DIRED Emacs' ®le manager. makes the interaction much smoother. This is be- OOBR A powerful code viewer for browsing ob- cause package-speci®c extensions can take advant- ject oriented code. The interface provided is age of the application context to provide appropri- similar to the one provided in the Small talk ate feedback. world. Emacs-19's font-locking facilities are extended to the speech output as well; for instance, a user GDB A powerful interface for debugging C and ++ can customize the system to have different types C code. of text spoken using different kinds of voices ++ (speech fonts). Currently, this feature is used to CC-Mode CandC editing extensions. provide ªvoice lockingº for many popular editing INFO Browsing online documentation. modes like c-mode, tcl-mode, perl-mode, emacs- lisp-mode etc. MAN Browsing online UNIX MAN pages. Emacspeak currently comes with speech exten- sions for several popular Emacs subsystems and Folding Using Emacs as a structured folding ed- editing modes. I would like to thank their respect- itor. ive authors for their wonderful work which makes TEMPO A package that allows for editing tem- Emacs more than a text editor ÐEmacs is a fully plates. customizable user environment. Here is a partial list of the various editing modes ISPELL A powerful interactive spell checker. and applications supported by Emacspeak: CALC A powerful symbolic algebra calculator. W3 A powerful Emacs-based WWW browser. ETERM Launching a terminal inside HTML-HELPER Publishing on the WWW. Emacs. This extension enables you to login to another system and get spoken feedback, VM A mail reader. as well as running programs that can only be GNUS A USENET news reader. run from the shell. With this extension, Emac- speak can do everything that a screen-reader BBDB The Insidious Big Brother Data Base. This written at the TTY level would achieve. For is a powerful rolodex system that can be used instance, you can run a VI 4 inside the terminal to maintain and automatically update a rolo- emulator and get complete spoken feedback dex. from Emacspeak. CALENDAR A tool for maintaining appoint- BUFFER-MENU Navigating through the list of ments etc. currently open buffers. HYPERBOLE A powerful hypertext and hyper- Comint Interacting with command interpreters button system that allows the user to organize running in an inferior process. This allows the and ®nd information. user to run inferior UNIX shells, Lisp or TCL processes etc. ROLO Yet another rolodex system that comes with Hyperbole. EDIFF A powerful interface that allows the user to compare ®les, apply patches etc. KOUTL An outlining editor for maintaining structured documents. 4 VI is the default editor on most UNIX systems TCL Supports editing and interactive debugging these applications. Clearly, the screen-reading ap- of TCL scripts. proach has the advantage that providing spoken ac- cess does not require direct cooperation from the PERL Supports editing of PERL scripts. underlying application Ðbut this is also its primary shortcoming. 4 Advantages The speech-enabling approach forces speci®c applications to treat speech as a ®rst-class I/O me- Emacspeak is a speech output system, and as such dium. This means that the speech output modules describing the output from Emacspeak in print is do not have to wait until the information is ®nally clearly impossible.
Recommended publications
  • GNU Emacs GNU Emacs
    GNU Emacs GNU Emacs Reference Card Reference Card Computing and Information Technology Computing and Information Technology Replacing Text Replacing Text Replace foo with bar M-x replace-string Replace foo with bar M-x replace-string foo bar foo bar Query replacement M-x query replace Query replacement M-x query replace then replace <Space> then replace <Space> then skip to next <Backspace> then skip to next <Backspace> then replace and exit <Period> then replace and exit <Period> then replace all ! then replace all ! then go back ^ then go back ^ then exit query <Esc> then exit query <Esc> Setting a Mark Setting a Mark Set mark here C-<Space> or C-@ Set mark here C-<Space> or C-@ Exchange point and mark C-x C-x Exchange point and mark C-x C-x Mark paragraph M-h Mark paragraph M-h Mark entire file C-x h Mark entire file C-x h Sorting Text Sorting Text Normal sort M-x sort-lines Normal sort M-x sort-lines Reverse sort <Esc>-1 M-x sort lines Reverse sort <Esc>-1 M-x sort lines Inserting Text Inserting Text Scan file fn M-x view-file fn Scan file fn M-x view-file fn Write buffer to file fn M-x write-file fn Write buffer to file fn M-x write-file fn Insert file fn into buffer M-x insert-file fn Insert file fn into buffer M-x insert-file fn Write region to file fn M-x write-region fn Write region to file fn M-x write-region fn Write region to end of file fn M-x append-to-file fn Write region to end of file fn M-x append-to-file fn Write region to beginning of Write region to beginning to specified buffer M-x prepend-to-buffer specified buffer
    [Show full text]
  • Emacspeak — the Complete Audio Desktop User Manual
    Emacspeak | The Complete Audio Desktop User Manual T. V. Raman Last Updated: 19 November 2016 Copyright c 1994{2016 T. V. Raman. All Rights Reserved. Permission is granted to make and distribute verbatim copies of this manual without charge provided the copyright notice and this permission notice are preserved on all copies. Short Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright ::::::::::::::::::::::::::::::::::::::::::: 2 2 Announcing Emacspeak Manual 2nd Edition As An Open Source Project ::::::::::::::::::::::::::::::::::::::::::::: 3 3 Background :::::::::::::::::::::::::::::::::::::::::: 4 4 Introduction ::::::::::::::::::::::::::::::::::::::::: 6 5 Installation Instructions :::::::::::::::::::::::::::::::: 7 6 Basic Usage. ::::::::::::::::::::::::::::::::::::::::: 9 7 The Emacspeak Audio Desktop. :::::::::::::::::::::::: 19 8 Voice Lock :::::::::::::::::::::::::::::::::::::::::: 22 9 Using Online Help With Emacspeak. :::::::::::::::::::: 24 10 Emacs Packages. ::::::::::::::::::::::::::::::::::::: 26 11 Running Terminal Based Applications. ::::::::::::::::::: 45 12 Emacspeak Commands And Options::::::::::::::::::::: 49 13 Emacspeak Keyboard Commands. :::::::::::::::::::::: 361 14 TTS Servers ::::::::::::::::::::::::::::::::::::::: 362 15 Acknowledgments.::::::::::::::::::::::::::::::::::: 366 16 Concept Index :::::::::::::::::::::::::::::::::::::: 367 17 Key Index ::::::::::::::::::::::::::::::::::::::::: 368 Table of Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright :::::::::::::::::::::::::::::::::::::::
    [Show full text]
  • Emacspeak User's Guide
    Emacspeak User's Guide Jennifer Jobst Revision History Revision 1.3 July 24,2002 Revised by: SDS Updated the maintainer of this document to Sharon Snider, corrected links, and converted to HTML Revision 1.2 December 3, 2001 Revised by: JEJ Changed license to GFDL Revision 1.1 November 12, 2001 Revised by: JEJ Revision 1.0 DRAFT October 19, 2001 Revised by: JEJ This document helps Emacspeak users become familiar with Emacs as an audio desktop and provides tutorials on many common tasks and the Emacs applications available to perform those tasks. Emacspeak User's Guide Table of Contents 1. Legal Notice.....................................................................................................................................................1 2. Introduction.....................................................................................................................................................2 2.1. What is Emacspeak?.........................................................................................................................2 2.2. About this tutorial.............................................................................................................................2 3. Before you begin..............................................................................................................................................3 3.1. Getting started with Emacs and Emacspeak.....................................................................................3 3.2. Emacs Command Conventions.........................................................................................................3
    [Show full text]
  • Spelling Correction: from Two-Level Morphology to Open Source
    Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.
    [Show full text]
  • Hunspell – the Free Spelling Checker
    Hunspell – The free spelling checker About Hunspell Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. Hunspell interfaces: Ispell-like terminal interface using Curses library, Ispell pipe interface, OpenOffice.org UNO module. Main features of Hunspell spell checker and morphological analyzer: - Unicode support (affix rules work only with the first 65535 Unicode characters) - Morphological analysis (in custom item and arrangement style) and stemming - Max. 65535 affix classes and twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.) - Support complex compoundings (for example, Hungarian and German) - Support language specific features (for example, special casing of Azeri and Turkish dotted i, or German sharp s) - Handle conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms. - Free software (LGPL, GPL, MPL tri-license) Usage The src/tools dictionary contains ten executables after compiling (or some of them are in the src/win_api): affixcompress: dictionary generation from large (millions of words) vocabularies analyze: example of spell checking, stemming and morphological analysis chmorph: example of automatic morphological generation and conversion example: example of spell checking and suggestion hunspell: main program for spell checking and others (see manual) hunzip: decompressor of hzip format hzip: compressor of
    [Show full text]
  • Texing in Emacs Them
    30 TUGboat, Volume 39 (2018), No. 1 TEXing in Emacs them. I used a simple criterion: Emacs had a nice tutorial, and Vim apparently did not (at that time). Marcin Borkowski I wince at the very thought I might have chosen Abstract wrong! And so it went. I started with reading the In this paper I describe how I use GNU Emacs to manual [8]. As a student, I had a lot of free time work with LAT X. It is not a comprehensive survey E on my hands, so I basically read most of it. (I still of what can be done, but rather a subjective story recommend that to people who want to use Emacs about my personal usage. seriously.) I noticed that Emacs had a nice TEX In 2017, I gave a presentation [1] during the joint mode built-in, but also remembered from one of GUST/TUG conference at Bachotek. I talked about the BachoTEXs that other people had put together my experiences typesetting a journal (Wiadomo´sci something called AUCTEX, which was a TEX-mode Matematyczne, a journal of the Polish Mathematical on steroids. Society), and how I utilized LAT X and GNU Emacs E In the previous paragraph, I mentioned modes. in my workflow. After submitting my paper to the In order to understand what an Emacs mode is, let proceedings issue of TUGboat, Karl Berry asked me me explain what this whole Emacs thing is about. whether I'd like to prepare a paper about using Emacs with LATEX. 1 Basics of Emacs Well, I jumped at the proposal.
    [Show full text]
  • O Software Livre Como Alternativa Para a Inclusão Digital Do Deficiente Visual
    Universidade Estadual de Campinas Faculdade de Engenharia Elétrica e de Computação O Software Livre como Alternativa para a Inclusão Digital do Deficiente Visual Autor: Samer Eberlin Orientador: Prof. Dr. Luiz César Martini Dissertação de Mestrado apresentada à Facul- dade de Engenharia Elétrica e de Computação da Universidade Estadual de Campinas como parte dos requisitos para obtenção do título de Mestre em Engenharia Elétrica. Banca Examinadora José Raimundo de Oliveira, Dr. DCA/FEEC/Unicamp Luiz César Martini, Dr. DECOM/FEEC/Unicamp Rita de Cassia Ietto Montilha, Dra. CEPRE/FCM/Unicamp Yuzo Iano, Dr. DECOM/FEEC/Unicamp Campinas, SP – Brasil Abril/2006 FICHA CATALOGRÁFICA ELABORADA PELA BIBLIOTECA DA ÁREA DE ENGENHARIA E ARQUITETURA - BAE - UNICAMP Eberlin, Samer Eb37s O software livre como alternativa para a inclusão digital do deficiente visual / Samer Eberlin. −−Campinas, SP: [s.n.], 2006. Orientador: Luiz César Martini. Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação. 1. Acessibilidade. 2. Tecnologia educacional. 3. Inclusão digital. 4. Software livre. 5. Software de comunicação. 6. Síntese da voz. I. Martini, Luiz César. II. Universidade Estadual de Campinas. Faculdade de Engenharia Elétrica e de Computação. III. Título. Titulo em Inglês: The free software as an alternative for digital cohesion of visually impaired people Palavras-chave em Inglês: Accessibility, Assistive technology, Digital cohesion, Free software, Screen reader, Voice synthesizer Área de concentração: Telecomunicações e Telemática Titulação: Mestre em Engenharia Elétrica Banca examinadora: José Raimundo de Oliveira, Rita de Cássia Ietto Montilha e Yuzo Iano Data da defesa: 19/04/2006 ii Resumo A acelerada difusão do software “livre”, tanto no Brasil como no exterior, vem se mos- trando cada vez mais evidente nos mais diversos âmbitos (governo, empresas, escolas, etc.).
    [Show full text]
  • Finite State Recognizer and String Similarity Based Spelling
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by BRAC University Institutional Repository FINITE STATE RECOGNIZER AND STRING SIMILARITY BASED SPELLING CHECKER FOR BANGLA Submitted to A Thesis The Department of Computer Science and Engineering of BRAC University by Munshi Asadullah In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering Fall 2007 BRAC University, Dhaka, Bangladesh 1 DECLARATION I hereby declare that this thesis is based on the results found by me. Materials of work found by other researcher are mentioned by reference. This thesis, neither in whole nor in part, has been previously submitted for any degree. Signature of Supervisor Signature of Author 2 ACKNOWLEDGEMENTS Special thanks to my supervisor Mumit Khan without whom this work would have been very difficult. Thanks to Zahurul Islam for providing all the support that was required for this work. Also special thanks to the members of CRBLP at BRAC University, who has managed to take out some time from their busy schedule to support, help and give feedback on the implementation of this work. 3 Abstract A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the suggestions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word.
    [Show full text]
  • Installing Emacspeak HOWTO
    Installing Emacspeak HOWTO Jennifer Jobst James Van Zandt <[email protected]> Revision History Revision 1.1 July 23, 2002 SDS Updated the maintainer of this document to Sharon Snider, corrected links, and converted to XML. Revision 1.0 December 4, 2001 JEJ First release Revision 1.0 DRAFT November 9, 2001 JEJ DRAFT Revision Emacspeak HOWTO 1996-2001 JVZ Previously, this document was known as the Emacspeak HOW- TO, and was written and maintained by Mr. James Van Zandt. Abstract This document contains the installation instructions for the Emacspeak audio desktop application for Linux. Please send any comments, or contributions via e-mail to Sharon Snider [mailto:[email protected]]. This docu- ment will be updated regularly with new contributions and suggestions. Table of Contents Legal Notice ...................................................................................................................... 2 Introduction ........................................................................................................................ 2 Documentation Conventions .................................................................................................. 2 Requirements ...................................................................................................................... 2 Linux Distributions ...................................................................................................... 2 Emacs ......................................................................................................................
    [Show full text]
  • SDL Tridion Docs Release Notes
    SDL Tridion Docs Release Notes SDL Tridion Docs 14 July 2019 ii SDL Tridion Docs Release Notes 1 Welcome to Tridion Docs Release Notes 1 Welcome to Tridion Docs Release Notes This document contains the complete Release Notes for SDL Tridion Docs 14. Customer support To contact Technical Support, connect to the Customer Support Web Portal at https://gateway.sdl.com and log a case for your SDL product. You need an account to log a case. If you do not have an account, contact your company's SDL Support Account Administrator. Acknowledgments SDL products include open source or similar third-party software. 7zip Is a file archiver with a high compression ratio. 7-zip is delivered under the GNU LGPL License. 7zip SFX Modified Module The SFX Modified Module is a plugin for creating self-extracting archives. It is compatible with three compression methods (LZMA, Deflate, PPMd) and provides an extended list of options. Reference website http://7zsfx.info/. Akka Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event- driven applications on the JVM. Amazon Ion Java Amazon Ion Java is a Java streaming parser/serializer for Ion. It is the reference implementation of the Ion data notation for the Java Platform Standard Edition 8 and above. Amazon SQS Java Messaging Library This Amazon SQS Java Messaging Library holds the Java Message Service compatible classes, that are used for communicating with Amazon Simple Queue Service. ANTLR ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files.
    [Show full text]
  • Spelling Correction: from Two-Level Morphology to Open Source
    Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.
    [Show full text]
  • Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization
    10 Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization Molouk Sadat Hosseini Beheshti PhD in General Linguistics; Assistant Professor; Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran; Corresponding Auther [email protected] Hadi Abdi Ghavidel Msc. Graduate in Computational Linguistics; Sharif University of Technology; Tehran, Iran [email protected] Received: 26, Jan. 2016 Accepted: 2, Aug. 2016 Abstract: One of the most primary phases of automatic text processing is spelling error detection and grapheme normalization. Storing textual Iranian Research Institute for Science and Technology documents faces several problems without passing this phase, which ISSN 2251-8223 causes a disturbance in retrieving the documents automatically. Therefore, eISSN 2251-8231 specialists in the fields of natural language processing and computational Indexed by SCOPUS, ISC, & LISTA linguistics usually make an attempt to sample various data through Vol. 32 | No. 4 | pp. 1143-1170 presenting ideal methods and algorithms in order to reach the normalized Summer 2017 data. Several researches have been conducted on English and some other languages, which have been followed by a certain amount of researches on Farsi too. Sometimes, these several researches have remained to be a pure study and sometimes they have been released as a product. This paper carries out the categorization of the different methods and essential datasets in these researches and depicts each
    [Show full text]