Weighted Finite-State Methods for Spell-Checking and Correction Tommi a Pirinen
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
GNU Emacs GNU Emacs
GNU Emacs GNU Emacs Reference Card Reference Card Computing and Information Technology Computing and Information Technology Replacing Text Replacing Text Replace foo with bar M-x replace-string Replace foo with bar M-x replace-string foo bar foo bar Query replacement M-x query replace Query replacement M-x query replace then replace <Space> then replace <Space> then skip to next <Backspace> then skip to next <Backspace> then replace and exit <Period> then replace and exit <Period> then replace all ! then replace all ! then go back ^ then go back ^ then exit query <Esc> then exit query <Esc> Setting a Mark Setting a Mark Set mark here C-<Space> or C-@ Set mark here C-<Space> or C-@ Exchange point and mark C-x C-x Exchange point and mark C-x C-x Mark paragraph M-h Mark paragraph M-h Mark entire file C-x h Mark entire file C-x h Sorting Text Sorting Text Normal sort M-x sort-lines Normal sort M-x sort-lines Reverse sort <Esc>-1 M-x sort lines Reverse sort <Esc>-1 M-x sort lines Inserting Text Inserting Text Scan file fn M-x view-file fn Scan file fn M-x view-file fn Write buffer to file fn M-x write-file fn Write buffer to file fn M-x write-file fn Insert file fn into buffer M-x insert-file fn Insert file fn into buffer M-x insert-file fn Write region to file fn M-x write-region fn Write region to file fn M-x write-region fn Write region to end of file fn M-x append-to-file fn Write region to end of file fn M-x append-to-file fn Write region to beginning of Write region to beginning to specified buffer M-x prepend-to-buffer specified buffer -
Powering Inclusion: Artificial Intelligence and Assistive Technology
POLICY BRIEF MARCH 2021 Powering Inclusion: Artificial Intelligence and Assistive Technology Assistive technology (AT) is broadly defined as any equipment, product or service that can make society more inclusive. Eye- glasses, hearing aids, wheelchairs or even some mobile applica- tions are all examples of AT. This briefing explores the power of Key facts using Artificial Intelligence (AI) to enhance the design of current and future AT, as well as to increase its reach. The UN Convention on the Context Rights of Persons with Dis- Globally, over 1 billion people are currently in need of AT. Lack of access to basic AT excludes individuals, reduces their ability to live independent abilities (UNCRPD) estab- lives [3], and is a barrier to the realisation of the Sustainable Development lished AT provision as a Goals (SDGs) [2]. human right [1]. Advances in AI offer the potential to develop and enhance AT and gain new insights into the scale and nature of AT needs (although the risks of potential bias and discrimination in certain AI-based tools must also be Over 1 billion people are acknowledged). In June 2019, the Conference of State Parties to the UN currently in need of AT. Convention on the Rights of Persons with Disabilities (CRPD) recognized that AI has the potential to enhance inclu- By 2050, this number is sion, participation, and independence for people with disabilities [5]. AI- predicted to double [2]. enabled tools – such as predictive text, visual recognition, automatic speech- to-text transcription, speech recognition, and virtual assistants - have experi- enced great advances in the last few years and are already being Only 10% of those who used by people with vision, hearing, mobility and learning impairments. -
Automatic Correction of Real-Word Errors in Spanish Clinical Texts
sensors Article Automatic Correction of Real-Word Errors in Spanish Clinical Texts Daniel Bravo-Candel 1,Jésica López-Hernández 1, José Antonio García-Díaz 1 , Fernando Molina-Molina 2 and Francisco García-Sánchez 1,* 1 Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, Spain; [email protected] (D.B.-C.); [email protected] (J.L.-H.); [email protected] (J.A.G.-D.) 2 VÓCALI Sistemas Inteligentes S.L., 30100 Murcia, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-86888-8107 Abstract: Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing Citation: Bravo-Candel, D.; López-Hernández, J.; García-Díaz, with patient information. -
Emacspeak — the Complete Audio Desktop User Manual
Emacspeak | The Complete Audio Desktop User Manual T. V. Raman Last Updated: 19 November 2016 Copyright c 1994{2016 T. V. Raman. All Rights Reserved. Permission is granted to make and distribute verbatim copies of this manual without charge provided the copyright notice and this permission notice are preserved on all copies. Short Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright ::::::::::::::::::::::::::::::::::::::::::: 2 2 Announcing Emacspeak Manual 2nd Edition As An Open Source Project ::::::::::::::::::::::::::::::::::::::::::::: 3 3 Background :::::::::::::::::::::::::::::::::::::::::: 4 4 Introduction ::::::::::::::::::::::::::::::::::::::::: 6 5 Installation Instructions :::::::::::::::::::::::::::::::: 7 6 Basic Usage. ::::::::::::::::::::::::::::::::::::::::: 9 7 The Emacspeak Audio Desktop. :::::::::::::::::::::::: 19 8 Voice Lock :::::::::::::::::::::::::::::::::::::::::: 22 9 Using Online Help With Emacspeak. :::::::::::::::::::: 24 10 Emacs Packages. ::::::::::::::::::::::::::::::::::::: 26 11 Running Terminal Based Applications. ::::::::::::::::::: 45 12 Emacspeak Commands And Options::::::::::::::::::::: 49 13 Emacspeak Keyboard Commands. :::::::::::::::::::::: 361 14 TTS Servers ::::::::::::::::::::::::::::::::::::::: 362 15 Acknowledgments.::::::::::::::::::::::::::::::::::: 366 16 Concept Index :::::::::::::::::::::::::::::::::::::: 367 17 Key Index ::::::::::::::::::::::::::::::::::::::::: 368 Table of Contents Emacspeak :::::::::::::::::::::::::::::::::::::::::: 1 1 Copyright ::::::::::::::::::::::::::::::::::::::: -
Enabling Non-Speech Experts to Develop Usable Speech-User Interfaces
Enabling Non-Speech Experts to Develop Usable Speech-User Interfaces Anuj Kumar CMU-HCII-14-105 August 2014 Human-Computer Interaction Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 USA Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright © Anuj Kumar 2014. All rights reserved. Committee: Florian Metze Co-Chair, Carnegie Mellon University Matthew Kam Co-Chair, Carnegie Mellon University & American Institutes for Research Dan Siewiorek Carnegie Mellon University Tim Paek Microsoft Research The research described in this dissertation was supported by the National Science Foundation under grants IIS- 1247368 and CNS-1205589, Siebel Scholar Fellowship 2014, Carnegie Mellon’s Human-Computer Interaction Institute, and Nokia. Any findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the above organizations or corporations. KEYWORDS Human-Computer Interaction, Machine Learning, Non-Experts, Rapid Prototyping, Speech-User Interfaces, Speech Recognition, Toolkit Development. II Dedicated to Mom and Dad for their eternal support, motivation, and love III IV ABSTRACT Speech user interfaces (SUIs) such as Apple’s Siri, Microsoft’s Cortana, and Google Now are becoming increasingly popular. However, despite years of research, such interfaces really only work for specific users, such as adult native speakers of English, when in fact, many other users such as non-native speakers or children stand to benefit at least as much, if not more. The problem in developing SUIs for such users or for other acoustic or language situations is the expertise, time, and cost in building an initial system that works reasonably well, and can be deployed to collect more data or also to establish a group of loyal users. -
Welsh Language Technology Action Plan Progress Report 2020 Welsh Language Technology Action Plan: Progress Report 2020
Welsh language technology action plan Progress report 2020 Welsh language technology action plan: Progress report 2020 Audience All those interested in ensuring that the Welsh language thrives digitally. Overview This report reviews progress with work packages of the Welsh Government’s Welsh language technology action plan between its October 2018 publication and the end of 2020. The Welsh language technology action plan derives from the Welsh Government’s strategy Cymraeg 2050: A million Welsh speakers (2017). Its aim is to plan technological developments to ensure that the Welsh language can be used in a wide variety of contexts, be that by using voice, keyboard or other means of human-computer interaction. Action required For information. Further information Enquiries about this document should be directed to: Welsh Language Division Welsh Government Cathays Park Cardiff CF10 3NQ e-mail: [email protected] @cymraeg Facebook/Cymraeg Additional copies This document can be accessed from gov.wales Related documents Prosperity for All: the national strategy (2017); Education in Wales: Our national mission, Action plan 2017–21 (2017); Cymraeg 2050: A million Welsh speakers (2017); Cymraeg 2050: A million Welsh speakers, Work programme 2017–21 (2017); Welsh language technology action plan (2018); Welsh-language Technology and Digital Media Action Plan (2013); Technology, Websites and Software: Welsh Language Considerations (Welsh Language Commissioner, 2016) Mae’r ddogfen yma hefyd ar gael yn Gymraeg. This document is also available in Welsh. -
An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation
An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation Bijoyan Das Sarit Chakraborty Student Member, IEEE Member, IEEE, Kolkata, India Abstract – With the rapid growth of Text sentiment tickets. Although this is a very trivial problem, text analysis, the demand for automatic classification of classification can be used in many different areas as electronic documents has increased by leaps and bound. follows: The paradigm of text classification or text mining has been the subject of many research works in recent time. Most of the consumer based companies use In this paper we propose a technique for text sentiment sentiment classification to automatically classification using term frequency- inverse document generate reports on customer feedback. It is an frequency (TF-IDF) along with Next Word Negation integral part of CRM. [2] (NWN). We have also compared the performances of In medical science, text classification is used to binary bag of words model, TF-IDF model and TF-IDF analyze and categorize reports, clinical trials, with ‘next word negation’ (TF-IDF-NWN) model for hospital records etc. [2] text classification. Our proposed model is then applied Text classification models are also used in law on three different text mining algorithms and we found on the various trial data, verdicts, court the Linear Support vector machine (LSVM) is the most transcripts etc. [2] appropriate to work with our proposed model. The Text classification models can also be used for achieved results show significant increase in accuracy Spam email classification. [7] compared to earlier methods. In this paper we have demonstrated a study on the three different techniques to build models for text I. -
Extraction of Predictive Document Spans with Neural Attention
SpanPredict: Extraction of Predictive Document Spans with Neural Attention Vivek Subramanian, Matthew Engelhard, Samuel Berchuck, Liqun Chen, Ricardo Henao, and Lawrence Carin Duke University {vivek.subramanian, matthew.engelhard, samuel.berchuck, liqun.chen, ricardo.henao, lcarin}@duke.edu Abstract clinical notes, for example, attributing predictions to specific note content assures clinicians that the In many natural language processing applica- tions, identifying predictive text can be as im- model is not relying on data artifacts that are not portant as the predictions themselves. When clinically meaningful or generalizable. Moreover, predicting medical diagnoses, for example, this process may illuminate previously unknown identifying predictive content in clinical notes risk factors that are described in clinical notes but not only enhances interpretability, but also al- not captured in a structured manner. Our work is lows unknown, descriptive (i.e., text-based) motivated by the problem of autism spectrum dis- risk factors to be identified. We here formal- order (ASD) diagnosis, in which many early symp- ize this problem as predictive extraction and toms are behavioral rather than physiologic, and address it using a simple mechanism based on linear attention. Our method preserves dif- are documented in clinical notes using multiple- ferentiability, allowing scalable inference via word descriptions, not individual terms. Morever, stochastic gradient descent. Further, the model extended and nuanced descriptions are important decomposes predictions into a sum of contri- in many common document classification tasks, for butions of distinct text spans. Importantly, we instance, the scoring of movie or food reviews. require only document labels, not ground-truth Identifying important spans of text is a recurring spans. -
Learning to Read by Spelling Towards Unsupervised Text Recognition
Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group Visual Geometry Group Visual Geometry Group University of Oxford University of Oxford University of Oxford [email protected] [email protected] [email protected] tttttttttttttttttttttttttttttttttttttttttttrssssss ttttttt ny nytt nr nrttttttt ny ntt nrttttttt zzzz iterations iterations bcorote to whol th ticunthss tidio tiostolonzzzz trougfht to ferr oy disectins it has dicomered Training brought to view by dissection it was discovered Figure 1: Text recognition from unaligned data. We present a method for recognising text in images without using any labelled data. This is achieved by learning to align the statistics of the predicted text strings, against the statistics of valid text strings sampled from a corpus. The figure above visualises the transcriptions as various characters are learnt through the training iterations. The model firstlearns the concept of {space}, and hence, learns to segment the string into words; followed by common words like {to, it}, and only later learns to correctly map the less frequent characters like {v, w}. The last transcription also corresponds to the ground-truth (punctuations are not modelled). The colour bar on the right indicates the accuracy (darker means higher accuracy). ABSTRACT 1 INTRODUCTION This work presents a method for visual text recognition without read (ri:d) verb • Look at and comprehend the meaning of using any paired supervisory data. We formulate the text recogni- (written or printed matter) by interpreting the characters or tion task as one of aligning the conditional distribution of strings symbols of which it is composed. -
Detecting Personal Life Events from Social Media
Open Research Online The Open University’s repository of research publications and other research outputs Detecting Personal Life Events from Social Media Thesis How to cite: Dickinson, Thomas Kier (2019). Detecting Personal Life Events from Social Media. PhD thesis The Open University. For guidance on citations see FAQs. c 2018 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.00010aa9 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Detecting Personal Life Events from Social Media a thesis presented by Thomas K. Dickinson to The Department of Science, Technology, Engineering and Mathematics in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the subject of Computer Science The Open University Milton Keynes, England May 2019 Thesis advisor: Professor Harith Alani & Dr Paul Mulholland Thomas K. Dickinson Detecting Personal Life Events from Social Media Abstract Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves. But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online. -
Emacspeak User's Guide
Emacspeak User's Guide Jennifer Jobst Revision History Revision 1.3 July 24,2002 Revised by: SDS Updated the maintainer of this document to Sharon Snider, corrected links, and converted to HTML Revision 1.2 December 3, 2001 Revised by: JEJ Changed license to GFDL Revision 1.1 November 12, 2001 Revised by: JEJ Revision 1.0 DRAFT October 19, 2001 Revised by: JEJ This document helps Emacspeak users become familiar with Emacs as an audio desktop and provides tutorials on many common tasks and the Emacs applications available to perform those tasks. Emacspeak User's Guide Table of Contents 1. Legal Notice.....................................................................................................................................................1 2. Introduction.....................................................................................................................................................2 2.1. What is Emacspeak?.........................................................................................................................2 2.2. About this tutorial.............................................................................................................................2 3. Before you begin..............................................................................................................................................3 3.1. Getting started with Emacs and Emacspeak.....................................................................................3 3.2. Emacs Command Conventions.........................................................................................................3 -
Spelling Correction: from Two-Level Morphology to Open Source
Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.