Language Model Techniques in Machine Translation Diplomarbeit
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Automatic Correction of Real-Word Errors in Spanish Clinical Texts
sensors Article Automatic Correction of Real-Word Errors in Spanish Clinical Texts Daniel Bravo-Candel 1,Jésica López-Hernández 1, José Antonio García-Díaz 1 , Fernando Molina-Molina 2 and Francisco García-Sánchez 1,* 1 Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, Spain; [email protected] (D.B.-C.); [email protected] (J.L.-H.); [email protected] (J.A.G.-D.) 2 VÓCALI Sistemas Inteligentes S.L., 30100 Murcia, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-86888-8107 Abstract: Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing Citation: Bravo-Candel, D.; López-Hernández, J.; García-Díaz, with patient information. -
Improving Neural Language Models with A
Published as a conference paper at ICLR 2017 IMPROVING NEURAL LANGUAGE MODELS WITH A CONTINUOUS CACHE Edouard Grave, Armand Joulin, Nicolas Usunier Facebook AI Research {egrave,ajoulin,usunier}@fb.com ABSTRACT We propose an extension to neural network language models to adapt their pre- diction to the recent history. Our model is a simplified version of memory aug- mented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented net- works. 1 INTRODUCTION Language models, which are probability distributions over sequences of words, have many appli- cations such as machine translation (Brown et al., 1993), speech recognition (Bahl et al., 1983) or dialogue agents (Stolcke et al., 2000). While traditional neural networks language models have ob- tained state-of-the-art performance in this domain (Jozefowicz et al., 2016; Mikolov et al., 2010), they lack the capacity to adapt to their recent history, limiting their application to dynamic environ- ments (Dodge et al., 2015). A recent approach to solve this problem is to augment these networks with an external memory (Graves et al., 2014; Grefenstette et al., 2015; Joulin & Mikolov, 2015; Sukhbaatar et al., 2015). These models can potentially use their external memory to store new information and adapt to a changing environment. -
Formatting Instructions for NIPS -17
System Combination for Machine Translation Using N-Gram Posterior Probabilities Yong Zhao Xiaodong He Georgia Institute of Technology Microsoft Research Atlanta, Georgia 30332 1 Microsoft Way, Redmond, WA 98052 [email protected] [email protected] Abstract This paper proposes using n-gram posterior probabilities, which are estimated over translation hypotheses from multiple machine translation (MT) systems, to improve the performance of the system combination. Two ways using n-gram posteriors in confusion network decoding are presented. The first way is based on n-gram posterior language model per source sentence, and the second, called n-gram segment voting, is to boost word posterior probabilities with n-gram occurrence frequencies. The two n-gram posterior methods are incorporated in the confusion network as individual features of a log-linear combination model. Experiments on the Chinese-to-English MT task show that both methods yield significant improvements on the translation performance, and an combination of these two features produces the best translation performance. 1 Introduction In the past several years, the system combination approach, which aims at finding a consensus from a set of alternative hypotheses, has been shown with substantial improvements in various machine translation (MT) tasks. An example of the combination technique was ROVER [1], which was proposed by Fiscus to combine multiple speech recognition outputs. The success of ROVER and other voting techniques give rise to a widely used combination scheme referred as confusion network decoding [2][3][4][5]. Given hypothesis outputs of multiple systems, a confusion network is built by aligning all these hypotheses. The resulting network comprises a sequence of correspondence sets, each of which contains the alternative words that are aligned with each other. -
Unified Language Model Pre-Training for Natural
Unified Language Model Pre-training for Natural Language Understanding and Generation Li Dong∗ Nan Yang∗ Wenhui Wang∗ Furu Wei∗† Xiaodong Liu Yu Wang Jianfeng Gao Ming Zhou Hsiao-Wuen Hon Microsoft Research {lidong1,nanya,wenwan,fuwei}@microsoft.com {xiaodl,yuwan,jfgao,mingzhou,hon}@microsoft.com Abstract This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirec- tional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-of- the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. 1 Introduction Language model (LM) pre-training has substantially advanced the state of the art across a variety of natural language processing tasks [8, 29, 19, 31, 9, 1]. -
Analyzing Political Polarization in News Media with Natural Language Processing
Analyzing Political Polarization in News Media with Natural Language Processing by Brian Sharber A thesis presented to the Honors College of Middle Tennessee State University in partial fulfillment of the requirements for graduation from the University Honors College Fall 2020 Analyzing Political Polarization in News Media with Natural Language Processing by Brian Sharber APPROVED: ______________________________ Dr. Salvador E. Barbosa Project Advisor Computer Science Department Dr. Medha Sarkar Computer Science Department Chair ____________________________ Dr. John R. Vile, Dean University Honors College Copyright © 2020 Brian Sharber Department of Computer Science Middle Tennessee State University; Murfreesboro, Tennessee, USA. I hereby grant to Middle Tennessee State University (MTSU) and its agents (including an institutional repository) the non-exclusive right to archive, preserve, and make accessible my thesis in whole or in part in all forms of media now and hereafter. I warrant that the thesis and the abstract are my original work and do not infringe or violate any rights of others. I agree to indemnify and hold MTSU harmless for any damage which may result from copyright infringement or similar claims brought against MTSU by third parties. I retain all ownership rights to the copyright of my thesis. I also retain the right to use in future works (such as articles or books) all or part of this thesis. The software described in this work is free software. You can redistribute it and/or modify it under the terms of the MIT License. The software is posted on GitHub under the following repository: https://github.com/briansha/NLP-Political-Polarization iii Abstract Political discourse in the United States is becoming increasingly polarized. -
A Study on Neural Network Language Modeling
A Study on Neural Network Language Modeling Dengliang Shi [email protected] Shanghai, Shanghai, China Abstract An exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language mod- els, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are ex- plored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight ma- trixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed. Keywords: neural network language modeling, optimization techniques, limits, improve- ment scheme 1. Introduction Generally, a well-designed language model makes a critical difference in various natural language processing (NLP) tasks, like speech recognition (Hinton et al., 2012; Graves et al., 2013a), machine translation (Cho et al., 2014a; Wu et al., 2016), semantic extraction (Col- lobert and Weston, 2007, 2008) and etc. -
Automatic Spelling Correction Based on N-Gram Model
International Journal of Computer Applications (0975 – 8887) Volume 182 – No. 11, August 2018 Automatic Spelling Correction based on n-Gram Model S. M. El Atawy A. Abd ElGhany Dept. of Computer Science Dept. of Computer Faculty of Specific Education Damietta University Damietta University, Egypt Egypt ABSTRACT correction. In this paper, we are designed, implemented and A spell checker is a basic requirement for any language to be evaluated an end-to-end system that performs spellchecking digitized. It is a software that detects and corrects errors in a and auto correction. particular language. This paper proposes a model to spell error This paper is organized as follows: Section 2 illustrates types detection and auto-correction that is based on n-gram of spelling error and some of related works. Section 3 technique and it is applied in error detection and correction in explains the system description. Section 4 presents the results English as a global language. The proposed model provides and evaluation. Finally, Section 5 includes conclusion and correction suggestions by selecting the most suitable future work. suggestions from a list of corrective suggestions based on lexical resources and n-gram statistics. It depends on a 2. RELATED WORKS lexicon of Microsoft words. The evaluation of the proposed Spell checking techniques have been substantially, such as model uses English standard datasets of misspelled words. error detection & correction. Two commonly approaches for Error detection, automatic error correction, and replacement error detection are dictionary lookup and n-gram analysis. are the main features of the proposed model. The results of the Most spell checkers methods described in the literature, use experiment reached approximately 93% of accuracy and acted dictionaries as a list of correct spellings that help algorithms similarly to Microsoft Word as well as outperformed both of to find target words [16]. -
N-Gram-Based Machine Translation
N-gram-based Machine Translation ∗ JoseB.Mari´ no˜ ∗ Rafael E. Banchs ∗ Josep M. Crego ∗ Adria` de Gispert ∗ Patrik Lambert ∗ Jose´ A. R. Fonollosa ∗ Marta R. Costa-jussa` Universitat Politecnica` de Catalunya This article describes in detail an n-gram approach to statistical machine translation. This ap- proach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS). 1. Introduction The beginnings of statistical machine translation (SMT) can be traced back to the early fifties, closely related to the ideas from which information theory arose (Shannon and Weaver 1949) and inspired by works on cryptography (Shannon 1949, 1951) during World War II. According to this view, machine translation was conceived as the problem of finding a sentence by decoding a given “encrypted” version of it (Weaver 1955). Although the idea seemed very feasible, enthusiasm faded shortly afterward because of the computational limitations of the time (Hutchins 1986). Finally, during the nineties, two factors made it possible for SMT to become an actual and practical technology: first, significant increment in both the computational power and storage capacity of computers, and second, the availability of large volumes of bilingual data. The first SMT systems were developed in the early nineties (Brown et al. 1990, 1993). These systems were based on the so-called noisy channel approach, which models the probability of a target language sentence T given a source language sentence S as the product of a translation-model probability p(S|T), which accounts for adequacy of trans- lation contents, times a target language probability p(T), which accounts for fluency of target constructions. -
Eusipco 2013 1569741253
EUSIPCO 2013 1569741253 1 2 3 4 PLSA ENHANCED WITH A LONG-DISTANCE BIGRAM LANGUAGE MODEL FOR SPEECH 5 RECOGNITION 6 7 Md. Akmal Haidar and Douglas O’Shaughnessy 8 9 INRS-EMT, 6900-800 de la Gauchetiere Ouest, Montreal (Quebec), H5A 1K6, Canada 10 11 12 13 ABSTRACT sition to extract the semantic information for different words and documents. In PLSA and LDA, semantic properties of 14 We propose a language modeling (LM) approach using back- words and documents can be shown in probabilistic topics. 15 ground n-grams and interpolated distanced n-grams for The PLSA latent topic parameters are trained by maximizing 16 speech recognition using an enhanced probabilistic latent the likelihood of the training data using an expectation max- 17 semantic analysis (EPLSA) derivation. PLSA is a bag-of- imization (EM) procedure and have been successfully used 18 words model that exploits the topic information at the docu- for speech recognition [3, 5]. The LDA model has been used 19 ment level, which is inconsistent for the language modeling successfully in recent research work for LM adaptation [6, 7]. 20 in speech recognition. In this paper, we consider the word A bigram LDA topic model, where the word probabilities are 21 sequence in modeling the EPLSA model. Here, the predicted conditioned on their preceding context and the topic proba- 22 word of an n-gram event is drawn from a topic that is cho- bilities are conditioned on the documents, has been recently 23 sen from the topic distribution of the (n-1) history words. investigated [8]. -
Spell Checker in CET Designer
Linköping University | Department of Computer Science Bachelor thesis, 16 ECTS | Datateknik 2016 | LIU-IDA/LITH-EX-G--16/069--SE Spell checker in CET Designer Rasmus Hedin Supervisor : Amir Aminifar Examiner : Zebo Peng Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. -
Master's Thesis Preparation Report
Eindhoven University of Technology MASTER Improving the usability of the domain-specific language editors using artificial intelligence Rajasekar, Nandini Award date: 2021 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science Department of Electrical Engineering Improving the usability of the domain-specific language editors using artificial intelligence Master Thesis Preparation Report Nandini Rajasekar Supervisors: Prof. Dr. M.G.J. van den Brand M. Verano Merino, MSc 1.0 Eindhoven, July 2020 Contents Contents ii List of Figures iii 1 Introduction 1 1.1 Language Oriented Programming............................1 1.2 Language Workbenches.................................1 1.3 Artificial Intelligence...................................2 2 Motivation 4 3 Goal and Approach5 3.1 Goal............................................5 3.2 Research questions....................................5 3.3 Approach.........................................5 4 Literature Survey7 5 Prototype 10 5.1 Next steps........................................ -
Chapter 4 N-Grams
Chapter 4 N-Grams But it must be recognized that the notion “probability of a sentence” is an entirely useless one, under any known interpretation of this term. Noam Chomsky (1969, p. 57) Anytime a linguist leaves the group the recognition rate goes up. Fred Jelinek (then of the IBM speech group) (1988)1 Being able to predict the future is not always a good thing. Cassandra of Troy had the gift of foreseeing but was cursed by Apollo that her predictions would never be believed. Her warnings of the destruction of Troy were ignored and to simplify, let’s just say that things just didn’t go well for her later. Predicting words seems somewhat less fraught, and in this chapter we take up this idea of word prediction. What word, for example, is likely to follow Please turn your homework ... Hopefully, most of you concluded that a very likely word is in, or possibly over, but probably not the. We formalize this idea of word prediction with probabilistic N-gram models called N-gram models, which predict the next word from the previous N 1 words. An N-gram is an N-token sequence of words: a 2-gram (more commonly called− a bigram) is a two-word sequence of words like “please turn”, “turn your”, or ”your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “please turn your”, or “turn your homework”. An N-gram model, as we will explore in detail, is then a model which computes the last word ofan N-gram from the previous ones.2 Such statistical models of word sequences are also called language Language model models or LMs.