Eusipco 2013 1569741253

Total Page:16

File Type:pdf, Size:1020Kb

Eusipco 2013 1569741253 EUSIPCO 2013 1569741253 1 2 3 4 PLSA ENHANCED WITH A LONG-DISTANCE BIGRAM LANGUAGE MODEL FOR SPEECH 5 RECOGNITION 6 7 Md. Akmal Haidar and Douglas O’Shaughnessy 8 9 INRS-EMT, 6900-800 de la Gauchetiere Ouest, Montreal (Quebec), H5A 1K6, Canada 10 11 12 13 ABSTRACT sition to extract the semantic information for different words and documents. In PLSA and LDA, semantic properties of 14 We propose a language modeling (LM) approach using back- words and documents can be shown in probabilistic topics. 15 ground n-grams and interpolated distanced n-grams for The PLSA latent topic parameters are trained by maximizing 16 speech recognition using an enhanced probabilistic latent the likelihood of the training data using an expectation max- 17 semantic analysis (EPLSA) derivation. PLSA is a bag-of- imization (EM) procedure and have been successfully used 18 words model that exploits the topic information at the docu- for speech recognition [3, 5]. The LDA model has been used 19 ment level, which is inconsistent for the language modeling successfully in recent research work for LM adaptation [6, 7]. 20 in speech recognition. In this paper, we consider the word A bigram LDA topic model, where the word probabilities are 21 sequence in modeling the EPLSA model. Here, the predicted conditioned on their preceding context and the topic proba- 22 word of an n-gram event is drawn from a topic that is cho- bilities are conditioned on the documents, has been recently 23 sen from the topic distribution of the (n-1) history words. investigated [8]. A similar model but in the PLSA framework, 24 The EPLSA model cannot capture the long-range topic in- called a bigram PLSA model, was introduced recently [9]. 25 formation from outside of the n-gram event. The distanced An updated bigram PLSA model was proposed in [10] where 26 n-grams are incorporated into interpolated form (IEPLSA) the topic is further conditioned on the bigram history con- 27 to cover the long-range information. A cache-based LM that text. A topic-based language model was proposed where the 28 models the re-occurring words is also incorporated through topic information was obtained from n-gram history through 29 unigram scaling to the EPLSA and IEPLSA models, which Dirichlet distribution [11] and from long-distance history 30 models the topical words. We have seen that our proposed ap- (topic cache) through multinomial distributions [12]. 31 proaches yield significant reductions in perplexity and word 32 error rate (WER) over a PLSA based LM approach using the In [13], a PLSA technique enhanced with long-distance 33 Wall Street Journal (WSJ) corpus. bigrams was used to incorporate the long-term word de- 34 pendencies in determining word clusters. This motivates 35 Index Terms— language model, topic model, speech us to present LM approaches for speech recognition using 36 recognition, cache-based LM, long-distance n-grams distanced n-grams. In this paper, we use default n-grams 37 using enhanced PLSA derivation to form the EPLSA n-gram 38 1. INTRODUCTION model. Here, the observed n-gram events contain the history 39 words and the predicted word. The EPLSA model extracts 40 Statistical n-gram LMs play a vital role for speech recogni- the topic information from history words and the current 41 tion and many other applications. They use the local context word is then predicted based on the topic information of the 42 information by modeling text as a Markovian Sequence. history words. However, the EPLSA model does not cap- 43 However, the n-gram LMs suffer from shortages of long- ture the topic information from outside of the n-gram events. 44 range information, which degrade performance. Cache-based We propose interpolated distanced n-grams (IEPLSA) and 45 LM was one of the earliest efforts to capture the long-range cache based models to capture the long-term word depen- 46 information. Here, the model increases the probability of the dencies into the EPLSA model. The n-gram probabilities of 47 words that appear earlier in a document when predicting the the IEPLSA model are computed by mixing the component 48 next word [1]. Recently, latent topic modeling techniques distanced word probabilities for topics and the interpolated 49 have been used broadly for topic based language modeling topic information for histories. Furthermore, a cache-based 50 to compensate for the weaknesses of the n-gram LMs. Sev- LM is incorporated into the EPLSA and IEPLSA models as 51 eral techniques such as Latent Semantic Analysis (LSA) [2], the cache-based LM models a different part of the language 52 PLSA [3], and latent Dirichlet allocation (LDA) [4] have been than EPLSA/IEPLSA models. 53 studied to extract the latent semantic information from a train- The rest of this paper is organized as follows. Section 2 is 54 ing corpus. All these methods are based on a bag-of-words used to review the PLSA. The proposed EPLSA and IEPLSA 55 assumption. LSA performs word-document matrix decompo- models are described in section 3. In section 4, a comparison 56 57 60 1 61 1 2 of PLSA, bigram PLSA, EPLSA and IEPLSA models is illus- 3 trated. The unigram scaling of the cache-based model to the topic models is explained in section 5. Section 6 is used to de- 4 h z wi 5 scribe the experiments. Finally, the conclusions are explained 6 in section 7. V 7 H 8 2. PLSA MODEL 9 10 The PLSA model [3] can be described in the following pro- Fig. 1. The graphical model of the EPLSA model. The shaded 11 cedure. First a document Dj (j = 1; 2;:::;N) is selected circle represents the observed variables. H and V describe 12 with probability P (Dj). A topic zk (k = 1; 2;:::;K) is the number of histories and the size of vocabulary. 13 then chosen with probability P (zkjDj), and finally a word 14 wi (i = 1; 2;:::;V ) is generated with probability P (wijzk). 15 Here, the observed variables are wi and Dj whereas the un- M-step: 16 observed variable is zk. The joint distribution of the observed P data can be described as: h n(h; wi)P (zkjh; wi) 17 P (wijzk) = P P ; (6) n(h; w 0 )P (z jh; w 0 ) 18 K i0 h i k i X 19 P (Dj; wi) = P (Dj)P (wijDj) = P (Dj) P (zkjDj)P (wijzk); P n(h; w 0 )P (z jh; w 0 ) 20 k=1 i0 i k i P (zkjh) = P P : (7) 21 (1) k0 i0 n(h; wi0 )P (zk0 jh; wi0 ) 22 where the word probability P (wijDj) can be computed as: 23 K 3.2. IEPLSA X 24 P (w jD ) = P (z jD )P (w jz ): (2) i j k j i k The EPLSA model does not capture the long-distance in- 25 k=1 26 formation. To incorporate the long-range characteristics, we 27 The model parameters P (wijzk) and P (zkjDj) are com- used the distanced n-grams in the EPLSA model. Incorpo- 28 puted by using the EM algorithm [3]. rating the interpolated distance n-grams in the EPLSA, the 29 model can be written as [13]: 30 3. PROPOSED EPLSA AND IEPLSA MODELS K 31 X X P (w jh) = [ λ P (w jz )]P (z jh); (8) 32 3.1. EPLSA IEP LSA i d d i k k k=1 d 33 Representing a document Dj as a sequence of words, the joint 34 distribution of the document and the previous (n-1) history where λd are the weights for each component probability 35 words h of the current word wi can be described as [13]: estimated on the held-out data using the EM algorithm and 36 P (w jz ) is the word probabilities for topic z obtained by Y d i k k 37 P (Dj; h) = P (h) Pd(wijh); (3) using the distanced n-grams in the IEPLSA training. d repre- 38 wiDj sents the distance between words in the n-gram events. d = 1 39 describes the default n-grams. For example, the distanced where P (w jh) is the distanced n-gram model. Here, d rep- 40 d i n-grams of the phrase “Speech in Life Sciences and Human resents the distance between the words in the n-grams. There- 41 Societies” are described in Table 1 for the distance d = 1; 2. 42 fore, the probability Pd(wijh) can be computed similar to the 43 PLSA derivation [3, 13]. For d = 1, Pd(wijh) is the default 44 background n-gram and we define it as the enhanced PLSA Table 1. Distanced n-grams for the phrase “Speech in Life 45 (EPLSA) model. The graphical model of the EPLSA model Sciences and Human Societies” Distance Bigrams Trigrams 46 can be described in Figure 1. The equations for the EPLSA d=1 Speech in, in Life, Speech in Life, in Life 47 model are: Life Sciences, Sci- Sciences, Life Sci- 48 K X ences and, and Hu- ences and, Sciences 49 PEP LSA(wijh) = P (wijzk)P (zkjh); (4) man, Human Soci- and Human, and Hu- 50 k=1 eties man Societies 51 The parameters of the model are computed using the EM al- d=2 Speech Life, in Sci- Speech Life and, in 52 gorithm as: E-step: ences, Life and, Sci- Sciences Human, 53 P (w jz )P (z jh) ences Human, and Life and Societies 54 P (z jh; w ) = i k k ; (5) k i PK Societies 55 k0=1 P (wijzk0 )P (zk0 jh) 56 57 60 2 61 1 2 The parameters of the IEPLSA model can be computed As the cache-based LM (i.e., models re-occurring words) 3 as: E-step: is different from the background model (i.e., models short- range information), EPLSA and IEPLSA models (i.e., model 4 P (w jz )P (z jh) P (z jh; w ) = d i k k ; (9) topical words), we can integrate the cache model to adapt the 5 d k i PK P (w jz 0 )P (z 0 jh) 6 k0=1 d i k k PL(wijh) through unigram scaling as [15, 16]: 7 M-step: P (w jh)δ(w ) 8 L i i P PAdapt(wijh) = ; (14) 9 h nd(h; wi)Pd(zkjh; wi) Z(h) Pd(wijzk) = P P ; (10) 10 i0 h nd(h; wi0 )Pd(zkjh; wi0 ) with 11 P P X λ n (h; w 0 )P (z jh; w 0 ) Z(h) = δ(wi):PL(wijh): (15) 12 P (z jh) = i0 d d d i d k i : (11) k P P P wi 13 k0 i0 d λdnd(h; wi0 )Pd(zkjh; wi0 ) 14 where Z(h) is a normalization term, which guarantees that the 15 4.
Recommended publications
  • Improving Neural Language Models with A
    Published as a conference paper at ICLR 2017 IMPROVING NEURAL LANGUAGE MODELS WITH A CONTINUOUS CACHE Edouard Grave, Armand Joulin, Nicolas Usunier Facebook AI Research {egrave,ajoulin,usunier}@fb.com ABSTRACT We propose an extension to neural network language models to adapt their pre- diction to the recent history. Our model is a simplified version of memory aug- mented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented net- works. 1 INTRODUCTION Language models, which are probability distributions over sequences of words, have many appli- cations such as machine translation (Brown et al., 1993), speech recognition (Bahl et al., 1983) or dialogue agents (Stolcke et al., 2000). While traditional neural networks language models have ob- tained state-of-the-art performance in this domain (Jozefowicz et al., 2016; Mikolov et al., 2010), they lack the capacity to adapt to their recent history, limiting their application to dynamic environ- ments (Dodge et al., 2015). A recent approach to solve this problem is to augment these networks with an external memory (Graves et al., 2014; Grefenstette et al., 2015; Joulin & Mikolov, 2015; Sukhbaatar et al., 2015). These models can potentially use their external memory to store new information and adapt to a changing environment.
    [Show full text]
  • Formatting Instructions for NIPS -17
    System Combination for Machine Translation Using N-Gram Posterior Probabilities Yong Zhao Xiaodong He Georgia Institute of Technology Microsoft Research Atlanta, Georgia 30332 1 Microsoft Way, Redmond, WA 98052 [email protected] [email protected] Abstract This paper proposes using n-gram posterior probabilities, which are estimated over translation hypotheses from multiple machine translation (MT) systems, to improve the performance of the system combination. Two ways using n-gram posteriors in confusion network decoding are presented. The first way is based on n-gram posterior language model per source sentence, and the second, called n-gram segment voting, is to boost word posterior probabilities with n-gram occurrence frequencies. The two n-gram posterior methods are incorporated in the confusion network as individual features of a log-linear combination model. Experiments on the Chinese-to-English MT task show that both methods yield significant improvements on the translation performance, and an combination of these two features produces the best translation performance. 1 Introduction In the past several years, the system combination approach, which aims at finding a consensus from a set of alternative hypotheses, has been shown with substantial improvements in various machine translation (MT) tasks. An example of the combination technique was ROVER [1], which was proposed by Fiscus to combine multiple speech recognition outputs. The success of ROVER and other voting techniques give rise to a widely used combination scheme referred as confusion network decoding [2][3][4][5]. Given hypothesis outputs of multiple systems, a confusion network is built by aligning all these hypotheses. The resulting network comprises a sequence of correspondence sets, each of which contains the alternative words that are aligned with each other.
    [Show full text]
  • A Study on Neural Network Language Modeling
    A Study on Neural Network Language Modeling Dengliang Shi [email protected] Shanghai, Shanghai, China Abstract An exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language mod- els, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are ex- plored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight ma- trixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed. Keywords: neural network language modeling, optimization techniques, limits, improve- ment scheme 1. Introduction Generally, a well-designed language model makes a critical difference in various natural language processing (NLP) tasks, like speech recognition (Hinton et al., 2012; Graves et al., 2013a), machine translation (Cho et al., 2014a; Wu et al., 2016), semantic extraction (Col- lobert and Weston, 2007, 2008) and etc.
    [Show full text]
  • Master's Thesis Preparation Report
    Eindhoven University of Technology MASTER Improving the usability of the domain-specific language editors using artificial intelligence Rajasekar, Nandini Award date: 2021 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Department of Mathematics and Computer Science Department of Electrical Engineering Improving the usability of the domain-specific language editors using artificial intelligence Master Thesis Preparation Report Nandini Rajasekar Supervisors: Prof. Dr. M.G.J. van den Brand M. Verano Merino, MSc 1.0 Eindhoven, July 2020 Contents Contents ii List of Figures iii 1 Introduction 1 1.1 Language Oriented Programming............................1 1.2 Language Workbenches.................................1 1.3 Artificial Intelligence...................................2 2 Motivation 4 3 Goal and Approach5 3.1 Goal............................................5 3.2 Research questions....................................5 3.3 Approach.........................................5 4 Literature Survey7 5 Prototype 10 5.1 Next steps........................................
    [Show full text]
  • Chapter 4 N-Grams
    Chapter 4 N-Grams But it must be recognized that the notion “probability of a sentence” is an entirely useless one, under any known interpretation of this term. Noam Chomsky (1969, p. 57) Anytime a linguist leaves the group the recognition rate goes up. Fred Jelinek (then of the IBM speech group) (1988)1 Being able to predict the future is not always a good thing. Cassandra of Troy had the gift of foreseeing but was cursed by Apollo that her predictions would never be believed. Her warnings of the destruction of Troy were ignored and to simplify, let’s just say that things just didn’t go well for her later. Predicting words seems somewhat less fraught, and in this chapter we take up this idea of word prediction. What word, for example, is likely to follow Please turn your homework ... Hopefully, most of you concluded that a very likely word is in, or possibly over, but probably not the. We formalize this idea of word prediction with probabilistic N-gram models called N-gram models, which predict the next word from the previous N 1 words. An N-gram is an N-token sequence of words: a 2-gram (more commonly called− a bigram) is a two-word sequence of words like “please turn”, “turn your”, or ”your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “please turn your”, or “turn your homework”. An N-gram model, as we will explore in detail, is then a model which computes the last word ofan N-gram from the previous ones.2 Such statistical models of word sequences are also called language Language model models or LMs.
    [Show full text]
  • Modeling Vocabulary for Big Code Machine Learning
    Modeling Vocabulary for Big Code Machine Learning Hlib Babii Andrea Janes Romain Robbes Free University of Bozen-Bolzano Free University of Bozen-Bolzano Free University of Bozen-Bolzano Bolzano, Italy Bolzano, Italy Bolzano, Italy [email protected] [email protected] [email protected] ABSTRACT to scale them is that recent results in NLP [23, 35, 52] show that When building machine learning models that operate on source NLMs can be used as upstream tasks in a transfer learning scenario, code, several decisions have to be made to model source-code vo- leading to state-of-the-art improvement in downstream tasks. cabulary. These decisions can have a large impact: some can lead Section 3 presents our first contribution: a detailed explanation to not being able to train models at all, others significantly affect of the possible modeling choices for source code that we identified. performance, particularly for Neural Language Models. Yet, these A variety of modeling choices for vocabulary are available, includ- decisions are not often fully described. This paper lists important ing: which source code elements to include or exclude; whether to modeling choices for source code vocabulary, and explores their filter out unfrequent tokens or not; how to handle different natural impact on the resulting vocabulary on a large-scale corpus of 14,436 languages; and how to handle compound tokens. Some of these projects. We show that a subset of decisions have decisive character- choices may have a large impact on vocabulary size, directly im- istics, allowing to train accurate Neural Language Models quickly pacting the feasibility of training neural approaches.
    [Show full text]
  • A Dynamic Language Model for Speech Recognition
    A Dynamic Language Model for Speech Recognition F. Jelinek, B. Merialdo, S. Roukos, and M. Strauss I IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 ABSTRACT eral words are bursty by nature, i.e., one expects the In the case of a trlgr~m language model, the proba- word "language" to occur in this paper at a signif- bility of the next word conditioned on the previous two icantly higher rate than the average frequency esti- words is estimated from a large corpus of text. The re- mated from a large collection of text. To capture the sulting static trigram language model (STLM) has fixed "dynamic" nature of the trigram probabilities in a probabilities that are independent of the document being particular document, we present a "cache;' trigram dictated. To improve the language mode] (LM), one can language model (CTLM) that uses a window of the n adapt the probabilities of the trigram language model to match the current document more closely. The partially most recent words to determine the probability dis- dictated document provides significant clues about what tribution of the next word. words ~re more likely to be used next. Of many meth- ods that can be used to adapt the LM, we describe in this The idea of using a window of the recent history to paper a simple model based on the trigram frequencies es- adjust the LM probabilities was proposed in [2, 3]. In timated from the partially dictated document. We call this [2] tile dynamic component adjusted the conditional model ~ cache trigram language model (CTLM) since we probability, p,(w,+l ] g,+l), of the next word, wn+l, are c~chlng the recent history of words.
    [Show full text]
  • Language Model Techniques in Machine Translation Diplomarbeit
    Language Model Techniques in Machine Translation Evaluating the effect of integrating long term context dependency language modeling techniques into Statistical Machine Translation Diplomarbeit Universit¨at Karlsruhe / Carnegie Mellon University Martin Raab Advisors: Matthias Eck, Stephan Vogel 31st October 2006 Affirmation Hiermit erkl¨areich, die vorliegende Arbeit selbst¨andigerstellt und keine anderen als die angegebenen Quellen verwendet zu haben. Hereby I declare by that I created this thesis on my own and used no other references as the quoted ones. Pittsburgh, the 31st October 2006 ....................................... i Abstract Automatic translation from one language to another is a highly ambituous task, and there is already a long history of people trying to solve this problem (Weaver, 1955). Yet there is no answer to this problem, but since Brown et al. (1993) Statistical Machine Translation (SMT) emerged as a promising candidate and is until now of primary research interest. Many work has been spent on components like the translation model and subtleties of this. Less work has been spent on language models, which are needed to derive well formed sentences in the target language. Hence the leading idea of this thesis is to integrate and evaluate language mod- eling techniques to capture long term context dependencies. An excellent source of inspiration for this is the field of speech recognition. The reason is that lan- guage models have been studied thoroughly for speech recognition, where lan- guage models play a similar role. However, few of the numerous approaches for speech recognition language models have been evaluated with respect to machine translation. After analyzing the literature the following three approaches seemed promising and have been chosen to become investigated: class base language models, cache language models and sentence mixture language models.
    [Show full text]
  • A Neurophysiologically-Inspired Statistical Language Model
    A Neurophysiologically-Inspired Statistical Language Model Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Jon Dehdari ∼6 6 Graduate Program in Linguistics The Ohio State University 2014 Dissertation Committee: Professor William Schuler, Advisor Professor Eric Fosler-Lussier Professor Per Sederberg c Jon Dehdari, 2014 Abstract We describe a statistical language model having components that are inspired by electrophysiological activities in the brain. These components correspond to important language-relevant event-related potentials measured using electroencephalography. We relate neural signals involved in local- and long-distance grammatical processing, as well as local- and long-distance lexical processing to statistical language models that are scalable, cross-linguistic, and incremental. We develop a novel language model component that unifies n-gram, skip, and trigger language models into a generalized model inspired by the long-distance lexical event-related potential (N400). We evaluate this model in textual and speech recognition experiments, showing consistent improvements over 4-gram modified Kneser-Ney language models (Chen and Goodman, 1998) for large-scale textual datasets in English, Arabic, Croatian, and Hungarian. ii Acknowledgements I would like to thank my advisor, William Schuler, for his guidance these past few years. His easygoing attitude combined with his passion for his research interests helped steer me towards the topic of this dissertation. I'm also grateful for helpful feedback from Eric Fosler-Lussier and Per Sederberg in improving this work. Eric also provided much-appreciated guidance during my candidacy exams. I'd like to thank my past advisors, Chris Brew, Detmar Meurers, and Deryle Lonsdale for shaping who I am today, and for their encouragement and instruction.
    [Show full text]
  • Arxiv:1910.05493V1 [Cs.LG] 12 Oct 2019 1
    Deep Transfer Learning for Source Code Modeling Yasir Hussaina,∗, Zhiqiu Huanga,b,c,∗, Yu Zhoua, Senzhang Wanga aCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing 211106, China bKey Laboratory of Safety-Critical Software, NUAA, Ministry of Industry and Information Technology, Nanjing 211106, China cCollaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China Abstract In recent years, deep learning models have shown great potential in source code model- ing and analysis. Generally, deep learning-based approaches are problem-specific and data-hungry. A challenging issue of these approaches is that they require training from starch for a different related problem. In this work, we propose a transfer learning- based approach that significantly improves the performance of deep learning-based source code models. In contrast to traditional learning paradigms, transfer learning can transfer the knowledge learned in solving one problem into another related prob- lem. First, we present two recurrent neural network-based models RNN and GRU for the purpose of transfer learning in the domain of source code modeling. Next, via transfer learning, these pre-trained (RNN and GRU) models are used as feature extractors. Then, these extracted features are combined into attention learner for dif- ferent downstream tasks. The attention learner leverages from the learned knowledge of pre-trained models and fine-tunes them for a specific downstream task. We evaluate the performance of the proposed approach with extensive experiments with the source code suggestion task. The results indicate that the proposed approach outperforms the state-of-the-art models in terms of accuracy, precision, recall, and F-measure without training the models from scratch.
    [Show full text]
  • Context-Sensitive Code Completion
    Context-Sensitive Code Completion A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in the Department of Computer Science University of Saskatchewan Saskatoon By Muhammad Asaduzzaman c Muhammad Asaduzzaman, January, 2018. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon SK S7N 5C9 OR Dean College of Graduate and Postdoctoral Studies University of Saskatchewan 116 - 110 Science Place Saskatoon SK S7N 5C9 i Abstract Developers depend extensively on software frameworks and libraries to deliver the products on time.
    [Show full text]
  • Are Deep Neural Networks the Best Choice for Modeling Source Code?
    Are Deep Neural Networks the Best Choice for Modeling Source Code? Vincent J. Hellendoorn Premkumar Devanbu Computer Science Dept., UC Davis Computer Science Dept., UC Davis Davis, CA, USA 95616 Davis, CA, USA 95616 [email protected] [email protected] ABSTRACT Statistical models from NLP, estimated over the large volumes of Current statistical language modeling techniques, including deep- code available in GitHub, have led to a wide range of applications learning based models, have proven to be quite effective for source in software engineering. High-performance language models are code. We argue here that the special properties of source code can widely used to improve performance on NLP-related tasks, such as be exploited for further improvements. In this work, we enhance translation, speech-recognition, and query completion; similarly, established language modeling approaches to handle the special better language models for source code are known to improve per- challenges of modeling source code, such as: frequent changes, formance in tasks such as code completion [15]. Developing models larger, changing vocabularies, deeply nested scopes, etc. We present that can address (and exploit) the special properties of source code a fast, nested language modeling toolkit specifically designed for is central to this enterprise. software, with the ability to add & remove text, and mix & swap out Language models for NLP have been developed over decades, many models. Specifically, we improve upon prior cache-modeling and are highly refined; however, many of the design decisions work and present a model with a much more expansive, multi-level baked-into modern NLP language models are finely-wrought to notion of locality that we show to be well-suited for modeling exploit properties of natural language corpora.
    [Show full text]