Wordnet and Distributional Analysis: a Class-Based Approach to Lexical Discovery

Total Page:16

File Type:pdf, Size:1020Kb

Wordnet and Distributional Analysis: a Class-Based Approach to Lexical Discovery From: AAAI Technical Report WS-92-01. Compilation copyright © 1992, AAAI (www.aaai.org). All rights reserved. WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery Philip Resnik* Department of Computer and Information Science University of Pennsylvania, Philadelphia, PA 19104, USA resnik@linc,cis. upenn. edu Introduction plications for which such class-based investigations might prove useful. It has becomecommon in statistical studies of nat- ural language data to use measures of lexical asso- ciation, such as the information-theoretic measure Word/Word Relationships of mutual information, to extract useful relation- ships between words (e.g. [Church et al., 1989; Mutual information is an information-theoretic mea- Church and Hanks, 1989; Hindle, 1990]). For ex- sure of association frequently used with natural lan- ample, [Hindle, 1990] uses an estimate of mutual in- guage data to gauge the "relatedness" between two formation to calculate what nouns a verb can take words x and y. The mutual information of x and y as its subjects and objects, based on distributions is defined as follows: found within a large corpus of naturally occurring Pr(x, y) text. I(x;y) = log Pr(x)Pr(y) (1) Lexical association has its limits, however, since oftentimes either the data are insufficient to provide Intuitively, the probability of seeing x and y to- reliable word/word correspondences, or a task re- gether, Pr(x, y), gives some idea as to how related quires more abstraction than word/word correspon- they are. However, ifx and y are both very common, dences permit. In this paper I present a generaliza- then it is likely that they appear together frequently tion of lexical association techniques that addresses simply by chance and not as a result of any relation- these limitations by facilitating statistical discovery ship between them. In order to correct for this pos- of facts involving word classes rather than individual sibility, Pr(x,y) is divided by Pr(x)Pr(y), which words. Although defining association measures over the probability that x and y would have of appearing classes (as sets of words) is straightforward in theory, together by chance if they were independent. Taking makingdirect use of such a definition is impractical the logarithm of this ratio gives mutual information because there are simply too many classes to con- some desirable properties; for example, its value is sider. Rather than considering all possible classes, respectively positive, zero, or negative according to I propose constraining the set of possible word whether x and y appear together more frequently, as classes by using WordNet [Beckwith et al., 1991; frequently, or less frequently than one would expect Miller, 1990], a broad-coverage lexical/conceptual if they were independent) hierarchy. Let us consider two examples of how mutual in- Section 1 very briefly reviews mutual informa- formation can be used. [Church and Hanks, 1989] tion as it is commonlyused to discover lexical as- define the association ratio between two words as sociations, and Section 2 discusses somelimitations a variation on mutual information in which word y of this approach. In Section 3 I apply mutual infor- must follow word z within a windowof w words. The mation to word/class relationships, and propose a association ratio is then used to discover "interest- knowledge-based interpretation of "class" that takes ing associations" within a large corpus of naturally advantage of the WordNet taxonomy. In Section 4 occurring text -- in this case, the 1987 Associated I apply the technique to a problem involving verb Press corpus. Table 1 gives some of the word pairs argument relationships, and in Sections 5 and 6 I found in this manner. An application of such infor- mention some related work and suggest further ap- mation is the discovery of interesting lexico-syntactic regularities. Church and Hanks write: *I would like to acknowledge the support of an IBMgraduate fellowship, and to thank Eric Brill, Ar- 1Thedefinition of mutual information is also related avind Joshi, David Magerman,Christine Nakatani, and to other information-theoretic notions such as relative Michael Niv for their helpful comments.This research entropy; see [Cover and Thomas,1991] for a clear dis- wasalso supportedin part by the following grants: ARO cussion of these and related topics. See [Churchet al., DAAL03-89-C-0031, DARPAN00014-90-J-1863, NSF 1991] for a discussion of mutual information and other IRI 90-16592, and Ben Franklin 91S.3078C-1. statistics as applied to lexical analysis. 48 Association ratio x Y I Co-occurrence score I verb I object ]countl 11.3 honorary doctor 5.8470 open closet 2 11.3 doctors dentists 5.8470 open chapter 2 10.7 doctors nurses 5.7799 open mouth 7 9.4 doctors treating 5.5840 open window 5 9.0 examined doctor 5.4320 open store 2 8.9 doctors treat 5.4045 open door 26 8.7 doctor bills 5.0170 open scene 3 4.6246 open season 2 Table 1: Some interesting associations with "Doc- 4.2621 open church 2 tor" (data from Church and Hanks 1989) 3.6246 open minute 2 3.2621 open eye 7 3.1101 open statement 2 Co-occurrence score [ verb ] object 1.5991 open time 2 11.75 drink tea 1.3305 open way 3 11.75 drink Pepsi 11.75 drink champagne Table 3: Verb~object pairs for open 10.53 drink liquid (with count > I) 10.20 drink beer 9.34 drink wine First, as in all statistical applications, it must Table 2: High-scoring verb~object pairs for drink be possible to estimate probabilities accurately. Al- (This is a portion of Hindle 1990, Table 2.) though larger and larger corpora are increasingly available, the specific task under consideration often can restrict the choice of corpus to one that provides ... Webelieve the association ratio can also a smaller sample than necessary to discover all the be used to search for interesting lexico-syntactic lexical relationships of interest. This can lead some relationships between verbs and typical argu- lexical relationships to go unnoticed. ments/adjuncts .... [For example, i]t happens For example, the Brown Corpus [Francis and that set ... off was found 177 times in the 1987 AP Corpus of approximately 15 million words Kucera, 1982] has the attractive (and for sometasks, ... Quantitatively, I(set, o.0) 5.9982, indicat- necessary) property of providing a sample that is ing that the probability of set ... off is Mmost64 balanced across many genres. In an experiment us- greater than chance. ing the BrownCorpus, modelled after Hindle’s [1990] investigation of verb/object relationships, I calcu- As a second example, consider Hindle’s [1990] lated the mutual information between verbs and the application of mutual information to the discov- nouns that appeared as their objects. 2 Table 3 shows ery of predicate argument relations. Unlike Church objects of the verb open; as in Table 2, the listing in- and Hanks, who restrict themselves to surface dis- cludes only verb/object pairs that were encountered tributions of words, Hindle investigates word co- more than once. occurrences as mediated by syntactic structure. A six-million-word sample of Associated Press news Attention to the verb/object pairs that occurred stories was parsed in order to construct a collection only once, however, led to an interesting observa- tion. Included among the "discarded" object nouns of subject~verb~object instances. On the basis of was the following set: discourse, engagement, reply, these data, Hindle calculates a co-occurrence score (an estimate of mutual information) for verb/object program, conference, and session. Although each pairs and verb/subject pairs. Table 2 shows some of these appeared as the object of open only once of the verb/object pairs for the verb drink that oc- -- certainly too infrequently to provide reliable esti- curred more than once, ranked by co-occurrence mates -- this set, considered as a whole, reveals an score, "in effect giving the answer to the question interesting fact about somekinds of things that can ’what can you drink?’ " [Itindle, 1990, p. 270]. be opened, roughly captured by the notion of com- munications. More generally, several pieces of sta- tistically unreliable information at the lexical level Word/Word Limitations maynonetheless capture a useful statistical regular- The application of mutual information at the level ity when combined. This observation motivates an of word pairs suffers from two obvious limitations: approach to lexical association that makes such com- binations possible. ¯ the corpus mayfail to provide sufficient informa- A second limitation of the word/word relation- tion about relevant word/word relationships, and ship is simply this: some tasks are not amenable to ¯ word/word relationships, even those supported by treatment using lexical relationships alone. An ex- the data, maynot be the appropriate relationships to lookat forsome tasks. ZTheexperiment differed from that described in [Hin- dle, 1990] primarily in the way the direct object of the Letus considerthese limitations in turn. verb wasidentified; see Section 4 for details. 49 ample is the automatic discovery of verb selectional Coherent Classes preferences for natural language systems. Here, the The search for a verb’s most typical object nouns relationship of interest holds not betweena verb and requires at most ]A/’I evaluations of the association a noun, but between the verb and a class of nouns function, and can thus be done exhaustively. An ex- (e.g. between eat and nouns that are [+EDIBLE]). haustive search amongobject classes is impractical, Given a table built using lexical statistics, such as however, since the numberof classes is exponential. Table 2 or 3, no single lexical item necessarily stands Clearly some way to constrain the search is needed.
Recommended publications
  • Aviation Speech Recognition System Using Artificial Intelligence
    SPEECH RECOGNITION SYSTEM OVERVIEW AVIATION SPEECH RECOGNITION SYSTEM USING ARTIFICIAL INTELLIGENCE Appareo’s embeddable AI model for air traffic control (ATC) transcription makes the company one of the world leaders in aviation artificial intelligence. TECHNOLOGY APPLICATION As featured in Appareo’s two flight apps for pilots, Stratus InsightTM and Stratus Horizon Pro, the custom speech recognition system ATC Transcription makes the company one of the world leaders in aviation artificial intelligence. ATC Transcription is a recurrent neural network that transcribes analog or digital aviation audio into text in near-real time. This in-house artificial intelligence, trained with Appareo’s proprietary dataset of flight-deck audio, takes the terabytes of training data and its accompanying transcriptions and reduces that content to a powerful 160 MB model that can be run inside the aircraft. BENEFITS OF ATC TRANSCRIPTION: • Provides improved situational awareness by identifying, capturing, and presenting ATC communications relevant to your aircraft’s operation for review or replay. • Provides the ability to stream transcribed speech to on-board (avionic) or off-board (iPad/ iPhone) display sources. • Provides a continuous representation of secondary audio channels (ATIS, AWOS, etc.) for review in textual form, allowing your attention to focus on a single channel. • ...and more. What makes ATC Transcription special is the context-specific training work that Appareo has done, using state-of-the-art artificial intelligence development techniques, to create an AI that understands the aviation industry vocabulary. Other natural language processing work is easily confused by the cadence, noise, and vocabulary of the aviation industry, yet Appareo’s groundbreaking work has overcome these barriers and brought functional artificial intelligence to the cockpit.
    [Show full text]
  • A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech
    A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech Joshua Y. Kim1, Chunfeng Liu1, Rafael A. Calvo1*, Kathryn McCabe2, Silas C. R. Taylor3, Björn W. Schuller4, Kaihang Wu1 1 University of Sydney, Faculty of Engineering and Information Technologies 2 University of California, Davis, Psychiatry and Behavioral Sciences 3 University of New South Wales, Faculty of Medicine 4 Imperial College London, Department of Computing Abstract large-scale commercial products such as Google Home and Amazon Alexa. Mainstream ASR Automatic Speech Recognition (ASR) systems use only voice as inputs, but there is systems have proliferated over the recent potential benefit in using multi-modal data in order years to the point that free platforms such to improve accuracy [1]. Compared to machines, as YouTube now provide speech recognition services. Given the wide humans are highly skilled in utilizing such selection of ASR systems, we contribute to unstructured multi-modal information. For the field of automatic speech recognition example, a human speaker is attuned to nonverbal by comparing the relative performance of behavior signals and actively looks for these non- two sets of manual transcriptions and five verbal ‘hints’ that a listener understands the speech sets of automatic transcriptions (Google content, and if not, they adjust their speech Cloud, IBM Watson, Microsoft Azure, accordingly. Therefore, understanding nonverbal Trint, and YouTube) to help researchers to responses to unintelligible speech can both select accurate transcription services. In improve future ASR systems to mark uncertain addition, we identify nonverbal behaviors transcriptions, and also help to provide feedback so that are associated with unintelligible speech, as indicated by high word error that the speaker can improve his or her verbal rates.
    [Show full text]
  • A Way with Words: Recent Advances in Lexical Theory and Analysis: a Festschrift for Patrick Hanks
    A Way with Words: Recent Advances in Lexical Theory and Analysis: A Festschrift for Patrick Hanks Gilles-Maurice de Schryver (editor) (Ghent University and University of the Western Cape) Kampala: Menha Publishers, 2010, vii+375 pp; ISBN 978-9970-10-101-6, e59.95 Reviewed by Paul Cook University of Toronto In his introduction to this collection of articles dedicated to Patrick Hanks, de Schryver presents a quote from Atkins referring to Hanks as “the ideal lexicographer’s lexicogra- pher.” Indeed, Hanks has had a formidable career in lexicography, including playing an editorial role in the production of four major English dictionaries. But Hanks’s achievements reach far beyond lexicography; in particular, Hanks has made numerous contributions to computational linguistics. Hanks is co-author of the tremendously influential paper “Word association norms, mutual information, and lexicography” (Church and Hanks 1989) and maintains close ties to our field. Furthermore, Hanks has advanced the understanding of the relationship between words and their meanings in text through his theory of norms and exploitations (Hanks forthcoming). The range of Hanks’s interests is reflected in the authors and topics of the articles in this Festschrift; this review concentrates more on those articles that are likely to be of most interest to computational linguists, and does not discuss some articles that appeal primarily to a lexicographical audience. Following the introduction to Hanks and his career by de Schryver, the collection is split into three parts: Theoretical Aspects and Background, Computing Lexical Relations, and Lexical Analysis and Dictionary Writing. 1. Part I: Theoretical Aspects and Background Part I begins with an unfinished article by the late John Sinclair, in which he begins to put forward an argument that multi-word units of meaning should be given the same status in dictionaries as ordinary headwords.
    [Show full text]
  • Finetuning Pre-Trained Language Models for Sentiment Classification of COVID19 Tweets
    Technological University Dublin ARROW@TU Dublin Dissertations School of Computer Sciences 2020 Finetuning Pre-Trained Language Models for Sentiment Classification of COVID19 Tweets Arjun Dussa Technological University Dublin Follow this and additional works at: https://arrow.tudublin.ie/scschcomdis Part of the Computer Engineering Commons Recommended Citation Dussa, A. (2020) Finetuning Pre-trained language models for sentiment classification of COVID19 tweets,Dissertation, Technological University Dublin. doi:10.21427/fhx8-vk25 This Dissertation is brought to you for free and open access by the School of Computer Sciences at ARROW@TU Dublin. It has been accepted for inclusion in Dissertations by an authorized administrator of ARROW@TU Dublin. For more information, please contact [email protected], [email protected]. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License Finetuning Pre-trained language models for sentiment classification of COVID19 tweets Arjun Dussa A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Analytics) September 2020 Declaration I certify that this dissertation which I now submit for examination for the award of MSc in Computing (Data Analytics), is entirely my own work and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the test of my work. This dissertation was prepared according to the regulations for postgraduate study of the Technological University Dublin and has not been submitted in whole or part for an award in any other Institute or University.
    [Show full text]
  • Learned in Speech Recognition: Contextual Acoustic Word Embeddings
    LEARNED IN SPEECH RECOGNITION: CONTEXTUAL ACOUSTIC WORD EMBEDDINGS Shruti Palaskar∗, Vikas Raunak∗ and Florian Metze Carnegie Mellon University, Pittsburgh, PA, U.S.A. fspalaska j vraunak j fmetze [email protected] ABSTRACT model [10, 11, 12] trained for direct Acoustic-to-Word (A2W) speech recognition [13]. Using this model, we jointly learn to End-to-end acoustic-to-word speech recognition models have re- automatically segment and classify input speech into individual cently gained popularity because they are easy to train, scale well to words, hence getting rid of the problem of chunking or requiring large amounts of training data, and do not require a lexicon. In addi- pre-defined word boundaries. As our A2W model is trained at the tion, word models may also be easier to integrate with downstream utterance level, we show that we can not only learn acoustic word tasks such as spoken language understanding, because inference embeddings, but also learn them in the proper context of their con- (search) is much simplified compared to phoneme, character or any taining sentence. We also evaluate our contextual acoustic word other sort of sub-word units. In this paper, we describe methods embeddings on a spoken language understanding task, demonstrat- to construct contextual acoustic word embeddings directly from a ing that they can be useful in non-transcription downstream tasks. supervised sequence-to-sequence acoustic-to-word speech recog- Our main contributions in this paper are the following: nition model using the learned attention distribution. On a suite 1. We demonstrate the usability of attention not only for aligning of 16 standard sentence evaluation tasks, our embeddings show words to acoustic frames without any forced alignment but also for competitive performance against a word2vec model trained on the constructing Contextual Acoustic Word Embeddings (CAWE).
    [Show full text]
  • Lexical Analysis
    From words to numbers Parsing, tokenization, extract information Natural Language Processing Piotr Fulmański Lecture goals • Tokenizing your text into words and n-grams (tokens) • Compressing your token vocabulary with stemming and lemmatization • Building a vector representation of a statement Natural Language Processing in Action by Hobson Lane Cole Howard Hannes Max Hapke Manning Publications, 2019 Terminology Terminology • A phoneme is a unit of sound that distinguishes one word from another in a particular language. • In linguistics, a word of a spoken language can be defined as the smallest sequence of phonemes that can be uttered in isolation with objective or practical meaning. • The concept of "word" is usually distinguished from that of a morpheme. • Every word is composed of one or more morphemes. • A morphem is the smallest meaningful unit in a language even if it will not stand on its own. A morpheme is not necessarily the same as a word. The main difference between a morpheme and a word is that a morpheme sometimes does not stand alone, but a word, by definition, always stands alone. Terminology SEGMENTATION • Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. • Word segmentation is the problem of dividing a string of written language into its component words. Text segmentation, retrieved 2020-10-20, https://en.wikipedia.org/wiki/Text_segmentation Terminology SEGMENTATION IS NOT SO EASY • Trying to resolve the question of what a word is and how to divide up text into words we can face many "exceptions": • Is “ice cream” one word or two to you? Don’t both words have entries in your mental dictionary that are separate from the compound word “ice cream”? • What about the contraction “don’t”? Should that string of characters be split into one or two “packets of meaning?” • The single statement “Don’t!” means “Don’t you do that!” or “You, do not do that!” That’s three hidden packets of meaning for a total of five tokens you’d like your machine to know about.
    [Show full text]
  • Synthesis and Recognition of Speech Creating and Listening to Speech
    ISSN 1883-1974 (Print) ISSN 1884-0787 (Online) National Institute of Informatics News NII Interview 51 A Combination of Speech Synthesis and Speech Oct. 2014 Recognition Creates an Affluent Society NII Special 1 “Statistical Speech Synthesis” Technology with a Rapidly Growing Application Area NII Special 2 Finding Practical Application for Speech Recognition Feature Synthesis and Recognition of Speech Creating and Listening to Speech A digital book version of “NII Today” is now available. http://www.nii.ac.jp/about/publication/today/ This English language edition NII Today corresponds to No. 65 of the Japanese edition [Advance Notice] Great news! NII Interview Yamagishi-sensei will create my voice! Yamagishi One result is a speech translation sys- sound. Bit (NII Character) A Combination of Speech Synthesis tem. This system recognizes speech and translates it Ohkawara I would like your comments on the fu- using machine translation to synthesize speech, also ture challenges. and Speech Recognition automatically translating it into every language to Ono The challenge for speech recognition is how speak. Moreover, the speech is created with a human close it will come to humans in the distant speech A Word from the Interviewer voice. In second language learning, you can under- case. If this study is advanced, it will be possible to Creates an Affluent Society stand how you should pronounce it with your own summarize the contents of a meeting and to automati- voice. If this is further advanced, the system could cally take the minutes. If a robot understands the con- have an actor in a movie speak in a different language tents of conversations by multiple people in a natural More and more people have begun to use smart- a smartphone, I use speech input more often.
    [Show full text]
  • Natural Language Processing in Speech Understanding Systems
    Working Paper Series ISSN 11 70-487X Natural language processing in speech understanding systems by Dr Geoffrey Holmes Working Paper 92/6 September, 1992 © 1992 by Dr Geoffrey Holmes Department of Computer Science The University of Waikato Private Bag 3105 Hamilton, New Zealand Natural Language P rocessing In Speech Understanding Systems G. Holmes Department of Computer Science, University of Waikato, New Zealand Overview Speech understanding systems (SUS's) came of age in late 1071 as a result of a five year devel­ opment. programme instigated by the Information Processing Technology Office of the Advanced Research Projects Agency (ARPA) of the Department of Defense in the United States. The aim of the progranune was to research and tlevelop practical man-machine conuuunication system s. It has been argued since, t hat. t he main contribution of this project was not in the development of speech science, but in the development of artificial intelligence. That debate is beyond the scope of th.is paper, though no one would question the fact. that the field to benefit most within artificial intelligence as a result of this progranune is natural language understan ding. More recent projects of a similar nature, such as projects in the Unite<l Kiug<lom's ALVEY programme and Ew·ope's ESPRIT programme have added further developments to this important field. Th.is paper presents a review of some of the natural language processing techniques used within speech understanding syst:ems. In particular. t.ecl.miq11es for handling syntactic, semantic and pragmatic informat.ion are ,Uscussed. They are integrated into SUS's as knowledge sources.
    [Show full text]
  • Arxiv:2007.00183V2 [Eess.AS] 24 Nov 2020
    WHOLE-WORD SEGMENTAL SPEECH RECOGNITION WITH ACOUSTIC WORD EMBEDDINGS Bowen Shi, Shane Settle, Karen Livescu TTI-Chicago, USA fbshi,settle.shane,[email protected] ABSTRACT Segmental models are sequence prediction models in which scores of hypotheses are based on entire variable-length seg- ments of frames. We consider segmental models for whole- word (“acoustic-to-word”) speech recognition, with the feature vectors defined using vector embeddings of segments. Such models are computationally challenging as the number of paths is proportional to the vocabulary size, which can be orders of magnitude larger than when using subword units like phones. We describe an efficient approach for end-to-end whole-word segmental models, with forward-backward and Viterbi de- coding performed on a GPU and a simple segment scoring function that reduces space complexity. In addition, we inves- tigate the use of pre-training via jointly trained acoustic word embeddings (AWEs) and acoustically grounded word embed- dings (AGWEs) of written word labels. We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional Fig. 1. Whole-word segmental model for speech recognition. (smaller) gains can be obtained by pre-training the word pre- Note: boundary frames are not shared. diction layer with AGWEs. Our final models improve over segmental models, where the sequence probability is com- prior A2W models. puted based on segment scores instead of frame probabilities. Index Terms— speech recognition, segmental model, Segmental models have a long history in speech recognition acoustic-to-word, acoustic word embeddings, pre-training research, but they have been used primarily for phonetic recog- nition or as phone-level acoustic models [11–18].
    [Show full text]
  • Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph
    mathematics Article Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph Tajana Ban Kirigin 1,* , Sanda Bujaˇci´cBabi´c 1 and Benedikt Perak 2 1 Department of Mathematics, University of Rijeka, R. Matejˇci´c2, 51000 Rijeka, Croatia; [email protected] 2 Faculty of Humanities and Social Sciences, University of Rijeka, SveuˇcilišnaAvenija 4, 51000 Rijeka, Croatia; [email protected] * Correspondence: [email protected] Abstract: This paper describes a graph method for labeling word senses and identifying lexical sentiment potential by integrating the corpus-based syntactic-semantic dependency graph layer, lexical semantic and sentiment dictionaries. The method, implemented as ConGraCNet application on different languages and corpora, projects a semantic function onto a particular syntactical de- pendency layer and constructs a seed lexeme graph with collocates of high conceptual similarity. The seed lexeme graph is clustered into subgraphs that reveal the polysemous semantic nature of a lexeme in a corpus. The construction of the WordNet hypernym graph provides a set of synset labels that generalize the senses for each lexical cluster. By integrating sentiment dictionaries, we introduce graph propagation methods for sentiment analysis. Original dictionary sentiment values are integrated into ConGraCNet lexical graph to compute sentiment values of node lexemes and lexical clusters, and identify the sentiment potential of lexemes with respect to a corpus. The method can be used to resolve sparseness of sentiment dictionaries and enrich the sentiment evaluation of Citation: Ban Kirigin, T.; lexical structures in sentiment dictionaries by revealing the relative sentiment potential of polysemous Bujaˇci´cBabi´c,S.; Perak, B. Lexical Sense Labeling and Sentiment lexemes with respect to a specific corpus.
    [Show full text]
  • Improving Named Entity Recognition Using Deep Learning with Human in the Loop
    Demonstration Improving Named Entity Recognition using Deep Learning with Human in the Loop Ticiana L. Coelho da Silva Regis Pires Magalhães José Antônio F. de Macêdo Insight Data Science Lab Insight Data Science Lab Insight Data Science Lab Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil [email protected] [email protected] [email protected] David Araújo Abreu Natanael da Silva Araújo Vinicius Teixeira de Melo Insight Data Science Lab Insight Data Science Lab Insight Data Science Lab Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil [email protected] [email protected] [email protected] Pedro Olímpio Pinheiro Paulo A. L. Rego Aloisio Vieira Lira Neto Insight Data Science Lab Federal University of Ceara Brazilian Federal Highway Police Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil Fortaleza, Ceara, Brazil [email protected] [email protected] [email protected] ABSTRACT Even, orthographic features and language-specific knowledge Named Entity Recognition (NER) is a challenging problem in Nat- resources as gazetteers are widely used in NER, such approaches ural Language Processing (NLP). Deep Learning techniques have are costly to develop, especially for new languages and new do- been extensively applied in NER tasks because they require little mains, making NER a challenge to adapt to these scenarios[10]. feature engineering and are free from language-specific resources, Deep Learning models have been extensively used in NER learning important features from word or character embeddings tasks [3, 5, 9] because they require little feature engineering and trained on large amounts of data. However, these techniques they may learn important features from word or character embed- are data-hungry and require a massive amount of training data.
    [Show full text]
  • Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog
    Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog Yao Qian, Yu Shi and Michael Zeng Microsoft Speech and Dialogue Research Group, Redmond, WA, USA fyaoqian,yushi,[email protected] Abstract universal representation for the common-sense knowledge and lexical semantics hiding in the data. Adapting pre-trained trans- Spoken language understanding (SLU) tries to decode an in- former to ASR lattice was proposed to SDS in [7], where lattice put speech utterance such that effective semantic actions can reachability masks and lattice positional encoding were utilized be taken to continue meaningful and interactive spoken dialog to enable lattice inputs during fine-tuning. In [8], ASR-robust (SD). The performance of SLU, however, can be adversely af- contextualized embeddings were learned by using word acous- fected by automatic speech recognition (ASR) errors. In this tic confusion. The studies on ATIS corpus [9] have shown that paper, we exploit transfer learning in a Generative pre-trained both approaches could improve the robustness of NLU to ASR Transformer (GPT) to jointly optimize ASR error correction errors. and semantic labeling in terms of dialog act and slot-value for a given user’s spoken response in the context of SD system Meanwhile, many researchers tried to skip ASR entirely or (SDS). With the encoded ASR output and dialog history as use only partial information, e.g., phone sequence instead of context, a conditional generative model is trained to generate word sequence, extracted from its modules for SLU [10–12]. transcripts correction, dialog act, and slot-values successively. In [10], it has studied techniques for building call routers from The proposed generation model is jointly optimized as a clas- scratch without any knowledge of the application vocabulary sification task, which utilizes the ground-truth and N-best hy- or grammar.
    [Show full text]