Semantic Structure

Total Page:16

File Type:pdf, Size:1020Kb

Semantic Structure The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence Qian Liu,1,2 Heyan Huang,1,3∗ Guangquan Zhang,2 Yang Gao,1,4 Junyu Xuan,2 Jie Lu2 1. Department of Computer Science, Beijing Institute of Technology, China 2. DeSI Lab, Centre for Artificial Intelligence, University of Technology Sydney, Australia 3. Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, China 4. Beijing Advanced Innovation Center for Imaging Technology, Captial Normal University, China Q`CR $%] `1J:C 1 .V IQ 1: H.VR $:IV 1J Abstract $:IVZV_%1]IVJ CV:RV` `QQ G:CC9 9 &' 1 .V IQ 1: H.VR $:IV 1J G:@V G:CC9 9 Representing the semantics of words is a fundamental task in G:CC H:] :1J `:1JV` .V J: 1QJ:C G:@V G:CC :QH1: 1QJ IQ text processing. Several research studies have shown that text 0:C%:GCV ]C:7V` ^_ 1 :J :JJ%:C88 and knowledge bases (KBs) are complementary sources for .V C1 V QQ G:CC H:RVI7 :`` 1 I%H. :J1I:C HQ:H. word embedding learning. Most existing methods only con- IQ`V .:J =% HQ:H.V :JR `:1JV`8 `:1JV` sider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we Figure 1: We mark two types of words in the sentences with propose a semantic structure-based word embedding method, blue color and red color, respectively. The underlined words and introduce concept convergence and word divergence to are their context in the corpus. The graphs in the right are reveal semantic structures in the word embedding learning their semantic structures generated from WordNet. process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method counting-based methods (Pennington, Socher, and Manning outperforms state-of-the-art methods, including the methods 2014) which learn word representations through global ma- trained solely on the corpus, and others trained on the corpus trix factorization based on a count of co-occurring words. and the KBs. These corpus-based methods mainly consider a word’s co- occurrence information and, therefore, generally learn simi- lar embeddings for words with similar contexts. Introduction In the past few years, some efforts have focused on learn- Understanding and representing the sense of text is a fun- ing word representation beyond the corpus, and considered damental task in both information retrieval (IR) and natural external knowledge bases constructed by human experts, language processing (NLP). Previous research has expended such as semantic lexicons and concept graphs (Ponzetto great effort on constructing distributed representations of and Navigli 2010; Bollegala, Maehara, and Kawarabayashi words (also known as word embedding) as the atomic com- 2015; Liu et al. 2015a; Bollegala et al. 2016; Goikoetxea, ponents of text by embedding the semantic and syntactic Agirre, and Soroa 2016). Most previously proposed meth- properties of the surface text into low-dimensional dense ods simply use relations within word-pairs, e.g., constrain- vectors. Trained word embeddings have achieved over- ing words belonging to one semantic category (Yu and whelming success in various real-world applications, e.g., Dredze 2014), or constructing a regularizer to model words document retrieval (Bengio, Courville, and Vincent 2013; in particular semantic relations (Bollegala et al. 2016). As Passalis and Tefas 2016; Roy et al. 2016), text classification such, this work did not fully explore the comprehensive (Lampos, Zou, and Cox 2017), question answering (Shen et structures in the KBs. al. 2017), and sentiment classification (Bollegala, Mu, and In this paper, we argue that effective word embeddings Goulermas 2016). should contain the semantic structures within the knowledge Most of the research directs attention entirely towards base. We illustrate how the semantic structures can be a learning word representation methods from a large un- complementary source for word embeddings in Fig.1. As labeled corpus, such as prediction-based methods (Col- shown in the sentences, football, basketball, trainer, coach lobert et al. 2011; Mikolov et al. 2013b; 2013a; Cao usually share similar context, and tend to have similar repre- and Lu 2017) which learn word representation by predict- sentations in the corpus-based methods. While the semantic ing the co-occurrence of words in the given context, and structures in the right side clearly define these words with ∗Corresponding author different semantic granularities and abstractions, i.e., these Copyright c 2018, Association for the Advancement of Artificial four words are located in two different subgraphs, showing Intelligence (www.aaai.org). All rights reserved. that they belong to different concepts; football and basket- 5261 ball are not directly linked in the subgraph, showing that al. 2015) refines word representations using relational in- they hold different attributes. On the other hand, compared formation from semantic lexicons. The method encourages with relations in word-pairs, comprehensively modeling a linked words to have similar vector representations which word’s structural features with its directly linked and indi- are then embedded in a semantic network that consists of rectly linked words in the KBs could be more stable and linked word senses in a continuous-vector word space. Jo- reliable(Xuan et al. 2016; 2017). hansson et al.(2015) presented a method to embed a seman- To this end, we propose a semantic structure-based word tic network into a pre-trained word embedding, considering embedding method called SENSE. Moreover, we introduce that vectors for polysemous words can be decomposed into concept convergence and word divergence to implement se- a convex combination of sense vectors and the vector for a mantic structure modeling in the word embedding learn- sense is kept similar to those of its neighbors in the network. ing process. The basic idea can be intuitively explained Goikoetxea et al.(2016) learned word representations from as football and basketball are related to ball (concept con- text and WordNet independently, and then explored both vergence), but they also hold different attributes since they simple and sophisticated methods to combine them, showing are indirectly linked in the graph (word divergence). We that a simple concatenation of independently learned embed- evaluate our word embedding method using extensive in- dings outperforms more complex combination techniques in trinsic and extrinsic evaluations. The experimental results word similarity and relatedness datasets. show that modeling semantic structures in the knowledge In contrast to the combined methods, several studies have base by incorporating concept convergence and word diver- jointly leveraged semantic lexicons and corpus-based meth- gence makes embeddings significantly more powerful, and ods. The RCM method (Yu and Dredze 2014) is a rela- results in consistent performance improvement across real- tion constrained model which introduces a training objec- world applications. tive that incorporates both a neural language model objective This paper departs from previous work in that it explores and a semantic knowledge objective. In the RCM method, global structural information of words in the usage of knowl- the knowledge base functions as word similarity informa- edge base, not the local relations that exist between two tion to improve the performance of word embedding. Xu et words. The main contributions can be summarized as fol- al.(2014) leveraged both relational and categorical knowl- lows: edge to produce word representation (RC-NET), combin- • We design a novel approach for learning word embed- ing this with the Skip-gram method. Liu et al.(2015a) rep- ding that considers relatively stable and reliable semantic resents semantic knowledge as a number of ordinal simi- structures within the KBs. larity inequalities of related word pairs to learn semantic • word embedding (SWE). Bollegala et al.(2015) proposed a We design the principle of preserving semantic structures method that considers semantic relations in which they co- by converging words to their concept on the upper level occur to learn word representations. And they (Bollegala et and diverging words on the same sense level. We show al. 2016) also proposed a joint word representation learn- that this principle is effective and easy to implement into ing method that simultaneously predicts the co-occurrences the word embedding training process. of two words in a sentence, subject to the relational con- • To validate this method, we conducted extensive exper- straints given by a semantic lexicon . Although these studies iments on semantic property testing, document retrieval, consider the semantic information from an external knowl- and text classification. The experiment results show that edge base in the learning process, they do not leverage high- the proposed method significantly outperforms the state- quality semantic structures to improve word embeddings. of-the-art methods. Our work in this paper can be categorized as a joint learn- ing method that incorporates both co-occurrence informa- Related Work tion and semantic structures.
Recommended publications
  • Arxiv:2002.06235V1
    Semantic Relatedness and Taxonomic Word Embeddings Magdalena Kacmajor John D. Kelleher Innovation Exchange ADAPT Research Centre IBM Ireland Technological University Dublin [email protected] [email protected] Filip Klubickaˇ Alfredo Maldonado ADAPT Research Centre Underwriters Laboratories Technological University Dublin [email protected] [email protected] Abstract This paper1 connects a series of papers dealing with taxonomic word embeddings. It begins by noting that there are different types of semantic relatedness and that different lexical representa- tions encode different forms of relatedness. A particularly important distinction within semantic relatedness is that of thematic versus taxonomic relatedness. Next, we present a number of ex- periments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy. These experiments demonstrate how the properties of the synthetic corpus, such as the percentage of rare words, are affected by the shape of the knowledge graph the corpus is generated from. Finally, we explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined. 1 Introduction Deep learning has revolutionised natural language processing over the last decade. A key enabler of deep learning for natural language processing has been the development of word embeddings. One reason for this is that deep learning intrinsically involves the use of neural network models and these models only work with numeric inputs. Consequently, applying deep learning to natural language processing first requires developing a numeric representation for words. Word embeddings provide a way of creating numeric representations that have proven to have a number of advantages over traditional numeric rep- resentations of language.
    [Show full text]
  • Combining Word Embeddings with Semantic Resources
    AutoExtend: Combining Word Embeddings with Semantic Resources Sascha Rothe∗ LMU Munich Hinrich Schutze¨ ∗ LMU Munich We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guar- antees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks. 1. Introduction Unsupervised methods for learning word embeddings are widely used in natural lan- guage processing (NLP). The only data these methods need as input are very large corpora. However, in addition to corpora, there are many other resources that are undoubtedly useful in NLP, including lexical resources like WordNet and Wiktionary and knowledge bases like Wikipedia and Freebase. We will simply refer to these as resources. In this article, we present AutoExtend, a method for enriching these valuable resources with embeddings for non-word objects they describe; for example, Auto- Extend enriches WordNet with embeddings for synsets. The word embeddings and the new non-word embeddings live in the same vector space. Many NLP applications benefit if non-word objects described by resources—such as synsets in WordNet—are also available as embeddings.
    [Show full text]
  • Probabilistic Topic Modelling with Semantic Graph
    Probabilistic Topic Modelling with Semantic Graph B Long Chen( ), Joemon M. Jose, Haitao Yu, Fajie Yuan, and Huaizhi Zhang School of Computing Science, University of Glasgow, Sir Alwyns Building, Glasgow, UK [email protected] Abstract. In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowl- edge, respectively. Given the semantic graph representation of the docu- ments, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP). Keywords: Topic model · Semantic graph · DBpedia 1 Introduction Topic models, such as Probabilistic Latent Semantic Analysis (PLSA) [7]and Latent Dirichlet Analysis (LDA) [2], have been remarkably successful in ana- lyzing textual content. Specifically, each document in a document collection is represented as random mixtures over latent topics, where each topic is character- ized by a distribution over words. Such a paradigm is widely applied in various areas of text mining. In view of the fact that the information used by these mod- els are limited to document collection itself, some recent progress have been made on incorporating external resources, such as time [8], geographic location [12], and authorship [15], into topic models.
    [Show full text]
  • Semantic Memory: a Review of Methods, Models, and Current Challenges
    Psychonomic Bulletin & Review https://doi.org/10.3758/s13423-020-01792-x Semantic memory: A review of methods, models, and current challenges Abhilasha A. Kumar1 # The Psychonomic Society, Inc. 2020 Abstract Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of seman- tic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distribu- tional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval- based, and topic models) are revisiting the traditional “static” conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges
    [Show full text]
  • Word-Like Character N-Gram Embedding
    Word-like character n-gram embedding Geewook Kim and Kazuki Fukui and Hidetoshi Shimodaira Department of Systems Science, Graduate School of Informatics, Kyoto University Mathematical Statistics Team, RIKEN Center for Advanced Intelligence Project fgeewook, [email protected], [email protected] Abstract Table 1: Top-10 2-grams in Sina Weibo and 4-grams in We propose a new word embedding method Japanese Twitter (Experiment 1). Words are indicated called word-like character n-gram embed- by boldface and space characters are marked by . ding, which learns distributed representations FNE WNE (Proposed) Chinese Japanese Chinese Japanese of words by embedding word-like character n- 1 ][ wwww 自己 フォロー grams. Our method is an extension of recently 2 。␣ !!!! 。␣ ありがと proposed segmentation-free word embedding, 3 !␣ ありがと ][ wwww 4 .. りがとう 一个 !!!! which directly embeds frequent character n- 5 ]␣ ございま 微博 めっちゃ grams from a raw corpus. However, its n-gram 6 。。 うござい 什么 んだけど vocabulary tends to contain too many non- 7 ,我 とうござ 可以 うござい 8 !! ざいます 没有 line word n-grams. We solved this problem by in- 9 ␣我 がとうご 吗? 2018 troducing an idea of expected word frequency. 10 了, ください 哈哈 じゃない Compared to the previously proposed meth- ods, our method can embed more words, along tion tools are used to determine word boundaries with the words that are not included in a given in the raw corpus. However, these segmenters re- basic word dictionary. Since our method does quire rich dictionaries for accurate segmentation, not rely on word segmentation with rich word which are expensive to prepare and not always dictionaries, it is especially effective when the available.
    [Show full text]
  • Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
    January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications.
    [Show full text]
  • Learned in Speech Recognition: Contextual Acoustic Word Embeddings
    LEARNED IN SPEECH RECOGNITION: CONTEXTUAL ACOUSTIC WORD EMBEDDINGS Shruti Palaskar∗, Vikas Raunak∗ and Florian Metze Carnegie Mellon University, Pittsburgh, PA, U.S.A. fspalaska j vraunak j fmetze [email protected] ABSTRACT model [10, 11, 12] trained for direct Acoustic-to-Word (A2W) speech recognition [13]. Using this model, we jointly learn to End-to-end acoustic-to-word speech recognition models have re- automatically segment and classify input speech into individual cently gained popularity because they are easy to train, scale well to words, hence getting rid of the problem of chunking or requiring large amounts of training data, and do not require a lexicon. In addi- pre-defined word boundaries. As our A2W model is trained at the tion, word models may also be easier to integrate with downstream utterance level, we show that we can not only learn acoustic word tasks such as spoken language understanding, because inference embeddings, but also learn them in the proper context of their con- (search) is much simplified compared to phoneme, character or any taining sentence. We also evaluate our contextual acoustic word other sort of sub-word units. In this paper, we describe methods embeddings on a spoken language understanding task, demonstrat- to construct contextual acoustic word embeddings directly from a ing that they can be useful in non-transcription downstream tasks. supervised sequence-to-sequence acoustic-to-word speech recog- Our main contributions in this paper are the following: nition model using the learned attention distribution. On a suite 1. We demonstrate the usability of attention not only for aligning of 16 standard sentence evaluation tasks, our embeddings show words to acoustic frames without any forced alignment but also for competitive performance against a word2vec model trained on the constructing Contextual Acoustic Word Embeddings (CAWE).
    [Show full text]
  • Large Semantic Network Manual Annotation 1 Introduction
    Large Semantic Network Manual Annotation V´aclav Nov´ak Institute of Formal and Applied Linguistics Charles University, Prague [email protected] Abstract This abstract describes a project aiming at manual annotation of the content of natural language utterances in a parallel text corpora. The formalism used in this project is MultiNet – Multilayered Ex- tended Semantic Network. The annotation should be incorporated into Prague Dependency Treebank as a new annotation layer. 1 Introduction A formal specification of the semantic content is the aim of numerous semantic approaches such as TIL [6], DRT [9], MultiNet [4], and others. As far as we can tell, there is no large “real life” text corpora manually annotated with such markup. The projects usually work only with automatically generated annotation, if any [1, 6, 3, 2]. We want to create a parallel Czech-English corpora of texts annotated with the corresponding semantic network. 1.1 Prague Dependency Treebank From the linguistic viewpoint there language resources such as Prague Dependency Treebank (PDT) which contain a deep manual analysis of texts [8]. PDT contains annotations of three layers, namely morpho- logical, analytical (shallow dependency syntax) and tectogrammatical (deep dependency syntax). The units of each annotation level are linked with corresponding units on the preceding level. The morpho- logical units are linked directly with the original text. The theoretical basis of the treebank lies in the Functional Gener- ative Description of language system [7]. PDT 2.0 is based on the long-standing Praguian linguistic tradi- tion, adapted for the current computational-linguistics research needs. The corpus itself is embedded into the latest annotation technology.
    [Show full text]
  • Knowledge-Powered Deep Learning for Word Embedding
    Knowledge-Powered Deep Learning for Word Embedding Jiang Bian, Bin Gao, and Tie-Yan Liu Microsoft Research {jibian,bingao,tyliu}@microsoft.com Abstract. The basis of applying deep learning to solve natural language process- ing tasks is to obtain high-quality distributed representations of words, i.e., word embeddings, from large amounts of text data. However, text itself usually con- tains incomplete and ambiguous information, which makes necessity to leverage extra knowledge to understand it. Fortunately, text itself already contains well- defined morphological and syntactic knowledge; moreover, the large amount of texts on the Web enable the extraction of plenty of semantic knowledge. There- fore, it makes sense to design novel deep learning algorithms and systems in order to leverage the above knowledge to compute more effective word embed- dings. In this paper, we conduct an empirical study on the capacity of leveraging morphological, syntactic, and semantic knowledge to achieve high-quality word embeddings. Our study explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning, respectively. Experiments on an analogical reason- ing task, a word similarity task, and a word completion task have all demonstrated that knowledge-powered deep learning can enhance the effectiveness of word em- bedding. 1 Introduction With rapid development of deep learning techniques in recent years, it has drawn in- creasing attention to train complex and deep models on large amounts of data, in order to solve a wide range of text mining and natural language processing (NLP) tasks [4, 1, 8, 13, 19, 20].
    [Show full text]
  • Universal Or Variation? Semantic Networks in English and Chinese
    Universal or variation? Semantic networks in English and Chinese Understanding the structures of semantic networks can provide great insights into lexico- semantic knowledge representation. Previous work reveals small-world structure in English, the structure that has the following properties: short average path lengths between words and strong local clustering, with a scale-free distribution in which most nodes have few connections while a small number of nodes have many connections1. However, it is not clear whether such semantic network properties hold across human languages. In this study, we investigate the universal structures and cross-linguistic variations by comparing the semantic networks in English and Chinese. Network description To construct the Chinese and the English semantic networks, we used Chinese Open Wordnet2,3 and English WordNet4. The two wordnets have different word forms in Chinese and English but common word meanings. Word meanings are connected not only to word forms, but also to other word meanings if they form relations such as hypernyms and meronyms (Figure 1). 1. Cross-linguistic comparisons Analysis The large-scale structures of the Chinese and the English networks were measured with two key network metrics, small-worldness5 and scale-free distribution6. Results The two networks have similar size and both exhibit small-worldness (Table 1). However, the small-worldness is much greater in the Chinese network (σ = 213.35) than in the English network (σ = 83.15); this difference is primarily due to the higher average clustering coefficient (ACC) of the Chinese network. The scale-free distributions are similar across the two networks, as indicated by ANCOVA, F (1, 48) = 0.84, p = .37.
    [Show full text]
  • Local Homology of Word Embeddings
    Local Homology of Word Embeddings Tadas Temcinasˇ 1 Abstract Intuitively, stratification is a decomposition of a topologi- Topological data analysis (TDA) has been widely cal space into manifold-like pieces. When thinking about used to make progress on a number of problems. stratification learning and word embeddings, it seems intu- However, it seems that TDA application in natural itive that vectors of words corresponding to the same broad language processing (NLP) is at its infancy. In topic would constitute a structure, which we might hope this paper we try to bridge the gap by arguing why to be a manifold. Hence, for example, by looking at the TDA tools are a natural choice when it comes to intersections between those manifolds or singularities on analysing word embedding data. We describe the manifolds (both of which can be recovered using local a parallelisable unsupervised learning algorithm homology-based algorithms (Nanda, 2017)) one might hope based on local homology of datapoints and show to find vectors of homonyms like ‘bank’ (which can mean some experimental results on word embedding either a river bank, or a financial institution). This, in turn, data. We see that local homology of datapoints in has potential to help solve the word sense disambiguation word embedding data contains some information (WSD) problem in NLP, which is pinning down a particular that can potentially be used to solve the word meaning of a word used in a sentence when the word has sense disambiguation problem. multiple meanings. In this work we present a clustering algorithm based on lo- cal homology, which is more relaxed1 that the stratification- 1.
    [Show full text]
  • Word Embedding, Sense Embedding and Their Application to Word Sense Induction
    Word Embeddings, Sense Embeddings and their Application to Word Sense Induction Linfeng Song The University of Rochester Computer Science Department Rochester, NY 14627 Area Paper April 2016 Abstract This paper investigates the cutting-edge techniques for word embedding, sense embedding, and our evaluation results on large-scale datasets. Word embedding refers to a kind of methods that learn a distributed dense vector for each word in a vocabulary. Traditional word embedding methods first obtain the co-occurrence matrix then perform dimension reduction with PCA. Recent methods use neural language models that directly learn word vectors by predicting the context words of the target word. Moving one step forward, sense embedding learns a distributed vector for each sense of a word. They either define a sense as a cluster of contexts where the target word appears or define a sense based on a sense inventory. To evaluate the performance of the state-of-the-art sense embedding methods, I first compare them on the dominant word similarity datasets, then compare them on my experimental settings. In addition, I show that sense embedding is applicable to the task of word sense induction (WSI). Actually we are the first to show that sense embedding methods are competitive on WSI by building sense-embedding-based systems that demonstrate highly competitive performances on the SemEval 2010 WSI shared task. Finally, I propose several possible future research directions on word embedding and sense embedding. The University of Rochester Computer Science Department supported this work. Contents 1 Introduction 3 2 Word Embedding 5 2.1 Skip-gramModel................................
    [Show full text]