Semantic Structure
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence Qian Liu,1,2 Heyan Huang,1,3∗ Guangquan Zhang,2 Yang Gao,1,4 Junyu Xuan,2 Jie Lu2 1. Department of Computer Science, Beijing Institute of Technology, China 2. DeSI Lab, Centre for Artificial Intelligence, University of Technology Sydney, Australia 3. Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, China 4. Beijing Advanced Innovation Center for Imaging Technology, Captial Normal University, China Q`CR $%] `1J:C 1 .V IQ 1: H.VR $:IV 1J Abstract $:IVZV_%1]IVJ CV:RV` `QQ G:CC9 9 &' 1 .V IQ 1: H.VR $:IV 1J G:@V G:CC9 9 Representing the semantics of words is a fundamental task in G:CC H:] :1J `:1JV` .V J: 1QJ:C G:@V G:CC :QH1: 1QJ IQ text processing. Several research studies have shown that text 0:C%:GCV ]C:7V` ^_ 1 :J :JJ%:C88 and knowledge bases (KBs) are complementary sources for .V C1 V QQ G:CC H:RVI7 :`` 1 I%H. :J1I:C HQ:H. word embedding learning. Most existing methods only con- IQ`V .:J =% HQ:H.V :JR `:1JV`8 `:1JV` sider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we Figure 1: We mark two types of words in the sentences with propose a semantic structure-based word embedding method, blue color and red color, respectively. The underlined words and introduce concept convergence and word divergence to are their context in the corpus. The graphs in the right are reveal semantic structures in the word embedding learning their semantic structures generated from WordNet. process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method counting-based methods (Pennington, Socher, and Manning outperforms state-of-the-art methods, including the methods 2014) which learn word representations through global ma- trained solely on the corpus, and others trained on the corpus trix factorization based on a count of co-occurring words. and the KBs. These corpus-based methods mainly consider a word’s co- occurrence information and, therefore, generally learn simi- lar embeddings for words with similar contexts. Introduction In the past few years, some efforts have focused on learn- Understanding and representing the sense of text is a fun- ing word representation beyond the corpus, and considered damental task in both information retrieval (IR) and natural external knowledge bases constructed by human experts, language processing (NLP). Previous research has expended such as semantic lexicons and concept graphs (Ponzetto great effort on constructing distributed representations of and Navigli 2010; Bollegala, Maehara, and Kawarabayashi words (also known as word embedding) as the atomic com- 2015; Liu et al. 2015a; Bollegala et al. 2016; Goikoetxea, ponents of text by embedding the semantic and syntactic Agirre, and Soroa 2016). Most previously proposed meth- properties of the surface text into low-dimensional dense ods simply use relations within word-pairs, e.g., constrain- vectors. Trained word embeddings have achieved over- ing words belonging to one semantic category (Yu and whelming success in various real-world applications, e.g., Dredze 2014), or constructing a regularizer to model words document retrieval (Bengio, Courville, and Vincent 2013; in particular semantic relations (Bollegala et al. 2016). As Passalis and Tefas 2016; Roy et al. 2016), text classification such, this work did not fully explore the comprehensive (Lampos, Zou, and Cox 2017), question answering (Shen et structures in the KBs. al. 2017), and sentiment classification (Bollegala, Mu, and In this paper, we argue that effective word embeddings Goulermas 2016). should contain the semantic structures within the knowledge Most of the research directs attention entirely towards base. We illustrate how the semantic structures can be a learning word representation methods from a large un- complementary source for word embeddings in Fig.1. As labeled corpus, such as prediction-based methods (Col- shown in the sentences, football, basketball, trainer, coach lobert et al. 2011; Mikolov et al. 2013b; 2013a; Cao usually share similar context, and tend to have similar repre- and Lu 2017) which learn word representation by predict- sentations in the corpus-based methods. While the semantic ing the co-occurrence of words in the given context, and structures in the right side clearly define these words with ∗Corresponding author different semantic granularities and abstractions, i.e., these Copyright c 2018, Association for the Advancement of Artificial four words are located in two different subgraphs, showing Intelligence (www.aaai.org). All rights reserved. that they belong to different concepts; football and basket- 5261 ball are not directly linked in the subgraph, showing that al. 2015) refines word representations using relational in- they hold different attributes. On the other hand, compared formation from semantic lexicons. The method encourages with relations in word-pairs, comprehensively modeling a linked words to have similar vector representations which word’s structural features with its directly linked and indi- are then embedded in a semantic network that consists of rectly linked words in the KBs could be more stable and linked word senses in a continuous-vector word space. Jo- reliable(Xuan et al. 2016; 2017). hansson et al.(2015) presented a method to embed a seman- To this end, we propose a semantic structure-based word tic network into a pre-trained word embedding, considering embedding method called SENSE. Moreover, we introduce that vectors for polysemous words can be decomposed into concept convergence and word divergence to implement se- a convex combination of sense vectors and the vector for a mantic structure modeling in the word embedding learn- sense is kept similar to those of its neighbors in the network. ing process. The basic idea can be intuitively explained Goikoetxea et al.(2016) learned word representations from as football and basketball are related to ball (concept con- text and WordNet independently, and then explored both vergence), but they also hold different attributes since they simple and sophisticated methods to combine them, showing are indirectly linked in the graph (word divergence). We that a simple concatenation of independently learned embed- evaluate our word embedding method using extensive in- dings outperforms more complex combination techniques in trinsic and extrinsic evaluations. The experimental results word similarity and relatedness datasets. show that modeling semantic structures in the knowledge In contrast to the combined methods, several studies have base by incorporating concept convergence and word diver- jointly leveraged semantic lexicons and corpus-based meth- gence makes embeddings significantly more powerful, and ods. The RCM method (Yu and Dredze 2014) is a rela- results in consistent performance improvement across real- tion constrained model which introduces a training objec- world applications. tive that incorporates both a neural language model objective This paper departs from previous work in that it explores and a semantic knowledge objective. In the RCM method, global structural information of words in the usage of knowl- the knowledge base functions as word similarity informa- edge base, not the local relations that exist between two tion to improve the performance of word embedding. Xu et words. The main contributions can be summarized as fol- al.(2014) leveraged both relational and categorical knowl- lows: edge to produce word representation (RC-NET), combin- • We design a novel approach for learning word embed- ing this with the Skip-gram method. Liu et al.(2015a) rep- ding that considers relatively stable and reliable semantic resents semantic knowledge as a number of ordinal simi- structures within the KBs. larity inequalities of related word pairs to learn semantic • word embedding (SWE). Bollegala et al.(2015) proposed a We design the principle of preserving semantic structures method that considers semantic relations in which they co- by converging words to their concept on the upper level occur to learn word representations. And they (Bollegala et and diverging words on the same sense level. We show al. 2016) also proposed a joint word representation learn- that this principle is effective and easy to implement into ing method that simultaneously predicts the co-occurrences the word embedding training process. of two words in a sentence, subject to the relational con- • To validate this method, we conducted extensive exper- straints given by a semantic lexicon . Although these studies iments on semantic property testing, document retrieval, consider the semantic information from an external knowl- and text classification. The experiment results show that edge base in the learning process, they do not leverage high- the proposed method significantly outperforms the state- quality semantic structures to improve word embeddings. of-the-art methods. Our work in this paper can be categorized as a joint learn- ing method that incorporates both co-occurrence informa- Related Work tion and semantic structures.