Improved Word Representation Learning with Sememes
Total Page:16
File Type:pdf, Size:1020Kb
Improved Word Representation Learning with Sememes Yilin Niu1∗, Ruobing Xie1,∗ Zhiyuan Liu1;2 ,y Maosong Sun1;2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China 2 Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, Xuzhou 221009 China Abstract for each word. Hence, people manually annotate word sememes and build linguistic common-sense Sememes are minimum semantic units of knowledge bases. word meanings, and the meaning of each HowNet (Dong and Dong, 2003) is one of such word sense is typically composed by sev- knowledge bases, which annotates each concep- eral sememes. Since sememes are not ex- t in Chinese with one or more relevant sememes. plicit for each word, people manually an- Different from WordNet (Miller, 1995), the phi- notate word sememes and form linguistic losophy of HowNet emphasizes the significance of common-sense knowledge bases. In this part and attribute represented by sememes. paper, we present that, word sememe in- HowNet has been widely utilized in word similar- formation can improve word representa- ity computation (Liu and Li, 2002) and sentiment tion learning (WRL), which maps word- analysis (Xianghua et al., 2013), and in section 3.2 s into a low-dimensional semantic space we will give a detailed introduction to sememes, and serves as a fundamental step for many senses and words in HowNet. NLP tasks. The key idea is to utilize In this paper, we aim to incorporate word se- word sememes to capture exact meanings memes into word representation learning (WRL) of a word within specific contexts accu- and learn improved word embeddings in a low- rately. More specifically, we follow the dimensional semantic space. WRL is a fundamen- framework of Skip-gram and present three tal and critical step in many NLP tasks such as lan- sememe-encoded models to learn repre- guage modeling (Bengio et al., 2003) and neural sentations of sememes, senses and word- machine translation (Sutskever et al., 2014). s, where we apply the attention scheme There have been a lot of researches for learn- to detect word senses in various contexts. ing word representations, among which word2vec We conduct experiments on two tasks in- (Mikolov et al., 2013) achieves a nice balance be- cluding word similarity and word analogy, tween effectiveness and efficiency. In word2vec, and our models significantly outperform each word corresponds to one single embedding, baselines. The results indicate that WRL ignoring the polysemy of most words. To address can benefit from sememes via the attention this issue, (Huang et al., 2012) introduces a multi- scheme, and also confirm our models be- prototype model for WRL, conducting unsuper- ing capable of correctly modeling sememe vised word sense induction and embeddings ac- information. cording to context clusters. (Chen et al., 2014) fur- 1 Introduction ther utilizes the synset information in WordNet to instruct word sense representation learning. Sememes are defined as minimum semantic u- From these previous studies, we conclude that nits of word meanings, and there exists a lim- word sense disambiguation are critical for WR- ited close set of sememes to compose the se- L, and we believe that the sememe annotation mantic meanings of an open set of concepts (i.e. of word senses in HowNet can provide neces- word sense). However, sememes are not explicit sary semantic regularization for the both tasks. ∗ indicates equal contribution To explore its feasibility, we propose a novel y Corresponding author: Z. Liu ([email protected]) Sememe-Encoded Word Representation Learning (SE-WRL) model, which detects word senses et al., 2015), parsing (Chen and Manning, 2014) and learns representations simultaneously. More and text classification (Zhang et al., 2015). Word specifically, this framework regards each word distributed representations are capable of encod- sense as a combination of its sememes, and iter- ing semantic meanings in vector space, serving as atively performs word sense disambiguation ac- the fundamental and essential inputs of many NLP cording to their contexts and learn representation- tasks. s of sememes, senses and words by extending There are large amounts of efforts devoted to Skip-gram in word2vec (Mikolov et al., 2013). In learning better word representations. As the ex- this framework, an attention-based method is pro- ponential growth of text corpora, model efficien- posed to select appropriate word senses according cy becomes an important issue. (Mikolov et al., to contexts automatically. To take full advantages 2013) proposes two models, CBOW and Skip- of sememes, we propose three different learning gram, achieving a good balance between effective- and attention strategies for SE-WRL. ness and efficiency. These models assume that the In experiments, we evaluate our framework on meanings of words can be well reflected by their two tasks including word similarity and word anal- contexts, and learn word representations by maxi- ogy, and further conduct case studies on sememe, mizing the predictive probabilities between words sense and word representations. The evaluation and their contexts. (Pennington et al., 2014) fur- results show that our models outperform other ther utilizes matrix factorization on word affinity baselines significantly, especially on word analo- matrix to learn word representations. However, gy. This indicates that our models can build bet- these models merely arrange only one vector for ter knowledge representations with the help of se- each word, regardless of the fact that many word- meme information, and also implies the potential s have multiple senses. (Huang et al., 2012; Tian of our models on word sense disambiguation. et al., 2014) utilize multi-prototype vector model- The key contributions of this work are conclud- s to learn word representations and build distinct ed as follows: (1) To the best of our knowledge, vectors for each word sense. (Neelakantan et al., this is the first work to utilize sememes in HowNet 2015) presents an extension to Skip-gram model to improve word representation learning. (2) We for learning non-parametric multiple embeddings successfully apply the attention scheme to detect per word. (Rothe and Schutze¨ , 2015) also utilizes word senses and learn representations according to an Autoencoder to jointly learn word, sense and contexts with the favor of the sememe annotation synset representations in the same semantic space. in HowNet. (3) We conduct extensive experiments This paper, for the first time, jointly learns rep- and verify the effectiveness of incorporating word resentations of sememes, senses and words. The sememes for improved WRL. sememe annotation in HowNet provides useful se- mantic regularization for WRL. Moreover, the u- 2 Related Work nified representations incorporated with sememes 2.1 Word Representation also provide us more explicit explanations of both word and sense embeddings. Recent years have witnessed the great thrive in word representation learning. It is simple and s- 2.2 Word Sense Disambiguation and traightforward to represent words using one-hot Representation Learning representations, but it usually struggles with the data sparsity issue and the neglect of semantic re- Word sense disambiguation (WSD) aims to iden- lations between words. tify word senses or meanings in a certain context To address these issues, (Rumelhart et al., computationally. There are mainly two approach- 1988) proposes the idea of distributed represen- es for WSD, namely the supervised methods and tation which projects all words into a continuous the knowledge-based methods. Supervised meth- low-dimensional semantic space, considering each ods usually take the surrounding words or senses word as a vector. Distributed word representation- as features and use classifiers like SVM for word s are powerful and have been widely utilized in sense disambiguation (Lee et al., 2004), which are many NLP tasks, including neural language mod- intensively limited to the time-consuming human els (Bengio et al., 2003; Mikolov et al., 2010), ma- annotation of training data. chine translation (Sutskever et al., 2014; Bahdanau On contrary, knowledge-based methods utilize large external knowledge resources such as knowl- word “apple”. The word “apple” actually has two edge bases or dictionaries to suggest possible sens- main senses shown on the second layer: one is a es for a word. (Banerjee and Pedersen, 2002) ex- sort of juicy fruit (apple), and another is a famous ploits the rich hierarchy of semantic relations in computer brand (Apple brand). The third and fol- WordNet (Miller, 1995) for an adapted dictionary- lowing layers are those sememes explaining each based WSD algorithm. (Bordes et al., 2011) intro- sense. For instance, the first sense Apple brand in- duces synset information in WordNet to WR- dicates a computer brand, and thus has sememes L. (Chen et al., 2014) considers synsets in Word- computer, bring and SpeBrand. Net as different word senses, and jointly conducts From Fig.1 we can find that, sememes of word sense disambiguation and word / sense rep- many senses in HowNet are annotated with vari- resentation learning. (Guo et al., 2014) considers ous relations, such as define and modifier, and for- bilingual datasets to learn sense-specific word rep- m complicated hierarchical structures. In this pa- resentations. Moreover, (Jauhar et al., 2015) pro- per, for simplicity, we only consider all annotat- poses two approaches to learn sense-specific word ed sememes of each sense as a sememe set with- representations that are grounded to ontologies. out considering their internal structure. HowNet (Pilehvar and Collier, 2016) utilizes personalized assumes the limited annotated sememes can well PageRank to learn de-conflated semantic represen- represent senses and words in the real-world sce- tations of words. nario, and thus sememes are expected to be useful In this paper, we follow the knowledge-based for both WSD and WRL. approach and automatically detect word senses ac- cording to the contexts with the favor of sememe 苹果 information in HowNet.