Word2set: Wordnet-Based Word Representation Rivaling Neural

©ISTOCKPHOTO.COM/WOSEPHJEBER word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis Abstract —Measuring lexical similarity using WordNet has a Sergio Jimenez long tradition. In the last decade, it has been challenged by dis- Instituto Caro y Cuervo, Bogotá D.C., COLOMBIA tributional methods, and more recently by neural word embed- Fabio A. González ding. In recent years, several larger lexical similarity MindLab Research Group, Universidad Nacional de Colombia, benchmarks have been introduced, on which word embedding Bogotá D.C., COLOMBIA has achieved state-of-the-art results. The success of such meth- Alexander Gelbukh ods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinali- CIC, Instituto Politécnico Nacional, Mexico City, MEXICO ty-based method for measuring lexical similarity, which George Dueñas exploits the WordNet graph, obtaining a word representation, CIC, Instituto Politécnico Nacional, Mexico City, MEXICO which we called word2set, based on related neighboring words. Digital Object Identifier 10.1109/MCI.2019.2901085 Date of publication: 10 April 2019 Corresponding Author: A. Gelbukh (Email: [email protected]). 1556-603X/19©2019IEEE MAY 2019 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 41 word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis Sergio Jimenez, Instituto Caro y Cuervo, Bogotá D.C., COLOMBIA Fabio A. González, MindLab Research Group, Universidad Nacional de Colombia, Bogotá D.C., COLOMBIA Alexander Gelbukh, CIC, Instituto Politécnico Nacional, Mexico City, MEXICO George Dueñas, CIC, Instituto Politécnico Nacional, Mexico City, MEXICO Abstract—Measuring lexical similarity using WordNet has a relationships encoded in WordNet can be competitive with long tradition. In the last decade, it has been challenged by distri- word embedding. butional methods, and more recently by neural word embedding. Sentiment analysis is closely related to measuring lexical In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of- similarity since semantically similar words tend to have sim- the-art results. The success of such methods has eclipsed the use ilar polarity. Fig. 1 shows how pairs of words having both of WordNet for predicting human judgments of lexical similarity. either positive or negative polarity have, on average, a greater We propose a new set cardinality-based method for measuring similarity than any other combination. In addition, the cross- lexical similarity, which exploits the WordNet graph, obtaining a like pattern in those graphs shows that neutral words have word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set very low similarity with words of any polarity. With this, the cardinalities computed using this word representation, when fed representations used for lexical and textual similarity can also into a support vector regression classifier trained on a dataset of be useful for sentiment analysis [3]. In this context, practically common synonyms and antonyms, produce results competitive all systems of sentiment analysis rely on a mechanism to with those of word-embedding approaches. On the task of determine the sentiment polarity of the words. In this paper, we predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures show that there is a large performance gap between predictors and achieves the performance of neural embeddings. Although of lexical sentiment polarity based on word embeddings [4], word embedding is still the best approach for these tasks, [5] and those based on the classical WordNet-based measures our method significantly reduces the gap between the results [6], [7], [8], [9], [10]. Recently, Li et al. [11] widened that shown by knowledge-based approaches and by distributional gap by proposing word embeddings optimized for sentiment representations, without requiring a large training corpus. It is also more effective for less-frequent words. analysis. Despite this, we demonstrate that WordNet can be used to rival the performance of neural embeddings on the tasks of both lexical-similarity and lexical sentiment polarity I. INTRODUCTION classification. This returns the WordNet-based methods back Automatic understanding of human language is the main in the game. goal of the natural language processing field. Given the WordNet [14] is a lexical database that links words in a intrinsic compositionality of human language, the relationships graph connected by relationships of synonymy, hyperonymy, between lexical units (i.e. words) play an important role in hyponymy, etc. It has been used for more than 20 years for this process. In particular, recognizing lexical similarity and addressing many NLP tasks, particularly lexical similarity. lexical relatedness is a key component that endows auto- Lexical similarity functions based on WordNet use graph mea- matic systems with the ability to relate pairs of sentences sures to provide a numerical score of the similarity between that use different words but are close in their meaning. two so-called synsets (sets of synonyms in WordNet) or two Traditionally, apart from edit distance [1], [2], computational words [6], [10]. The functions proposed almost two decades linguists have used two main resources to tackle this task: ago mainly rely on the is-a hierarchy, the path length between linguistic knowledge manually coded by lexicographers, such concepts, and the depth of the concepts in the hierarchy. as WordNet, and large corpora. Recently proposed corpus- Fig. 2 illustrates these components. Another important concept based methods known as word embeddings have outperformed added to this approach is information content [7], which knowledge-based methods by using neural networks trained represents the amount of information conveyed by a concept on very large corpora. However, the availability and quality by combining counts of lexical units in corpora aware of the of manually-coded knowledge or very large corpora vary for WordNet is-a hierarchy [7], [8], [9]. different languages and domains. With this, word embedding Another common approach to address lexical similarity uses is not a clear choice in all scenarios. In this paper, we show the so-called distributional hypothesis of meaning: “words that knowledge-base methods exploiting all lexical-semantic with similar meaning will occur with similar neighbors if enough material is available” [15]. This approach involves Corresponding author: A. Gelbukh (Email: [email protected]). construction of a matrix whose entries contain the number of ANEW SenticNet 1 0.175 0.20 0.150 0.18 0.125 0.16 0.14 0.100 0.12 0.075 0.10 0.050 (-) Word Polarity (+) 0.08 (-) Word Polarity (+) Average Word Similarity 0.025 Average Word Similarity 0.06 0.000 (-) Word Polarity (+) (-) Word Polarity (+) Fig. 1. Average lexical similarity of words according to their polarity in two sentiment lexicons [12], [13]. Words were compared using word2vec [4]. Depth … WordNet Among distributional approaches, neural word embedding [4], [5], [20] has received great attention. In this approach, instead of first obtaining word contexts and then reducing Least Common Subsumer the dimensionality, the dimension of the space is fixed and a single iterative procedure attempts to learn a language model from corpora, obtaining an optimal word representation. This language model aims to build either a prediction model for Shortest Path each word given a large number of contexts (continuous bag-of-words) or a prediction model for contexts given the words (skip-gram) [4]. Baroni et al. [21] compared traditional word2 distributional methods against word embedding concluding that word embedding was superior in performance at several word1 tasks, including lexical similarity and relatedness. Both WordNet-based and distributional methods have their advantages and disadvantages in practical applications. For Information Content example, while WordNet-based approaches are aware of the different senses of a word, distributional methods merge senses concept word synset is-a in a single representation. In contrast, when a language other than English is used, a WordNet is a resource difficult and Fig. 2. Elements involved in the classical approach for lexical similarity using costly to obtain, but the text corpora required by distributional WordNet. approaches are generally available for major languages. Re- cently, Aletras and Stevenson [22] proposed a hybrid approach combining word embedding and WordNet, obtaining very competitive results but observed only a marginal contribution times a word (rows) occurs in a particular context (columns) of the WordNet component to the overall performance. across corpora. The context of a word can be a fixed-size There is an important gap in performance between window, a sentence, a paragraph, etc. The goal in this approach WordNet-based and word embedding approaches for lexical is to obtain a vectorial representation of the words in a metric similarity. In this paper, we significantly reduce this difference. feature space and combine this with cosine similarity (or We present a new method that uses WordNet to build lexi- other metrics) to provide a similarity score between pairs of cal similarity relatedness functions.

Load more