Wordnet Embeddings

WordNet Embeddings Chakaveh Saedi, António Branco, João António Rodrigues, João Ricardo Silva University of Lisbon NLX-Natural Language and Speech Group, Department of Informatics Faculdade de Ciências Campo Grande, 1749-016 Lisboa, Portugal {chakaveh.saedi, antonio.branco, joao.rodrigues, jsilva}@di.fc.ul.pt Abstract nymy, meronymy, etc.). In a feature-based model, the semantics of a lexicon is represented by a hash Semantic networks and semantic spaces table where a key is the lexical unit of interest and have been two prominent approaches to the respective value is a set of other units denoting represent lexical semantics. While a uni- typical characteristics of the denotation of the unit fied account of the lexical meaning re- in the key (e.g. role, usage or shape, etc.). Under lies on one being able to convert between a semantic space perspective, in turn, the mean- these representations, in both directions, ing of a lexical unit is represented by a vector in the conversion direction from semantic a high-dimensional space, where each component networks into semantic spaces started to is based on some frequency level of co-occurrence attract more attention recently. In this pa- with the other units in contexts of language usage. per we present a methodology for this con- The motivation for these three families of lexi- version and assess it with a case study. cal representation is to be found in their different When it is applied over WordNet, the per- suitability and success in explaining a wide range formance of the resulting embeddings in of empirical phenomena, in terms of how these are a mainstream semantic similarity task is manifest in ordinary language usage and how they very good, substantially superior to the are elicited in laboratory experimentation. These performance of word embeddings based phenomena are related to the acquisition, storage on very large collections of texts like and retrieval of lexical knowledge (e.g. the spread word2vec. activation effect (Meyer and Schvaneveldt, 1971), the fan effect (Anderson, 1974), among many oth- 1 Introduction ers) and to how this knowledge interacts with other The study of lexical semantics has been at the core cognitive faculties or tasks, including categoriza- of the research on language science and technol- tion (Estes, 1994), reasoning (Rips, 1975), prob- ogy as the meaning of linguistic forms results from lem solving (Holyoak and Koh, 1987), learning the meaning of their lexical units and from the (Ross, 1984), etc. way these are combined (Pelletier, 2016). How In the scope of the formal and computational to represent lexical semantics has thus been a cen- modeling of lexical semantics, these approaches tral topic of inquiry. Three broad families of ap- have inspired a number of initiatives to build proaches have emerged in this respect, namely repositories of lexical knowledge. Popular exam- those advocating that lexical semantics is repre- ples of such repositories are, for semantic net- sented as a semantic network (Quillan, 1966), a works, WordNet (Fellbaum, 1998), for feature- feature-based model (Minsky, 1975; Bobrow and based models, Small World of Words (De Deyne Norman, 1975), or a semantic space (Harris, 1954; et al., 2013), and for the semantic space, word2vec Osgood et al., 1957). (Mikolov et al., 2013a), among many others. In- In terms of data structures, under a semantic terestingly, to achieve the highest quality, reposi- network approach, the meaning of a lexical unit tories of different types typically resort to different is represented as a node in a graph whose edges empirical sources of data. For instance, WordNet between nodes encode different types of seman- is constructed on the basis of systematic lexical in- tic relations holding among the units (e.g. hyper- tuitions handled by human experts; the informa- 122 Proceedings of the 3rd Workshop on Representation Learning for NLP, pages 122–131 Melbourne, Australia, July 20, 2018. c 2018 Association for Computational Linguistics tion encoded in Small World of Words is evoked Princeton WordNet version 3 as a repository of the from laypersons; and word2vec is built on the ba- lexical semantics of the English language, repre- sis of the co-occurrence frequency of lexical units sented as a semantic graph, and converted a sub- in a collection of documents. graph of it with half of its concepts into wnet2vec, Even when motivated in the first place by psy- a collection of vectors in a high-dimension space. cholinguistic research goals, these repositories of These WordNet embeddings were evaluated un- lexical knowledge have been extraordinarily im- der the same conditions that semantic space based portant for language technology. They have been repositories like word2vec are, namely under the instrumental for major advances in language pro- processing task of determining the semantic sim- cessing tasks and applications such as word sense ilarity between pairs of lexical units. The evalua- disambiguation, part-of-speech tagging, named tion results obtained for wnet2vec are around 15% entity recognition, sentiment analysis (e.g. (Li superior to the results obtained for word2vec with and Jurafsky, 2015)), parsing (e.g. (Socher et al., the same mainstream evaluation data set SimLex- 2013)), textual entailment (e.g. (Baroni et al., 999 (Hill et al., 2016). 2012)), discourse analysis (e.g. (Ji and Eisenstein, 2014)), among many others.1 2 Distributional vectors from ontological The proliferation of different types of represen- graphs tation for the same object of research is common For a given word w, its distributional represen- in science, and searching for a unified rendering tation ~w (aka word embedding) is a high dimen- of a given research domain has been a major goal sion vector whose elements ~wi record real val- in many disciplines. To a large extent, such search ued scores expressing the strength of the seman- focuses on finding ways of converting from one tic affinity of w with other words in the vocab- type of representation into another. Once this is ulary. The usual source of these scores, and ul- made possible, it brings not only the theoretical timately the empirical base of word embeddings, satisfaction of getting a better unified insight into has been the frequency of co-occurrence between the research object, but also important instrumen- words taken from large collections of text. tal rewards of reapplying results, resources and The goal here instead is to use semantic net- tools that had been obtained under one representa- works as the empirical source of word embed- tion to the other representations, thus opening the dings. This will permit that the lexical knowledge potential for further research advances. that is encoded in a semantic graph be re-encoded This is the case also in what concerns the re- as an embeddings matrix compiling the distribu- search on lexical semantics. Establishing whether tional vectors of the words in the vocabulary. and how any given lexical representation can be To determine the strength of semantic affinity of converted into another representation is important two words from their representation in a semantic for a more unified account of it. On the language graph, we follow this intuition: the larger the num- science side, this will likely enhance the plausibil- ber of paths and the shorter the paths connecting ity of our empirical modeling about how the mind- any two nodes the stronger is their affinity. brain handles lexical meaning. On the language To make this intuition operative we resort to the technology side, in turn, this will permit to reuse following procedure, to be refined later on. First, resources and find new ways to combine different the semantic graph G is represented as an adja- sources of lexical information for better applica- cency matrix M such that iff two nodes of G with tion results. words wi and wj are related by an edge represent- In the present paper, we seek to contribute to- ing a direct semantic relation between them, the wards a unified account of lexical semantics. We element Mij is set to 1 (to 0 otherwise). report on the methodology we used to convert Second, to enrich M with scores that represent from a semantic network based representation of the strength of semantic affinity of nodes not di- lexical meaning into a semantic space based one, rectly connected with each other by an edge, the and on the successful evaluation results obtained following cumulative iteration is resorted to when applying that methodology. We resorted to (n) 2 2 n n MG = I + αM + α M + ::: + α M (1) 1For the vast number of applications of WordNet, see http://lit.csci.unt.edu/∼wordnet where I is the identity matrix; the n-th power of 123 n the transition matrix, M , is the matrix where Model Similarity each Mij counts the number of paths of lenght n wnet2vec 0.50 between nodes i and j; and α < 1 is a decay fac- tor determining how longer paths are dominated word2vec 0.44 by shorter ones. Third, this iterative procedure is pursued until Table 1: Performance in semantic similarity task it converges into matrix MG, which is analytically over SimLex-999 given by Spearman’s coefficient obtained by an inverse matrix operation given by2 (higher score is better). 1 X e −1 MG = (αM) = (I − αM) (2) For the sound application of the conversion, e=0 each line in MG was normalized, using L2-norm, so that it corresponds to a vector whose scores sum 3 WordNet embeddings to 1, corresponding to a transition matrix. In order to assess this procedure, we use it to Finally, we used Principal Component Analysis convert a mainstream ontological graph into an (PCA) (Wold et al., 1987) to transform the matrix, embeddings matrix.

Load more