Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Total Page:16
File Type:pdf, Size:1020Kb
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation Alexander Panchenkoz, Fide Martenz, Eugen Ruppertz, Stefano Faralliy, Dmitry Ustalov∗, Simone Paolo Ponzettoy, and Chris Biemannz zLanguage Technology Group, Department of Informatics, Universitat¨ Hamburg, Germany yWeb and Data Science Group, Department of Informatics, Universitat¨ Mannheim, Germany ∗Institute of Natural Sciences and Mathematics, Ural Federal University, Russia fpanchenko,marten,ruppert,[email protected] fsimone,[email protected] [email protected] Abstract manually in one of the underlying resources, such as Wikipedia. Unsupervised knowledge-free ap- Interpretability of a predictive model is proaches, e.g. (Di Marco and Navigli, 2013; Bar- a powerful feature that gains the trust of tunov et al., 2016), require no manual labor, but users in the correctness of the predictions. the resulting sense representations lack the above- In word sense disambiguation (WSD), mentioned features enabling interpretability. For knowledge-based systems tend to be much instance, systems based on sense embeddings are more interpretable than knowledge-free based on dense uninterpretable vectors. Therefore, counterparts as they rely on the wealth of the meaning of a sense can be interpreted only on manually-encoded elements representing the basis of a list of related senses. word senses, such as hypernyms, usage We present a system that brings interpretability examples, and images. We present a WSD of the knowledge-based sense representations into system that bridges the gap between these the world of unsupervised knowledge-free WSD two so far disconnected groups of meth- models. The contribution of this paper is the first ods. Namely, our system, providing access system for word sense induction and disambigua- to several state-of-the-art WSD models, tion, which is unsupervised, knowledge-free, and aims to be interpretable as a knowledge- interpretable at the same time. The system is based based system while it remains completely on the WSD approach of Panchenko et al.(2017) unsupervised and knowledge-free. The and is designed to reach interpretability level of presented tool features a Web interface knowledge-based systems, such as Babelfy (Moro for all-word disambiguation of texts that et al., 2014), within an unsupervised knowledge- makes the sense predictions human read- free framework. Implementation of the system is able by providing interpretable word sense open source.1 A live demo featuring several dis- inventories, sense representations, and dis- ambiguation models is available online.2 ambiguation results. We provide a public API, enabling seamless integration. 2 Related Work 1 Introduction In this section, we list prominent WSD systems arXiv:1707.06878v1 [cs.CL] 21 Jul 2017 with openly available implementations. The notion of word sense is central to computa- tional lexical semantics. Word senses can be either Knowledge-Based and/or Supervised Systems encoded manually in lexical resources or induced IMS (Zhong and Ng, 2010) is a supervised all- automatically from text. The former knowledge- words WSD system that allows users to integrate based sense representations, such as those found additional features and different classifiers. By de- in the BabelNet lexical semantic network (Nav- fault, the system relies on the linear support vec- igli and Ponzetto, 2012), are easily interpretable tor machines with multiple features. The AutoEx- by humans due to the presence of definitions, us- tend (Rothe and Schutze¨ , 2015) approach can be age examples, taxonomic relations, related words, used to learn embeddings for lexemes and synsets and images. The cost of such interpretability is 1https://github.com/uhh-lt/wsd that every element mentioned above is encoded 2http://jobimtext.org/wsd Induction of the WSD Models (Scala/Spark): §3.1 WSD Model Index RESTful WSD API (Scala/Play): §3.2 PostgreSQL Applications Building a Distributional Thesaurus Word Sense Disambiguation Word Sense Induction Labeling Senses with Images Super Sense Induction Human Users Building Sense Representations WSD Web Interface (JS/React): §3.3 Text Corpus Labeling Senses with Hypernyms Single Word Disambiguation Extraction of Sense Usage Examples Bing Image Search All Words Disambiguation RESTful API Figure 1: Software and functional architecture of the WSD system. of a lexical resource. These representations were 3.1 Induction of the WSD Models successfully used to perform WSD using the IMS. Figure1 presents architecture of the WSD sys- DKPro WSD (Miller et al., 2013) is a modu- tem. As one may observe, no human labor lar, extensible Java framework for word sense dis- is used to learn interpretable sense representa- ambiguation. It implements multiple WSD meth- tions and the corresponding disambiguation mod- ods and also provides an interface to evaluation els. Instead, these are induced from the input text 3 datasets. PyWSD project also provides imple- corpus using the JoBimText approach (Biemann mentations of popular WSD methods, but these and Riedl, 2013) implemented using the Apache are implemented in the Python language. Spark framework4, enabling seamless processing Babelfy (Moro et al., 2014) is a system based of large text collections. Induction of a WSD on the BabelNet that implements a multilingual model consists of several steps. First, a graph of graph-based approach to entity linking and WSD semantically related words, i.e. a distributional based on the identification of candidate meanings thesaurus, is extracted. Second, word senses are using the densest subgraph heuristic. induced by clustering of an ego-network of related words (Biemann, 2006). Each discovered word Knowledge-Free and Unsupervised Systems sense is represented as a cluster of words. Next, Neelakantan et al.(2014) proposed a multi-sense the induced sense inventory is used as a pivot to extension of the Skip-gram model that features generate sense representations by aggregation of an open implementation. AdaGram (Bartunov the context clues of cluster words. To improve et al., 2016) is a system that learns sense embed- interpretability of the sense clusters they are la- dings using a Bayesian extension of the Skip-gram beled with hypernyms, which are in turn extracted model and provides WSD functionality based on from the input corpus using Hearst(1992) pat- the induced sense inventory. SenseGram (Pelev- terns. Finally, the obtained WSD model is used to ina et al., 2016) is a system that transforms word retrieve a list of sentences that characterize each embeddings to sense embeddings via graph clus- sense. Sentences that mention a given word are tering and uses them for WSD. Other methods to disambiguated and then ranked by prediction con- learn sense embeddings were proposed, but these fidence. Top sentences are used as sense usage ex- do not feature open implementations for WSD. amples. For more details about the model induc- Among all listed systems, only Babelfy imple- tion process refer to (Panchenko et al., 2017). Cur- ments a user interface supporting interpretable vi- rently, the following WSD models induced from a sualization of the disambiguation results. text corpus are available: Word senses based on cluster word features. 3 Unsupervised Knowledge-Free This model uses the cluster words from the in- Interpretable WSD duced word sense inventory as sparse features that represent the sense. This section describes (1) how WSD models are Word senses based on context word features. learned in an unsupervised way from text and (2) This representation is based on a sum of word vec- how the system uses these models to enable human tors of all cluster words in the induced sense inven- interpretable disambiguation in context. tory weighted by distributional similarity scores. 3https://github.com/alvations/pywsd 4http://spark.apache.org Super senses based on cluster word features. an ambiguous word and its context. The output of To build this model, induced word senses are the system is a ranked list of all word senses of the first globally clustered using the Chinese Whispers ambiguous word ordered by relevance to the input graph clustering algorithm (Biemann, 2006). The context. By default, only the best matching sense edges in this sense graph are established by disam- is displayed. The user can quickly understand the biguation of the related words (Faralli et al., 2016; meaning of each induced sense by looking at the Ustalov et al., 2017). The resulting clusters rep- hypernym and the image representing the sense. resent semantic classes grouping words sharing a Faralli and Navigli(2012) showed that Web search common hypernym, e.g. “animal”. This set of se- engines can be used to acquire information about mantic classes is used as an automatically learned word senses. We assign an image to each word in inventory of super senses: There is only one global the cluster by querying an image search API9 us- sense inventory shared among all words in contrast ing a query composed of the ambiguous word and to the two previous traditional “per word” models. its hypernym, e.g. “jaguar animal”. The first hit Each semantic class is labeled with hypernyms. of this query is selected to represent the induced This model uses words belonging to the semantic word sense. Interpretability of each sense is fur- class as features. ther ensured by providing to the user the list of Super senses based on context word features. related senses, the list of the most salient context This model relies on the same semantic classes clues, and the sense usage examples (cf. Figure2). as the previous one but, instead, sense represen- Note that all these elements are obtained without tations are obtained by averaging vectors of words manual intervention. sharing the same class. Finally, the system provides the reasons behind the sense predictions by displaying context words 3.2 WSD API triggered the prediction. Each common feature is To enable fast access to the sense inventories and clickable, so a user is able to trace back sense clus- effective parallel predictions, the WSD models ob- ter words containing this context feature.