Interpretability in Word Sense Disambiguation Using Tsetlin Machine

Interpretability in Word Sense Disambiguation using Tsetlin Machine Rohan Kumar Yadav, Lei Jiao, Ole-Christoffer Granmo and Morten Goodwin Centre for Artificial Intelligence Research, University of Agder, Grimstad, Norway Keywords: Tsetlin Machine, Word Sense Disambiguation, Interpretability. Abstract: Word Sense Disambiguation (WSD) is a longstanding unresolved task in Natural Language Processing. The challenge lies in the fact that words with the same spelling can have completely different senses, sometimes depending on subtle characteristics of the context. A weakness of the state-of-the-art supervised models, however, is that it can be difficult to interpret them, making it harder to check if they capture senses accurately or not. In this paper, we introduce a novel Tsetlin Machine (TM) based supervised model that distinguishes word senses by means of conjunctive clauses. The clauses are formulated based on contextual cues, represented in propositional logic. Our experiments on CoarseWSD-balanced dataset indicate that the learned word senses can be relatively effortlessly interpreted by analyzing the converged model of the TM. Additionally, the classification accuracy is higher than that of FastText-Base and similar to that of FastText-CommonCrawl. 1 INTRODUCTION distinct for different contexts. For example, let us consider the word “book” in the sentence “I want to book a ticket for the upcoming movie”. Although Word Sense Disambiguation (WSD) is one of a traditional chatbot can classify “book” as “reser- the unsolved task in Natural Language Process- vation” rather than “reading material”, it does not ing (NLP) (Agirre and Edmonds, 2007) with rapidly give us an explanation of how it learns the meaning increasing importance, particularly due to the ad- of the target word “book”. An unexplained model vent of chatbots. WSD consists of distinguishing the raises several questions, like: “How can we trust the meaning of homographs – identically spelled words model?” or “How did the model make the decision?”. whose sense or meaning depends on the surrounding Answering these questions would undoubtedly make context words in a sentence or a paragraph. WSD is a chatbot more trustworthy. In particular, deciding one of the main NLP tasks that still revolves around word senses for the wrong reasons may lead to unde- the perfect solution of sense classification and indica- sirable consequences, e.g., leading the chatbot astray tion (Navigli et al., 2017), and it usually fails to be in- or falsely categorizing a CV. Introducing a high level tegrated into NLP applications (de Lacalle and Agirre, of interpretability while maintaining classification ac- 2015). Many supervised approaches attempt to solve curacy is a challenge that the state-of-the-art NLP the WSD problem by training a model on sense anno- techniques so far have failed to solve satisfactorily. tated data (Liao et al., 2010). However, most of them Although some of the rule-based methods, like fail to produce interpretable models. Because word decision trees, are somewhat easy to interpret, other senses can be radically different depending on the methods are out of reach for comprehensive inter- context, interpretation errors can have adverse con- pretation (Wang et al., 2018), such as Deep Neu- sequences in real applications, such as chatbots. It ral Networks (DNNs). Despite the excellent accu- is therefore crucial for a WSD model to be easily in- racy achieved by DNNs, the “black box” nature im- terpretable for human beings, by showing the signifi- pedes their impact (Rudin, 2018). It is difficult for cance of context words for WSD. human beings to interpret the decision-making pro- NLP is one of the discipline that are used as the cess of artificial neurons. Weights and bias of deep application in a chatbot. With the recent prolifera- neural networks are in the form of fine-tuned contin- tion of chatbots, the limitations of the state-of-the- uous values that make it intricate to distinguish the art WSD has become increasingly apparent. In real- context words that drive the decision for classifica- life operation, chatbots are notoriously poor in distin- tion. Some straightforward techniques such as Naive guishing the meaning of words with multiple senses, 402 Yadav, R., Jiao, L., Granmo, O. and Goodwin, M. Interpretability in Word Sense Disambiguation using Tsetlin Machine. DOI: 10.5220/0010382104020409 In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) - Volume 2, pages 402-409 ISBN: 978-989-758-484-8 Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Interpretability in Word Sense Disambiguation using Tsetlin Machine Bayes classifier, logistic regression, decision trees, from traditional supervised WSD, embedding is be- random forest, and support vector machine are there- coming increasingly popular to capture the senses fore still widely used because of their simplicity and of words (Mikolov et al., 2013). Further, Majid et interpretability. However, they provide reasonable ac- al. improve the state-of-the-art supervised WSD by curacy only when the data is limited. assigning vector coefficients to obtain more precise In this paper, we aim to obtain human- context representations, and then applying PCA di- interpretable classification of the CoarseWSD- mensionality reduction to find a better transforma- balanced dataset, using the recently introduced tion of the features (Sadi et al., 2019). Salomons- Tsetlin Machine (TM). Our goal is to achieve a viable son presents a supervised classifier based on bidirec- balance between accuracy and interpretability by tional LSTM for the lexical sample task of the Sense- introducing a novel model for linguistic patterns. val dataset (Kageb˚ ack¨ and Salomonsson, 2016). TM is a human interpretable pattern recognition Contextually-aware word embedding has been ex- method that composes patterns in propositional logic. tensively addressed with other machine learning ap- Recently, it has provided comparable accuracy as proaches across many disciplines. Perhaps the most compared to DNN with arguably less computational relevant one is the work on neural network embed- complexity, while maintaining high interpretability. ding (Rezaeinia et al., 2019; Khattak et al., 2019; We demonstrate how our model learns pertinent Lazreg et al., 2020). There is a fundamental differ- patterns based on the context words, and explore ence between our work and previous ones in terms of which context words drive the classification decisions interpretability. Existing methods yield complex vec- of each particular word sense. The rest of the paper is torized embedding, which can hardly be claimed to be arranged as follows: The related work on WSD and human interpretable. TM are explained in Section 2. Our TM-based WSD- Furthermore, natural language processing has, in architecture, the learning process, and our approach recent years, been dominated by neural network- to interpretability are covered in Section 3. Section based attention mechanisms (Vaswani et al., 2017; 4 presents the experiment results for interpretability Sonkar et al., 2020). Even though attentions and the and accuracy. We conclude the paper in Section 5. attention-based transformers (Devlin et al., 2018) im- plementation provide the state-of-the-art results, the methods are overly complicated and far from interpretable. The recently introduced work (Loureiro 2 RELATED WORK et al., 2020) shows how contextual information influ- ences the sense of a word via the analysis of WSD on The research area of WSD is attracting increasing at- BERT. tention in the NLP community (Agirre and Edmonds, All these contributions clearly show that super- 2007) and has lately experienced rapid progress (Yuan vised neural models can achieve the state-of-the-art et al., 2016; Tripodi and Pelillo, 2017; Hadiwinoto performance in terms of accuracy without consid- et al., 2019). In all brevity, WSD methods can be ering external language-specific features. However, categorized into two groups: knowledge-based and such neural network models are criticized for be- supervised WSD. Knowledge-based methods involve ing difficult to interpret due to their black-box na- selecting the sense of an ambiguous word from the ture (Buhrmester et al., 2019). To introduce inter- semantic structure of lexical knowledge bases (Nav- pretability, we employ the newly developed TM for igli and Velardi, 2004). For instance, the seman- WSD in this study. The TM paradigm is inherently tic structure of BabelNet has been used to measure interpretable by producing rules in propositional logic word similarity (Dongsuk et al., 2018). The benefit (Granmo, 2018). TMs have demonstrated promising of using such models is that they do not require an- results in various classification tasks involving image notated or unannotated data but rely heavily on the data (Granmo et al., 2019), NLP tasks (Yadav et al., synset relations. Regarding supervised WSD, tradi- 2021; Bhattarai et al., 2020; Berge et al., 2019; Saha tional approaches generally depend on extracting fea- et al., 2020) and board games (Granmo, 2018). Al- tures from the context words that are present around though the TM operates on binary data, recent work the target word (Zhong and Ng, 2010). suggests that a threshold-based representation of con- The success of deep learning has significantly fu- tinuous input allows the TM to perform successfully eled WSD research. For example, Le et al. have beyond binary data, e.g., applied to diseases outbreak reproduced the state-of-the-art performance of an forecasting (Abeyrathna et al., 2019). Additionally, LSTM-based approach to WSD on several openly the convergence of TM has been analysed in (Zhang available datasets : GigaWord, SemCor (Miller et al., et al., 2020). 1994), and OMSTI (Taghipour and Ng, 2015). Apart 403 ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence Text corpus 1 vocab list apple apple Input 1 Apple will launch launch launch 1 0 0 1 1 1 0 Iphone 12 next year. year year iphone iphone Text corpus 2 Stemmer like I like apple more than remove stopwords apple like Input 2 orange.

Interpretability in Word Sense Disambiguation Using Tsetlin Machine

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

A Sentiment-Based Chat Bot

The History and Recent Advances of Natural Language Interfaces for Databases Querying

Sentiment Analysis on Interactive Conversational Agent/Chatbots Sameena Thabassum [email protected]

A Meaningful Information Extraction System for Interactive Analysis of Documents Julien Maitre, Michel Ménard, Guillaume Chiron, Alain Bouju, Nicolas Sidère

NLP with BERT: Sentiment Analysis Using SAS® Deep Learning and Dlpy Doug Cairns and Xiangxiang Meng, SAS Institute Inc

Span Model for Open Information Extraction on Accurate Corpus

Best Practices in Text Analytics and Natural Language Processing

Sentiment Analysis Using N-Gram Algo and Svm Classifier

Information Extraction Overview

Open Information Extraction from The

Information Extraction Using Natural Language Processing