Improving Word Sense Disambiguation with Linguistic Knowledge from a Sense Annotated Treebank

Total Page:16

File Type:pdf, Size:1020Kb

Improving Word Sense Disambiguation with Linguistic Knowledge from a Sense Annotated Treebank Improving Word Sense Disambiguation with Linguistic Knowledge from a Sense Annotated Treebank Kiril Simov Alexander Popov Petya Osenova Linguistic Modeling Department IICT-BAS kivs alex.popov petya @bultreebank.org { | | } Abstract (Sensed BulTreeBank) in order to extract semantic relations and add them into the lexical resources. In this paper we present an approach The hypothesis that this enrichment would lead to for the enrichment of WSD knowledge better WSD for Bulgarian was tested in the context bases with data-driven relations from a of the Personalized PageRank algorithm. gold standard corpus (annotated with word The structure of the papers is as follows: the senses, valency information, syntactic next section discusses the related work on the analyses, etc.). We focus on Bulgarian as topic. Section 3 presents the Bulgarian sense an- a use case, but our approach is scalable to notated treebank. Section 4 focuses on the Bul- other languages as well. For the purpose garian Syntactic and Lexical Resources. Section 5 of exploring such methods, the Personal- introduces the WSD experiments and results. Sec- ized Page Rank algorithm was used. The tion 6 concludes the paper. reported results show that the addition of new knowledge improves the accuracy of 2 Related Work WSD with approximately 10.5%. Knowledge-based systems for WSD have proven 1 Introduction to be a good alternative to supervised systems, which require large amounts of manually anno- Solutions to WSD-related tasks usually employ tated training data. In contrast, knowledge-based lexical databases, such as wordnets and ontolo- systems require only a knowledge base and no gies. However, lexical databases suffer from additional corpus-dependent information. An es- sparseness in the availability and density of rela- pecially popular knowledge-based disambiguation tions. One approach towards remedying this prob- approach has been the use of popular graph-based lem is the BabelNet (Navigli and Ponzetto, 2012), algorithms known under the name of ”Random which relates several lexical resources — Word- Walk on Graph” (Agirre et al., 2014). Most ap- Net1, DBpedia, Wiktionary, etc. Although such a proaches exploit variants of the PageRank algo- setting takes into consideration the role of lexical rithm (Brin and Page, 2012). Agirre and Soroa and world knowledge, it does not incorporate con- (2009) apply a variant of the algorithm to Word textual knowledge learned from actual texts (such Sense Disambiguation by translating WordNet as collocational patterns, for example). This hap- into a graph in which the synsets are represented pens because the knowledge sources for WSD sys- as vertices and the relations between them are rep- tems usually capture only a fraction of the rela- resented as edges between the nodes. The result- tions between entities in the world. Many im- ing graph is called a knowledge graph in this pa- portant relations are not present in ontological re- per. Calculating the PageRank vector Pr is accom- sources but could be learned from texts. plished through solving the equation: One possible approach to handling this sparse- ness issue is the incorporation of relations from Pr = cMPr + (1 c)v (1) sense annotated corpora into the lexical databases. − We decided to focus on this line of research, where M is an N x N transition probability matrix by using the Bulgarian sense annotated treebank (N being the number of vertices in the graph), c 1In this work we used version 3.0 of Princeton WordNet: is the damping factor and v is an N x 1 vector. In https://wordnet.princeton.edu/. the traditional, static version of PageRank the val- 596 Proceedings of Recent Advances in Natural Language Processing, pages 596–603, Hissar, Bulgaria, Sep 7–9 2015. ues of v are all equal (1/N), which means that in knowledge graph – whether the knowledge repre- the case of a random jump each vertex is equally sented in terms of nodes and relations (arcs) be- likely to be selected. Modifying the values of tween them is sufficient for the algorithm to pick v effectively changes these probabilities and thus the correct senses of ambiguous words. Several makes certain nodes more important. The version extensions of the knowledge graph constructed on of PageRank for which the values in v are not uni- the basis of WordNet have been proposed and im- form is called Personalized PageRank. plemented. An approach similar to the one pre- The words in the text that are to be disam- sented here is described in Agirre and Martinez biguated are inserted as nodes in the knowledge (2002), which explores the extraction of syntacti- graph and are connected to their potential senses cally supported semantic relations from manually via directed edges (by default, a context window of annotated corpora. In that piece of research, Sem- at least 20 words is used for each disambiguation). Cor, a semantically annotated corpus, was pro- These newly introduced nodes serve to inject ini- cessed with the MiniPar dependency parser and tial probability mass (via the v vector) and thus to the subject-verb and object-verb relations were make their associated sense nodes especially rel- consequently extracted. The new relations were evant in the knowledge graph. Applying the Per- represented on several levels: as word-to-class sonalized PageRank algorithm iteratively over the and class-to-class relations. The extracted selec- resulting graph determines the most appropriate tional relations were then added to WordNet and sense for each ambiguous word. The method has used in the WSD task. The chief difference with been boosted by the addition of new relations and the presently described approach is that the set by developing variations and optimizations of the of relations used here is bigger (it includes also algorithm (Agirre and Soroa, 2009). It has also indirect-object-to-verb relations, noun-to-modifier been applied to the task of Named Entity Disam- relations, etc.). Another difference is that the new biguation (Agirre et al., 2015). relations in the present piece of research are not added as selectional relations, but as semantic re- Montoyo et al. (2005) present a combina- lations between the corresponding synsets. This tion of knowledge-based and supervised systems means that the specific syntactic role of the partic- for WSD, which demonstrates that the two ap- ipant is not taken into account, but only the con- proaches can boost one another, due to the funda- nectedness between the participant and the event mentally different types of knowledge they utilise is registered in the knowledge graph. (paradigmatic vs. syntagmatic). They explore a knowledge-based system that uses heuristics for 3 The Bulgarian Sense Annotated WSD depending on the position of word poten- Treebank tial senses in the WordNet knowledge base. In terms of supervised machine learning based on The sense annotation process over BulTreeBank an annotated corpus, it explores a Maximum En- (BTB) was organized in three layers: verb valency tropy model that takes into account multiple fea- frames (Osenova et al., 2012); senses of verbs, tures from the context of the to-be-disambiguated nouns, adjectives and adverbs; DBpedia URIs over word. This earlier line of research demonstrates named entities. However, in the experiment pre- that combining paradigmatic and syntagmatic in- sented here, we used mainly the annotated senses formation is a fruitful strategy, but it does so by of nouns and verbs (together with the valency in- doing the combination in a postprocessing step, formation), as well as the concept mappings to i.e. by merging the output of two separate sys- WordNet. For that reason we do not discuss the tems; also, it still relies on manually-annotated DBpedia annotation here. A brief outline can be data for the supervised disambiguation. Building found in Popov et al. (2014). on the already mentioned work on graph-based ap- The sense annotation was organized as follows: proaches, it is possible to combine paradigmatic the lemmatized words per part-of-speech (POS) and syntagmatic information in another way – by from BTB received all their possible senses from incorporating both into the knowledge graph. This the explanatory dictionary of Bulgarian and from 2 approach is described in the current paper. our Core WordNet . When two competing defi- nitions came from both resources, preference was The success of knowledge-based WSD ap- proaches apparently depends on the quality of the 2Available at http://compling.hss.ntu.edu.sg/omw/ 597 given to the one that was mapped to the WordNet. open for developing a language-specific hierarchy. In the ambiguous cases the correct sense was se- The valency lexicon consists of around 18,000 lected according to the context of usage. For the verb frames extracted from the BTB. The partici- purposes of the evaluation, some of the files were pants in these frames have ontological constraints. independently manually checked by two individ- At the moment, the verb senses are mapped to ual annotators. In total, 92,000 running words WordNet, but the constraints over arguments are have been mapped to word senses. Thus, about 43 not synchronized with the WordNet concepts in % of all the treebank tokens have been associated their levels of granularity and specificity. This with senses. syncronization is planned as a next step in our The word forms annotated with senses mapped work, in order to further enrich the knowledge to WordNet synsets are 69,333, consisting of graph. nouns and verbs. From these POS, 12,792 word forms have been used for testing, and the rest 5 Experiments have been used for relation extraction. About 20,000 word forms are now in the process of be- 5.1 Description of the WSD tool ing mapped to WordNet synsets. Most of them are The experiments that serve to illustrate the out- adjectives and adverbs.
Recommended publications
  • How Do BERT Embeddings Organize Linguistic Knowledge?
    How Do BERT Embeddings Organize Linguistic Knowledge? Giovanni Puccettiy , Alessio Miaschi? , Felice Dell’Orletta y Scuola Normale Superiore, Pisa ?Department of Computer Science, University of Pisa Istituto di Linguistica Computazionale “Antonio Zampolli”, Pisa ItaliaNLP Lab – www.italianlp.it [email protected], [email protected], [email protected] Abstract et al., 2019), we proposed an in-depth investigation Several studies investigated the linguistic in- aimed at understanding how the information en- formation implicitly encoded in Neural Lan- coded by BERT is arranged within its internal rep- guage Models. Most of these works focused resentation. In particular, we defined two research on quantifying the amount and type of in- questions, aimed at: (i) investigating the relation- formation available within their internal rep- ship between the sentence-level linguistic knowl- resentations and across their layers. In line edge encoded in a pre-trained version of BERT and with this scenario, we proposed a different the number of individual units involved in the en- study, based on Lasso regression, aimed at understanding how the information encoded coding of such knowledge; (ii) understanding how by BERT sentence-level representations is ar- these sentence-level properties are organized within ranged within its hidden units. Using a suite of the internal representations of BERT, identifying several probing tasks, we showed the existence groups of units more relevant for specific linguistic of a relationship between the implicit knowl- tasks. We defined a suite of probing tasks based on edge learned by the model and the number of a variable selection approach, in order to identify individual units involved in the encodings of which units in the internal representations of BERT this competence.
    [Show full text]
  • Treebanks, Linguistic Theories and Applications Introduction to Treebanks
    Treebanks, Linguistic Theories and Applications Introduction to Treebanks Lecture One Petya Osenova and Kiril Simov Sofia University “St. Kliment Ohridski”, Bulgaria Bulgarian Academy of Sciences, Bulgaria ESSLLI 2018 30th European Summer School in Logic, Language and Information (6 August – 17 August 2018) Plan of the Lecture • Definition of a treebank • The place of the treebank in the language modeling • Related terms: parsebank, dynamic treebank • Prerequisites for the creation of a treebank • Treebank lifecycle • Theory (in)dependency • Language (in)dependency • Tendences in the treebank development 30th European Summer School in Logic, Language and Information (6 August – 17 August 2018) Treebank Definition A corpus annotated with syntactic information • The information in the annotation is added/checked by a trained annotator - manual annotation • The annotation is complete - no unannotated fragments of the text • The annotation is consistent - similar fragments are analysed in the same way • The primary format of annotation - syntactic tree/graph 30th European Summer School in Logic, Language and Information (6 August – 17 August 2018) Example Syntactic Trees from Wikipedia The two main approaches to modeling the syntactic information 30th European Summer School in Logic, Language and Information (6 August – 17 August 2018) Pros vs. Cons (Handbook of NLP, p. 171) Constituency • Easy to read • Correspond to common grammatical knowledge (phrases) • Introduce arbitrary complexity Dependency • Flexible • Also correspond to common grammatical
    [Show full text]
  • Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
    January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications.
    [Show full text]
  • Senserelate::Allwords - a Broad Coverage Word Sense Tagger That Maximizes Semantic Relatedness
    WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness Ted Pedersen and Varada Kolhatkar Department of Computer Science University of Minnesota Duluth, MN 55812 USA tpederse,kolha002 @d.umn.edu http://senserelate.sourceforge.net{ } Abstract Despite these difficulties, word sense disambigua- tion is often a necessary step in NLP and can’t sim- WordNet::SenseRelate::AllWords is a freely ply be ignored. The question arises as to how to de- available open source Perl package that as- velop broad coverage sense disambiguation modules signs a sense to every content word (known that can be deployed in a practical setting without in- to WordNet) in a text. It finds the sense of vesting huge sums in manual annotation efforts. Our each word that is most related to the senses answer is WordNet::SenseRelate::AllWords (SR- of surrounding words, based on measures found in WordNet::Similarity. This method is AW), a method that uses knowledge already avail- shown to be competitive with results from re- able in the lexical database WordNet to assign senses cent evaluations including SENSEVAL-2 and to every content word in text, and as such offers SENSEVAL-3. broad coverage and requires no manual annotation of training data. SR-AW finds the sense of each word that is most 1 Introduction related or most similar to those of its neighbors in the Word sense disambiguation is the task of assigning sentence, according to any of the ten measures avail- a sense to a word based on the context in which it able in WordNet::Similarity (Pedersen et al., 2004).
    [Show full text]
  • Detecting Personal Life Events from Social Media
    Open Research Online The Open University’s repository of research publications and other research outputs Detecting Personal Life Events from Social Media Thesis How to cite: Dickinson, Thomas Kier (2019). Detecting Personal Life Events from Social Media. PhD thesis The Open University. For guidance on citations see FAQs. c 2018 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.00010aa9 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Detecting Personal Life Events from Social Media a thesis presented by Thomas K. Dickinson to The Department of Science, Technology, Engineering and Mathematics in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the subject of Computer Science The Open University Milton Keynes, England May 2019 Thesis advisor: Professor Harith Alani & Dr Paul Mulholland Thomas K. Dickinson Detecting Personal Life Events from Social Media Abstract Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves. But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online.
    [Show full text]
  • Deep Linguistic Analysis for the Accurate Identification of Predicate
    Deep Linguistic Analysis for the Accurate Identification of Predicate-Argument Relations Yusuke Miyao Jun'ichi Tsujii Department of Computer Science Department of Computer Science University of Tokyo University of Tokyo [email protected] CREST, JST [email protected] Abstract obtained by deep linguistic analysis (Gildea and This paper evaluates the accuracy of HPSG Hockenmaier, 2003; Chen and Rambow, 2003). parsing in terms of the identification of They employed a CCG (Steedman, 2000) or LTAG predicate-argument relations. We could directly (Schabes et al., 1988) parser to acquire syntac- compare the output of HPSG parsing with Prop- tic/semantic structures, which would be passed to Bank annotations, by assuming a unique map- statistical classifier as features. That is, they used ping from HPSG semantic representation into deep analysis as a preprocessor to obtain useful fea- PropBank annotation. Even though PropBank tures for training a probabilistic model or statistical was not used for the training of a disambigua- tion model, an HPSG parser achieved the ac- classifier of a semantic argument identifier. These curacy competitive with existing studies on the results imply the superiority of deep linguistic anal- task of identifying PropBank annotations. ysis for this task. Although the statistical approach seems a reason- 1 Introduction able way for developing an accurate identifier of Recently, deep linguistic analysis has successfully PropBank annotations, this study aims at establish- been applied to real-world texts. Several parsers ing a method of directly comparing the outputs of have been implemented in various grammar for- HPSG parsing with the PropBank annotation in or- malisms and empirical evaluation has been re- der to explicitly demonstrate the availability of deep ported: LFG (Riezler et al., 2002; Cahill et al., parsers.
    [Show full text]
  • The Procedure of Lexico-Semantic Annotation of Składnica Treebank
    The Procedure of Lexico-Semantic Annotation of Składnica Treebank Elzbieta˙ Hajnicz Institute of Computer Science, Polish Academy of Sciences ul. Ordona 21, 01-237 Warsaw, Poland [email protected] Abstract In this paper, the procedure of lexico-semantic annotation of Składnica Treebank using Polish WordNet is presented. Other semantically annotated corpora, in particular treebanks, are outlined first. Resources involved in annotation as well as a tool called Semantikon used for it are described. The main part of the paper is the analysis of the applied procedure. It consists of the basic and correction phases. During basic phase all nouns, verbs and adjectives are annotated with wordnet senses. The annotation is performed independently by two linguists. Multi-word units obtain special tags, synonyms and hypernyms are used for senses absent in Polish WordNet. Additionally, each sentence receives its general assessment. During the correction phase, conflicts are resolved by the linguist supervising the process. Finally, some statistics of the results of annotation are given, including inter-annotator agreement. The final resource is represented in XML files preserving the structure of Składnica. Keywords: treebanks, wordnets, semantic annotation, Polish 1. Introduction 2. Semantically Annotated Corpora It is widely acknowledged that linguistically annotated cor- Semantic annotation of text corpora seems to be the last pora play a crucial role in NLP. There is even a tendency phase in the process of corpus annotation, less popular than towards their ever-deeper annotation. In particular, seman- morphosyntactic and (shallow or deep) syntactic annota- tically annotated corpora become more and more popular tion. However, there exist semantically annotated subcor- as they have a number of applications in word sense disam- pora for many languages.
    [Show full text]
  • Unified Language Model Pre-Training for Natural
    Unified Language Model Pre-training for Natural Language Understanding and Generation Li Dong∗ Nan Yang∗ Wenhui Wang∗ Furu Wei∗† Xiaodong Liu Yu Wang Jianfeng Gao Ming Zhou Hsiao-Wuen Hon Microsoft Research {lidong1,nanya,wenwan,fuwei}@microsoft.com {xiaodl,yuwan,jfgao,mingzhou,hon}@microsoft.com Abstract This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirec- tional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UNILM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UNILM achieves new state-of- the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm. 1 Introduction Language model (LM) pre-training has substantially advanced the state of the art across a variety of natural language processing tasks [8, 29, 19, 31, 9, 1].
    [Show full text]
  • Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
    Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation Alexander Panchenkoz, Fide Martenz, Eugen Ruppertz, Stefano Faralliy, Dmitry Ustalov∗, Simone Paolo Ponzettoy, and Chris Biemannz zLanguage Technology Group, Department of Informatics, Universitat¨ Hamburg, Germany yWeb and Data Science Group, Department of Informatics, Universitat¨ Mannheim, Germany ∗Institute of Natural Sciences and Mathematics, Ural Federal University, Russia fpanchenko,marten,ruppert,[email protected] fsimone,[email protected] [email protected] Abstract manually in one of the underlying resources, such as Wikipedia. Unsupervised knowledge-free ap- Interpretability of a predictive model is proaches, e.g. (Di Marco and Navigli, 2013; Bar- a powerful feature that gains the trust of tunov et al., 2016), require no manual labor, but users in the correctness of the predictions. the resulting sense representations lack the above- In word sense disambiguation (WSD), mentioned features enabling interpretability. For knowledge-based systems tend to be much instance, systems based on sense embeddings are more interpretable than knowledge-free based on dense uninterpretable vectors. Therefore, counterparts as they rely on the wealth of the meaning of a sense can be interpreted only on manually-encoded elements representing the basis of a list of related senses. word senses, such as hypernyms, usage We present a system that brings interpretability examples, and images. We present a WSD of the knowledge-based sense representations into system that bridges the gap between these the world of unsupervised knowledge-free WSD two so far disconnected groups of meth- models. The contribution of this paper is the first ods. Namely, our system, providing access system for word sense induction and disambigua- to several state-of-the-art WSD models, tion, which is unsupervised, knowledge-free, and aims to be interpretable as a knowledge- interpretable at the same time.
    [Show full text]
  • Building a Treebank for French
    Building a treebank for French £ £¥ Anne Abeillé£ , Lionel Clément , Alexandra Kinyon ¥ £ TALaNa, Université Paris 7 University of Pennsylvania 75251 Paris cedex 05 Philadelphia FRANCE USA abeille, clement, [email protected] Abstract Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations. 1. The tagged corpus pronoun (= him ) or a weak clitic pronoun (= to him or to As reported in (Abeillé and Clément, 1999), we present her), plus can either be a negative adverb (= not any more) the general methodology, the automatic tagging phase, the or a simple adverb (= more). Inflectional morphology also human validation phase and the final state of the tagged has to be annotated since morphological endings are impor- corpus. tant for gathering constituants (based on agreement marks) and also because lots of forms in French are ambiguous 1.1. Methodology with respect to mode, person, number or gender. For exam- 1.1.1.
    [Show full text]
  • Ten Years of Babelnet: a Survey
    Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Survey Track Ten Years of BabelNet: A Survey Roberto Navigli1 , Michele Bevilacqua1 , Simone Conia1 , Dario Montagnini2 and Francesco Cecconi2 1Sapienza NLP Group, Sapienza University of Rome, Italy 2Babelscape, Italy froberto.navigli, michele.bevilacqua, [email protected] fmontagnini, [email protected] Abstract to integrate symbolic knowledge into neural architectures [d’Avila Garcez and Lamb, 2020]. The rationale is that the The intelligent manipulation of symbolic knowl- use of, and linkage to, symbolic knowledge can not only en- edge has been a long-sought goal of AI. How- able interpretable, explainable and accountable AI systems, ever, when it comes to Natural Language Process- but it can also increase the degree of generalization to rare ing (NLP), symbols have to be mapped to words patterns (e.g., infrequent meanings) and promote better use and phrases, which are not only ambiguous but also of information which is not explicit in the text. language-specific: multilinguality is indeed a de- Symbolic knowledge requires that the link between form sirable property for NLP systems, and one which and meaning be made explicit, connecting strings to repre- enables the generalization of tasks where multiple sentations of concepts, entities and thoughts. Historical re- languages need to be dealt with, without translat- sources such as WordNet [Miller, 1995] are important en- ing text. In this paper we survey BabelNet, a pop- deavors which systematize symbolic knowledge about the ular wide-coverage lexical-semantic knowledge re- words of a language, i.e., lexicographic knowledge, not only source obtained by merging heterogeneous sources in a machine-readable format, but also in structured form, into a unified semantic network that helps to scale thanks to the organization of concepts into a semantic net- tasks and applications to hundreds of languages.
    [Show full text]
  • The Case for Word Sense Induction and Disambiguation
    Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation Alexander Panchenko‡, Eugen Ruppert‡, Stefano Faralli†, Simone Paolo Ponzetto† and Chris Biemann‡ ‡Language Technology Group, Computer Science Dept., University of Hamburg, Germany †Web and Data Science Group, Computer Science Dept., University of Mannheim, Germany panchenko,ruppert,biemann @informatik.uni-hamburg.de { faralli,simone @informatik.uni-mannheim.de} { } Abstract Word sense induction from domain-specific cor- pora is a supposed to solve this problem. How- The current trend in NLP is the use of ever, most approaches to word sense induction and highly opaque models, e.g. neural net- disambiguation, e.g. (Schutze,¨ 1998; Li and Juraf- works and word embeddings. While sky, 2015; Bartunov et al., 2016), rely on cluster- these models yield state-of-the-art results ing methods and dense vector representations that on a range of tasks, their drawback is make a WSD model uninterpretable as compared poor interpretability. On the example to knowledge-based WSD methods. of word sense induction and disambigua- Interpretability of a statistical model is impor- tion (WSID), we show that it is possi- tant as it lets us understand the reasons behind its ble to develop an interpretable model that predictions (Vellido et al., 2011; Freitas, 2014; Li matches the state-of-the-art models in ac- et al., 2016). Interpretability of WSD models (1) curacy. Namely, we present an unsuper- lets a user understand why in the given context one vised, knowledge-free WSID approach, observed a given sense (e.g., for educational appli- which is interpretable at three levels: word cations); (2) performs a comprehensive analysis of sense inventory, sense feature representa- correct and erroneous predictions, giving rise to tions, and disambiguation procedure.
    [Show full text]