Explicit Semantic Decomposition for Definition Generation

Total Page:16

File Type:pdf, Size:1020Kb

Explicit Semantic Decomposition for Definition Generation Explicit Semantic Decomposition for Definition Generation Jiahuan Li ∗ Yu Bao ∗ Shujian Huangy Xinyu Dai Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, China flijh,[email protected], fhuangsj,daixinyu,[email protected] Abstract Word captain Reference the person in charge of a ship Generated the person who is a member of a ship Definition generation, which aims to auto- matically generate dictionary definitions for Table 1: An example of the definitions of word “cap- words, has recently been proposed to assist tain”. Reference is from Oxford dictionary and Gener- the construction of dictionaries and help peo- ated is from the method of Ishiwatari et al.(2019). ple understand unfamiliar texts. However, pre- vious works hardly consider explicitly mod- eling the “components” of definitions, lead- where the word to be defined is mapped to a low- ing to under-specific generation results. In dimension semantic vector by an encoder, and the this paper, we propose ESD, namely Explicit Semantic Decomposition for definition gen- decoder is responsible for generating the definition eration, which explicitly decomposes mean- given the semantic vector. ing of words into semantic components, and Although the existing encoder-decoder architec- models them with discrete latent variables for ture (Gadetsky et al., 2018; Ishiwatari et al., 2019; definition generation. Experimental results Washio et al., 2019) yields reasonable generation show that ESD achieves substantial improve- ments on WordNet and Oxford benchmarks results, it relies heavily on the decoder to extract over strong previous baselines. thorough semantic components of the word, lead- ing to under-specific definition generation results, 1 Introduction i.e. missing some semantic components. As illus- trated in Table1, to generate a precise definition of Dictionary definition, which provides explanatory the word “captain”, one needs to know that “cap- sentences for word senses, plays an important role tain” refers to a person, “captain” is related to ship, in natural language understanding for human. It is and “captain” manages or is in charge of the ship, a common practice for human to consult a dictio- where person, ship, manage are three semantic nary when encountering unfamiliar words (Fraser, components of word “captain”. However, due to 1999). However, it is often the case that we can- the lack of explicitly modeling of these semantic not find satisfying definitions for words that are components, the model misses the semantic com- rarely used or newly created. To assist dictionary ponent “manage” for the word “captain”. compilation and help human readers understand un- Linguists and lexicographers define a word by familiar texts, generating definitions automatically decomposing its meaning into its semantic com- is of practical significance. ponents and expressing them in natural language Noraset et al.(2017) first propose definition sentences (Wierzbicka, 1996). Inspired by this, modeling, which is the task of generating the dic- Yang et al.(2019) incorporate sememes (Bloom- tionary definition for a given word with its embed- field, 1949; Dong and Dong, 2003), i.e. minimum ding. Gadetsky et al.(2018) extend the work by units of semantic meanings of human languages, in incorporating word sense disambiguation to gener- the task of generating definition in Chinese. How- ate context-aware word definitions.Both methods ever, it is just as, if not more, time-consuming and adopt a variant of encoder-decoder architecture, expensive to label the components of words than to write definitions manually. ∗ Equal contribution y Corresponding author In this paper, we propose to explicitly decom- 708 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 708–717 July 5 - 10, 2020. c 2020 Association for Computational Linguistics pose the meaning of words into semantic compo- river and swam to the opposite bank.”, then the ap- nents for definition generation. We introduce a propriate definition would be “the side of a river”. group of discrete latent variables to model the un- They extend Eqn.1 to make use of the given derlying semantic components.Extending the estab- context as follows: lished training technique for discrete latent variable used in representation learning (Roy et al., 2018) T Y and machine translation tasks (van den Oord et al., p(Djw∗; C) = p(dtjdi<t; w∗; C) (2) 2017; Kaiser et al., 2018; Shu et al., 2019), we fur- t=1 ther propose two auxiliary losses to ensure that the 2.3 Decomposed Semantic for Definition introduced latent variables capture the word seman- Modeling tics. Experimental results show that our method achieves significant improvements over previous Linguists consider the process of defining a word methods on two definition generation datasets. We is to decompose its meaning into constituent also show that our model indeed learns meaningful components and describe them in natural lan- and informative latent codes, and generates more guage sentences (Goddard and Wierzbicka, 1994; precise and specific definitions. Wierzbicka, 1996). Previously, Yang et al.(2019) take sememes as one kind of such semantic compo- 2 Background nents, and leverage external sememe annotations HowNet (Dong and Dong, 2003) to help definition In this section, we introduce the background of the generation. They formalize the task of definition original definition modeling task and two extensive generation given a word w∗ and its sememes s as works to original definition modeling. follows: 2.1 Definition Modeling T Y p(Djw ; s) = p(d jd ; w ; s) (3) Definition generation was firstly proposed by No- ∗ t i<t ∗ t=1 raset et al.(2017). The goal of the original task is to generate a natural language description D = d1:T Although it is shown their method can generate for a given word w∗. The authors view it as a con- definitions more accurately, they assume that an- ditional language modeling task: notations of sememes are available for each word, which can be unrealistic in real-world scenarios. T Y p(Djw∗) = p(dtjdi<t; w∗) (1) 3 Approach t=1 In this section, we present ESD, namely Explicit The main drawback of Noraset et al.(2017) is Semantic Decomposition for context-aware defini- that they cannot handle words with multiple differ- tion generation. ent meanings such as “spring” and “bank”, whose 3.1 Modeling Semantic Components with meanings can only be disambiguated using their Discrete Latent Variables contexts. It is linguistically motivated that to define a word 2.2 Word Context for Definition Modeling is to decompose its meaning into constituent To tackle the polysemous problem in the components and describe them in natural lan- definition generation task, Gadetsky et al. guage sentences (Goddard and Wierzbicka, 1994; (2018) introduce the task of Context- Wierzbicka, 1996). We assume that there exists a aware Definition Generation (CDG), in which a set of discrete latent variables z = z1:M that model the semantic components of w , where M is the hy- usage example C = c1:jCj of the target word is ∗ given to help disambiguate the meaning of the perparameter denoting the number of decomposed word. components. Then the marginal likelihood of a For example, given the word “bank” and its con- definition D that we would like to maximize given text “a bank account”, the goal of the task is to a target word w∗ and its context C can be written generate a definition like “an organization that pro- as follows: vides financial services”. However, if the input X pθ(Djw∗; C) = pθ(zjw∗; C)pθ(Djw∗; C; z) context has been changed to “He jumped into the z 709 Word Encoder into its representation r∗ and context C=c1:jCj into 0∗ Context Encoder Definition Encoder its contextual representation H=h1:jCj. The seman- !" !# !$ !% Word Emb. CNN &" &# &$ tic component predictor is responsible for predict- ing the semantic components z=z1:M . Finally, the ℎ , ℎ . ∗ ( decoder generates the target definition from the se- 2" 2# … 23 mantic components z, the word representation r∗ … and the context representation H. +" +# +1 ⊕ /" /# … /1 3.2.1 Encoder ⊕ Same as Ishiwatari et al.(2019), our encoder con- sists of two parts, namely word encoder and context )4 )" )# )$ encoder. <s> &" &# Definition Decoder Word Encoder The word encoder is responsi- Figure 1: Neural architecture of ESD, including the ble for mapping the word w∗ to a low-dimensional word encoder, context encoder, the decoder and the def- vector r∗, and consists of a word embedding and inition encoder for the posterior networks. a character level encoder. The word embedding is initialized by large-scale pretrained word em- beddings such as GloVe (Pennington et al., 2014) However, it is generally computationally in- or FastText (Bojanowski et al., 2017), and is kept tractable to sum over all the configurations of latent fixed at the training time. Previous works (No- variables. In order to address this issue, we instead raset et al., 2017; Ishiwatari et al., 2019) also show introduce a approximate posterior q (zjw ; C; D) φ ∗ that morphological information can be helpful for and optimize the evidence lower bound (ELBO) of definition generation. We employ a convolutional the log likelihood log p (Djw ; C) for training: θ ∗ neural network (Krizhevsky et al., 2012) to encode the character sequence of the word. We concatenate JELBO = E log pθ(Djz; w∗; C) qφ(zjw∗;C;D) the word embedding and the character encoding to − KL(qφ(zjw∗; C; D)jjpθ(zjw∗; C)) get the word representation r∗. ≤ log pθ(Djw∗; C) Context Encoder We adopt a standard bi- (4) directional LSTM network (Sundermeyer et al., At the training phase, both posterior distribution 2012) to encode the context, which takes word embedding sequence of the context C=c1:jCj and qφ(zjw∗; C; D) and prior distribution pθ(zjw∗; C) are computed and z is sampled from the posterior outputs a hidden state sequence H=h1:jCj.
Recommended publications
  • Words and Alternative Basic Units for Linguistic Analysis
    Words and alternative basic units for linguistic analysis 1 Words and alternative basic units for linguistic analysis Jens Allwood SCCIIL Interdisciplinary Center, University of Gothenburg A. P. Hendrikse, Department of Linguistics, University of South Africa, Pretoria Elisabeth Ahlsén SCCIIL Interdisciplinary Center, University of Gothenburg Abstract The paper deals with words and possible alternative to words as basic units in linguistic theory, especially in interlinguistic comparison and corpus linguistics. A number of ways of defining the word are discussed and related to the analysis of linguistic corpora and to interlinguistic comparisons between corpora of spoken interaction. Problems associated with words as the basic units and alternatives to the traditional notion of word as a basis for corpus analysis and linguistic comparisons are presented and discussed. 1. What is a word? To some extent, there is an unclear view of what counts as a linguistic word, generally, and in different language types. This paper is an attempt to examine various construals of the concept “word”, in order to see how “words” might best be made use of as units of linguistic comparison. Using intuition, we might say that a word is a basic linguistic unit that is constituted by a combination of content (meaning) and expression, where the expression can be phonetic, orthographic or gestural (deaf sign language). On closer examination, however, it turns out that the notion “word” can be analyzed and specified in several different ways. Below we will consider the following three main ways of trying to analyze and define what a word is: (i) Analysis and definitions building on observation and supposed easy discovery (ii) Analysis and definitions building on manipulability (iii) Analysis and definitions building on abstraction 2.
    [Show full text]
  • ON SOME CATEGORIES for DESCRIBING the SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami
    ON SOME CATEGORIES FOR DESCRIBING THE SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami 1. A lexeme is the minimum unit that carries meaning. Thus a lexeme can be a "word" as well as an affix (i.e., something smaller than a word) or an idiom (i.e,, something larger than a word). 2. A sememe is a unit of meaning that can be realized as a single lexeme. It is defined as a structure constituted by those features having distinctive functions (i.e., serving to distinguish the sememe in question from other semernes that contrast with it).' A question that arises at this point is whether or not one lexeme always corresponds to just one serneme and no more. Three theoretical positions are foreseeable: (I) one which holds that one lexeme always corresponds to just one sememe and no more, (2) one which holds that one lexeme corresponds to an indefinitely large number of sememes, and (3) one which holds that one lexeme corresponds to a certain limited number of sememes. These three positions wiIl be referred to as (1) the "Grundbedeutung" theory, (2) the "use" theory, and (3) the "polysemy" theory, respectively. The Grundbedeutung theory, however attractive in itself, is to be rejected as unrealistic. Suppose a preliminary analysis has revealed that a lexeme seems to be used sometimes in an "abstract" sense and sometimes in a "concrete" sense. In order to posit a Grundbedeutung under such circumstances, it is to be assumed that there is a still higher level at which "abstract" and "concrete" are neutralized-this is certainly a theoretical possibility, but it seems highly unlikely and unrealistic from a psychological point of view.
    [Show full text]
  • Greek and Latin Roots, Prefixes, and Suffixes
    GREEK AND LATIN ROOTS, PREFIXES, AND SUFFIXES This is a resource pack that I put together for myself to teach roots, prefixes, and suffixes as part of a separate vocabulary class (short weekly sessions). It is a combination of helpful resources that I have found on the web as well as some tips of my own (such as the simple lesson plan). Lesson Plan Ideas ........................................................................................................... 3 Simple Lesson Plan for Word Study: ........................................................................... 3 Lesson Plan Idea 2 ...................................................................................................... 3 Background Information .................................................................................................. 5 Why Study Word Roots, Prefixes, and Suffixes? ......................................................... 6 Latin and Greek Word Elements .............................................................................. 6 Latin Roots, Prefixes, and Suffixes .......................................................................... 6 Root, Prefix, and Suffix Lists ........................................................................................... 8 List 1: MEGA root list ................................................................................................... 9 List 2: Roots, Prefixes, and Suffixes .......................................................................... 32 List 3: Prefix List ......................................................................................................
    [Show full text]
  • Download Article
    Advances in Social Science, Education and Humanities Research (ASSEHR), volume 312 International Conference "Topical Problems of Philology and Didactics: Interdisciplinary Approach in Humanities and Social Sciences" (TPHD 2018) Methods of Identifying Members of Synonymic Row Juliya A. Litvinova Elena A. Maklakova Chair of Foreign Languages Chair of Foreign Languages Federal State Budget Educational Institution of Higher Federal State Budget Educational Institution of Higher Education Voronezh State University of Forestry and Education Voronezh State University of Forestry and Technologies named after G.F. Morozov Technologies named after G.F. Morozov Voronezh, Russia Voronezh, Russia [email protected] Affiliation): dept. name of organization Abstract— This article is devoted to identifying the criteria of analysis, method of field modeling, method of semantic synonymity of lexical items. The existence of different definitions interpretation, method of generalization of dictionary of synonymy, the selection criteria of items in the synonymic row definitions, methods of quantitative, lexicographical, indicate the insufficient study and incoherence of this contextual, psycholinguistic analysis. Data for the study were phenomenon in linguistics. The study of the semantics of lexical 2 lexical items in the Russian language (gorodok, gorodishko) items allows explaining the most accurately and authentically the integration and differentiation of lexical items close in meaning. obtained from the Russian Explanatory Dictionaries (V. I. The description of the meaning structure (sememe) is possible Dahl, D. N. Ushakov, S. I. Ozhegov, A. P. Evgenieva, S. A. through the description of its seme composition. The methods of Kuznetsov, T. F. Efremova), Russian National Corpus seme semasiology (lexicographic, psycholinguistic, contextual) (ruscorpora.ru). allow revealing various components in the sememe structure.
    [Show full text]
  • Different but Not All Opposite: Contributions to Lexical Relationships Teaching in Primary School
    INTE - ITICAM - IDEC 2018, Paris-FRANCE VOLUME 1 All Different But Not All Opposite: Contributions To Lexical Relationships Teaching In Primary School Adriana BAPTISTA Polytechnic Institute of Porto – School of Media Arts and Design inED – Centre for Research and Innovation in Education Portugal [email protected] Celda CHOUPINA Polytechnic Institute of Porto – School of Education inED – Centre for Research and Innovation in Education Centre of Linguistics of the University of Porto Portugal [email protected] José António COSTA Polytechnic Institute of Porto – School of Education inED – Centre for Research and Innovation in Education Centre of Linguistics of the University of Porto Portugal [email protected] Joana QUERIDO Polytechnic Institute of Porto – School of Education Portugal [email protected] Inês OLIVEIRA Polytechnic Institute of Porto – School of Education Centre of Linguistics of the University of Porto Portugal [email protected] Abstract The lexicon allows the expression of particular cosmovisions, which is why there are a wide range of lexical relationships, involving different linguistic particularities (Coseriu, 1991; Teixeira , 2005). We find, however, in teaching context, that these variations are often replaced by dichotomous and decontextualized proposals of lexical organization, presented, for instance, in textbooks and other supporting materials (Baptista et al., 2017). Thus, our paper is structured in two parts. First, we will try to account for the diversity of lexical relations (Choupina, Costa & Baptista, 2013), considering phonological, morphological, syntactic, semantic, pragmatic- discursive, cognitive and historical criteria (Lehmann & Martin-Berthet, 2008). Secondly, we present an experimental study that aims at verifying if primary school pupils intuitively organize their mental lexicon in a dichotomous way.
    [Show full text]
  • Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph
    mathematics Article Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph Tajana Ban Kirigin 1,* , Sanda Bujaˇci´cBabi´c 1 and Benedikt Perak 2 1 Department of Mathematics, University of Rijeka, R. Matejˇci´c2, 51000 Rijeka, Croatia; [email protected] 2 Faculty of Humanities and Social Sciences, University of Rijeka, SveuˇcilišnaAvenija 4, 51000 Rijeka, Croatia; [email protected] * Correspondence: [email protected] Abstract: This paper describes a graph method for labeling word senses and identifying lexical sentiment potential by integrating the corpus-based syntactic-semantic dependency graph layer, lexical semantic and sentiment dictionaries. The method, implemented as ConGraCNet application on different languages and corpora, projects a semantic function onto a particular syntactical de- pendency layer and constructs a seed lexeme graph with collocates of high conceptual similarity. The seed lexeme graph is clustered into subgraphs that reveal the polysemous semantic nature of a lexeme in a corpus. The construction of the WordNet hypernym graph provides a set of synset labels that generalize the senses for each lexical cluster. By integrating sentiment dictionaries, we introduce graph propagation methods for sentiment analysis. Original dictionary sentiment values are integrated into ConGraCNet lexical graph to compute sentiment values of node lexemes and lexical clusters, and identify the sentiment potential of lexemes with respect to a corpus. The method can be used to resolve sparseness of sentiment dictionaries and enrich the sentiment evaluation of Citation: Ban Kirigin, T.; lexical structures in sentiment dictionaries by revealing the relative sentiment potential of polysemous Bujaˇci´cBabi´c,S.; Perak, B. Lexical Sense Labeling and Sentiment lexemes with respect to a specific corpus.
    [Show full text]
  • Verbs of 'Preparing Something for Eating by Heating It in a Particular
    DEPARTAMENTO DE FILOLOGÍA INGLESA Y ALEMANA Verbs of ‘preparing something for eating by heating it in a particular way’: a lexicological analysis Grado en Estudios Ingleses Fabián García Díaz Tutora: Mª del Carmen Fumero Pérez San Cristóbal de La Laguna, Tenerife 8 de septiembre de 2015 INDEX 1. Abstract ................................................................................................................................. 3 2. Introduction .......................................................................................................................... 4 3. Theoretical perspective ........................................................................................................ 6 4. Analysis: verbs of to prepare something for eating by heating it in a particular way: cook, fry and roast. ................................................................................................................... 9 4.1. Corpus selection .............................................................................................................. 9 4.2. Verb selection ................................................................................................................ 11 5. Paradigmatic relations ....................................................................................................... 13 5.1. Semantic components and lexematic analysis ............................................................... 13 5.2. Lexical relations ...........................................................................................................
    [Show full text]
  • Lexical Semantics
    Lexical Semantics COMP-599 Oct 20, 2015 Outline Semantics Lexical semantics Lexical semantic relations WordNet Word Sense Disambiguation • Lesk algorithm • Yarowsky’s algorithm 2 Semantics The study of meaning in language What does meaning mean? • Relationship of linguistic expression to the real world • Relationship of linguistic expressions to each other Let’s start by focusing on the meaning of words— lexical semantics. Later on: • meaning of phrases and sentences • how to construct that from meanings of words 3 From Language to the World What does telephone mean? • Picks out all of the objects in the world that are telephones (its referents) Its extensional definition not telephones telephones 4 Relationship of Linguistic Expressions How would you define telephone? e.g, to a three-year- old, or to a friendly Martian. 5 Dictionary Definition http://dictionary.reference.com/browse/telephone Its intensional definition • The necessary and sufficient conditions to be a telephone This presupposes you know what “apparatus”, “sound”, “speech”, etc. mean. 6 Sense and Reference (Frege, 1892) Frege was one of the first to distinguish between the sense of a term, and its reference. Same referent, different senses: Venus the morning star the evening star 7 Lexical Semantic Relations How specifically do terms relate to each other? Here are some ways: Hypernymy/hyponymy Synonymy Antonymy Homonymy Polysemy Metonymy Synecdoche Holonymy/meronymy 8 Hypernymy/Hyponymy ISA relationship Hyponym Hypernym monkey mammal Montreal city red wine beverage 9 Synonymy and Antonymy Synonymy (Roughly) same meaning offspring descendent spawn happy joyful merry Antonymy (Roughly) opposite meaning synonym antonym happy sad descendant ancestor 10 Homonymy Same form, different (and unrelated) meaning Homophone – same sound • e.g., son vs.
    [Show full text]
  • Antonymy and Conceptual Vectors
    Antonymy and Conceptual Vectors Didier Schwab, Mathieu Lafourcade and Violaine Prince LIRMM Laboratoire d'informatique, de Robotique et de Micro´electronique de Montpellier MONTPELLIER - FRANCE. schwab,lafourca,prince @lirmm.fr http://www.lirmm.fr/f ~ schwab,g lafourca, prince f g Abstract a kernel of manually indexed terms is necessary For meaning representations in NLP, we focus for bootstrapping the analysis. The transver- 1 our attention on thematic aspects and concep- sal relationships , such as synonymy (LP01), tual vectors. The learning strategy of concep- antonymy and hyperonymy, that are more or tual vectors relies on a morphosyntaxic analy- less explicitly mentioned in definitions can be sis of human usage dictionary definitions linked used as a way to globally increase the coher- to vector propagation. This analysis currently ence of vectors. In this paper, we describe a doesn't take into account negation phenomena. vectorial function of antonymy. This can help This work aims at studying the antonymy as- to improve the learning system by dealing with pects of negation, in the larger goal of its inte- negation and antonym tags, as they are often gration into the thematic analysis. We present a present in definition texts. The antonymy func- model based on the idea of symmetry compat- tion can also help to find an opposite thema to ible with conceptual vectors. Then, we define be used in all generative text applications: op- antonymy functions which allows the construc- posite ideas research, paraphrase (by negation tion of an antonymous vector and the enumer- of the antonym), summary, etc. ation of its potentially antinomic lexical items.
    [Show full text]
  • Improved Word Representation Learning with Sememes
    Improved Word Representation Learning with Sememes Yilin Niu1∗, Ruobing Xie1,∗ Zhiyuan Liu1;2 ,y Maosong Sun1;2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China 2 Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, Xuzhou 221009 China Abstract for each word. Hence, people manually annotate word sememes and build linguistic common-sense Sememes are minimum semantic units of knowledge bases. word meanings, and the meaning of each HowNet (Dong and Dong, 2003) is one of such word sense is typically composed by sev- knowledge bases, which annotates each concep- eral sememes. Since sememes are not ex- t in Chinese with one or more relevant sememes. plicit for each word, people manually an- Different from WordNet (Miller, 1995), the phi- notate word sememes and form linguistic losophy of HowNet emphasizes the significance of common-sense knowledge bases. In this part and attribute represented by sememes. paper, we present that, word sememe in- HowNet has been widely utilized in word similar- formation can improve word representa- ity computation (Liu and Li, 2002) and sentiment tion learning (WRL), which maps word- analysis (Xianghua et al., 2013), and in section 3.2 s into a low-dimensional semantic space we will give a detailed introduction to sememes, and serves as a fundamental step for many senses and words in HowNet. NLP tasks. The key idea is to utilize In this paper, we aim to incorporate word se- word sememes to capture exact meanings memes into word representation learning (WRL) of a word within specific contexts accu- and learn improved word embeddings in a low- rately.
    [Show full text]
  • Towards a Deeper Understanding of Meaning in Language
    Valentina Gavranović www.singidunum.ac.rs Valentina Gavranović TOWARDS A TOWARDS A DEEPER DEEPER UNDERSTANDING UNDERSTANDING OF MEANING OF MEANING IN LANGUAGE AIN COURSEBOOK LANGUAGE IN ENGLISH SEMANTICS A COURSEBOOK IN ENGLISH SEMANTICS is coursebook has been written for undergraduate students majoring in English, with the aim to introduce them with the main concepts found in the domain of semantics, and help them understand the complexities and dierent aspects of meaning in language. Although the book is primarily intended for students who have no previous theoretical knowledge in semantics, it can also be used by those who want to consolidate and expand on their existing learning experience in linguistics and semantics. e scope of concepts and topics covered in this book have been framed within a rounded context which oers the basics in semantics, and provides references and incentives for students’ further autonomous investigation and research. e author endeavoured to present complex linguistic issues in such a way that any language learner with no previous knowledge in this area can use the book. However, anyone who uses this book needs to read it very carefully so that they could understand the theory and apply it while tackling the examples illustrating the concepts provided in this book. Furthermore, the very nature of linguistics demands a thorough and careful reading, rereading and questioning. e book is composed of six chapters, and, within them, there are several sections and subsections. Each chapter contains theoretical explanations followed by a set of questions and exercises whose aim is to help the learners revise the theoretical concepts and apply them in practice.
    [Show full text]
  • Towards Building a Multilingual Sememe Knowledge Base
    Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets Fanchao Qi1∗, Liang Chang2∗y, Maosong Sun13z, Sicong Ouyang2y, Zhiyuan Liu1 1Department of Computer Science and Technology, Tsinghua University Institute for Artificial Intelligence, Tsinghua University Beijing National Research Center for Information Science and Technology 2Beijing University of Posts and Telecommunications 3Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University [email protected], [email protected] fsms, [email protected], [email protected] Abstract word husband A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain sense "married man" "carefully use" words annotated with sememes, have been successfully ap- plied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread human economize utilization. To address the issue, we propose to build a uni- sememe fied sememe KB for multiple languages based on BabelNet, a family male spouse multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It man- ually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of auto- Figure 1: Sememe annotation of the word “husband” in matic sememe prediction for synsets, aiming to expand the HowNet. seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative anal- sememes to annotate senses of over 100 thousand Chinese yses to explore important factors and difficulties in the task.
    [Show full text]