Explicit Semantic Decomposition for Definition Generation

Explicit Semantic Decomposition for Definition Generation Jiahuan Li ∗ Yu Bao ∗ Shujian Huangy Xinyu Dai Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, China flijh,[email protected], fhuangsj,daixinyu,[email protected] Abstract Word captain Reference the person in charge of a ship Generated the person who is a member of a ship Definition generation, which aims to automatically generate dictionary definitions for Table 1: An example of the definitions of word “cap- words, has recently been proposed to assist tain”. Reference is from Oxford dictionary and Gener- the construction of dictionaries and help peo- ated is from the method of Ishiwatari et al.(2019). ple understand unfamiliar texts. However, previous works hardly consider explicitly modeling the “components” of definitions, lead- where the word to be defined is mapped to a low- ing to under-specific generation results. In dimension semantic vector by an encoder, and the this paper, we propose ESD, namely Explicit Semantic Decomposition for definition gen- decoder is responsible for generating the definition eration, which explicitly decomposes mean- given the semantic vector. ing of words into semantic components, and Although the existing encoder-decoder architec- models them with discrete latent variables for ture (Gadetsky et al., 2018; Ishiwatari et al., 2019; definition generation. Experimental results Washio et al., 2019) yields reasonable generation show that ESD achieves substantial improvements on WordNet and Oxford benchmarks results, it relies heavily on the decoder to extract over strong previous baselines. thorough semantic components of the word, lead- ing to under-specific definition generation results, 1 Introduction i.e. missing some semantic components. As illus- trated in Table1, to generate a precise definition of Dictionary definition, which provides explanatory the word “captain”, one needs to know that “cap- sentences for word senses, plays an important role tain” refers to a person, “captain” is related to ship, in natural language understanding for human. It is and “captain” manages or is in charge of the ship, a common practice for human to consult a dictio- where person, ship, manage are three semantic nary when encountering unfamiliar words (Fraser, components of word “captain”. However, due to 1999). However, it is often the case that we can- the lack of explicitly modeling of these semantic not find satisfying definitions for words that are components, the model misses the semantic com- rarely used or newly created. To assist dictionary ponent “manage” for the word “captain”. compilation and help human readers understand un- Linguists and lexicographers define a word by familiar texts, generating definitions automatically decomposing its meaning into its semantic com- is of practical significance. ponents and expressing them in natural language Noraset et al.(2017) first propose definition sentences (Wierzbicka, 1996). Inspired by this, modeling, which is the task of generating the dic- Yang et al.(2019) incorporate sememes (Bloom- tionary definition for a given word with its embed- field, 1949; Dong and Dong, 2003), i.e. minimum ding. Gadetsky et al.(2018) extend the work by units of semantic meanings of human languages, in incorporating word sense disambiguation to gener- the task of generating definition in Chinese. How- ate context-aware word definitions.Both methods ever, it is just as, if not more, time-consuming and adopt a variant of encoder-decoder architecture, expensive to label the components of words than to write definitions manually. ∗ Equal contribution y Corresponding author In this paper, we propose to explicitly decom- 708 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 708–717 July 5 - 10, 2020. c 2020 Association for Computational Linguistics pose the meaning of words into semantic compo- river and swam to the opposite bank.”, then the ap- nents for definition generation. We introduce a propriate definition would be “the side of a river”. group of discrete latent variables to model the un- They extend Eqn.1 to make use of the given derlying semantic components.Extending the estab- context as follows: lished training technique for discrete latent variable used in representation learning (Roy et al., 2018) T Y and machine translation tasks (van den Oord et al., p(Djw∗; C) = p(dtjdi<t; w∗; C) (2) 2017; Kaiser et al., 2018; Shu et al., 2019), we fur- t=1 ther propose two auxiliary losses to ensure that the 2.3 Decomposed Semantic for Definition introduced latent variables capture the word seman- Modeling tics. Experimental results show that our method achieves significant improvements over previous Linguists consider the process of defining a word methods on two definition generation datasets. We is to decompose its meaning into constituent also show that our model indeed learns meaningful components and describe them in natural lan- and informative latent codes, and generates more guage sentences (Goddard and Wierzbicka, 1994; precise and specific definitions. Wierzbicka, 1996). Previously, Yang et al.(2019) take sememes as one kind of such semantic compo- 2 Background nents, and leverage external sememe annotations HowNet (Dong and Dong, 2003) to help definition In this section, we introduce the background of the generation. They formalize the task of definition original definition modeling task and two extensive generation given a word w∗ and its sememes s as works to original definition modeling. follows: 2.1 Definition Modeling T Y p(Djw ; s) = p(d jd ; w ; s) (3) Definition generation was firstly proposed by No- ∗ t i<t ∗ t=1 raset et al.(2017). The goal of the original task is to generate a natural language description D = d1:T Although it is shown their method can generate for a given word w∗. The authors view it as a con- definitions more accurately, they assume that an- ditional language modeling task: notations of sememes are available for each word, which can be unrealistic in real-world scenarios. T Y p(Djw∗) = p(dtjdi<t; w∗) (1) 3 Approach t=1 In this section, we present ESD, namely Explicit The main drawback of Noraset et al.(2017) is Semantic Decomposition for context-aware defini- that they cannot handle words with multiple differ- tion generation. ent meanings such as “spring” and “bank”, whose 3.1 Modeling Semantic Components with meanings can only be disambiguated using their Discrete Latent Variables contexts. It is linguistically motivated that to define a word 2.2 Word Context for Definition Modeling is to decompose its meaning into constituent To tackle the polysemous problem in the components and describe them in natural lan- definition generation task, Gadetsky et al. guage sentences (Goddard and Wierzbicka, 1994; (2018) introduce the task of Context- Wierzbicka, 1996). We assume that there exists a aware Definition Generation (CDG), in which a set of discrete latent variables z = z1:M that model the semantic components of w , where M is the hy- usage example C = c1:jCj of the target word is ∗ given to help disambiguate the meaning of the perparameter denoting the number of decomposed word. components. Then the marginal likelihood of a For example, given the word “bank” and its con- definition D that we would like to maximize given text “a bank account”, the goal of the task is to a target word w∗ and its context C can be written generate a definition like “an organization that pro- as follows: vides financial services”. However, if the input X pθ(Djw∗; C) = pθ(zjw∗; C)pθ(Djw∗; C; z) context has been changed to “He jumped into the z 709 Word Encoder into its representation r∗ and context C=c1:jCj into 0∗ Context Encoder Definition Encoder its contextual representation H=h1:jCj. The seman- !" !# !$ !% Word Emb. CNN &" &# &$ tic component predictor is responsible for predict- ing the semantic components z=z1:M . Finally, the ℎ , ℎ . ∗ ( decoder generates the target definition from the se- 2" 2# … 23 mantic components z, the word representation r∗ … and the context representation H. +" +# +1 ⊕ /" /# … /1 3.2.1 Encoder ⊕ Same as Ishiwatari et al.(2019), our encoder consists of two parts, namely word encoder and context )4 )" )# )$ encoder. <s> &" &# Definition Decoder Word Encoder The word encoder is responsi- Figure 1: Neural architecture of ESD, including the ble for mapping the word w∗ to a low-dimensional word encoder, context encoder, the decoder and the def- vector r∗, and consists of a word embedding and inition encoder for the posterior networks. a character level encoder. The word embedding is initialized by large-scale pretrained word em- beddings such as GloVe (Pennington et al., 2014) However, it is generally computationally in- or FastText (Bojanowski et al., 2017), and is kept tractable to sum over all the configurations of latent fixed at the training time. Previous works (No- variables. In order to address this issue, we instead raset et al., 2017; Ishiwatari et al., 2019) also show introduce a approximate posterior q (zjw ; C; D) φ ∗ that morphological information can be helpful for and optimize the evidence lower bound (ELBO) of definition generation. We employ a convolutional the log likelihood log p (Djw ; C) for training: θ ∗ neural network (Krizhevsky et al., 2012) to encode the character sequence of the word. We concatenate JELBO = E log pθ(Djz; w∗; C) qφ(zjw∗;C;D) the word embedding and the character encoding to − KL(qφ(zjw∗; C; D)jjpθ(zjw∗; C)) get the word representation r∗. ≤ log pθ(Djw∗; C) Context Encoder We adopt a standard bi- (4) directional LSTM network (Sundermeyer et al., At the training phase, both posterior distribution 2012) to encode the context, which takes word embedding sequence of the context C=c1:jCj and qφ(zjw∗; C; D) and prior distribution pθ(zjw∗; C) are computed and z is sampled from the posterior outputs a hidden state sequence H=h1:jCj.

Load more