<<

Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation Hua Zheng1,2∗, Damai Dai1,2∗, Lei Li1,2, Tianyu Liu1,2, Zhifang Sui1,2,3, Baobao Chang1,2,3, Yang Liu1,2† 1Department of Computer Science and Technology, Peking University 2Key Lab of Computational (MOE), Peking University 3Peng Cheng Laboratory, China {zhenghua,daidamai,tianyu0421,szf,chbb,liuyang}@pku.edu.cn [email protected]

Formation Morphemes: Abstract Word Word Definition Rule Definitions Modifier-Head 䕾 : 䕼共䖃 (white) In this paper, we tackle the task of Definition 䕾冲䕼共䖃冰ȼ 1 (White flower.) 冲1: 冰㘴 (flower) Generation (DG) in Chinese, which aims at au- 䕾冲 Adverb-Verb 䕾 : 䕼䕼☯ (vainly) tomatically generating a definition for a word. 䕾冲 䕼䕼☯冰尸ȼ 2 (Vainly spend.) Most existing methods take the source word as 冲2: 冰尸 (spend) an indecomposable semantic unit. However, in parataxis languages like Chinese, word mean- Figure 1: Word formation process for the polysemous 白花 ings can be composed using the word forma- " ". With morphemes and a formation rule spec- tion process, where a word ("桃花", peach- ified, the process can construct and distinguish each blossom) is formed by formation components meaning. Definitions are simplified for ease of reading. ("桃", peach;"花", flower) using a formation rule (Modifier-Head). Inspired by this process, we propose to enhance DG with word forma- a formation rule. As shown in Figure 1, the poly- 白花 白花 tion features. We build a formation-informed semous word " " holds two meanings " 1" dataset and propose a model DeFT, which and "白花2", which can be distinguished by dif- Decomposes words into formation features, ferent morphemes ("白1;花1" vs. "白2;花2") and dynamically Fuses different features through different rules (Modifier-Head vs. Adverb-Verb). a gating mechanism, and generaTes word def- Such intuitive formation process can clearly and initions. Experimental results show that our unambiguously construct the word meaning. method is both effective and robust. 1 Inspired by the word formation process in - 1 Introduction nese, we propose to enhance DG with formation features. First, we build a formation-informed Definition Generation (DG) aims at automatically dataset under expert annotations. Next, we design generating an explanatory text for a word. This a DG model DeFT, which Decomposes words into task is of practical importance to assist formation features, Fuses different features through construction, especially in highly productive lan- a gating mechanism, and generaTes definitions. guages like Chinese (Yang et al., 2020). Most ex- Our contributions are as follows: (1) We first isting methods take the source word as an indecom- propose to use word formation features to enhance posable lexico-semantic unit, using features like DG and design a formation-informed model DeFT. word embedding (Noraset et al., 2017) and con- (2) We build a new formation-informed DG dataset text (Gadetsky et al., 2018; Ishiwatari et al., 2019). under expert annotations. (3) Experimental results Recently, Yang et al. (2020) and Li et al. (2020) show that our method brings a substantial perfor- achieve improvement by decomposing the word mance improvement, and maintains a robust per- meaning into different semantic components. formance even with only word formation features. In decomposing the word meaning, the word formation process is an intuitive and informative 2 Related Work way that has not been explored in DG by far. For parataxis languages like Chinese, a word is formed Definition Generation: Noraset et al. (2017) first by formation components, i.e., morphemes, and propose the DG task and use word embeddings as the main input. The following methods add ∗Equal contribution. † contexts for disambiguation (Gadetsky et al., 2018; Corresponding author. 1The code is available at https://github.com/ Ishiwatari et al., 2019) or word-pair embeddings Hunter-DDM/DeFT-naacl2021. to capture lexical relations (Washio et al., 2019). 5524

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5524–5531 June 6–11, 2021. ©2021 Association for Computational Linguistics Recent methods attempt to decompose the word Formation Rule Use Case % meaning by using HowNet sememes (Yang et al., Modifier-Head 红花 (red-flower) 38.62 2020) or modeling latent variables (Li et al., 2020). Parallel 昏花 (dizzy-dim) 22.87 Semantic Components: To systematically define Verb-Object 花钱 (spend-money) 16.44 Adverb-Verb 白花 (vainly-spend) 8.45 words, linguists decompose the word meaning into Single Morpheme 花生 (peanut) 3.51 semantic components (Wierzbicka, 1996). Follow- ing this idea, HowNet (Dong and Dong, 2006) uses Table 1: Examples of word formation rules and use manually-created sememes to describe the seman- cases. % denotes the instance percentage. tic aspects of words. Recent studies also show that leveraging subword information produces better Morpheme (ID) Morpheme Definition embeddings (Park et al., 2018; Lin and Liu, 2019; 花1 (07361-01) 花朵 (flower) Zhu et al., 2019), but these methods lack a clear 花2 (07361-06) 模糊;迷乱 (dim; blurred) 花 用;耗费 distinction among different formation rules. 3 (07361-09) (use; spend) Table 2: Three example morphemes and definitions for 3 Word Formation Process in Chinese the character "花". We give each morpheme a unique It is linguistically motivated to explore the word for- ID, C-M (C is character rank, M is morpheme rank). mation process to better understand words. Instead of combining roots and affixes, Chinese words are extract data from the 5th edition of the Contem- formed by characters in a parataxis way (Li et al., porary Chinese Dictionary published by the Com- 2018). Here, we introduce two formation features mercial Press2, one of the most influential Chinese and construct a formation-informed dataset. . We collect 45,311 Chinese disyllabic word entries with contexts and definitions. To an- 3.1 Formation components and rules notate them, we also collect 10,527 Chinese char- Chinese formation components are morphemes, acters and 20,855 morphemes with definitions. defined as the smallest meaning-bearing units (Zhu, Our annotators include two professors and six 1982). Morphemes are unambiguous in represent- graduates major in Chinese linguistics. Given the ing word meanings, since they can distinguish dif- definition, they annotate each word with its for- ferent meanings and uses of each character in a mation rule (as shown in Table 1) and morpheme word, like "白1" and "白2" in Figure 1. Morphemes IDs (as shown in Table 2). Each entry is cross- are also productive in constructing words, since validated by three independent annotators and re- over 99.48% Chinese words are formed using a viewed by one. The detailed annotation process small of nearly 20,000 morphemes (Fu, 1988). includes the following three steps: These properties make morphemes highly effective (1) Equipped with the definition, annotators anno- as formation components. tate each entry with two morpheme IDs (select Formation rules specify how morphemes are from the morphemes of each character) and combined to form words in a parataxis way. For a formation rule (select from 16 formation example, the Modifier-Head rule uses the first mor- rules). Each entry is independently annotated pheme to modify the second morpheme. Following by three annotators, who also note down a the study of Liu et al. (2018), we adopt 16 Chinese confidence score. If three annotations are the formation rules and show the top 5 in instance per- same, turn to (3); otherwise, turn to (2). centage in Table 1. Complete descriptions of 16 formation rules are provided in Appendix A. (2) Another annotator reviews the conflicting an- notations and confidence scores, and decides 3.2 Formation-informed dataset the final annotation. Turn to (3). We construct a DG dataset under expert annota- (3) The annotation is collected as an entry into tions, which contains morphemes and formation the final dataset. rules. Each entry consists of (1) source word, (2) morphemes and morpheme definitions, (3) forma- It takes one minute on average for each annotator tion rule, (4) context (a sentence containing the to annotate an entry. Only 8,193 out of 45,311 source word), (5) source word definition. entries enter Phase (2) in the whole process. To ensure full coverage and fine granularity, we 2https://www.cp.com.cn 5525 Comprehensive Morpheme Embedding 4.2.1 Seed vector ܚ ݉݋ݎ݌݄ଵ Bi-LSTM ܕଵ ௠ We first employ a Bi-LSTM (Graves and Schmidhu- Linear ݉݋ݎ݌݄ଶ Bi-LSTM ܕଶ ber, 2005) to encode morphi. Then, we combine ܟ -morphi into a comprehensive morpheme embed כ Character-level Embedding Pretrained Word Embedding ࢉࢎଵ GloVe ding rm with a rule-specific linear layer, which כLinear ݓ ࢉࢎଶ כ܉ Gated Attended Morpheme Vector captures different semantic relations: ݉݋ݎ݌݄ଵ ᇱ Attention ܏௧,ଵ ܐ௧ Morph morph , ܏௧ mi = Bi-LSTM([ i]) ݉݋ݎ݌݄ଶ ᇱ GATE Attention ܏௧,ଶ (rule) (rule) ܐ௧ rm = Wm [m1; m2]+bm .

Attended Context Vector ܋௧ ܥ כܟ ௠ܚ Attention We then use a linear layer to combine rm and the Linear ܐ௧ ܎܍܉ܜ ௧ pretrained source word embedding w∗ to obtain

܎܍܉ܜଵ ܌ଵ ܎܍܉ܜଶ the seed vector r∗ as the initial generator input: (seed vector) כܚ GRU- GRU- … LSTM ܐ ܐᇱ LSTM ܐ ܐᇱ … ଵ GATE ଵ ଶ GATE ଶ r∗ = Wr[rm; w∗]+br.

Figure 2: An illustration of DeFT. The definition gen- 4.2.2 Definition generator erator uses the seed vector as input, and dynamically We employ an LSTM followed by a GRU-like (Cho fuses different features using a gating mechanism. et al., 2014) gate GRU-GATE(·), which dynami- cally fuses different features, as the generator:

4 Approach  ht = LSTM(dt−1, ht−1),  4.1 Task formulation ht = GRU-GATE(ht, featt), We extend the DG setting in Ishiwatari et al. (2019) featt =[rm; w∗; a∗; gt; ct], to incorporate the word formation features, F = th {morph1,morph2,rule}, where morphi is the where ht is the LSTM hidden state at the t step,  ith morpheme definition sentence and rule is the ht is the gated hidden state, dt−1 is the embedding formation rule. The training goal is to maximize the of the previous definition word, specially, d0  r∗, likelihood of the ground-truth definition D = d1:T and featt denotes the features that dynamically given the source word w∗, the context sentence control the generation process. We explain a∗, gt, C = c1:n, and the word formation features F : and ct as follows. a∗ is the character-level embedding, obtained by T  combining the embedding chi of each character in p(D|w∗,C,F)= p(dt|di

Table 5: Human evaluation guideline for raters. Note that the evaluation results on these two metrics may show different trends. For example, a predicted definition with an opposite meaning to the ground-truth definition may receive a high coverage score but a low overall score.

BLEU (Δ) ROUGE-L (Δ) different meanings. By contrast, using only the DeFT 26.42 38.58 formation features (F) can capture the meaning dif- w/o MorphGATE 25.81 (0.61↓) 38.01 (0.57↓) ference and disambiguate the word (use vs. money). ↓ ↓ w/o Formation Rules 25.52 (0.90 ) 37.40 (1.18 ) Further, DeFT (F+W+C) generates the exactly cor- rect definition by fusing different features. Due Table 6: Ablation study of DeFT. to space limits, we put two additional interesting analyses on formation rules in Appendix B. 旵䐧1 旵䐧2 (כWord (ݓ Formation features are more feasible and ef- Morph1 旵1: piecemeal 旵1: piecemeal fective compared with sememes: Sememes are Morph2 䐧1: use 䐧2: expense Rule Adverb-Verb Modifier-Head expert-crafted words to describe the semantic as-

2ȼ pects of words. For annotation cost, annotating[כ濈濃澳⁂ȼ ㇾ㗳⠙ [ݓ 1[כINPUT Context ṟ⋮ᷤ [ݓ sememes is as expensive as writing definitions (Li (.2[כyuan.) (Get more [ݓ 50 1[כYou can [ݓ) Definition 旵䞍☯Ṿ䐧ȼ 旵䞍䖃尸䐧ȼ et al., 2020), whereas annotating formation features (Use in a piecemeal way.) (Piecemeal expense.) is a simple multiple-choice task with 1.98 choices W 掰䐧掰ȼ 掰䐧掰ȼ (Money uses money.) (Money uses money.) on average. For effectiveness, we conduct experi- F 旵䞍䖃Ṿ䐧ȼ 旵䞍䖃掰ȼ ments using sememe embeddings from Yang et al. (Piecemeal use.) (Piecemeal money.) (2020) as additional features. Results show that, OUTPUT F+W+C 旵䞍☯Ṿ䐧ȼ 旵䞍䖃尸䐧ȼ based on W, adding sememes brings a BLEU im- (DeFT) (Use in a piecemeal way.) (Piecemeal expense.) provement of 0.52, lower than that of 2.30 from Figure 3: Generation examples for a polysemous word F+W. Further, based on DeFT, adding sememes "零用" using different features. even brings noises and decreases BLEU by 0.35. This indicates that, compared with sememes, for- mation features are more feasible and effective. the automatic evaluation: formation features are effective and DeFT performs the best. 6 Conclusion 5.3 Analysis In this paper, we propose to use formation features Ablation study: Based on DeFT, we perform ab- to enhance DG. We build a formation-informed lation study regarding MorphGATE and the for- dataset and design a model DeFT, which decom- mation rule in Table 6. (1) For MorphGATE, we poses words into formation features and fuses fea- replace it with a simple average function, which tures via a gating mechanism. Experimental results leads to a drop in performance. This reveals that show that our method is both effective and robust. different morphemes take effect in different gen- Acknowledgments eration phases. (2) For formation rule, we replace the rule-specific layers with a rule-shared layer, We would like to thank all the anonymous review- leading to a more serious performance drop. This ers and annotators for their helpful advice on var- verifies that distinguishing the specific formation ious aspects of this work, including Fuqian Wu, rule can assist word meaning construction. Ming Liu, Yaqi Yin, Yue Wang, etc. This paper is Formation features can assist disambiguation: supported by the National Natural Science Founda- We present the generated definitions for a polyse- tion of China (No. 62036001, U19A2065) and the mous word in Figure 3. The example shows that National Social Science Foundation of China (No. using only the word feature (W) cannot distinguish 16YY137, 18ZDA295). 5528 Yang Liu, Zi Lin, and Sichen Kang. 2018. Towards a description of chinese morpheme conceptions and Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- semantic composition of word. Journal of Chinese gio. 2015. Neural machine translation by jointly Information Processing, 32(2):12–21. learning to align and translate. In ICLR 2015. Thanapon Noraset, Chen Liang, Larry Birnbaum, and Piotr Bojanowski, Edouard Grave, Armand Joulin, and Doug Downey. 2017. Definition modeling: Learn- Tomas Mikolov. 2017. Enriching word vectors with ing to define word embeddings in natural language. subword information. Transactions of the Associa- In AAAI 2017, pages 3259–3266. tion for Computational Linguistics, 5:135–146. Hyun-jung Park, Min-chae Song, and Kyung-Shik Kyunghyun Cho, Bart van Merrienboer, Çaglar Shin. 2018. Sentiment analysis of korean reviews Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Hol- using cnn: Focusing on morpheme embedding. ger Schwenk, and Yoshua Bengio. 2014. Learning Journal of Intelligence and Information Systems, phrase representations using RNN encoder-decoder 24(2):59–83. for statistical machine translation.InEMNLP 2014, pages 1724–1734. Koki Washio, Satoshi Sekine, and Tsuneaki Kato. 2019. Bridging the defined and the defining: Exploiting im- Zhendong Dong and Qiang Dong. 2006. Hownet and plicit lexical semantic relations in definition model- the computation of meaning. World Scientific. ing.InEMNLP-IJCNLP 2019, pages 3519–3525.

Joseph L Fleiss and Jacob Cohen. 1973. The equiv- Anna Wierzbicka. 1996. : Primes and uni- alence of weighted kappa and the intraclass corre- versals: Primes and universals. Oxford University lation coefficient as measures of reliability. Educa- Press, UK. tional and Psychological Measurement, 33(3):613– 619. Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, and Erhong Yang. 2020. Incorpo- Yonghe Fu. 1988. Modern chinese general character rating sememes into chinese definition modeling. list: Appendix of the most and secondly frequently IEEE/ACM Transactions on Audio, Speech and Lan- used characters. Yuwen Jianshe. guage Processing, 28:1669–1677.

Artyom Gadetsky, Ilya Yakubovskiy, and Dmitry P. Dexi Zhu. 1982. Yufa Jiangyi (Lectures on Grammar). Vetrov. 2018. Conditional generators of words defi- The Commercial Press, China. nitions.InACL 2018, pages 266–271. Yi Zhu, Ivan Vulic, and Anna Korhonen. 2019. A systematic study of leveraging subword information Alex Graves and Jürgen Schmidhuber. 2005. Frame- for learning word representations.InNAACL-HLT wise phoneme classification with bidirectional 2019, pages 912–932. LSTM and other neural network architectures. Neu- ral Networks, 18(5-6):602–610. A Formation-Informed Dataset Details Shonosuke Ishiwatari, Hiroaki Hayashi, Naoki Yoshi- We show the complete descriptions of 16 formation naga, Graham Neubig, Shoetsu Sato, Masashi Toy- oda, and Masaru Kitsuregawa. 2019. Learning to rules in Table 7. describe unknown phrases with local and global con- texts.InNAACL-HLT 2019, pages 3467–3476. B Additional Analysis on Formation Rules Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR 2015. Here we provide some additional interesting analy- ses that reveal the specific properties and influences Jiahuan Li, Yu Bao, Shujian Huang, Xinyu Dai, and Jiajun Chen. 2020. Explicit semantic decomposition of word formation rules. for definition generation.InACL 2020, pages 708– 717. B.1 The similarity among different formation rules Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018. Analogical reasoning on chi- In Section 4.2.1, we produce a comprehensive nese morphological and semantic relations.InACL morpheme embedding using a rule-specific linear 2018, pages 138–143. layer. We study the relations of the weight matrices (rule) Chin-Yew Lin. 2004. Rouge: A package for automatic Wm in these layers by resizing them into vec- evaluation of summaries.InACL 2004 Workshop on tors and calculating their pairwise cosine similarity, Text Summarization Branches Out, pages 74–81. as shown in Figure 4. Zi Lin and Yang Liu. 2019. Implanting rational knowl- Figure 4 (a) shows that Overlapping is most sim- edge into distributed representation at morpheme ilar to Suffixation, Prefixation, and Single Mor- level. In AAAI 2019, pages 2954–2961. pheme. Interestingly, word meanings constructed 5529 Word Formation Rule Explanation Use Case %

定中 dìng·zhong¯ (Modifier-Head) morph1 modifies morph2 (noun). 红花 hóng·hua¯ (red-flower) 38.62 morph and morph are similar, 联合 lián·hé (Parallel) 1 2 昏花 hun¯ ·hua¯ (dizzy-dim) 22.87 contrasting or complementary.

述宾 shù·b¯ın (Verb-Object) morph1 operates on morph2. 花钱 hua¯·qián (spend-money) 16.44

状中 zhuàng·zhong¯ (Adverb-Verb) morph1 modifies morph2 (verb). 白花 bái·hua¯ (vainly-spend) 8.45 单纯 dan¯ ·chún (Single Morpheme) The word is a single morpheme. 花生 hua¯·sheng¯ (peanut) 3.51

连谓 lián·wèi (Verb-Consequence) morph2 is the consequence of morph1. 休息 xiu¯·xi (stop-rest) 3.43

后缀 hòu·zhuì (Suffixation) morph2 is the suffix of morph1. 花头 hua¯·tóu (trick-∅) 2.70

述补 shù·buˇ (Verb-Complement) morph2 is the action follows morph1. 压低 ya¯·d¯ı (press-down) 1.28

主谓 zhuˇ·wèi (Subject-Predicate) morph1 is the subject of morph2. 眼花 yanˇ ·hua¯ (eyesight-dim) 1.06

重叠 chóng·dié (Overlapping) morph1 and morph2 are the same. 白白 bái·bái (vainly-vainly) 0.59

方位 fang¯ ·wèi (Entity-Position) morph1 is an entity, morph2 is a position. 期中 q¯ı·zhong¯ (semester-mid) 0.37 morph is a preposition, morph is an 介宾 jiè·b¯ın (Preposition-Object) 1 2 凭空 píng·kong¯ (from-nowhere) 0.31 object.

名量 míng·liàng (Noun-Quantifier) morph2 is the quantifier of morph1. 花朵 hua¯·duoˇ (flower-bud) 0.13 morph is a number, morph is a 数量 shù·liàng (Number-Quantifier) 1 2 一点 yì·dianˇ (one-dot) 0.10 quantifier.

前缀 qián·zhuì (Prefixation) morph1 is the prefix of morph2. 老师 laoˇ ·sh¯ı(∅-teacher) 0.10

复量 fù·liàng (Quantifier-Quantifier) Both morph1 and morph2 are quantifiers. 千米 qian¯ ·mˇı (kilo-meter) 0.03

Table 7: Descriptions of the total 16 formation rules. ∅ denotes the affix and % denotes the instance percentage. The first and the third columns are in the format of “Chinese characters - Chinese phonetic notation - (English translation)”. To help understand these rules, we give a simple explanation to describe the relation between two morphemes in the second column. by these four formation rules share a similar pat- show that each predicted definition indicates a clear tern of using only one morpheme. For example, pattern of the used formation rule. The Modifier- in Suffixation, only the first morpheme carries the Head rule uses touched to modify sad, and out- meaning, like “花头”(trick-∅). puts a noun; the Adverb-Verb rule outputs an ad- Figure 4 (b) shows that Noun-Quantifier is jective in a similar modifying way; the Parallel most similar to Quantifier-Quantifier and Number- rule outputs a single meaning of sad with differ- Quantifier. Word meanings constructed by these three formation rules all have a quantifier mor- pheme. For example, in Number-Quantifier, the 感伤 first morpheme is a number, and the second mor- Morph1: 感触 (touched) Morph2: 伤心 (sad) 一点 pheme is a quantifier, like “ ”(one-dot). Verb-Consequence () 因有所感触而悲伤。 Figure 4 (c) shows that Verb-Object is most (Sad due to being similar to Modifier-Head, Parallel and Adverb- touched.) Verb. Word meanings constructed by Verb-Object, Modifier-Head 感触的悲伤。 Modifier-Head and Adverb-Verb share a similar (Sadness of being pattern of using the first morpheme to operate on touched.) the second morpheme. For example, in Adverb- Parallel 伤心。 Verb, the first morpheme modifies the second verb (Sad.) morpheme, like “白花”(vainly-spend). Adverb-verb 难过的伤心。 (Woefully sad.) B.2 The specific impact of formation rules In Table 8, we generate definitions using differ- Table 8: A case study of 4 predicted definitions of “感 伤  ent formation rules for the same word. Results ” using 1 correct formation rule ( ) and 3 others. 5530 Figure 4: We take Verb-Object, Overlapping and Noun-Quantifier weight matrices as three examples and display their similarity with all the other 15 formation rules in the heatmap. The color goes deeper for more similar formation rules. The top 3 most similar formation rules are shown in Bold. ent parts-of-speech. However, only the correct rule (Verb-Consequence) captures the cause-and-effect semantic aspect and outputs the correct definition. This reveals the impact of formation rules in the word formation process.

5531