Solving and Generating Chinese Character Riddles
Total Page:16
File Type:pdf, Size:1020Kb
Solving and Generating Chinese Character Riddles + Chuanqi Tan† ∗ Furu Wei‡ Li Dong Weifeng Lv† Ming Zhou‡ †State Key Laboratory of Software Development Environment, Beihang University, China + ‡Microsoft Research Asia University of Edinburgh + †[email protected] [email protected] ‡ fuwei, mingzhou @microsoft.com †[email protected] { } Abstract and a corresponding solution. The character rid- dle is one of the most popular forms of various rid- Chinese character riddle is a riddle game in dles in which the riddle solution is a single Chinese which the riddle solution is a single Chi- character. While English words are strings of let- nese character. It is closely connected with ters together, Chinese characters are composed of the shape, pronunciation or meaning of Chi- radicals that associate with meaning or metaphor. nese characters. The riddle description (sen- In other words, Chinese characters are usually posi- tence) is usually composed of phrases with rich linguistic phenomena (such as pun, sim- tioned into some common structures, such as upper- ile, and metaphor), which are associated to lower structure, left-right structure, inside-outside different parts (namely radicals) of the so- structure, which means they can be decomposed lution character. In this paper, we propose into other characters or radicals. For example, “好” a statistical framework to solve and generate (good), a character with left-right structure, can be Chinese character riddles. Specifically, we decomposed into “女” (daughter) and “子” (son). As learn the alignments and rules to identify the illustrated in Figure 1(a), the left part of “好” is “女” metaphors between phrases in riddles and rad- 子 女 子 icals in characters. Then, in the solving phase, and the right part is “ ”. “ ” and “ ” are called we utilize a dynamic programming method the “radical” of “好”. Figure 1(b) is another exam- to combine the identified metaphors to obtain ple of the character “思” (miss) with an upper-lower candidate solutions. In the riddle generation structure. phase, we use a template-based method and a replacement-based method to obtain candi- 好 date riddle descriptions. We then use Rank- 田 good ing SVM to rerank the candidates both in the field solving and generation process. Experimental 思 results in the solving task show that the pro- 女 子 miss 心 posed method outperforms baseline methods. daughter son heart We also get very promising results in the gen- (a) Left-Right Structure (b) Upper-Lower Structure eration task according to human judges. Figure 1: Examples of the structure of Chinese characters 1 Introduction One of the most important characteristics of char- acter riddle lies in the structure of Chinese charac- The riddle is regarded as one of the most unique ters. Unlike the common riddles which imply the and vital elements in traditional Chinese culture, object in the riddle descriptions, character riddles which is usually composed of a riddle description pay more attention to structures such as combination ∗The work was done when the first author and the third of radicals and decomposition of characters. Ac- author were interns at Microsoft Research Asia. cording to these characteristics, metaphors in the 846 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 846–855, Austin, Texas, November 1-5, 2016. c 2016 Association for Computational Linguistics 千 里 会 千 金 2 Related Work thousand kilometer meet thousand gold To the best of our knowledge, no previous work has 马 女 studied on Chinese riddles. For other languages, horse daughter there are a few approaches concentrated on solv- 妈 ing English riddles. Pepicello and Green (1984) de- mother scribe the various strategies incorporated in riddles. (De Palma and Weiner, 1992; Weiner and De Palma, Figure 2: An example of Chinese character riddle: The solution 1993) use the knowledge representation system to “妈” is composed of the radical “女” derived from “千 金” and solve English riddles that consist of a single sen- “马” derived from “千 里”. tence question followed by a single sentence an- swer. They propose to build the relation between the riddles always imply the radicals of characters. phonemic representation and their associated lexi- We show an example of a Chinese character rid- cal concepts. Binsted and Ritchie (1994) imple- dle in Figure 2. The riddle description is “千 里 会 ment a program JAPE which generates riddles from 千 金” and the riddle solution is “妈”. In this exam- humour-independent lexical entries and evaluate the ple, “千 里” (thousand kilometer) aligns with “马” behaviour of the program by 120 children (Binsted (horse) because in Chinese culture it is said that a et al., 1997). Olaosun and Faleye (2015) identify good horse can run thousands of kilometers per day. meaning construction strategies in selected English Furthermore, “千 金” (thousand gold) aligns with riddles in the web and account for the mental pro- “女” (daughter) because of the analogy that a daugh- cesses involved in their production, which shows ter is very important in the family. The final solution that the meaning of a riddle is an imposed mean- “妈” is composed of these two metaphors because ing that relates to the logical, experiential, linguistic, the radical “女” meets the radical “马”. Radicals can literary and intuitive judgments of the riddles. Be- be derived not only from the meaning of metaphors, sides, there are some studies in Yoruba(Ak´ınyem´ı, but also from the structure of characters. We will de- 2015b; Ak´ınyem´ı, 2015a; Magaji, 2014). All of scribe the alignments and rules in detail in Section 3. these works focus on the semantic meaning, which In this paper, we propose a statistical framework is different from Chinese character riddles that focus to solve and generate Chinese character riddles. We on the structure of characters. show our pipeline in Figure 3. First, we learn the Another popular word game is Crossword Puzzles common alignments and the combination rules from (CPs) that normally has the form of a square or rect- large riddle-solution pairs which are mined from the angular grid of white and black shaded squares. The Web. The alignments and rules are used to identify white squares on the border of the grid or adjacent to the metaphors in the riddles. Second, in the solving the black ones are associated with clues. Compared phase, we utilize a dynamic programming algorithm with our riddle task, the clues in the CPs are derived on the basis of the alignments and rules to figure out from each question where the radicals in solution are the candidate solutions. For the generating phase, derived from the metaphors in the riddles. Proverb we use a template-based method and a replacement- (Littman et al., 2002) is the first system for the au- based method based on the decomposition of the tomatic resolution of CPs. Ernandes et al. (2005) character to generate the candidate riddles. Finally, utilize a web-search module to find sensible candi- we employ Ranking SVM to rank the candidates in dates to questions expressed in natural language and both the solving and generation task. We conduct get the final answer by ranking the candidates. And the evaluation on 2,000 riddles in the riddle solving the rule-based module and the dictionary module are task and 100 Chinese characters in the riddle gener- mentioned in his work. The tree kernel is used to ation task. Experimental results show that the pro- rerank the candidates proposed by Barlacchi et al. posed method outperforms baseline methods in the (2014) for automatic resolution of crossword puz- solving task. We also get very promising results in zles. the generation task according to human judges. From another perspective, there are a few projects 847 Offline Learning Phrase-Radical Alignment Alignment Riddle/Solution Pairs Rule Table and Rule Learning Table Riddle Solving Solution Solution Candidate Solution Riddle Description Ranking Generation Riddle Generation Riddle Riddle Candidate Solution (Chinese Riddle Description Ranking Generation Character) Figure 3: The pipeline of offline learning, riddle solving and riddle generation on Chinese language cultures, such as the couplet scribe the simple metaphors, e.g. “千 里” aligns generation and the poem generation. A statistical “马”, which aligns the phrase and the radical by the machine translation (SMT) framework is proposed meaning. We employ a statistical framework with to generate Chinese couplets and classic Chinese po- a word alignment algorithm to automatically mine etry (He et al., 2012; Zhou et al., 2009; Jiang and phrase-radical metaphors from riddle dataset. Con- Zhou, 2008). Jiang and Zhou (2008) use a phrase- sidering the alignment is often represented as the based SMT model with linguistic filters to generate matching between successive words in the riddle and Chinese couplets satisfied couplet constraints, using a radical in the solution, we propose two methods both human judgments and BLEU scores as the eval- specifically to extract alignments. The first method uation. Zhou et al. (2009) use the SMT model to in according with (Och and Ney, 2003) is described generate quatrain with a human evaluation. He et al. as follows. With a riddle description q and corre- (2012) generate Chinese poems with the given topic sponding solution s, we tokenize the input riddle words by combining a statistical machine translation q to character as (w1, w2, . , wn) and decompose model with an ancient poetic phrase taxonomy. Fol- the solution s into radicals as (r1, r2, . , rm). We lowing the approaches in SMT framework, it is valid count all ([w , w ], r )(i, j [1, n], k [1, m]) i j k ∈ ∈ to regard the metaphors with its radicals as the align- as alignments. The second method takes into ac- ments.