Construction of a Chinese Idiom Knowledge Base and Its Applications

Construction of a Chinese Idiom Knowledge Base and Its Applications Lei Wang Shiwen Yu Key Laboratory of Computational Key Laboratory of Computational Linguistics of Ministry of Education Linguistics of Ministry of Education, Department of English, Peking University Peking University [email protected] [email protected] Abstract words, it is the distinctive form or construction of a particular language that has a unique form Idioms are not only interesting but also or style characteristic only of that language. An distinctive in a language for its continuity idiom is also used, in most cases, with some and metaphorical meaning in its context. intention of the writer or to express certain This paper introduces the construction of emotion or attitude. Thus in nature, idioms are a Chinese idiom knowledge base by the exaggerative and descriptive and do not belong Institute of Computational Linguistics at to the plain type. Peking University and describes an Therefore, to classify idioms according to experiment that aims at the automatic its emotional property or descriptive property is emotion classification of Chinese idioms. important for many practical applications. In In the process, we expect to know more recent years, emotion classification has become about how the constituents in a fossilized a very popular task in the area of Natural composition like an idiom function so as Language Processing (NLP), which tries to to affect its semantic or grammatical predict sentiment (opinion, emotion, etc.) from properties. As an important Chinese texts. Most research has focused on subjectivity language resource, our idiom knowledge (subjective/objective) or polarity base will play a major role in applications (positive/neutral/negative) classification. The such as linguistic research, teaching applications with this respect include human or Chinese as a foreign language and even as machine translation, automatic text a tool for preserving this non-material classification or Teaching Chinese as a Foreign Chinese cultural and historical heritage. Language (TCFL). For example, when a student learning Chinese as a foreign language encounters an idiom in his or her reading or 1 Introduction conversation, for better understanding it is important for him or her to know whether the An idiom is a multi-word expression that has a idiom is used to indicate an appreciative or figurative meaning that is comprehended in derogatory sense which is very crucial to regard to a common use of that expression that understand the attitude of the idiom user. is separate from the literal meaning or definition Another example is that long articles about of the words of which it is made (McArthur, politics in newspapers often include a lot of 1992). From a linguistic perspective, idioms are idiom usage to boost their expressiveness and usually presumed to be figures of speech that these idioms may carry emotional information. are contradictory to the principle of Obviously by knowing the emotional compositionality. The words that construct an inclination we may easily obtain a clue about idiom no longer keep their original meaning or the general attitude of the particular medium. popular sense, while in the process of its We may even be able to detect or monitor formation it develops a specialized meaning as automatically the possible hostile attitude from an entity whose sense is different from the certain electronic media which today provide so literal meanings of the constituent elements. huge amount of information that seems hard for Although an idiom is an expression not human processing on a daily basis. readily analyzable from its grammatical The rest of this paper is organized as structure or from the meaning of its component follows. Section 2 describes the construction of 10 Proceedings of the Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pages 10–17, Beijing, August 2010 a Chinese Idiom Knowledge Base (CIKB) and carried by the few characters, as Chinese idioms introduces its several applications so far. are often closely related with the fable, story or Section 3 concludes the related work that serves historical account from which they were as the basis of the building of CIKB and the originally born. As their constructs remain emotion classification experiment introduced in stable through history, Chinese idioms do not this paper. Section 4 describes the classification follow the usual lexical pattern and syntax of method, feature settings, the process of emotion modern Chinese language which has been classification and the analysis of the result. reformed many a time. They are instead highly Section 5 includes conclusions and our future compact and resemble more ancient Chinese work. language in many linguistic features. Usually a Chinese idiom reflects the moral 2 Chinese Idioms and Chinese Idiom behind the story that it is derived. (Lo, 1997) Knowledge Base For example, the idiom ―破釜沉舟‖ (pò fǔ chén zhōu) literally means ―smash the cauldrons and sink the boats.‖ It was based on a historical Generally an idiom is a metaphor — a term story where General Xiang Yu in Qin Dynasty requiring some background knowledge, (221 B. C. – 207 B. C.) ordered his army to contextual information, or cultural experience, destroy all cooking utensils and boats after they mostly to use only within a particular language, crossed a river into the enemy’s territory. He where conversational parties must possess and his men won the battle for their ―life or common cultural references. Therefore, idioms death‖ courage and ―no-retreat‖ policy. are not considered part of the language, but part Although there are similar phrases in English, of a nation’s history, society or culture. As such as ―burning bridges‖ or ―crossing the culture typically is localized, idioms often can Rubicon‖, this particular idiom cannot be used only be understood within the same cultural in a losing scenario because the story behind it background; nevertheless, this is not a definite does not indicate a failure. Another typical rule because some idioms can overcome cultural barriers and easily be translated across example is the idiom ―瓜田李下‖ (guā tián lǐ languages, and their metaphoric meanings can xià) which literally means ―melon field, under still be deduced. Contrary to common the plum trees‖. Metaphorically it implies a knowledge that language is a living thing, suspicious situation. Derived from a verse idioms do not readily change as time passes. called 《君子行》 (jūn zǐ xíng, meaning ―A Some idioms gain and lose favor in popular Gentleman’s Journey‖) from Eastern Han literature or speeches, but they rarely have any Dynasty (A. D. 25 – A. D. 220), the idiom is actual shift in their constructs as long as they do originated from two lines of the poem ―瓜田不 not become extinct. In real life, people also 纳履，李下不整冠‖ (guā tián bù nà lǚ, lǐ xià bù have a natural tendency to over exaggerate what zhěng guān) which describe a code of conduct they mean or over describe what they see or for a gentleman that says ―Don't adjust your hear sometimes and this gives birth to new shoes in a melon field and don’t tidy your hat idioms by accident. under plum trees‖ in order to avoid suspicion of Most Chinese idioms (成语: chéng1 yǔ, stealing. However, most Chinese idioms do not literally meaning ―set phrases‖) are derived possess an allusion nature and are just a from ancient literature, especially Chinese combination of morphemes that will give this classics, and are widely used in written Chinese set phrase phonetic, semantic or formal texts. Some idioms appear in spoken or expressiveness. For example, the idiom ―欢天 vernacular Chinese. The majority of Chinese 喜地‖ (huān tiān xǐ dì, metaphorically meaning idioms consist of four characters, but some have ―be highly delighted‖) literally means ―happy fewer or more. The meaning of an idiom heaven and joyful earth‖; or in the idiom ―锒铛 usually surpasses the sum of the meanings 入狱‖ (láng dāng rù yù, meaning ―be thrown into the jail‖), the word ―锒铛‖ is just the sound 1 The marks on the letters in a Pinyin are for the five tones of a prisoner’s fetters. of Chinese characters. 11 For the importance of idioms in Chinese literal translation of an idiom will not reflect its language and culture, an idiom bank with about metaphorical meaning generally, it will still be 6,790 entries were included in the most of value to those who expect to get familiar influential Chinese language knowledge base – with the constituent characters and may want to the Grammatical Knowledge base of connect its literal meaning with its metaphorical Contemporary Chinese (GKB) completed by meaning, especially for those learners of the Institute of Computational Linguistics at Chinese as a foreign language. Peking University (ICL), which has been working on language resources for over 20 years and building many knowledge bases on Chinese language. Based on that, the Chinese Core 3,000 Idiom Knowledge Base (CIKB) had been constructed from the year 2004 to 2009 and Basic 11,000 collects more than 38, 000 idioms with more semantic and pragmatic properties added. Basically the properties of each entry in CIKB 38,117 CIKB can be classified into four categories: lexical, semantic, syntactic and pragmatic, each of which also includes several fields in its container -- the SQL database. Table 1 shows Figure 1. The hierarchical structure of CIKB. the details about the fields. The idioms are classified into three grades Categories Properties in terms of appearance in texts and complexity Lexical idiom, Pinyin2, full Pinyin3, of annotation. The most commonly used 3,000 bianti4, explanation, origin idioms serve as the core idioms based on the Semantic synonym, antonym, literal statistics obtained from the corpus of People’s translation, free translation, Daily (the year of 1998), a newspaper that has English equivalent the largest circulation in China.

Construction of a Chinese Idiom Knowledge Base and Its Applications

The Fatigue Properties and Damage of the Corroded Steel Bars Under the Constant-Amplitude Fatigue Load

Is Shuma the Chinese Analog of Soma/Haoma? a Study of Early Contacts Between Indo-Iranians and Chinese

Last Name First Name/Middle Name Course Award Course 2 Award 2 Graduation

Dual-Band Plasmonic Perfect Absorber Based on the Hybrid Halide Perovskite in the Communication Regime

Achieving Wide-Band Linear-To-Circular Polarization Conversion Using Ultra-Thin Bi-Layered Metasurfaces

Immunogenicity Evaluation of Inactivated Virus and Purified Proteins of Porcine Circovirus Type 2 in Mice

Self-Study Syllabus on Chinese Foreign Policy

Chinese Language and Characters

Current Status of the Chinese National Twin Registry

Origin Narratives: Reading and Reverence in Late-Ming China

A Hypothesis on the Origin of the Yu State

Recent Publications