Analyzing Sentiment in Classical Chinese Poetry
Total Page:16
File Type:pdf, Size:1020Kb
Analyzing Sentiment in Classical Chinese Poetry Yufang Hou Anette Frank Institute for Computational Linguistics, Heidelberg University, Germany (hou|frank)@cl.uni-heidelberg.de Abstract positive or negative) of textual elements in Tang poetry from a computational perspective. Specif- Although sentiment analysis in Chinese ically, we propose a novel graph-based method to social media has attracted a lot of in- create a sentiment lexicon for classical Chinese terest in recent years, it has been less poetry. Such a lexicon is a valuable resource for explored in traditional Chinese literature other computational research on classical Chinese (e.g., classical Chinese poetry) due to the poetry, such as semantic analysis (Lee and Tak- lack of sentiment lexicon resources. In sum, 2012) or poetry generation (He et al., 2012; this paper, we propose a weakly super- Zhang and Lapata, 2014). vised approach based on Weighted Person- Turney and Littman (2003) propose a PMI- alized PageRank (WPPR) to create a sen- based algorithm to estimate the semantic orien- timent lexicon for classical Chinese po- tation or polarity of a word. The semantic ori- etry. We evaluate our lexicon intrinsically entation of a given word is calculated by com- and extrinsically. We show that our graph- paring its similarity to positive reference words based approach outperforms a previous (e.g., excellent or beautiful) with its similarity to well-known PMI-based approach (Turney negative reference words (e.g., poor or bad). In- and Littman, 2003) on both evaluation set- stead of calculating the similarity between a given tings. On the basis of our sentiment lexi- word and each of the positive (negative) reference con, we analyze sentiment in the Complete words separately, we apply Weighted Personalized Anthology of Tang Poetry. We extract top- PageRank (WPPR) to measure the similarity be- ics associated with positive (negative) sen- tween the given word and all positive (negative) timent using a position-aware sentiment- reference words simultaneously in a lexical net- topic model. We further compare senti- work that we build from a poetry corpus. Our ment among different poets in Tang Dy- graph-based method is able to find globally opti- nasty (AD 618 – 907). mal solution because the lexical network is ana- lyzed as a whole (Section 3). 1 Introduction We evaluate our poetry sentiment lexicon intrin- Classical Chinese poetry is a precious cultural her- sically and extrinsically. For the intrinsic eval- itage. Among its over 3,000 years of history, the uation, we compile two test datasets. The first Tang Dynasty (AD 618 – 907) is widely viewed dataset contains 933 words (532 positive and 401 as the zenith of the art of classical Chinese poetry. negative) taken from three Chinese sentiment lexi- 1 The Complete Anthology of Tang Poetry, edited cons . The second dataset contains 55 words taken during the Qing Dynasty (1644 – 1911), contains from literature of imagery analysis for Tang po- over 42,860 poems in 900 volumes by more than etry. These words reflect the common imageries 2,500 poets. The collection provides a magnificent in classical Chinese poetry and have certain fixed insight into all aspects of social life of that period. emotional connotations. For instance, the char- Research on sentiment/emotion and imagery acter “?” (ape) often relates to sadness, anxi- analysis of Tang poetry is an active subfield in ety and distress, while the character “w” (lotus) Chinese philology, with a vast literature (Watson, 1Although these lexicons are for contemporary Chinese, 1971; Kao and Mei, 1971; Kao and Mei, 1978). In some words keep the same meaning and polarity as in classi- this paper, we seek to analyze the sentiment (i.e., cal Chinese poetry. is the symbol of beauty, love and rectitude. We 2006; Kaji and Kitsuregawa, 2007; Kiritchenko show that our method outperforms the very com- et al., 2014) and dictionary-based approaches petitive PMI-based approach when evaluating on (Kamps et al., 2004; Esuli and Sebastiani, 2005; both datasets (Section 4.1). Our method also out- Mohammad et al., 2009; Baccianella et al., 2010). performs the baseline on an extrinsic evaluation Unlike previous graph-based approaches which task of predicting sentiment orientation of classi- create sentiment lexicons based on existing lexi- cal Chinese poetry (Section 4.2). cal resources (e.g., WordNet, thesauri) (Takamura On the basis of our sentiment lexicon, we ana- et al., 2005; Rao and avichandran, 2009; Hassan et lyze sentiment in the Complete Anthology of Tang al., 2011), there are no such lexical resources for Poetry. We first analyze topic distributions under classical Chinese poetry. Therefore, we choose a positive/negative sentiment in Tang poetry using corpus-based approach. a position-aware sentiment-topic model (Section While our approach for building sentiment lexi- 5.1). We then compare sentiment among different cons is domain independent, in this paper we apply poets in Tang Dynasty (Section 5.2). it to classical Chinese poetry. This is not a triv- The main contributions of our work are: ial task. There are a variety of reliable resources for English sentiment analysis. However, only a • We propose a graph-based method to build few sentiment lexicons for Chinese are available. a sentiment lexicon for classical Chinese po- In particular, these lexicons are for contemporary etry. Our method is weakly supervised and Chinese. Moreover, given that these lexicons are does not rely on existing lexical resources developed for contemporary Chinese, they will (e.g., WordNet). It can be easily ported to only have partial coverage for classical Chinese other domains/languages. poetry. There might also be divergences due to the • We evaluate our sentiment lexicon systemat- change of language over several thousand years. ically and demonstrate that it can be utilized To improve sentiment analysis for Chinese, one to analyze sentiment orientation of classical line of work seeks to leverage rich English senti- Chinese poetry. ment resources through machine translation (Wan, 2008; Wan, 2009; He et al., 2010). These ap- • We analyze sentiment in Tang poetry on the proaches depend on the quality of machine trans- basis of our sentiment lexicon. We apply lation and translation of classical Chinese poetry a position-aware sentiment-topic model to to English is hard even for professional transla- extract themes which are tightly associated tors. Our work is similar to Zagibalov and Car- with positive/negative sentiment. Our model roll (2008) in the sense that both approaches are builds in specific assumptions that character- weakly supervised. They build a sentiment lexi- ize sentiment expression in classical Chinese con iteratively, starting from a small set of seed poetry. It assumes that lexical items from items and several lexical patterns (negated adver- the same region are generated from a single bial constructions) which can indicate lexical po- sentiment-topic pair. We compare sentiment larity. However, such lexical patterns (e.g., 不 among different famous poets and show that (not) 很 (quite) + á意 (satisfied) (target word) ) our results are in accordance with studies in are not applicable in classical Chinese poetry. Chinese philology. Computational analysis of classical Chinese po- The poetry sentiment lexicon described in the etry. There has been previous work focusing on paper as well as all test datasets are freely available classical Chinese poetry generation (Zhou et al., at http://www.cl.uni-heidelberg. 2010; He et al., 2012; Zhang and Lapata, 2014). de/˜hou/resources.mhtml. Lee and Kong (2012) develop a dependency tree- 2 Related Work bank for the Complete Anthology of Tang Po- etry. On the basis of this corpus, Lee and Tak-sum Sentiment lexicons. In recent years, consider- (2012) quantitatively analyze the semantic con- able attention has been given to the creation of tent and word usage in the Complete Anthology large polarity (positive and negative) lexicons, in- of Tang Poetry. Voigt and Jurafsky (2013) find cluding various corpus-based approaches (Turney that the classical characters of Chinese poetry de- and Littman, 2003; Kanayama and Nasukawa, creased across the century by comparing classical poetry and contemporary prose. ank vector R over G can be calculated as follows: There are only a few works trying to analyze sentiment in classical Chinese poetry. Hu (2001) R = αMR + (1 − α)P; (1) proposes “similarity search” by using word asso- ciation measures. For instance, given typical emo- where α is the damping factor and its value usu- tional words such as “²$ (sadness) 哀 (sorrow)”, ally set in the [0:85::0:95] range. P is a N × 1 1 the system can find words (e.g., Wf (southern vector, where Pi = jSj for vi 2 S, and zero other- shore, a place often used to hold farewell parties wise, i.e., all vertices in the sentiment seeds have in ancient China) ) associated with sad emotions. equal prior probability. However, he does not analyze sentiment in classi- Equation 1 can be viewed as the result of a ran- cal Chinese poetry quantitatively. Based on manu- dom walk process starting from the seed nodes, ally annotated data, Luo (2009) analyzes the senti- where the random walkers can jump back to the ment of classical Chinese Song poetry among dif- seed nodes S with a given probability 1 − α. The ferent poets. To the best of our knowledge, there final rank of vertex vi, biased towards the set S is no publicly available sentiment lexicon for clas- (the bias is encoded in P ), represents the proba- sical Chinese poetry. bility of a random walk over the weighted graph (weights associated with edges are encoded in M) 3 Building a Sentiment Lexicon for ending on vertex vi, at a sufficiently large time. Classical Chinese Poetry 3.2 Lexical Network Construction In this section, we briefly introduce Weighted Per- To create a sentiment lexicon for classical Chinese sonalized PageRank (WPPR).