Word Sense and Subjectivity
Total Page:16
File Type:pdf, Size:1020Kb
Word Sense and Subjectivity Janyce Wiebe Rada Mihalcea Department of Computer Science Department of Computer Science University of Pittsburgh University of North Texas [email protected] [email protected] Abstract In this paper we address two questions about word sense and subjectivity. First, can subjectiv- Subjectivity and meaning are both impor- ity labels be assigned to word senses? To address tant properties of language. This paper ex- this question, we perform two studies. The first plores their interaction, and brings empir- (Section 3) investigates agreement between anno- ical evidence in support of the hypotheses tators who manually assign the labels subjective, that (1) subjectivity is a property that can objective, or both to WordNet senses. The second be associated with word senses, and (2) study (Section 4) evaluates a method for automatic word sense disambiguation can directly assignment of subjectivity labels to word senses. benefit from subjectivity annotations. We devise an algorithm relying on distributionally similar words to calculate a subjectivity score, and 1 Introduction show how it can be used to automatically assess the subjectivity of a word sense. There is growing interest in the automatic extrac- Second, can automatic subjectivity analysis be tion of opinions, emotions, and sentiments in text used to improve word sense disambiguation? To (subjectivity), to provide tools and support for var- address this question, the output of a subjectivity ious NLP applications. Similarly, there is continu- sentence classifier is input to a word-sense disam- ous interest in the task of word sense disambigua- biguation system, which is in turn evaluated on the tion, with sense-annotated resources being devel- nouns from the SENSEVAL-3 English lexical sam- oped for many languages, and a growing num- ple task (Section 5). The results of this experiment ber of research groups participating in large-scale show that a subjectivity feature can significantly evaluations such as SENSEVAL. improve the accuracy of a word sense disambigua- Though both of these areas are concerned with tion system for those words that have both subjec- the semantics of a text, over time there has been tive and objective senses. little interaction, if any, between them. In this pa- per, we address this gap, and explore possible in- A third obvious question is, can word sense dis- teractions between subjectivity and word sense. ambiguation help automatic subjectivity analysis? However, due to space limitations, we do not ad- There are several benefits that would motivate dress this question here, but rather leave it for fu- such a joint exploration. First, at the resource ture work. level, the augmentation of lexical resources such as WordNet (Miller, 1995) with subjectivity labels could support better subjectivity analysis tools, 2 Background and principled methods for refining word senses and clustering similar meanings. Second, at the tool level, an explicit link between subjectivity and Subjective expressions are words and phrases word sense could help improve methods for each, being used to express opinions, emotions, evalu- by integrating features learned from one into the ations, speculations, etc. (Wiebe et al., 2005). A other in a pipeline approach, or through joint si- general covering term for such states is private multaneous learning. state, “a state that is not open to objective obser- vation or verification” (Quirk et al., 1985).1 There and question answering (e.g., (Yu and Hatzivas- are three main types of subjective expressions:2 siloglou, 2003; Stoyanov et al., 2005)). Most manual subjectivity annotation research (1) references to private states: has focused on annotating words, out of context (e.g., (Heise, 2001)), or sentences and phrases in His alarm grew. the context of a text or conversation (e.g., (Wiebe He absorbed the information quickly. et al., 2005)). The new annotations in this pa- He was boiling with anger. per are instead targeting the annotation of word senses. (2) references to speech (or writing) events ex- pressing private states: 3 Human Judgment of Word Sense Subjectivity UCC/Disciples leaders roundly con- demned the Iranian President’s verbal To explore our hypothesis that subjectivity may assault on Israel. be associated with word senses, we developed a manual annotation scheme for assigning subjec- The editors of the left-leaning paper at- tivity labels to WordNet senses,3 and performed tacked the new House Speaker. an inter-annotator agreement study to assess its (3) expressive subjective elements: reliability. Senses are classified as S(ubjective), O(bjective), or B(oth). Classifying a sense as S He would be quite a catch. means that, when the sense is used in a text or con- What's the catch? versation, we expect it to express subjectivity; we That doctor is a quack. also expect the phrase or sentence containing it to be subjective. Work on automatic subjectivity analysis falls We saw a number of subjective expressions in into three main areas. The first is identifying Section 2. A subset is repeated here, along with words and phrases that are associated with sub- relevant WordNet senses. In the display of each jectivity, for example, that think is associated with sense, the first part shows the synset, gloss, and private states and that beautiful is associated with any examples. The second part (marked with =>) positive sentiments (e.g., (Hatzivassiloglou and shows the immediate hypernym. McKeown, 1997; Wiebe, 2000; Kamps and Marx, 2002; Turney, 2002; Esuli and Sebastiani, 2005)). His alarm grew. Such judgments are made for words. In contrast, alarm, dismay, consternation – (fear resulting from the aware- ness of danger) our end task (in Section 4) is to assign subjectivity => fear, fearfulness, fright – (an emotion experienced in labels to word senses. anticipation of some specific pain or danger (usually ac- The second is subjectivity classification of sen- companied by a desire to flee or fight)) tences, clauses, phrases, or word instances in the He was boiling with anger. seethe, boil – (be in an agitated emotional state; “The cus- context of a particular text or conversation, ei- tomer was seething with anger”) ther subjective/objective classifications or posi- => be – (have the quality of being; (copula, used with an tive/negative sentiment classifications (e.g.,(Riloff adjective or a predicate noun); “John is rich”; “This is not and Wiebe, 2003; Yu and Hatzivassiloglou, 2003; a good answer”) Dave et al., 2003; Hu and Liu, 2004)). What’s the catch? catch – (a hidden drawback; “it sounds good but what’s the The third exploits automatic subjectivity anal- catch?”) ysis in applications such as review classification => drawback – (the quality of being a hindrance; “he (e.g., (Turney, 2002; Pang and Lee, 2004)), min- pointed out all the drawbacks to my plan”) ing texts for product reviews (e.g., (Yi et al., 2003; That doctor is a quack. quack – (an untrained person who pretends to be a physician Hu and Liu, 2004; Popescu and Etzioni, 2005)), and who dispenses medical advice) summarization (e.g., (Kim and Hovy, 2004)), in- => doctor, doc, physician, MD, Dr., medico formation extraction (e.g., (Riloff et al., 2005)), 1Note that sentiment, the focus of much recent work in the Before specifying what we mean by an objec- area, is a type of subjectivity, specifically involving positive tive sense, we give examples. or negative opinion, emotion, or evaluation. 2These distinctions are not strictly needed for this paper, 3All our examples and data used in the experiments are but may help the reader appreciate the examples given below. from WordNet 2.0. The alarm went off. have both S and O senses and 16 do not (according alarm, warning device, alarm system – (a device that signals to Judge 1). Among the 16 that do not have both the occurrence of some undesirable event) S and O senses, 8 have only S senses and 8 have => device – (an instrumentality invented for a particu- lar purpose; “the device is small enough to wear on your only O senses. All of the subsets are balanced be- wrist”; “a device intended to conserve water”) tween nouns and verbs. Table 1 shows the contin- The water boiled. gency table for the two annotators’ judgments on boil – (come to the boiling point and change from a liquid to this data. In addition to S, O, and B, the annotation vapor; “Water boils at 100 degrees Celsius”) => change state, turn – (undergo a transformation or a scheme also permits U(ncertain) tags. change of position or action; “We turned from Socialism to Capitalism”; “The people turned against the President S O B U Total when he stole the election”) S 39 O O 4 43 O 3 73 2 4 82 He sold his catch at the market. B 1 O 3 1 5 catch, haul – (the quantity that was caught; “the catch was only 10 fish”) U 3 2 O 3 8 Total 46 75 5 12 138 => indefinite quantity – (an estimated quantity) The duck’s quack was loud and brief. Table 1: Agreement on balanced set (Agreement: quack – (the harsh sound of a duck) 85.5%, κ: 0.74) => sound – (the sudden occurrence of an audible event; “the sound awakened them”) Overall agreement is 85.5%, with a Kappa (κ) value of 0.74. For 12.3% of the senses, at least While we expect phrases or sentences contain- one annotator’s tag is U. If we consider these cases ing subjective senses to be subjective, we do not to be borderline and exclude them from the study, necessarily expect phrases or sentences containing percent agreement increases to 95% and κ rises to objective senses to be objective. Consider the fol- 0.90. Thus, annotator agreement is especially high lowing examples: when both are certain. Considering only the 16-word subset with both Will someone shut that damn alarm off? S and O senses (according to Judge 1), κ is .75, Can’t you even boil water? and for the 16-word subset for which Judge 1 gave While these sentences contain objective senses only S or only O senses, κ is .73.