Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

Cecilia Ovesdotter Alm Department of English College of Liberal Arts Rochester Institute of Technology [email protected]

Abstract lems. Rather, it intends to launch discussions about how subjective natural language problems have a vi- This opinion paper discusses subjective natu- tal role to play in computational linguistics and in ral language problems in terms of their mo- shaping fundamental questions in the field for the tivations, applications, characterizations, and future. An additional point of departure is that a implications. It argues that such problems deserve increased attention because of their po- continuing focus on primarily the fundamental dis- tential to challenge the status of theoretical tinction of facts vs. opinions (implicitly, denotative understanding, problem-solving methods, and vs. connotative meaning) is, alas, somewhat limit- evaluation techniques in computational lin- ing. An expanded scope of problem types will bene- guistics. The author supports a more holis- fit our understanding of subjective language and ap- tic approach to such problems; a view that proaches to tackling this family of problems. extends beyond opinion mining or sentiment analysis. It is definitely reasonable to assume that problems involving subjective perception, meaning, and lan- 1 Introduction guage behaviors will diversify and earn increased attention from computational approaches to language. Interest in subjective meaning and individual, inter- Banea et al already noted: “We have seen a surge personal or social, poetic/creative, and affective di- in interest towards the application of automatic tools mensions of language is not new to linguistics or and techniques for the extraction of opinions, emo- computational approaches to language. Language tions, and sentiments in text (subjectivity)” (p. 127) analysts, including computational linguists, have (Banea et al, 2008). Therefore, it is timely and use- long acknowledged the importance of such topics ful to examine subjective natural language problems (Buhler,¨ 1934; Lyons, 1977; Jakobson, 1996; Halli- from different angles. The following account is an day, 1996; Wiebe et al, 2004; Wilson et al, 2005). In attempt in this direction. The first angle that the pa- computational linguistics and natural language pro- per comments upon is what motivates investigatory cessing (NLP), current efforts on subjective natural efforts into such problems. Next, the paper clarifies language problems are concentrated on the vibrant what subjective natural language processing prob- field of opinion mining and sentiment analysis (Liu, lems are by providing a few illustrative examples of 2010; Tackstr¨ om,¨ 2009), and ACL-HLT 2011 lists some relevant problem-solving and application ar- Sentiment Analysis, Opinion Mining and Text Clas- eas. This is followed by discussing yet another an- sification as a subject area. The terms subjectivity or gle of this family of problems, namely what some subjectivity analysis are also established in the NLP of their characteristics are. Finally, potential im- literature to cover these topics of growing inquiry. plications for the field of computational linguistics The purpose of this opinion paper is not to pro- at large are addressed, with the hope that this short vide a survey of subjective natural language prob- piece will spawn continued discussion.

107

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers, pages 107–112, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics 2 Motivations in text at the sentence level that accounted for different levels of affective granularity (Alm, 2008; Alm, The types of problems under discussion here are 2009; Alm, 2010). There are other examples of the fundamental language tasks, processes, and phe- strong interest in affective NLP or affective interfac- nomena that mirror and play important roles in peo- ing (Liu et al, 2003; Holzman and Pottenger, 2003; ple’s daily social, interactional, or affective lives. Francisco and Gervas,´ 2006; Kalra and Karahalios, Subjective natural language processing problems 2005; Gen´ ereux´ and Evans, 2006; Mihalcea and Liu, represent exciting frontier areas that directly re- 2006). Affective semantics is difficult for many au- late to advances in artificial natural language be- tomatic techniques to capture because rather than havior, improved intelligent access to information, simple text-derived ‘surface’ features, it requires so- and more agreeable and comfortable language-based phisticated, ‘deep’ natural language understanding human-computer interaction. As just one example, that draws on subjective human knowledge, inter- interactional systems continue to suffer from a bias pretation, and experience. At the same time, ap- toward ‘neutral’, unexpressive (and thus commu- proaches that accumulate knowledge bases face is- nicatively cumbersome) language. sues such as the artificiality and limitations of trying From a practical, application-oriented point of to enumerate rather than perceive and experience hu- view, dedicating more resources and efforts to sub- man understanding. jective natural language problems is a natural step, given the wealth of available written, spoken or mul- 3.2 Case 2: Image sense discrimination timodal texts and information associated with cre- Image sense discrimination refers to the problem of ativity, socializing, and subtle interpretation. From determining which images belong together (or not) a conceptual and methodological perspective, auto- (Loeff et al, 2006; Forsyth et al, 2009). What counts matic subjective text analysis approaches have po- as the sense of an image adds subjective complex- tential to challenge the state of theoretical under- ity. For instance, images capture “both word and standing, problem-solving methods, and evaluation iconographic sense distinctions ... CRANE can re- techniques. The discussion will return to this point fer to, e.g. a MACHINE or a BIRD; iconographic in section 5. distinctions could additionally include birds stand- 3 Applications ing, vs. in a marsh land, or flying, i.e. sense distinctions encoded by further descriptive modication in Subjective natural language problems extend well text.” (p. 547) (Loeff et al, 2006). In other words, beyond sentiment and opinion analysis. They in- images can evoke a range of subtle, subjective mean- volve a myriad of topics–from linguistic creativity ing phenomena. Challenges for annotating images via inference-based forecasting to generation of so- according to lexical meaning (and the use of verifi- cial and affective language use. For the sake of illus- cation as one way to assess annotation quality) have tration, four such cases are presented below (bearing been discussed in depth, cf. (Alm et al, 2006). in mind that the list is open-ended). 3.3 Case 3: Multilingual communication 3.1 Case 1: Modeling affect in language The world is multilingual and so are many human A range of affective computing applications apply language technology users. Multilingual applica- to language (Picard, 1997). One such area is au- tions have strong potential to grow. Arguably, future tomatically inferring affect in text. Work on auto- generations of users will increasingly demand tools matic affect inference from language data has gener- capable of effective multilingual tasking, communi- ally involved recognition or generation models that cation and inference-making (besides expecting ad- contrast a range of affective states either along af- justments to non-native and cross-linguistic behav- fect categories (e.g. angry, happy, surprised, neu- iors). The challenges of code-mixing include dy- tral, etc.) or dimensions (e.g. arousal and pleasant- namically adapting sociolinguistic forms and func- ness). As one example, Alm developed an affect tions, and they involve both flexible, subjective dataset and explored automatic prediction of affect sense-making and perspective-taking.

108 3.4 Case 4: Individualized iCALL • Variation in human behavior Humans often A challenging problem area of general interest vary in their assessments of these language be- is language learning. State-of-the-art intelligent haviors. The variability could reflect, for exam- computer-assisted language learning (iCALL) ap- ple, individual preferences and perceptual dif- proaches generally bundle language learners into a ferences, and that humans adapt, readjust, or homogeneous group. However, learners are individ- change their mind according to situation de- uals exhibiting a vast range of various kinds of dif- tails. Humans (e.g. dataset annotators) may ferences. The subjective aspects here are at another be sensitive to sensory demands, cognitive fa- level than meaning. Language learners apply per- tigue, and external factors that affect judge- sonalized strategies to acquisition, and they have a ments made at a particular place and point in myriad of individual communicative needs, motiva- time. Arguably, this behavioral variation is part tions, backgrounds, and learning goals. A frame- of the given subjective language problem. work that recognizes subjectivity in iCALL might • Absence of real ‘ground truth’? For such exploit such differences to create tailored acquisition problems, acceptability may be a more useful flows that address learning curves and proficiency concept than ‘right’ and ’wrong’. A partic- enhancement in an individualized manner. Counter- ular solution may be acceptable/unacceptable ing boredom can be an additional positive side-effect rather than accurate/erroneous, and there may of such approaches. be more than one acceptable solution. (Rec- 4 Characterizations ognizing this does not exclude that acceptability may in clear, prototypical cases converge It must be acknowledged that a problem such as on just one solution, but this scenario may not inferring affective meaning from text is a substan- apply to a majority of instances.) This central tially different kind of ‘beast’ compared to predict- characteristic is, conceptually, at odds with in- ing, for example, part-of-speech tags.1 Identifying terannotator agreement ‘targets’ and standard such problems and tackling their solutions is also performance measures, potentially creating an becoming increasingly desirable with the boom of abstraction gap to be filled. If we recog- personalized, user-generated contents. It is a use- nize that (ground) truth is, under some circum- ful intellectual exercise to consider what the gen- stances, a less useful concept–a problem reduc- eral characteristics of this family of problems are. tion and simplification that is undesirable be- This initial discussion is likely not complete; that is cause it does not reflect the behavior of lan- also not the scope of this piece. The following list is guage users–how should evaluation then be ap- rather intended as a set of departure points to spark proached with rigor? discussion. • Social/interpersonal focus Many problems in • Non-traditional intersubjectivity Subjective this family concern inference (or generation) natural language processing problems are gen- of complex, subtle dimensions of meaning and erally problems of meaning or communication information, informed by experience or socio- where so-called intersubjective agreement does culturally influenced language use in real- not apply in the same way as in traditional situation contexts (including human-computer tasks. interaction). They tend to tie into sociolin- • Theory gaps A particular challenge is that sub- guistic and interactional insights on language jective language phenomena are often less un- (Mesthrie et al, 2009). derstood by current theory. As an example, in • Multimodality and interdisciplinarity Many the affective sciences there is a vibrant debate– of these problems have an interactive and hu- indeed a controversy–on how to model or even manistic basis. Multimodal inference is ar- define a concept such as emotion. guably also of importance. For example, writ- 1No offense intended to POS tagger developers. ten web texts are accompanied by visual mat-

109 ter (‘texts’), such as images, videos, and text load (e.g. counting task steps or load bro- aesthetics (font choices, etc.). As another ex- ken down into units), safety increase and non- ample, speech is accompanied by biophysical invasiveness (e.g. attention upgrade when per- cues, visible gestures, and other perceivable in- forming a complex task), or. Combining stan- dicators. dard metrics of system performance with alter- native assessment methods may provide espe- It must be recognized that, as one would expect, cially valuable holistic evaluation information. one cannot ‘neatly’ separate out problems of this type, but core characteristics such as non-traditional • Dataset annotation Studies of human annota- intersubjectivity, variation in human behavior, and tions generally report on interannotator agree- recognition of absence of real ‘ground truth’ may be ment, and many annotation schemes and ef- quite useful to understand and appropriately model forts seek to reduce variability. That may problems, methods, and evaluation techniques. not be appropriate (Zaenen, 2006), consid- ering these kinds of problems (Alm, 2010). 5 Implications Rather, it makes sense to take advantage of The cases discussed above in section 3 are just se- corpus annotation as a resource, beyond com- lections from the broad range of topics involving putational work, for investigation into actual aspects of subjectivity, but at least they provide language behaviors associated with the set of glimpses at what can be done in this area. The list problems dealt with in this paper (e.g. vari- could be expanded to problems intersecting with the ability vs. trends and language–culture–domain digital humanities, healthcare, economics or finance, dependence vs. independence). For exam- and political science, but such discussions go be- ple, label-internal divergence and intraannota- yond the scope of this paper. Instead the last item on tor variation may provide useful understand- this agenda concerns the broader, disciplinary im- ing of the language phenomenon at stake; sur- plications that subjective natural language problems veys, video recordings, think-alouds, or inter- raise. views may give additional insights on human (annotator) behavior. The genetic computation • Evaluation If the concept of “ground truth” community has theorized concepts such as user needs to be reassessed for subjective natural fatigue and devised robust algorithms that in- language processing tasks, different and al- tegrate interactional, human input in effective ternative evaluation techniques deserve care- ways (Llora` et al, 2005; Llora` et al, 2005). ful thought. This requires openness to alterna- Such insights can be exploited. Reporting on tive assessment metrics (beyond precision, re- sociolinguistic information in datasets can be call, etc.) that fit the problem type. For ex- useful properties for many problems, assuming ample, evaluating user interaction and satis- that it is feasible and ethical for a given context. faction, as Liu et al (2003) did for an affective email client, may be relevant. Similarly, • Analysis of ethical risks and gains Overall, analysis of acceptability (e.g. via user or anno- how language and technology coalesce in so- tation verification) can be informative. MOS ciety is rarely covered; but see Sproat (2010) testing for speech and visual systems has such for an important exception. More specifically, flavors. Measuring pejoration and ameliora- whereas ethics has been discussed within the tion effects on other NLP tasks for which stan- field of affective computing (Picard, 1997), dard benchmarks exist is another such route. how ethics applies to language technologies re- In some contexts, other measures of quality mains an unexplored area. Ethical interroga- of life improvements may help complement tions (and guidelines) are especially important (or, if appropriate, substitute) standard evalua- as language technologies continue to be refined tion metrics. These may include ergonomics, and migrate to new domains. Potential prob- personal contentment, cognitive and physical lematic implications of language technologies–

110 or how disciplinary contributions affect the lin- tion away from how subjective perception and pro- guistic world–have rarely been a point of dis- duction phenomena actually manifest themselves in cussion. However, there are exceptions. For natural language. In encouraging a focus on efforts example, there are convincing arguments for to achieve ’high-performing’ systems (as measured gains that will result from an increased engage- along traditional lines), there is risk involved–the ment with topics related to endangered lan- sacrificing of opportunities for fundamental insights guages and language documentation in compu- that may lead to a more thorough understanding of tational linguistics (Bird, 2009), see also Ab- language uses and users. Such insights may in fact ney and Bird (2010). By implication, such ef- decisively advance language science and artificial forts may contribute to linguistic and cultural natural language intelligence. sustainability. Acknowledgments • Interdisciplinary mixing Given that many subjective natural language problem have a hu- I would like to thank anonymous reviewers and col- manistic and interpersonal basis, it seems par- leagues for their helpful comments. ticularly pivotal with investigatory ‘mixing’ efforts that reach outside the computational lin- References guistics community in multidisciplinary net- works. As an example, to improve assess- Abney, Steven and Steven Bird. 2010. The Human Lan- guage Project: Building a Universal Corpus of the ment of subjective natural language process- worlds languages. Proceedings of the 48th Annual ing tasks, lessons can be learned from the Meeting of the Association for Computational Linguis- human-computer interaction and social com- tics, Uppsala, Sweden, 8897. puting communities, as well as from the digi- Alm, Cecilia Ovesdotter. 2009. Affect in Text and tal humanities. In addition, attention to multi- Speech. VDM Verlag: Saarbrcken. modality will benefit increased interaction as it Alm, Cecilia Ovesdotter. 2010. Characteristics of high demands vision or tactile specialists, etc.2 agreement affect annotation in text. Proceedings of the LAW IV workshop at the 48th Annual Meeting of the • Intellectual flexibility Engaging with prob- Association for Computational Linguistics, Uppsala, lems that challenge black and white, right vs. Sweden, 118-122. wrong answers, or even tractable solutions, Alm, Cecilia Ovesdotter. 2008. Affect Dataset. GNU present opportunities for intellectual growth. Public License. Alm, Cecilia Ovesdotter and Xavier Llora.´ 2006. These problems can constitute an opportunity Evolving emotional prosody Proceedings of INTER- for training new generations to face challenges. SPEECH 2006 - ICSLP, Ninth International Confer- ence on Spoken Language Processing, Pittsburgh, PA, 6 Conclusion USA, 1826-1829. To conclude: there is a strong potential–or, as this Alm, Cecilia Ovesdotter, Nicolas Loeff, and David Forsyth. 2006. Challenges for annotating images for paper argues, a necessity–to expand the scope of sense disambiguation. Proceedings of the Workshop computational linguistic research into subjectivity. on Frontiers in Linguistically Annotated Corpora, at It is important to recognize that there is a broad fam- the 21st International Conference on Computational ily of relevant subjective natural language problems Linguistics and 44th Annual Meeting of the Associa- with theoretical and practical, real-world anchoring. tion for Computational Linguistics, Sydney, 1-4. The paper has also pointed out that there are certain Banea, Carmen, Rada Mihalcea, Janyce Wiebe, and aspects that deserve special attention. For instance, Samer Hassan. 2008. Multilingual subjectivity anal- there are evaluation concepts in computational lin- ysis using machine translation. Proceedings of the 2008 Conference on Empirical Methods in Natural guistics that, at least to some degree, detract atten- Language Processing, 127-135. 2When thinking along multimodal lines, we might stand a Bird, Steven. 2009. Last words: Natural language pro- chance at getting better at creating core models that apply suc- cessing and linguistic fieldwork. Journal of Computa- cessfully also to signed languages. tional Linguistics, 35 (3), 469-474.

111 Buhler,¨ Karl. 1934. Sprachtheorie: Die Darstellungs- Loeff, Nicolas, Cecilia Ovesdotter Alm, and David funktion der Sprache. Stuttgart: Gustav Fischer Ver- Forsyth. 2006. Discriminating image senses by clus- lag. tering with multimodal features. Proceedings of the Forsyth, David, Tamana Berg, Cecilia Ovesdotter Alm, 21st International Conference on Computational Lin- Ali Farhadi, Julia Hockenmaier, Nicolas Loeff, and guistics and the 44th ACL, Sydney, Australia, 547-554. Gang Wang. Words and pictures: categories, modi- Lyons, John. 1977. Semantics volumes 1, 2. Cambridge: fiers, depiction, and iconography. In S. J. Dickinson, Cambridge University Press. et al (Eds.). Object Categorization: Computer and Hu- Mesthrie, Rajend, Joan Swann, Ana Deumert, and man Vision Perspectives, 167-181. Cambridge: Cam- William Leap. 2009. Introducing Sociolinguistics, bridge Univ. Press. 2nd ed. Amsterdam: John Benjamins. Francisco, Virginia and Pablo Gervas.´ 2006. Explor- Mihalcea, Rada and Hugo Liu. 2006. A corpus-based ap- ing the compositionality of emotions in text: Word proach to finding happiness. AAAI Spring Symposium emotions, sentence emotions and automated tagging. on Computational Approaches to Analyzing Weblogs, AAAI-06 Workshop on Computational Aesthetics: Ar- 139-144. tificial Intelligence Approaches to Beauty and Happi- Picard, Rosalind W. 1997. Affective Computing. Cam- ness. bridge, Massachusetts: MIT Press. Gen´ ereux,´ Michel and Roger Evans. 2006. Distinguish- Sproat, Richard. 2010. Language, Technology, and Soci- ing affective states in weblog posts. AAAI Spring ety. Oxford: Oxford University Press. Symposium on Computational Approaches to Analyz- Tackstr¨ om,¨ Oscar. 2009. A literature survey of methods ing Weblogs, 40-42. for analysis of subjective language. SICS Technical Halliday, Michael A. K. 1996. Linguistic function and Report T2009:08, ISSN 1100-3154. literary style: An inquiry into the language of William Wiebe, Janyce, Theresa Wilson, Rebecca Bruce, Golding’s The Inheritors. Weber, Jean Jacques (ed). Matthew Bell, and Melanie Martin. 2004. Learning The Stylistics Reader: From Roman Jakobson to the subjective language. Journal of Computational Lin- Present. London: Arnold, 56-86. guistics 30 (3), 277-308. Holzman, Lars E. and William Pottenger. 2003. Classifi- Wilson, Theresa, Janyce Wiebe, and Paul Hoffman. cation of emotions in Internet chat: An application of 2005. Recognizing contextual polarity in phrase-level machine learning using speech phonemes. LU-CSE- sentiment analysis. Proceedings of the Human Lan- 03-002, Lehigh University. guage Technology Conference and Conference on Em- Jakobson, Roman. 1996. Closing statement: Linguistics pirical Methods in Natural Language Processing, 347- and poetics. Weber, Jean Jacques (ed). The Stylistics 354. Reader: From Roman Jakobson to the Present . Lon- Zaenen, Annie. 2006. Mark-up barking up the wrong don: Arnold, 10-35. tree. Journal of Computational Linguistics 32 (4), Karla, Ankur and Karrie Karahalios. 2005. TextTone: 577-580. Expressing emotion through text. Interact 2005, 966- 969. Liu, Bing. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing, second edition. Nitin Indurkhya and Fred J. Damerau (Eds.). Boca Raton: CRC Press, 627-666. Liu, Hugo, Henry Lieberman, and Ted Selker. 2003. A model of textual affect sensing using real-world knowledge International Conference on Intelligent User Interfaces, 125-132. Llora,` Xavier, Kumara Sastry, David E. Goldberg, Abhi- manyu Gupta, and Lalitha Lakshmi. 2005. Combating user fatigue in iGAs: Partial ordering, Support Vec- tor Machines, and synthetic fitness Proceedings of the Genetic and Evolutionary Computation Conference. Llora,` Xavier, Francesc Al´ıas, Llu´ıs Formiga, Kumara Sastry and David E. Goldberg. Evaluation consis- tency in iGAs: User contradictions as cycles in partial- ordering graphs IlliGAL TR No 2005022, University of Illinois at Urbana-Champaign.

112