An Empirical Study of Verbal Irony; Identification, Interpretation, and Its Role in Turn-Taking Discourse
Total Page:16
File Type:pdf, Size:1020Kb
AN EMPIRICAL STUDY OF VERBAL IRONY; IDENTIFICATION, INTERPRETATION, AND ITS ROLE IN TURN-TAKING DISCOURSE By DEBANJAN GHOSH A dissertation submitted to the School of Graduate Studies Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Communication, Information and Library Studies written under the direction of Smaranda Muresan, Ph.D. and approved by New Brunswick, New Jersey May, 2018 ABSTRACT OF THE DISSERTATION An Empirical Study of Verbal Irony; Identification, Interpretation, and its Role in Turn-taking Discourse By DEBANJAN GHOSH Dissertation Director: Smaranda Muresan, Ph.D. Human communication often involves the use of figurative language, such as verbal irony or sarcasm, where the speakers usually mean the opposite of what they say. In this dissertation, I address three problems regarding verbal irony: automatic identifica- tion of verbal irony and its characteristics from social media platforms, interpretation of verbal irony, and examining the role of verbal irony in identifying dis(agreement) relations in discussion forums. To automatically detect verbal irony I propose compu- tational models that are based on theoretical underpinnings of irony. I first reframe the question of irony identification as a word-sense disambiguation problem to understand how particular target words are used in the literal or figurative sense. Next, I thoroughly analyze two characteristics of irony; irony markers, and irony factors. I propose em- pirical models to identify irony, irrespective of contextual knowledge as well as with conversation context. I also analyze the context to understand what triggers an ironic reply and perform user studies to explain the machine learning model predictions. Re- garding the interpretation of irony, I offer a typology of linguistic strategies for verbal irony interpretation and link it to various theoretical linguistic frameworks. I design ii computational models to capture these strategies and present empirical studies aimed to answer two questions: (1) what is the distribution of linguistic strategies used by hearers to interpret ironic messages; (2) do hearers adopt similar strategies for interpreting the speaker’s ironic intent? Finally, I turn to the application of irony to show how the use of irony-based features assists in identifying argumentative relations from discussion forums. I perform this research on two types of social media datasets: self-labeled data (e.g., microblogging platforms such as Twitter and Reddit threads), and crowdsource-labeled corpus (e.g., Internet Argumentative Corpus). iii Acknowledgments This dissertation would not have been possible without the support of my advisor, Smaranda Muresan. Smara always gave me the freedom to pursue my research in the direction I wanted and was ready to share her ideas and recommendations. Even when I tended to deviate, she was patient and brainstormed with me on every occasion. Smara’s extreme attention to detail in conducting experiments and writing is inspiring. It was a privilege to work together. I am also thankful to my brilliant committee members. Mark Aakhus and Nina Wacholder introduced me to the fantastic world of argumentation research, and I will always cherish their comments, suggestions, and the exchange of ideas during SALTS group meetings. I am thankful to Nick Belkin for his critique, and ideas on how to sharpen the research questions. Thanks to Kathy McKeown for her suggestions and comments. More than a committee member, Kathy has always been an immense influ- ence on me as an NLP researcher. Apart from my committee member, I am indebted to Elisabeth Camp, who directed me to literature from linguistics as well as philosophy on figurative language. I am also grateful to the esteemed faculty of SCI, especially Marie Radford, Jen Theiss, and Mar- ija Dalbello who taught me to think in interdisciplinary terms. I am thankful to Rohini Srihari who introduced me to the field of computational linguistics at the University of Buffalo while guiding my Masters thesis. Thanks to my wonderful colleagues at Thomson Reuters R&D group. In between my M.S. and the Ph.D. I spent a great time at Thomson Reuters working on many real-world NLP problems. Thanks to Francisco Pereira, my mentor at Siemens research who introduced me to the area of brain repre- sentation of semantic knowledge. My cohorts at Rutgers University: Xiaofeng Li, Vanessa Kitzie, Robyn Kaplan, iv Shahe Sanentz, Alexa Bolanos, Surabhi Sahay, Ralph Gigliotti have provided the intel- lectual environment and warm friendliness conducive to conduct research. I will always cherish the great time we had during our Ph.D. coursework. Likewise, I am incredibly lucky to be a part of the vibrant NLP research group at Columbia University and Center for Computational Learning Systems (CCLS). Thanks to Vinod Prabhakaran, Apoorv Agarwal, Or Biran, Heba Elfardy, Noura Farra, Daniel Bauer, Sara Rosenthal, and Chris Hidey for suggestions during presentations and NLP meetings. Extra thanks go to Elena Musi and Olivia Winn, fellow lab mates at Columbia University, for their numerous rec- ommendations during various stages of this dissertation. During my Ph.D., I collaborated with a brilliant group of undergraduate, graduate students and postdoctoral scholars. I thank Elena Musi for collaboration on multiple projects on argument mining and hope we will collaborate more in the future. I thank Weiwei Guo for pointing me to the word embedding approach during the word sense- disambiguation research, Anusha Balakrishnan for her brilliant work on detecting sar- casm across time-frame, Alex Fabrri for his diligent research on conversation context, Aquila Khanam and Yubo Han for their annotation and research on identifying argu- ments in persuasive writing, and Rituparna Mukherjee, Renato Vieira, Daniel Chaparro, and Pedro Perez Sanchez for their help in conducting various AMTurk experiments on cross-lingual sarcasm detection. I am grateful to the DARPA DEFT grant as well as SALTS scholarship that sup- ported my dissertation and numerous travels to conference venues. Thanks to Joan Chabrak, Allison Machiaverna, and Danielle Lopez who help graduate students at ev- ery stage of their Ph.D. at SCI. Likewise, many thanks to Idrija Ibrahimagic and Kathy Hickey at Columbia University who handled all the paperwork and conduct everything so smoothly. In Calcutta, my parents Rama Ghosh and Nani Gopal Ghosh were surprised by my decision to leave the current job for pursue a Ph.D. but they always kept the faith in me even when I was uncertain. I am incredibly fortunate for their love and support. Thanks to my sisters Ratnabali Basu, Dipabali Acharjee, Chaitali Dutta, and Kakali Ghosh for everything! They were my first teachers in every aspect of my life. Thanks to my v brothers-in-law, Kalyan Basu, Ashish Acharjee, Rajib Dutta, and Kaushik Ghosh for their supports. Big hug to my nieces, Ranjabati, Aishani, and Riddhi. Also a big thanks to my friends from the University at Buffalo: Indranil Sarkar, Radhika Barua, Souvik Chattopadhyay, Anindita Banerjee, Kaushik Roychoudhuri, Senjuti Gupta, Tarashankar Shaw, and Dinesh Kumar, for their love and support. Finally, thanks to my partner, pillar, and best friend, Nabaparna Ghosh. Without her encouragement, frankly, I would not have started the Ph.D. journey. As a fellow researcher, she is an inspiration in so many ways. Besides that, her love and uncondi- tional support have been the most vital in putting together this dissertation. And for all such reasons, this dissertation is dedicated to her. vi Table of Contents Abstract ::::::::::::::::::::::::::::::::::::: ii Acknowledgments ::::::::::::::::::::::::::::::: iv List of Tables :::::::::::::::::::::::::::::::::: xi List of Figures ::::::::::::::::::::::::::::::::: xiv 1. Introduction ::::::::::::::::::::::::::::::::: 1 1.1. Overview . 1 1.1.1. Irony Identification . 4 1.1.2. Irony Interpretation . 6 1.1.3. Role of Irony to Detect Dis(agreement) Relations . 7 1.2. Irony and its Characteristics . 8 1.3. Contributions . 10 1.4. Structure of the Chapters . 13 2. Data ::::::::::::::::::::::::::::::::::::: 15 2.1. Overview . 15 2.2. Datasets . 17 2.2.1. Twitter Dataset . 17 2.2.2. Reddit Corpus . 20 2.2.3. Internet Argument Corpus . 21 3. Verbal Irony Identification ::::::::::::::::::::::::: 25 3.1. Overview . 25 3.2. Background and Related Work . 26 vii 3.2.1. Theoretical Approaches of Irony Analysis . 26 3.2.2. Computational Approaches for Irony Identification . 29 3.3. Introduction to Irony Markers and Irony Factors . 32 3.3.1. Theoretical Overview of Irony Markers . 33 3.3.1.1. Tropes as Irony Markers: . 34 3.3.1.2. Morpho-syntactic Irony Markers: . 34 3.3.1.3. Typographic Irony Markers: . 35 3.3.2. Theoretical Overview of Irony Factors . 35 3.4. Data and Preprocessing . 37 3.4.1. Twitter Data . 38 3.4.2. Internet Argument Corpus . 38 3.4.3. Reddit Corpus . 39 3.4.4. Dense Representation of Input Data . 39 3.5. Utterance-level Analysis . 41 3.5.1. Empirical Analysis of Irony Markers . 41 3.5.1.1. Irony Markers as Discriminative Features . 43 3.5.1.2. Generalization of Irony Markers . 48 3.5.2. Modeling of Irony Factors . 51 3.5.2.1. Literal/Ironic Sense Disambiguation (LISD) task . 51 3.5.2.1.1. Identifying Target Words: . 52 3.5.2.1.2. Disambiguation Task: . 54 3.5.2.2. Identifying Incongruence at the Utterance Level . 61 3.6. Role of Conversation Context in Verbal Irony Identification . 67 3.6.1. Computational Models and Experimental Setup . 71 3.6.2. Results and Discussion . 76 3.6.2.1. Prior Turn as Conversation Context . 76 3.6.2.2. Prior Turn and Subsequent Turn as Conversation Con- text . 81 3.6.3. Qualitative Analysis . 82 viii 3.6.3.1. Crowdsourcing Experiments . 83 3.6.3.1.1. Crowdsourcing Experiment 1 . 83 3.6.3.1.2. Crowdsourcing Experiment 2 . 84 3.6.3.2. Comparing Turkers’ Answers with Attention Models 85 3.6.3.3.