Lexical Semantics, Transitivity, and the Identification of Implicit Sentiment

ABSTRACT Title of Document: SPIN: LEXICAL SEMANTICS, TRANSITIVITY, AND THE IDENTIFICATION OF IMPLICIT SENTIMENT Stephan Charles Greene Doctor of Philosophy, 2007 Directed By: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies Current interest in automatic sentiment analysis is motivated by a variety of information requirements. The vast majority of work in sentiment analysis has been specifically targeted at detecting subjective statements and mining opinions. This dissertation focuses on a different but related problem that to date has received relatively little attention in NLP research: detecting implicit sentiment , or spin, in text. This text classification task is distinguished from other sentiment analysis work in that there is no assumption that the documents to be classified with respect to sentiment are necessarily overt expressions of opinion. They rather are documents that might reveal a perspective . This dissertation describes a novel approach to the identification of implicit sentiment, motivated by ideas drawn from the literature on lexical semantics and argument structure, supported and refined through psycholinguistic experimentation. A relationship predictive of sentiment is established for components of meaning that are thought to be drivers of verbal argument selection and linking and to be arbiters of what is foregrounded or backgrounded in discourse. In computational experiments employing targeted lexical selection for verbs and nouns, a set of features reflective of these components of meaning is extracted for the terms. As observable proxies for the underlying semantic components, these features are exploited using machine learning methods for text classification with respect to perspective. After initial experimentation with manually selected lexical resources, the method is generalized to require no manual selection or hand tuning of any kind. The robustness of this linguistically motivated method is demonstrated by successfully applying it to three distinct text domains under a number of different experimental conditions, obtaining the best classification accuracies yet reported for several sentiment classification tasks. A novel graph-based classifier combination method is introduced which further improves classification accuracy by integrating statistical classifiers with models of inter-document relationships. SPIN: LEXICAL SEMANTICS, TRANSITIVITY, AND THE IDENTIFICATION OF IMPLICIT SENTIMENT by Stephan Charles Greene Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2007 Advisory Committee: Professor Philip Resnik, Chair Dr. Donald Hindle Professor Jeffrey Lidz Professor V.S. Subrahmanian Professor Amy Weinberg © Copyright by Stephan Charles Greene 2007 Dedication This dissertation is dedicated to Sabrina Parker Greene and Owen Gates Greene Hey, guys—I’m done! ii Acknowledgements The completion of this dissertation, at times a remote and preposterous notion, would not have come about without the substantial support of many generous individuals. My advisor, Philip Resnik, has given his time and talent in ways that simply cannot be adequately acknowledged. Philip has provided constant, immeasurable intellectual challenge and inspiration. His energy and enthusiasm are renowned and I benefited greatly from them time and again. I thank him also for always believing in the work. At a practical level, in any situation, large or small, Philip was able to identify a realistic set of parameters that in each case proved essential to engineering this endeavor successfully. Buying a house around the corner from me was, of course, the ultimate expression of his dedication to my work. Co-advisor Don Hindle has been the project’s sage, a sounding board for ideas, who always responded with insight, wisdom, and practical advice. Don dedicated substantial amounts of his time to this project but was remarkably efficient and concise in saying just the right thing at the right time. On top of countless discussions about the work itself, his instruction to “visualize the dissertation” provided perhaps the three most powerful words spoken to me in the course of this effort. I thank my committee members, Professors Amy Weinberg, Jeff Lidz, and VS Subrahmanian for their insightful questions, their support, and the contribution of their valuable time. I owe the Department of Linguistics at the University of Maryland, along with the Computer Science department, a great deal of gratitude for their extended support. Maryland was a good fit in balancing my life as a working professional, a family guy, and a student. I particularly thank those professors and others who have taught or worked with me over these years that have been the most rewarding educational experience of my life: Norbert Hornstein, David Lightfoot, Juan Uriagereka, Andrea Zukowski, Colin Phillips, Paul Pietroski, Laura Benua, Linda Lombardi, Mark Arnold, Bonnie Dorr, Mari Broman Olsen, and Rebecca Hwa. In my professional life I have had the great fortune to work with many very talented and supportive people. I thank George Krupka in particular for his help in making this achievement possible. Thanks also go to Elena Spivak, Lorraine Bryan, Tony Davis, Cheinan Marks, and Anand Kumar For generously sharing of their work (and sometimes, data), I thank Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, Lillian Lee, Matt Thomas, Bo Pang, Ed Kako and Talke MacFarland. I thank Becky Bishop Resnik for frequently alerting me to related work that I would not have discovered otherwise. I thank Chip Denman for crucial help with statistical analyses. Thanks to Frank Keller and the WebExp development group at the iii University of Edinburgh for sharing and assisting with their system. Thanks to Mark Damrongsri for Web administration. I thank my parents, my in-laws, the rest of my family, and my friends for giving me endless encouragement and support. My kids, Sabrina and Owen, have been rooting for me like nobody else, despite the intrusion that this work has been. I thank them for their tolerance, and their smiles. My wife, Linda Parker Gates, is an essential partner to whatever success I am able to claim. She inspires me and envisions things for me that I could never imagine alone. And she has encouraged, cajoled, and hounded me, as necessary, to bring this work to fruition. Thanks for seeing this through with me. iv Table of Contents Dedication..................................................................................................................... ii Acknowledgements......................................................................................................iii Table of Contents.......................................................................................................... v List of Tables .............................................................................................................. vii List of Figures.............................................................................................................. ix List of Equations........................................................................................................... x 1 Identifying Implicit Sentiment.............................................................................. 1 1.1 Introduction and Overview ........................................................................... 1 1.2 Defining the Task.......................................................................................... 3 1.3 Related Work ................................................................................................ 7 1.4 Dissertation Roadmap................................................................................... 9 2 Connecting Lexical Semantics to Perceived Sentiment: Psycholinguistic Evidence...................................................................................................................... 11 2.1 Introduction................................................................................................. 11 2.2 Background: Lexical Semantics and the Syntax and Semantics of Transitivity.............................................................................................................. 13 2.3 Investigative Approach............................................................................... 17 2.4 Psycholinguistic Investigation: Linking Transitivity Components to Sentiment Construal................................................................................................ 19 2.4.1 Experiment 1....................................................................................... 20 2.4.1.1 Stimuli and Procedure..................................................................... 20 2.4.1.2 Participants...................................................................................... 22 2.4.1.3 Analysis and Results....................................................................... 22 2.4.2 Experiment 2....................................................................................... 24 2.4.2.1 Stimuli and Procedure..................................................................... 25 2.4.2.2 Participants...................................................................................... 28 2.4.2.3 Analysis and Results: Effect of Surface Encoding on Perceived Sentiment .......................................................................................................

Load more