The Linguistics of Sentiment Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Portland State University PDXScholar University Honors Theses University Honors College 5-24-2013 The Linguistics of Sentiment Analysis Laurel Hart Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/honorstheses Let us know how access to this document benefits ou.y Recommended Citation Hart, Laurel, "The Linguistics of Sentiment Analysis" (2013). University Honors Theses. Paper 20. https://doi.org/10.15760/honors.19 This Thesis is brought to you for free and open access. It has been accepted for inclusion in University Honors Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. The Linguistics of Sentiment Analysis by Laurel Hart An undergraduate honors thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Arts in University Honors and Applied Linguistics Thesis Adviser George Tucker Childs Portland State University 2013 Abstract Computational linguistics is a field that was founded by linguists, but more recently is the domain of more computer scientists than linguists. Use of data-driven and machine learning methods for computational linguistics applications is now more common than hand-written linguistic rules. In order for a linguist to enter the field, it is essential that he or she be familiar with methods and techniques from computer science. The purpose of this paper is two-fold. The first is to serve as a linguist©s introduction to concepts from outside of linguistics that are used in computational linguistics. The second purpose is to illustrate the use of linguistic features for a specific task known as sentiment analysis. This task involves determining the sentiment of a piece of text. By way of examining linguistics within sentiment analysis, this paper will begin to gesture at the potential role for linguists in the modern field of computational linguistics as a whole. The goal is to encourage and enable linguists to re-engage with computational linguistics by providing a suitable introductory work. Table of Contents Part I - Introduction...................................................................................................4 Part II - Groundwork..................................................................................................6 1 Computational Linguistics Natural Language Processing................................................6 1.1 Building a corpus for NLP..............................................................................................7 1.1.1 Amazon.................................................................................................................7 1.1.2 Livejournal...........................................................................................................10 1.1.3 Twitter.................................................................................................................11 1.2 Pre-processing steps....................................................................................................13 1.3 Building a corpus, continued.......................................................................................15 2 Features, models, and tools of NLP................................................................................15 2.1 Unigrams and Bag of Words (BoW) models ................................................................15 2.2 Named Entities............................................................................................................17 2.3 Part-of-speech tagging.................................................................................................18 2.4 Pattern extraction........................................................................................................19 3 Algorithms and Measuring Success................................................................................19 4 Machine Learning..........................................................................................................20 4.1 Supervised Learning....................................................................................................21 4.2 Unsupervised Learning................................................................................................21 4.3 Other types of learning................................................................................................21 5 Summary.......................................................................................................................22 Part III - Sentiment Analysis.....................................................................................23 1 Sentiment Analysis........................................................................................................23 1.1 Building a Corpus, Revisited........................................................................................24 1.2 Applications................................................................................................................24 2 Specifics.........................................................................................................................25 2.1 Polarity / Opinion Scoring...........................................................................................25 2.1.1 Text Classification Approach................................................................................25 2.1.2 Sentiment Lexicon Approach...............................................................................25 2.1.3 Linguistic features................................................................................................26 2.2 Affective State / Mood Classification...........................................................................29 2.2.1 Challenges...........................................................................................................29 2.2.2 Approaches..........................................................................................................30 2.2.3 Linguistic features................................................................................................30 2.2.4 Applications.........................................................................................................31 2.3 Sarcasm Detection.......................................................................................................31 2.3.1 Linguistic features................................................................................................31 2.3.2 Applications.........................................................................................................32 3 Summary.......................................................................................................................32 Part IV - Conclusions................................................................................................33 References ...............................................................................................................34 Selected References by Topic....................................................................................36 Polarity / Opinion Scoring...............................................................................................36 Affective State / Mood Classification................................................................................36 Sarcasm Detection............................................................................................................36 Appendix I: Abbreviation Glossary...........................................................................37 Laurel Hart / The Linguistics of Sentiment Analysis Part I Introduction Though this paper is titled The Linguistics of Sentiment Analysis, an equally accurate alternate title would be Sentiment Analysis for Linguists. It is undeniable that most of the work in the field of sentiment analysisÐand even more broadly, in the field of computational linguisticsÐis not only carried out but also written for an audience more specialized in computer science and statistics than in linguistics. Even one of the most extensive and widely-referenced surveys of the field of sentiment analysis leaves discussion of linguistic contributions to the field at ª...we were directly charged to focus on information-access applications, as opposed to work of more purely linguistic interest. We stress that the importance of work in the latter vein is absolutely not in question,º1 and says little more on the subject. The purpose of this paper is not to show that focusing on the computational side of the field is in any way wrong, but to begin to re-access a field which has become separated from other studies of language. Over the course of its development, the field of computational linguistics has shifted its emphasis from linguistics to computer scienceÐtoday strides in computational linguistics are made not by linguistic discoveries, but by computational optimizations and exploration of statistical properties. Currently, it is nearly impossible for a linguist with no computer science background to participate in computational linguistics, but quite commonplace for a computer scientist with little knowledge of linguistics to participate. This field, which was formerly dominated by linguists, has become so steeped in specialized, technical language that is not approachable without equally-specialized training. The contribution of linguistics is ªnot in question,º but the role of linguists in modern computational linguistics