3 Sentiment Analysis 57 3.1 Concepts in Sentiment Analysis

Supervised and Semi-Supervised Statistical Models for Word-Based Sentiment Analysis Von der Fakultät Informatik, Elektrotechnik und Informationstechnik der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung Vorgelegt von Christian Scheible aus Mühlacker Hauptberichter: Prof. Dr. Hinrich Schütze Mitberichter: Prof. Dr. Bernhard Mitschang Tag der mündlichen Prüfung: 8. Juli 2014 Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart 2014 Contents 1 Introduction 15 1.1 Contributions in this Thesis...................... 18 1.2 Structure of this Thesis......................... 20 2 Statistical Natural Language Processing 23 2.1 Supervised, Unsupervised, and Semi-Supervised Models....... 25 2.2 Representing Language for Statistical Modeling........... 26 2.3 Learning Strategies........................... 28 2.4 Discriminative Models......................... 32 2.4.1 Maximum Entropy Model................... 32 2.4.2 Neural Networks........................ 34 2.4.3 Regularization.......................... 44 2.5 Graph-Based Natural Language Processing.............. 44 2.5.1 Graph Theory.......................... 45 2.5.2 Graph Representations of Natural Language......... 46 2.5.3 Graph Algorithms....................... 47 2.6 Evaluation................................ 50 2.6.1 Evaluation Measures...................... 50 2.6.2 Hypothesis Tests........................ 51 2.6.3 Agreement Measures...................... 53 2.7 Summary................................ 55 3 Sentiment Analysis 57 3.1 Concepts in Sentiment Analysis.................... 58 3.1.1 Polarity............................. 58 3.1.2 Subjectivity........................... 60 3.2 Automatic Sentiment Analysis..................... 62 3.2.1 Document Level......................... 62 3.2.2 Sentence/Clause Level..................... 67 3 Contents 3.2.3 Entity/Aspect Level...................... 69 3.3 Feature Representations........................ 70 3.4 Summary................................ 70 4 Sentiment Classification with Active Learning 71 4.1 introduction............................... 71 4.2 Methods................................. 73 4.2.1 Crowdsourcing with Amazon Mechanical Turk........ 73 4.2.2 Annotation System....................... 76 4.2.3 User Interface.......................... 78 4.2.4 Quality Control......................... 78 4.3 Experiments............................... 80 4.3.1 Experimental Setup....................... 80 4.3.2 Results.............................. 81 4.4 Related Work.............................. 92 4.4.1 Crowdsourcing......................... 92 4.4.2 Active Learning for Sentiment Analysis............ 92 4.5 Summary................................ 93 5 Bootstrapping Sentiment Classifiers from WordDocument Graphs 95 5.1 Introduction............................... 95 5.2 Background............................... 96 5.2.1 Polarity Induction with Word Graphs............. 96 5.2.2 Average Polarity........................ 98 5.3 Methods................................. 99 5.3.1 Polarity PageRank....................... 99 5.3.2 Document Classification with Polarity PageRank...... 101 5.3.3 Bootstrapping.......................... 103 5.4 Experiments............................... 103 5.4.1 Experimental Setup....................... 103 5.4.2 Experiments and Results.................... 104 5.4.3 Analysis of Word-Document Graphs............. 106 4 Contents 5.5 Related Work.............................. 110 5.5.1 Lexical Knowledge in Document Classification........ 110 5.5.2 Random Walk Methods for Polarity Induction........ 112 5.5.3 Bootstrapping.......................... 112 5.5.4 Conclusion............................ 113 5.6 Summary................................ 113 6 Sentiment Relevance 115 6.1 Introduction............................... 115 6.2 Exploring Sentiment Relevance.................... 117 6.2.1 Sentiment Relevance Corpus.................. 117 6.2.2 Sentiment Relevance vs. Subjectivity............. 118 6.3 Methods................................. 121 6.3.1 Discourse Constraints with Minimum Cut.......... 121 6.3.2 Feature Extraction....................... 123 6.4 Distant Supervision........................... 126 6.4.1 Initial Distant Supervision Experiment............ 127 6.4.2 Further Experimental Setup.................. 127 6.4.3 Experiments and Results.................... 128 6.4.4 Conclusion............................ 134 6.5 Transfer Learning............................ 134 6.5.1 Experimental Setup....................... 135 6.5.2 Experiments and Results.................... 137 6.5.3 Conclusion............................ 142 6.6 Related Work.............................. 142 6.6.1 Related Concepts........................ 142 6.6.2 Distant Supervision and Transfer Learning.......... 143 6.6.3 Feature Extraction....................... 144 6.6.4 Conclusion............................ 144 6.7 Summary................................ 144 7 Compositionality of Recursive Autoencoders 147 7.1 Introduction............................... 147 5 Contents 7.2 Semi-Supervised Recursive Autoencoders............... 148 7.3 Methods for Automatic Structural Simplification........... 152 7.4 Experiments............................... 155 7.4.1 Experimental Setup....................... 155 7.4.2 Human Evaluation....................... 156 7.4.3 Automatic Structural Simplification.............. 161 7.4.4 Discussion............................ 166 7.5 Summary................................ 167 8 Conclusions and Future Work 169 8.1 Contributions.............................. 169 8.2 Future Work............................... 171 A Resources 173 A.1 Tools................................... 173 A.1.1 Stanford Classifier....................... 173 A.1.2 HIPR.............................. 173 A.1.3 Mate Tagger and Parser.................... 174 A.2 Data................................... 174 A.2.1 Text Corpora.......................... 174 A.2.2 Lexical Resources........................ 176 A.2.3 Other Datasets......................... 180 Bibliography 183 6 Abstract Ever since its inception, sentiment analysis has relied heavily on methods that use words as their basic unit. Even today, such methods deliver top performance. This way of representing data for sentiment analysis is known as the clue model. It offers practical advantages over more sophisticated approaches: It is easy to implement and statistical models can be trained efficiently even on large datasets. However, the clue model also has notable shortcomings. First, clues are highly redundant across examples, and thus training based on annotated data is poten- tially inefficient. Second, clues are treated context-insensitively, i.e., the sentiment expressed by a clue is assumed to be the same regardless of context. In this thesis, we address these shortcomings. We propose two approaches to reduce redundancy: First, we use active learning, a method for automatic data selection guided by the statistical model to be trained. We show that active learning can speed up the training process for document classification significantly, reducing clue redundancy. Second, we present a graph- based approach that uses annotated clue types rather than annotated documents which contain clue instances. We show that using a random-walk model, we can train a highly accurate document classifier. We next investigate the context-dependency of clues. We first introduce sentiment relevance, a novel concept that aims at identifying content that contributes to the overall sentiment of the review. We show that even when we have no annotated sentiment relevance data available, a high-accuracy sentiment relevance classifier can be trained using transfer learning and distant supervision. Second, we perform linguistically motivated analysis and simplification of a compositional sentiment analysis. We find that the model captures linguistic structures poorly. Further, it can be simplified without any loss of accuracy. 7 Deutsche Zusammenfassung Eine der frühesten Methoden zur automatischen Sentimentanalyse nutzt Merkmal- srepräsentationen, die auf Wortvorkommen beruhen. Dieser Ansatz zur Daten- repräsentation ist der unter dem Namen Clue-Modell bekannt, da die Terme in einer größeren Spracheinheit Schlüsselwörter (Clues) für deren Sentiment sind. Das Clue-Modell ist noch immer einer der beliebtesten und erfolgreichsten Ansätze, da es einige praktische Vorteile gegenüber anderen Verfahren bietet: Es ist einfach zu implementieren und statistische Modelle sind mit einer solchen Repräsentation auch auf großen Datensätzen effizient trainierbar. Allerdings hat das Modell auch Nachteile. Erstens treten Schlüsselwörter redundant auf und kommen in vielen Trainingsbeispielen vor, so dass überwachtes Lernen ineffizient sein kann. Zweit- ens werden Schlüsselwörter kontextunabhängig behandelt, d.h., das durch einen Begriff ausgedrückte Sentiment ist unabhängig vom Kontext immer gleich. In dieser Dissertation stellen wir Lösungsansätze für diese beiden Nachteile vor. Um Redundanz zu vermeiden, verwenden wir zunächst Active Learning, eine Methode des maschinellen Lernens, bei der das statistische Modell die Auswahl der Trainingsbeispiele vornimmt. Unsere Ergebnisse zeigen, dass wir durch Ac- tive Learning gleiche Klassifikationsgenauigkeit bei reduzierten Kosten erreichen, indem wir Redundanz zwischen Dokumenten vermeiden. Ein weiterer Ansatz zur Vermeidung von Redundanz beruht darauf, die Schlüsselwörter direkt zu an- notieren. Annotierte Schlüsselwörter werden dann in

Load more