Sentiment Analysis Using a Novel Human Computation Game
Total Page:16
File Type:pdf, Size:1020Kb
Sentiment Analysis Using a Novel Human Computation Game Claudiu-Cristian Musat THISONE Alireza Ghasemi Boi Faltings Artificial Intelligence Laboratory (LIA) Ecole Polytechnique Fed´ erale´ de Lausanne (EPFL) IN-Ecublens, 1015 Lausanne, Switzerland [email protected] Abstract data is obtained from people using human computa- tion platforms and games. We also prove that the In this paper, we propose a novel human com- method can provide not only labelled texts, but peo- putation game for sentiment analysis. Our ple also help by selecting sentiment-expressing fea- game aims at annotating sentiments of a col- tures that can generalize well. lection of text documents and simultaneously constructing a highly discriminative lexicon of Human computation is a newly emerging positive and negative phrases. paradigm. It tries to solve large-scale problems by Human computation games have been widely utilizing human knowledge and has proven useful used in recent years to acquire human knowl- in solving various problems (Von Ahn and Dabbish, edge and use it to solve problems which are 2004; Von Ahn, 2006; Von Ahn et al., 2006a). infeasible to solve by machine intelligence. To obtain high quality solution from human com- We package the problems of lexicon construc- putation, people should be motivated to make their tion and sentiment detection as a single hu- best effort. One way to incentivize people for sub- man computation game. We compare the re- mitting high-quality results is to package the prob- sults obtained by the game with that of other well-known sentiment detection approaches. lem at hand as a game and request people to play Obtained results are promising and show im- it. This process is called gamification. The game provements over traditional approaches. design should be such that the solution to the main problems can be formed by appropriately aggregat- ing results of played games. 1 Introduction In this work, we propose a cooperative human We propose a novel solution for the analysis of sen- computation game for sentiment analysis called timent expressed in text media. Novel corpus based Guesstiment. It aims at annotating sentiment of and lexicon based sentiment analysis methods are a collection of text documents, and simultaneously created each year. The continual emergence of con- constructing a lexicon of highly polarized (positive ceptually similar methods for this known problem and negative) words which can further be used for shows that a satisfactory solution has still not been sentiment detection tasks. By playing a collabora- found. We believe that the lack of suitable labelled tive game, people rate hotel reviews as positive and data that could be used in machine learning tech- negative and select words and phrases within the re- niques to train sentiment classifiers is one of the ma- views that best express the chosen polarity. jor reasons the field of sentiment analysis is not ad- We compare these annotations with those ob- vancing more rapidly. tained during a former crowd-sourcing survey and Recognizing that knowledge for understanding prove that packaging the problem as a game can im- sentiment is common sense and does not require ex- prove the quality of the responses. We also com- perts, we plan to take a new approach where labelled pare our approach with the state-of-the-art machine 1 Proceedings of the 3rd Workshop on the People’s Web Meets NLP, ACL 2012, pages 1–9, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics learning techniques and prove the superiority of hu- (Settles, 2011a) is another approach which aims man cognition for this task. In a third experiment at combining active learning with crowd-sourcing we use the same annotations in a multi faceted opin- for text classification. The principal contribution of ion classification problem and find that results are their work is that as well as document annotation, superior to those obtained using known linguistic re- they use human computation also to perform fea- sources. ture selection. (Law et al., ) is another recent work In (section 2) we review the literature related to which proposes a game for acquisition of attribute- our work. We then outline the game and its rules value pairs from images. (section 3). We compare the Guesstiment results to the state-of-the-art machine learning, standard 2.2 Human Computation Games crowd-sourcing methods and sentiment dictionar- Luis Von Ahn, the pioneer of the field of human ies(section 4) and conclude the paper with ideas for computation, designed a game to encourage play- future work (section 5). ers to semantically annotate a large corpus of images (Von Ahn and Dabbish, 2004). It was the first human 2 Related Work computation game. Following Von Ahn’s work, more researchers In this section we review the important literature re- were encouraged to package computational prob- lated and similar to our work. Sine we propose a lems as joyful games and have a group of non- human computation approach for sentiment analy- expert users play them (Von Ahn, 2006). Verbosity sis, we start by reviewing the literature on human (Von Ahn et al., 2006a) was designed with the goal computation and the closely related field of crowd- of gathering common sense knowledge about words. sourcing. Then we move on by having a brief look KissKissBan (Ho et al., 2009) was another game for on the human computation and knowledge acquisi- image annotation. tion games proposed so far by the researchers. Fi- Peekaboom(Von Ahn et al., 2006b) aimed at im- nally, we briefly review major sentiment analysis age segmentation and ”Phrase Detectives” (Cham- methods utilized by the researchers. berlain et al., 2008) was used to help constructing an anamorphic corpus for NLP tasks. Another human 2.1 Human Computation and Crowd-Sourcing computation game is described in (Riek et al., 2011) The literature on human computation is highly over- whose purpose is semantic annotation of video data. lapping with that of crowd-sourcing, as they are closely connected. The two terms are sometimes 2.3 Sentiment Analysis used interchangeably although they are slightly dif- The field of sentiment analysis and classification ferent. Crowd-sourcing in its broadest form, ”is the currently mostly deals with extracting sentiment act of taking a job traditionally performed by a des- from text data. Various methods (Turney, 2002; ignated agent (usually an employee) and outsourc- Esuli and Sebastiani, 2006; Taboada et al., 2011; ing it to an undefined, generally large group of peo- Pang et al., 2002) have been proposed for effec- ple in the form of an open call”(Quinn and Bederson, tive and efficient sentiment extraction of large col- 2011; Howe, 2006). Since the first use of the word lections of text documents. crowd-sourcing by J. Howe (Howe, 2006), there has Sentiment classification methods are usually di- been a lot of interest in this field due to the wide ac- vided into two main categories: Lexicon based tech- cessibility of anonymous crowd workers across the niques and methods based on machine learning. In web. lexicon-based methods, a rich lexicon of polarized The work described in (Rumshisky, 2011) uses words is used to find key sentences and phrases in crowd-sourcing to perform word sense disambigua- text documents which can be used to describe senti- tion on a corpus. In (Vondrick et al., 2010), crowd- ment of the whole text (Taboada et al., 2011). Ma- sourcing is used for video annotation. Moreover, chine learning methods, on the other hand, treat the (Christophe et al., 2010) has used crowd-sourcing sentiment detection problem as a text classification for satellite image analysis. task (Pang et al., 2002). 2 Most of the research has been oriented towards round finishes. The game continues by introducing finding the overall polarity of whole documents. The different sentences to the teams and hence gathers problem was broken down even more by using of information about polarity of terms and their corre- the faceted opinion concept (Liu, 2010). The goal of sponding context. this attempt was to determine precisely what aspects of the concepts the expressed opinions should be 3 The Proposed Game linked to. We will use this distinction to assess our In this section we propose a novel human compu- method’s viability in both overall and multi faceted tation game called Guesstiment. We use the infor- opinion analysis. mation provided while playing this game to obtain a The work in (Brew et al., 2010) is an attempt to reliable dataset of sentiment annotated data as well use crowd-sourcing for sentiment analysis. The au- as a lexicon of highly polarized positive and negative thors use a crowd of volunteers for the analysis of words. sentiments of economical news items. Users provide Having two by-products as the result of playing annotations which are then used to learn a classifier instead of merely trying to obtain document annota- to discriminate positive articles from negatives. It tions is the most important contribution of Guessti- uses active learning to select a diverse set of arti- ment. The idea of using crowd-sourcing for feature cles for annotations so that a generalizable, precise extraction has already been used in (Settles, 2011b), classifier can be learned from annotated data. The but not as a human computation game. In the rest of work in (Zhou et al., 2010) is another approach to the following section, we will discuss the game play use active learning to improve sentiment classifica- and rules of Guesstiment. tion. It uses a deep network to learn a sentiment classifier in a semi-supervised manner. Moreover, 3.1 Rules of Guesstiment this method uses active learning to learn from unla- Guesstiment is a two-player asynchronous game.