New Avenues in Opinion Mining and Sentiment Analysis
Total Page:16
File Type:pdf, Size:1020Kb
KNOWLEDGE-BASED APPROACHES TO CONCEPT-LEVEL SENTIMENT ANALYSIS New Avenues in Opinion Mining and Sentiment Analysis Erik Cambria, National University of Singapore Björn Schuller, Technical University of Munich Yunqing Xia, Tsinghua University Catherine Havasi, Massachusetts Institute of Technology thers’ opinions can be crucial when it’s time to make a decision or Ochoose among multiple options. When those choices involve valuable The Web holds resources (for example, spending time and money to buy products or services) valuable, vast, people often rely on their peers’ past experiences. Until recently, the main sources and unstructured of information were friends and special- and sentiment analysis actually focus on po- information about ized magazine or websites. Now, the “social larity detection and emotion recognition, web” provides new tools to efficiently create respectively. Because the identification of public opinion. Here, and share ideas with everyone connected to sentiment is often exploited for detecting the World Wide Web. Forums, blogs, social polarity, however, the two fields are usually the history, current networks, and content-sharing services help combined under the same umbrella or even people share useful information. This infor- used as synonyms. Both fields use data min- use, and future of mation is unstructured, however, and be- ing and natural language processing (NLP) cause it’s produced for human consumption, techniques to discover, retrieve, and distill opinion mining and it’s not something that’s “machine process- information and opinions from the World able.” Capturing public opinion about social Wide Web’s vast textual information. sentiment analysis events, political movements, company strat- Mining opinions and sentiments from egies, marketing campaigns, and product natural language is challenging, because are discussed, preferences is garnering increasing interest it requires a deep understanding of the ex- from the scientific community (for the excit- plicit and implicit, regular and irregular, along with relevant ing open challenges), and from the business and syntactical and semantic language world (for the remarkable marketing fall- rules. Sentiment analysis researchers strug- techniques and tools. outs and for possible financial market pre- gle with NLP’s unresolved problems: co- diction). The resulting emerging fields are reference resolution, negation handling, opinion mining and sentiment analysis. Al- anaphora resolution, named-entity recogni- though commonly used interchangeably to tion, and word-sense disambiguation. Opin- denote the same field of study, opinion mining ion mining is a very restricted NLP problem, MARCH/APRIL 2013 1541-1672/13/$31.00 © 2013 IEEE 15 Published by the IEEE Computer Society IS-28-02-Cambria.indd 15 6/5/13 11:05 AM KNOWLEDGE-BASED APPROACHES TO CONCEPT-LEVEL SENTIMENT ANALYSIS because the system only needs to SenticNet (http://sentic.net), Luminoso emotional content for purposes such understand the positive or negative (http://luminoso.com), Factiva (http:// as affective human-machine interac- sentiments of each sentence and the dowjones.com/factiva), Attensity tion, troll filtering, and cyber-issue target entities or topics. Therefore, (http://attensity.com), and Converseon detection. If the text doesn’t contain sentiment analysis is an opportunity (http://converseon.com). Most existing strong opinions or covers more than for NLP researchers to make tangi- tools and research, however, are lim- one issue or item, new challenges ble progress on all fronts of NLP, ited to polarity evaluation or mood arise, such as subjectivity detection and potentially have a huge practical classification according to a limited and opinion-target identification. impact. set of emotions. Such methods mainly Distinguishing between subjective Many companies use opinion min- rely on parts of text in which people and objective text helps classify the ing and sentiment analysis as part explicitly express emotional states, sentiment. Moreover, a piece of text of their research. For instance, com- and therefore the tools can’t capture a might have a polarity without neces- panies use opinion mining to create reviewer’s implicitly expressed opin- sarily containing an opinion; for ex- and automatically maintain review ion or sentiment. To better consider ample, a news article can be classified and opinion-aggregation websites. the state of this field, we discuss here into good or bad news without being Their systems continuously gather the past, present, and future trends subjective. a wide array of information from of sentiment analysis by delving into Typically, a system performs sentiment the Web, such as product reviews, the evolution of opinion mining sys- analysis over on-topic documents— brand perception, and political is- tems. More comprehensive surveys using, for example, the results of a sues. Other systems might also use on sentiment analysis can be found topic-based search engine. However, opinion mining and sentiment anal- elsewhere.1–3 several studies suggest that managing ysis as subcomponent technology to these two tasks jointly might benefit improve customer relationship man- Common Sentiment overall performance. For example, a agement and recommendation sys- Analysis Tasks document’s off-topic passages might tems through positive and negative The basic task of opinion mining is contain irrelevant affective informa- customer feedback. Similarly, opinion polarity classification. Polarity clas- tion and create inaccurate global- mining and sentiment analysis might sification occurs when a piece of text sentiment polarity about the main detect and exclude “flames” (overly stating an opinion on a single issue is topic. Also, a document might con- heated or antagonistic language) in classified as one of two opposing sen- tain information on multiple top- social communication and enhance timents. Reviews such as “thumbs ics that interest the user. In such antispam systems. up” versus “thumbs down,” or “like” instances, it’s important to identify Companies use sentiment analysis versus “dislike” are examples of po- topics and separate the opinions asso- to develop marketing strategies by larity classification. Polarity classifi- ciated with each topic. assessing and predicting public atti- cations also identify pro and con ex- tudes toward their brand. Research pressions in online reviews and help Evolution of Opinion Mining and development focuses on design- make the product evaluations more Currently, opinion mining and senti- ing automatic tools that crawl online credible. ment analysis rely on vector extrac- reviews and condense the infor- Agreement detection is another tion to represent the most salient and mation gathered. Numerous compa- form of binary sentiment classifica- important text features. We can use nies already provide tools that track tion. Agreement detection determines this vector to classify the most relevant public viewpoints on a large scale by whether a pair of text documents features. Two commonly used features offering graphical summarizations should receive the same or different are term frequency and presence. of trends and opinions in the blogo- sentiment-related labels. After the Presence is a binary-valued feature sphere. Developing opinion-tracking system identifies the polarity classi- vector in which the entries indicate systems is commercially important. fication, it might assign degrees of only whether a term occurs (value 1) Also, several tools already exist to positivity to the polarity—that is, it or doesn’t (value 0). Presence forms a help companies extract and analyze might locate the opinion on a con- more effective basis to review polar- information from blogs about large- tinuum between positive and nega- ity classification and reveals an inter- scale trends in customers’ opinions tive. Also, it can classify multi- esting difference: although recurrent about products; those tools include media resources according to mood and keywords indicate a topic, repeated 16 www.computer.org/intelligent I EEE INTELLIGENT SYSTEMS IS-28-02-Cambria.indd 16 6/5/13 11:05 AM terms might not reflect the overall mostly on linguistic heuristics. For ex- research, Bo Pang and Lillian Lee sentiment. ample, in their work on polarity clas- attempted to partially address this It’s possible to add other term-based sification, Vasileios Hatzivassiloglou problem by incorporating location in- features to the features vector. Po- and Kathleen Mc Keown discuss how formation into the feature set.7 sition refers to how a token’s posi- two classes of interest represent oppo- More recent studies emphasize the tion in a text unit might affect the sites.4 These opposite constraints help importance of position in sentiment text’s sentiment. Further, we might the system with label decisions. summarization. For example, the in- consider presence n-grams—typically These approaches were unable cipits of articles in topic-based sum- bigrams and trigrams—to be useful to detect novel expression of senti- marization usually indicate the text’s features. Some methods also rely on the ment. Consequently, later work fo- sentiment. However, the last n sen- distance between terms. General tex- cused on propagating the valence of tences of a product review often best tual analysis uses part of speech (POS) seed words (for which the polarity is summarize the document’s overall information (for example, nouns, ad- known) to terms that co-occur with sentiment—almost as well