Latent Aspect Rating Analysis on Review Text Data: a Rating Regression Approach

Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach Hongning Wang, Yue Lu, Chengxiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL, 61801 USA {wang296, yuelu2, czhai}@uiuc.edu ABSTRACT to alleviate this problem including extracting information In this paper, we de¯ne and study a new opinionated text from reviews [18, 16, 26], summarizing users' opinions, cat- data analysis problem called Latent Aspect Rating Analysis egorizing reviews according to opinion polarities [20, 6, 7], (LARA), which aims at analyzing opinions expressed about and extracting comparative sentences from reviews [12, 13]. an entity in an online review at the level of topical aspects Nevertheless, with the current techniques, it is still hard for to discover each individual reviewer's latent opinion on each users to easily digest and exploit the large number of reviews aspect as well as the relative emphasis on di®erent aspects due to inadequate support for understanding each individ- when forming the overall judgment of the entity. We pro- ual reviewer's opinions at the ¯ne-grained level of topical pose a novel probabilistic rating regression model to solve aspects. this new text mining problem in a general way. Empirical experiments on a hotel review data set show that the pro- Hotel Name Four Seasons Hotel George V Paris Overall Rating posed latent rating regression model can e®ectively solve the Reviewer ID By trollydollySydney problem of LARA, and that the detailed analysis of opin- Lovely location, however, for 820 euros this was really bad value. ions at the level of topical aspects enabled by the proposed The room was nice, but you could have been anywhere in the world- it felt like a chain hotel in the worst sense. The room was model can support a wide range of application tasks, such tiny!Normally Four Seasons have mind blowing service and Review Text as aspect opinion summarization, entity ranking based on although they were nice it was not amazing. We had just been to Claridge's in London which was fantasic and half the price. It aspect ratings, and analysis of reviewers rating behavior. wasn't bad , but it wasn't great and not worth the money. A coke was 10 euros! There was no free wireless- all in all very average. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Text Mining Figure 1: A Sample Hotel Review Consider a typical hotel review shown in Figure 1. This General Terms review discusses multiple aspects of the hotel, such as price, Algorithms, Experimentation room condition, and service, but the reviewer only gives an overall rating for the hotel; without an explicit rating on Keywords each aspect, a user would not be able to easily know the reviewer's opinion on each aspect. Going beyond the over- Opinion and sentiment analysis, review mining, latent rating all rating to know the opinions of a reviewer on di®erent analysis aspects is important because di®erent reviewers may give a hotel the same overall rating for very di®erent reasons. For 1. INTRODUCTION example, one reviewer may have liked the location, but an- With the emergence and advancement of Web 2.0, more other may have enjoyed the room. In order to help users and more people can freely express opinions on all kinds of tell this di®erence, it is necessary to understand a reviewer's entities such as products and services. These reviews are rating on each of the major rating aspects (i.e., rating fac- useful to other users for making informed decisions and to tors) of a hotel. Furthermore, even if we can reveal the merchants for improving their service. However, the vol- rating on an aspect such as \price", it may still be insu±- ume of reviews grows so rapidly that it is becoming increas- cient because \cheap" may mean di®erent price ranges for ingly di±cult for users to wade through numerous reviews di®erent reviewers. Even the same reviewer may use a dif- to ¯nd the needed information. Much work has been done ferent standard to de¯ne \cheap" depending on how critical other factors (e.g. location) are; intuitively, when a reviewer cares more about the location, the reviewer would tend to Permission to make digital or hard copies of all or part of this work for be more willing to tolerate a higher price. To understand personal or classroom use is granted without fee provided that copies are such subtle di®erences, it is necessary to further reveal the not made or distributed for profit or commercial advantage and that copies relative importance weight that a reviewer placed on each bear this notice and the full citation on the first page. To copy otherwise, to aspect when assigning the overall rating. republish, to post on servers or to redistribute to lists, requires prior specific To achieve such deeper and more detailed understanding permission and/or a fee. KDD’10, July 25–28, 2010, Washington, DC, USA. of a review, we propose to study a novel text mining prob- Copyright 2010 ACM 978-1-4503-0055-1/10/07 ...$10.00. lem called Latent Aspect Rating Analysis (LARA). Given a set of reviews with overall ratings, LARA aims at analyz- tion is generalized to a multi-point rating scale [19, 9]. Many ing opinions expressed in each review at the level of topical approaches have been proposed to solve the problem, in- aspects to discover each individual reviewer's latent rating cluding supervised, un-supervised, and semi-supervised ap- on each aspect as well as the relative importance weight on proaches, but they all attempt to predict an overall senti- di®erent aspects when forming the overall judgment. ment class or rating of a review, which is not so informative Revealing the latent aspect ratings and aspect weights in as revealing aspect ratings as we attempt to do. each individual review would enable a wide range of applica- Since an online review usually contains multiple opinions tion tasks. For example, the revealed latent ratings on dif- on multiple aspects, some recent work has started to pre- ferent aspects can immediately support aspect-base opinion dict the aspect-level ratings instead of one overall rating. summarization; aspect weights are directly useful for ana- For example, Snyder et al. [23] show that modeling the de- lyzing reviewers' rating behaviors; and the combination of pendencies among aspects using good grief algorithm can latent ratings and aspect weights can support personalized improve the prediction of aspect ratings. In [24], Titov et aspect-level ranking of entities by using only those reviews al. propose to extract aspects and predict the corresponding from the reviewers with similar aspect weights to those pre- ratings simultaneously: they use topics to describe aspects ferred by an individual user. and incorporate a regression model fed by the ground-truth While existing work on opinion summarization has ad- ratings. However, they have assumed that the aspect ratings dressed the LARA problem to certain extent, no previous are explicitly provided in the training data. In contrast, we work has attempted to infer the latent aspect rating at the assume the aspect ratings are latent, which is a more general level of each individual review, nor has it attempted to in- and more realistic scenario. fer the weights a reviewer placed on di®erent aspects. (See Summarization is a generally useful technique to combat Section 2 for a more detailed review of all the related work.) information overload. A recent human evaluation [15] indi- To solve this new mining problem, we propose a two-stage cates that sentiment informed summaries are strongly pre- approach based on a novel latent rating regression model. ferred over non-sentiment baselines, suggesting the useful- In the ¯rst stage, we employ a bootstrapping-based algo- ness of modeling sentiment and aspects when summarizing rithm to identify the major aspects (guided by a few seed opinions. However, existing works on aspect-based summa- words describing the aspects) and segment reviews. In the rization [10, 21, 18, 26] only aimed at aggregating all the re- second stage, we propose a generative Latent Rating Regres- views and representing major opinions on di®erent aspects sion (LRR) model which aims at inferring aspect ratings and for a given topic. While aggregated opinions can present weights for each individual review based only on the review a general picture of a topic, the details in each review are content and the associated overall rating. More speci¯cally, lost; furthermore, the di®erences among reviews/reviewers the basic idea of LRR is to assume that the overall rating are not considered, thus the aggregated sentiment is based is \generated" based on a weighted combination of the la- on reviewers with di®erent tastes. Recent work by Lu et tent ratings over all the aspects, where the weights are to al. [17] is the closest to ours, but their goal is still to gen- model the relative emphasis that the reviewer has placed erate an aggregated summary with aspect ratings inferred on each aspect when giving the overall rating. We further from overall ratings. Most importantly, none of the pre- assume that latent rating of each aspect depends on the con- vious work considers the reviewer's emphasis on di®erent tent in the segment of a review discussing the correspond- aspects, i.e. aspect weight. Our work aims at inferring both ing aspect through a regression model. In other words, we the aspect ratings and aspect weights at the level of indi- may also view that the latent rating on each aspect as be- vidual reviews; the result can be useful for multiple tasks, ing \generated" by another weighted sum of word features including opinion-based entity ranking, analysis of user rat- where the weights indicate the corresponding sentimental ing behavior in addition to \rated aspect summarization".

Latent Aspect Rating Analysis on Review Text Data: a Rating Regression Approach

Gateless Gate Has Become Common in English, Some Have Criticized This Translation As Unfaithful to the Original

Last Name First Name/Middle Name Course Award Course 2 Award 2 Graduation

Reading/Not Reading Wang Wei

Build a Matching System Between Catchment Complexity and Model Complexity for Better Flow Modelling

Self-Study Syllabus on Chinese Foreign Policy

Crash Course Chinese

Origin Narratives: Reading and Reverence in Late-Ming China

Representing Talented Women in Eighteenth-Century Chinese Painting: Thirteen Female Disciples Seeking Instruction at the Lake Pavilion

CJK NACO Searching

Xiaoming Huo (Professor) I. Earned Degrees II. Employment/Visits III

American String Association / Greater Los Angeles Section

Rethinking Chinese Kinship in the Han and the Six Dynasties: a Preliminary Observation