Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews

Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews Jurgen¨ Broß Heiko Ehrig Institute of Computer Science Neofonie GmbH Freie Universitat¨ Berlin Robert-Koch-Platz 4 14195 Berlin, Germany 10115 Berlin, Germany [email protected] [email protected] Abstract The goal is to automatically derive a lexicon of the In this paper, we address the problem of most relevant aspects related to the product type. identifying relevant product aspects in a For example, given a set of hotel reviews, we want collection of online customer reviews. Be- to determine aspects such as “room size”, “front ing able to detect such aspects represents desk staff” “sleep quality”, and so on. In gen- an important subtask of aspect-based re- eral, product aspects may occur as nominal (e.g., view mining systems, which aim at auto- “image stabilization”), named (e.g., “SteadyShot matically generating structured summaries feature”), pronominal (e.g., “it”), or implicit men- of customer opinions. We cast the task as tions (e.g., “reduction of blurring from camera shake”). We explicitly restrict the task to finding a terminology extraction problem and ex- 1 amine the utility of varying term acquisi- nominal aspect mentions . tion heuristics, filtering techniques, vari- The contribution of this paper is to explicitly ant aggregation methods, and relevance cast the problem setting as a terminology extrac- measures. We evaluate the different ap- tion (TE) task and to examine the utility of meth- proaches on two distinct datasets (hotel ods that have been proven beneficial in this con- and camera reviews). For the best config- text. Most related work does not consider this uration, we find significant improvements close relationship and rather presents ad-hoc ap- over a state-of-the-art baseline method. proaches. Our main contributions are as follows: – We experiment with varying term acquisition 1 Introduction methods, propose a set of new term filtering ap- Identifying significant terms in a text corpus con- proaches, and consider variant aggregation tech- stitutes a core task in natural language process- niques typically applied in TE systems. ing. Fields of application are for example glos- – We compare the utility of different term rel- sary extraction (Kozakov et al., 2004) or ontology evance measures and experiment with combina- learning (Navigli and Velardi, 2004). In this work, tions of these measures. we particularly focus on the application scenario – We propose and assess a new method that fil- of aspect-based customer review mining (Hu and ters erroneous modifiers (adjectives) in term can- Liu, 2004; Dave et al., 2003). It is best described didates. Our method exploits information obtained as a sentiment analysis task, where the goal is from pros/cons summaries of customer reviews. to summarize the opinions expressed in customer – Our best configuration improves over a state-of- reviews. Typically, the problem is decomposed the-art baseline by up to 7 percentage points. into three subtasks: 1) identify mentions of rele- The remainder of the paper is organized as fol- vant product aspects, 2) identify sentiment expres- lows: In Section 2, we cover related work, setting sions and determine their polarity, and 3) aggre- focus on unsupervised approaches. Section 3 de- gate the sentiments for each aspect. In this paper, scribes the TE methods we examine in this study. we only consider the first subtask, i.e., finding rel- Section 4 introduces our evaluation datasets and evant product aspects in reviews. Section 5 presents experiments and results. We More precisely, we define the problem setting summarize and conclude in Section 6. as follows: Input is a homogeneous collection of 1Nominal mentions account for over 80% of all mentions customer reviews, i.e., all reviews refer to a sin- in our datasets. Also in other corpora, the ratio is quite simi- gle product type (e.g., digital cameras or hotels). lar, e.g., (Kessler et al., 2010). 222 Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 222–230, Sofia, Bulgaria, August 8-9 2013. c 2013 Association for Computational Linguistics product aspect detection in reviews focus of as a corpus. They assess a term candidate’s do- this paper main relevance by computing the pointwise mu- sentence level mention level can implement tual information (PMI) (Zernik, 1991) between the candidate term and some predefined phrases multi-class classification topic modeling sequence labeling lexicon-based (supervised) (unsupervised) (supervised) (unsupervised) that are associated with the product type. The PMI text categorization information extraction score is used to prune term candidates. Figure 1: Conceptual overview of related work in A further approach is to utilize a contrastive product aspect detection. background corpus to determine the domain relevance of terms. For instance, Yi et al. (2003) use the likelihood ratio test (LRT) to compute a confi- 2 Related Work dence value that a term candidate originates from the relevant review corpus. The computed score is Figure 1 provides a conceptual overview of differ- used to rank term candidates. Also Scaffidi et al. ent tasks and approaches in the research area. Ba- (2007) follow the basic idea of using a contrastive sically, we differentiate related work by the granu- corpus, but simply compare relative frequency larity of analysis, distinguishing between sentence ratios instead of computing a confidence value. level and mention level analysis. While at the sen- Other exemplary works consider the utility of sta- tence level, the goal is to decide whether a given tistical language models (Wu et al., 2009), pro- sentence refers to one or more predefined aspects, pose latent semantic analysis (Guo et al., 2009), fine-grained mention level analysis aims at discov- or examine a double propagation approach that ering each individual mention of a relevant prod- leverages the correlation between product aspects uct aspect (e.g., “The image stabilization works and sentiment bearing words (Zhang et al., 2010). well, but I didn’t like the poor battery life.”). Product aspect lexicons may also be created man- We address aspect detection at the mention ually, e.g., Carenini et al. (2005) or Bloom et al. level and our methods fall into the category of (un- (2007) follow this approach. Naturally, a manual supervised) lexicon-based approaches. In con- approach does not scale well across domains. trast to supervised methods, lexicon-based approaches do not rely on labeled training data and 2.2 Assessment of Lexicon-Based Approaches thus scale better across domains2. The common approach is to crawl a corpus of reviews and to Our goal in this section is to select a state-of-the apply frequency-based methods to extract a lex- art method that we can use as a baseline in our icon of product aspects from the dataset. Ap- experiments. Unfortunately, it is quite difficult to proaches differ in the way corpus statistics are assess the relative performance of the different ap- computed and to which extent linguistic features proaches as the evaluation datasets and method- are exploited. Section 2.1 briefly describes the ologies often vary. Popescu and Etzioni (2005) most relevant previous works and Section 2.2 pro- compare their results to the method by Hu and vides an assessment of the different approaches. Liu (2004) and report significantly improved results. However, their method relies on the private 2.1 Creating Product Aspect Lexicons “Know-it-all” information extraction system and is therefore not suited as a baseline. Scaffidi et al. Hu and Liu (2004) cast the problem as a frequent (2007) only assess the precision of the extracted itemset mining task and apply the well-known aspect lexicon. Their methodology does not al- Apriori algorithm (Agrawal and Srikant, 1994). low to measure recall, which renders their compar- 3 Inherent drawbacks of this approach are heuris- ison to Hu’s method rather useless4. Furthermore, tically treated in a post-processing step. the results are quite questionable as the number of Whereas Hu and Liu’s method exclusively ex- extracted aspects is extremely small (8-12 aspects amines documents of the input collection, Popescu compared to around thousand with our approach). and Etzioni (2005) propose to incorporate the Web Also Yi et al. (2003) only report results of an in- trinsic evaluation for their LRT-approach. A sys- 2For instance, (Jakob and Gurevych, 2010) report that F- scores for their sequence labeling method decrease by up to tematic comparison of Hu’s frequent itemset min- 25 percentage points in cross domain settings. 3The word order is not recognized and sub-terms of terms 4Without considering recall, the precision can easily be are not necessarily valid terms in natural language. tweaked by adjusting threshold values. 223 linguistic candidate which are directly discarded. Defining too restric- candidate ranking term document pre-processing filtering and selection collection dictionary tive filters may lower the recall, whereas too un- constrained filters may decrease the precision. Part-of-Speech Tag Filter We experiment with candidate variant two POS-tag filters: BNP1 and BNP2. As a base- acquisition manual revision aggregation and counting line (BNP1), we use the “base noun phrase pattern” proposed in (Yi et al., 2003): Figure 2: Pipeline architecture of a TE system. BNP1 := NN |NN NN |JJ NN |NN NN NN | JJ NN NN |JJ JJ NN ing and Yi’s LRT-approach is conducted by Jakob It restricts candidates to a maximum length of (2011). His results show that “the Likelihood Ra- three words (adjectives or nouns), where adjec- tio Test based approach generally yielded better tives must only occur as pre-modifiers to nouns. results”. In the absence of other valid compara- As an alternative, we examine the utility of a more tive studies, we therefore select the LRT-approach relaxed pattern (BNP2).

Load more