A Demo Search Engine for Products
Total Page:16
File Type:pdf, Size:1020Kb
A Demo Search Engine for Products Beibei Li Anindya Ghose Panagiotis G. Ipeirotis [email protected] [email protected] [email protected] Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business, New York University New York, New York 10012, USA ABSTRACT using customer review ratings. This approach has quite a few Most product search engines today build on models of rele- shortcomings. First, it ignores the multidimensional preferences vance devised for information retrieval. However, the decision of consumers. Second, it fails to leverage the information gener- mechanism that underlies the process of buying a product is ated by the online communities, going beyond simple numerical different than the process of locating relevant documents or ratings. Third, it hardly takes into account the heterogeneity objects. We propose a theory model for product search based on of consumers. These drawbacks highly necessitate a recommen- expected utility theory from economics. Specifically, we propose dation strategy for products that can better model consumers' a ranking technique in which we rank highest the products that underlying purchase behavior, to capture their multidimensional generate the highest surplus, after the purchase. We instantiate preferences and heterogeneous tastes. our research by building a demo search engine for hotels that Recommender systems [1] could fix some of these problems takes into account consumer heterogeneous preferences, and also but, to the best of our knowledge, existing techniques still have accounts for the varying hotel price. Moreover, we achieve this limitations: First, most recommendation mechanisms require without explicitly asking the preferences or purchasing histories consumers to log into the system. However, in reality many of individual consumers but by using aggregate demand data. consumers browse only anonymously. Due to the lack of any This new ranking system is able to recommend consumers prod- meaningful, personalized recommendations, consumers do not ucts with \best value for money" in a privacy-preserving manner. feel compelled to login before purchasing. Even when they login, The demo is accessible at http://nyuhotels.appspot.com/ before or after a purchase, consumers are reluctant to give out their individual demographic information due to many reasons (e.g., time constraints, privacy issues, or lack of incentives). Categories and Subject Descriptors Therefore, most context information is missing at the individual H.3.3 [Information Storage and Retrieval]: Information consumer level. Second, for goods with a low purchase frequency Search and Retrieval for an individual consumer, such as hotels, cars, or real estate, there are few repeated purchases we could leverage towards General Terms building a predictive model (i.e., models based on collaborative filtering). Third, and potentially more importantly, as privacy Algorithms, Economics, Experimentation, Measurement issues become increasingly noticeable today, marketers may not be able to observe the individual-level purchase history of each Keywords consumer (or consumer segment). Instead, the only information Consumer Surplus, Economics, Product Search, Ranking, Text available is at an aggregate level (e.g., market share or unit Mining, User-Generated Content, Utility Theory sold). As a consequence, many algorithms that rely on knowing individual-level behavior lack the ability of deriving consumer preferences from such aggregate data. 1. INTRODUCTION Alternative techniques try to identify the \Pareto optimal" It is now widely acknowledged that online search for products set of results [2]. Unfortunately, the feasibility of this approach is increasing in popularity, as more and more users search and diminishes as the number of product characteristics increases. purchase products from the Internet. Most search engines for With more than five or six characteristics, the probability of a products today are based on models of relevance from \clas- point being classified as \Pareto optimal" dramatically increases. sic" information retrieval theory [9] or use variants of faceted As a consequence, the set of Pareto optimal results soon includes search [11] to facilitate browsing. However, the decision mecha- every product. nism that underlies the process of buying a product is different In our work, we design a new ranking system for recommen- from the process of finding a relevant document or object. Cus- dation that leverages economic modeling. We aim at making tomers do not simply seek something relevant to their search, recommendations based on better perception of the underlying but also try to identify the \best" deal that satisfies their specific the \causality" of consumers' purchase decisions. Our algorithm criteria. Today's product search engines provide only rudimen- learns consumer preferences based on the largely anonymous, tary ranking facilities for search results, typically using a single publicly observed distributions of consumer demographics as ranking criterion such as price, best selling, or more recently, well as the observed aggregate-level purchases (i.e., anonymous purchases and market shares in NYC and LA), not by learning Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). Distribution of these papers is limited to classroom use, from the identified behavior or demographics of each individ- and personal use by others. ual. We instantiate our research by building a demo search WWW 2011, March 28–April 1, 2011, Hyderabad, India. ACM 978-1-4503-0637-9/11/03. engine for hotels, using a unique data set containing transac- utility after purchasing a product. This idea naturally gener- tions from Nov. 2008 to Jan. 2009 for US hotels from a major ates a ranking order: The products that generate the highest travel web site. Our extensive user studies, using more than consumer surplus should be ranked on top. 15000 user judgments, demonstrate an overwhelming preference for the ranking generated by our techniques, compared to a 2.2 The BLP Model large number of existing strong baselines. The key for our model is to identify the different product The major contributions of our research are: (1) We present a characteristics and estimate the corresponding weights assigned causal model, based on economic theory, to capture the decision- by consumers towards the characteristics and the price of the making process of consumers, leading to a better understanding product. However, different consumers hold different evalu- of consumer preferences. The causal model relaxes the assump- ations towards the product characteristics and towards the tion of \consistent environment" across training and testing money. To capture the consumer heterogeneity, we use the data sets: we can now have changes in the environment and Random-Coefficient Logit Model [3] (also known as BLP). can predict what should happen under such changes. (2) We This model incorporates consumer heterogeneity by assuming infer personal preferences from aggregate data, in a privacy- that consumers have idiosyncratic tastes towards product char- preserving manner. (3) We propose a ranking method using acteristics. In other words, the coefficients β and α in equation the notion of surplus, which is derived from a \generative" user 1 and 2 are different for each consumer. Based on this, we behavior model. (4) We present an extensive experimental define the utility surplus for consumer i to buy product Xj as study: using six hotel markets, and 15000 user evaluations USi = U (X ) − [U (Ii) − U (Ii − p )] + "i (3) using blind tests, we demonstrate that the generated rankings j h j m m j j are significantly better than existing approaches. X ik k i i = β · xj + ξj − α pj + "j : k |{z} |{z} 2. THEORY MODEL | {z } Utility of money Stochastic error Utility of product i In this section, we first introduce the background of the ex- Here, I is the income of consumer i, pj is the price of product pected utility theory, characteristics-based theory, and economic Xj , Um is the utility of money (parameterized by user specific i surplus. Then we discuss how we leverage these concepts into weight scalar α ), and Uh is the utility of product purchased our setting and empirically estimate our model. (parameterized by user specific weight vector βi). Note that ξ is a product-specific disturbance scalar summarizing unob- 2.1 Background i served characteristics of product Xj , whereas "j is a stochastic Our model is derived from from expected utility and rational choice error term that is assumed to be i.i.d. across products choice theories. A fundamental notion in utility theory is that and consumers in the selection process. The parameters to be each consumer is endowed with an associated utility function estimated are αi and βi, which represent the weights that con- U, which is \a measure of the satisfaction from consumption of sumer i assigns towards \money" and towards different observed various goods and services." The rationality assumption defines product characteristics, respectively. that each person tries to maximize its own utility. The technical details for the model estimation are in [7]. To More formally, assume that the consumer has a choice across better understand our model, let's consider an example. products X1; ::: ;Xn, and each product Xj has a price pj . Buy- ing a product involves the exchange of money for a product. Example 1. Suppose that we have two cities, A and B and Therefore, to analyze the purchasing behavior we need to have two types of consumers: business trip travelers and family trip two components for the utility function: (1) Utility of Product: travelers. City A is a business destination (e.g., New York The utility that the consumer will get by buying the product City) with 80% of the travelers being business travelers and 20% Xj , and (2) Utility of Money: The utility that the consumer families. City B is mainly a family destination (e.g., Orlando) will lose by paying the price pj for product Xj . with 10% business travelers and 90% family travelers. In city A, On one hand, the decision to purchase product Xj generates a we have two hotels: Hilton (A1) and Doubletree (A2).