AWESOME – A Data Warehouse-based System for Adaptive Recommendations

Andreas Thor Erhard Rahm University of Leipzig, Germany {thor, rahm}@informatik.uni-leipzig.de

Abstract There are many ways to automatically generate rec- ommendations taking into account different types of in- Recommendations are crucial for the success of formation (e.g. product characteristics, user characteris- large . While there are many ways to de- tics, or buying history) and applying different statistical or termine recommendations, the relative quality of approaches ([JKR02], [KDA02]). Sample these recommenders depends on many factors approaches include recommendations of top-selling prod- and is largely unknown. We propose a new clas- ucts (overall or per product category), new products, simi- sification of recommenders and comparatively lar products, products bought together by customers, evaluate their relative quality for a sample web- products viewed together in the same web session, or site. The evaluation is performed with products bought by similar customers. Obviously, the AWESOME (Adaptive website recommenda- relative utility of these recommendation approaches (rec- tions), a new data warehouse-based recommen- ommenders for short) depends on the website, its users dation system capturing and evaluating user and other factors so that there cannot be a single best ap- feedback on presented recommendations. More- proach. Website developers thus have to decide about over, we show how AWESOME performs an which approaches they should support and where and automatic and adaptive closed-loop website op- when they should be applied. Surprisingly, little informa- timization by dynamically selecting the most tion is available in the open literature on the relative qual- promising recommenders based on continuously ity of different recommenders. Hence, one focus of our measured recommendation feedback. We pro- work is an approach for comparative quantitative evalua- pose and evaluate several alternatives for dy- tions of different recommenders. namic recommender selection including a power- Advanced websites, such as Amazon [LSY03], sup- ful approach. port many recommenders but apparently are unable to select the most effective approach per user or product. 1 Introduction They overwhelm the user with many different types of recommendations leading to huge web pages and reduced Recommendations are crucial for the success of large web usability. While commercial websites often consider the sites to effectively guide users to relevant information. E- buying behaviour for generating recommendations, the commerce sites offering thousands of products cannot usage (navigation) behaviour on the website remains solely rely on standard navigation and search features but largely unexploited. We believe this a major shortcoming need to apply recommendations to help users quickly find since the navigation behaviour contains detailed informa- “interesting” products or services. With many users and tion on the users’ interests not reflected in the purchase products manual generation of recommendations is much data. Moreover, the web usage behaviour contains valu- too laborious and ineffective. Hence a key question be- able user feedback not only on products or other content comes how should recommendations be generated auto- but also on the presented recommendations. The utiliza- matically to optimally serve the users of a website. tion of this feedback to automatically and adaptively im- prove recommendation quality is a major goal of our Permission to copy without fee all or part of this material is granted work. provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the AWESOME (Adaptive website recommendations) is a publication and its date appear, and notice is given that copying is by new data warehouse-based website evaluation and rec- permission of the Very Large Data Base Endowment. To copy ommendation system under development at the University otherwise, or to republish, requires a fee and/or special permission from of Leipzig. It contains an extensible library of recommen- the Endowment Proceedings of the 30th VLDB Conference, der algorithms that can be comparatively evaluated for Toronto, Canada, 2004 real websites based on user feedback. Moreover,

384 Record web access and recommendation feedback Recommendations Rec. Filter

Current context Application Server

User history Recommender Selection Recom- Recommender Precomputed Selection Presented Library mender Recommendations Recommendations Web Usage Analysis Recommendation User history Content User Web Data Data feedback Data Warehouse User history & Recommender ETL process Recommendation feedback Evaluation

Figure 1: AWESOME architecture AWESOME can perform an automatic closed-loop web- lated work is briefly reviewed in Section 6 before we con- site optimization by dynamically selecting the most prom- clude. ising recommenders for a website access. This selection is based on the continuously measured recommendation 2 Architecture quality of the different recommenders so that AWESOME automatically adapts to changing user interests and chang- 2.1 Overview ing content. To support high performance and scalability, quality characteristics of recommenders and recommenda- Fig. 1 illustrates the overall architecture of AWESOME tions are largely precomputed. AWESOME is fully op- which is closely integrated with the application server erational and in continuous use at a sample website; adop- running the website. AWESOME is invoked for every tion to further sites is in preparation. website access, specified by a so-called context including The main contributions of this paper are as follows: information from the current HTTP request such as URL, - Presentation of the AWESOME architecture for timestamp and user-related data. For such a context, warehouse-based recommender evaluation and for AWESOME dynamically generates a list of recommenda- scaleable adaptive website recommendations tions which are displayed by the application server to- - A new classification of recommenders for websites gether with the requested website content. Recommenda- supporting a comparison of different approaches. tions are automatically determined by a variety of algo- We show how sample approaches fit the classifica- rithms from an extensible recommender library. The rec- tion and propose a new recommender for users ommenders use information on the usage history of the coming from search engines. website and additional information maintained in a web - A comparative quantitative evaluation of several data warehouse. The recommendations are subject to a recommenders for a sample website. The consid- final filter step to avoid the presentation of unsuitable or ered recommenders cover a large part of our classi- irrelevant recommendations (e.g., recommendation of the fication’s design space. current page or the homepage). - Description and comparative evaluation of several Dynamic selection of recommendations is a two-step rule-based approaches for dynamic recommender process. For a given context, AWESOME first selects the selection. In particular, a machine learning ap- most appropriate recommender(s). This recommender proach for feedback-based recommender selection selection is controlled by a moderate number of selection is presented. rules. For evaluation purposes, we support several selec- In the next section we present the AWESOME archi- tion strategies for determining and adapting these rules, in tecture and the underlying data warehouse approach. We particular automatic approaches based on user feedback then outline our recommender classification and sample on previously presented recommendations. This recom- recommenders (Section 3). Section 4 contains the com- mendation feedback is also recorded in the web data parative evaluation of several recommenders for a non- warehouse. For the chosen recommender(s), the best rec- commercial website. In Section 5 we describe and evalu- ommendations for the current context are selected in the ate approaches for dynamic recommender selection. Re- second step. For performance reasons, these recommenda-

385 tions are precomputed (and periodically refreshed) and they have been followed. We thus decided to use tailored can thus quickly be looked up at runtime. application server logging to record this information. Ap- Separating the selection of recommenders and recom- plication server logging also enables us to apply effective mendations makes it easy to add new recommenders. approaches for session and user identification and early Moreover, using recommendation feedback at the level of elimination of crawler accesses, thus supporting high data recommenders is simpler and more stable than trying to quality. use this feedback for individual recommendations, e.g. The AWESOME extensions of the application server specific web pages or products. One problem with the are implemented by PHP programs and run together with latter approach is that individual pages/products are fre- standard web servers such as Apache. We use two log quently added and that there is no feedback available for files: a web usage and a recommendation log file with the such new content. Conversely, removing content would formats shown in Table 1. The recommendation log file result in a loss of the associated recommendation feed- records all presented recommendations and is required for back. our recommender evaluation. It allows us to determine AWESOME is based on a comprehensive web data positive and negative user feedback, i.e. whether or not a warehouse integrating information on the website struc- presented recommendation was clicked. For each pre- ture and content (e.g., product catalog), website users and sented recommendation, we also record the relative posi- customers, the website usage history and recommendation tion of the recommendation on the page, the generating feedback. The application server continuously records the recommender and the used selection strategy. users’ web accesses and which presented recommenda- The web usage log file adds two elements to the stan- tions have been and which ones have NOT been followed. dard Common Log Format (CLF) of common web serv- During an extensive ETL (extract, transform, load) proc- ers: session ID and user ID. The session ID is generated ess (including data cleaning, session and user identifica- by the application server and stored inside a temporary tion) the usage data and recommendation feedback is cookie on the user’s computer (if enabled). These cookies added to the warehouse. allow a highly reliable session reconstruction [SMB+03]. The warehouse serves several purposes. Most impor- If the user does not accept temporary cookies, we use tantly it is the common data platform for all recommen- heuristic algorithms using client and referrer information ders and keeps feedback for the dynamic recommender for session identification [CMS99]. User IDs are stored selection thus enabling an automatic closed-loop website inside permanent cookies and are used for user identifica- optimization. However, it can also be used for extensive tion. If the user does not accept permanent cookies, user offline evaluations, e.g., using OLAP tools, not only for recognition is not done. About 85% of the users of our web usage analysis but also for a comparative evaluation prototype website accept at least temporary cookies. of different recommenders and of different strategies for Hence, cookies support a good balance between data qual- recommender selection. This functionality of AWESOME ity and user acceptance, in contrast to client side tracking allows us to systematically evaluate the various ap- approaches requiring application of java applets or the proaches under a large range of conditions. It is also an like [SBF01]. important feature for website designers to fine-tune the We use several approaches to detect and eliminate recommendation system, e.g. to deactivate or improve web crawler requests. First, we utilize an IP address list of less effective recommenders. known crawlers (e.g., from Google) to avoid logging their The current AWESOME implementation runs on dif- ferent servers. The warehouse is on a dedicated machine Page requested page Date, Time date and time of the request running MS SQL server. The recommendation engine Client IP address of the user’s computer runs on a Unix-based application server where the pre- Referrer referring URL computed recommendations and selection rules are main- Session ID session identifier tained in a MySQL database. In the following, we provide User ID user identifier some more details on the ETL process and the warehouse. More information on the recommenders and selection a) Web usage log strategies are presented in the subsequent sections. Pageview ID page view where recommen-dation has been presented 2.2 ETL process, data warehouse Recommendation recommended content The ETL workflow to refresh the data warehouse is exe- Position position of this recommendation inside a recommendation list cuted periodically, e.g. once a day. It processes the web Recommender recommender that generated the log files of the application server and other data sources recommendation (e.g., on the website and users). The standard log files of Strategy strategy that selected the applied rec- web servers are not sufficient for our purposes because to ommender obtain sufficient recommendation feedback we need to b) Recommendation log record all presented recommendations and whether or not Table 1: Log file formats

386 Dimension tables Fact table Recommendation Dimension tables 3 Recommenders A recommender generates for a given web page request, Content ID Content Type 1 Recommender ID Recommender specified by a context, an ordered list of recommenda- Recommendation ID tions. Such recommendations link to current website con- Date ID Region tent, e.g. pages describing a product or providing other Content Type 2 Time ID Region ID information or services. Recommendations usually are User ID User presented as titled links with a short description or pre- Date Customer ID view. Session ID Customer To calculate recommendations, recommenders can Time Accepted make use of the information available in the context as Viewed Purchased well as additional input, e.g. recorded purchase and web usage data. We distinguish between three types of context

information relevant for determining recommendations: Figure 2: Schema subset for recommendations (simplified) - Current content, i.e. the currently viewed content accesses. In addition, AWESOME eliminates all sessions (page view, product, …) and its related information containing special page views that can only be reached by such as content categories following links invisible to humans. Finally, we analyze - Current user, e.g. identified by a cookie, and associ- the navigation behavior and attributes like session length, ated information, e.g. her previous purchases, previous average time between two requests and number of re- web usage, interest preferences, or current session quests with blank referrer to distinguish between human - Additional information available from the HTTP re- user sessions and web crawler sessions [TK00]. quest (current date and time, user’s referrer, …) The web data warehouse is a relational database with a “galaxy” schema consisting of several fact tables sharing 3.1 Recommender classification several dimensions. Like in previous approaches on web Given the many possibilities to determine recommenda- usage analysis [KM00] we use separate fact tables for tions, there have been several attempts to classify recom- page views, sessions, and – for commercial sites – menders ([Bu02], [KDA02], [SKR01], [TH01]). These purchases. In addition we use a recommendation fact table classifications typically started from a given set of recom- as shown in Fig. 2. The details of the dimension and fact menders and tried to come up with a set of criteria cover- tables depend on the website, e.g. on how the content ing all considered recommenders. This led to rather com- (e.g., products), users or customers are categorized. In the plex and specialized classifications with criteria that are example of Fig. 2 there are two content dimensions for only relevant for a subset of recommenders. Moreover, different hierarchical categorizations. Other dimensions new recommenders can easily require additional criteria such as user, customer, region and date are also hierarchi- to keep the classification complete. For example, cally organized to allow evaluations at different levels of [SKR01] introduce a large number of specialized criteria detail. The recommendation fact table represents the posi- for e-commerce recommenders such as input from target tive and negative user feedback on recommendations. customers, community inputs, degree of personalization, Each record in this table refers to one presented recom- etc. mendation. The ID attributes are foreign keys on the vari- To avoid these problems we propose a general top- ous dimension tables and identify the recommender algo- level classification of website recommenders focusing on rithm, the recommended content, customer and details of the usable input data, in particular the context informa- the context (content, user, time, etc.) for which the rec- tion. This classification may be refined by taking addi- ommendation was presented. Three Boolean measures tional aspects into account, but already leads to a distinc- are used to derive recommendation quality metrics (see tion of major recommender types thus illustrating the de- Section 4). Accepted indicates whether or not the recom- sign space. Moreover, the classification helps to compare mendation was directly accepted (clicked) by the user, different recommenders and guides us in the evaluation of while Viewed specifies whether the recommended content different approaches. was viewed later during the respective session (i.e. the Fig. 3 illustrates our recommender classification and recommendation was a useful hint). Purchased is only indicates where sample approaches fit in. We classify rec- used for e-commerce websites to indicate whether or not ommenders based on three binary criteria, namely the recommended product was purchased during the cur- whether or not they use information on the current con- rent session. tent, the current user, and recorded usage (or purchase) The ETL process updates all affected warehouse di- history of users. This leads to a distinction of eight types mension and fact tables. Moreover, the quality metrics for of recommenders (Fig. 3). We specify each recommender all recommenders as well as the rules for dynamic recom- type by a three-character-code describing whether (+) or mender selection are updated(see Sections 4 and 5). not (–) each of the three types of information is used. For instance, type [+,+,–] holds for recommenders that use

387 Recommender

Current content No Yes

Current user No Yes No Yes

User history No Yes No Yes No Yes No Yes

Type [–,–,–] [–,–,+] [–,+,–] [–,+,+] [+,–,–] [+,–,+] [+,+,–] [+,+,+]

Sample •Most recent •Most freq. •„New for •Collaborative •Similarity •Association •„New for you“ •Association approaches •Fixed, e.g. viewed you“ Filtering •Random in rules (e.g., in current rules in a special offers (purchased) •Search •Usage current most freq. category user group •Random •Highest engine profiles category successor) increase rate recom- mender

Figure 3: Top-level classification of recommenders information on the current content and current user, but patterns of website users. Simple recommenders of type do not take into account user history. [–,–,+] recommend the most frequently purchased/viewed The first classification criteria considers whether or content (top-seller) or the content with the highest recent not a recommender uses the current content, i.e. the cur- increase of interest. rently requested page or product. A sample content-based While not made explicit in the classification, recom- approach (type [+,–,–]) is to recommend content that is menders can utilize additional information than on current most similar to the current content, e.g. based on text- content, current user or history, e.g. the current date or based similarity metrics such as TF/IDF. Content-based time. Furthermore, additional classification criteria could recommenders may also use generalized information on be considered, such as metrics used for ranking recom- the content category (e.g., to recommend products within mendations (e.g. similarity metrics, relative or absolute the current content category). Sample content-insensitive access/purchase frequencies, recency, monetary metrics, recommenders (type [–,–,–]) are to recommend the most etc.) or the type of analysis algorithm (simple statistics, recent content, e.g. added within the last week, or to give association rules, clustering, etc.). a fixed recommendation at each page, e.g. for a special offer. 3.2 Additional approaches At the second level we consider whether or not a rec- Interesting recommenders often consider more than one of ommender utilizes information on the current user. User- the three main types of user input. We briefly describe based approaches could thus provide special recommen- some examples to further illustrate the power and flexibil- dations for specific user subsets, e.g. returning users or ity of our classification and to introduce approaches that customers, or based on personal interest profiles. Recom- are considered in our evaluation. menders could also recommend content for individuals, [+,–,+]: Association rule based recommenders such as e.g. new additions since a user’s last visit (“New for “Users who bought this item also bought …”, made fa- you”). We developed a new recommender of type [–,+,–] mous by Amazon [LSY03], consider the current content for users coming from a search engine such as Google. (item) and purchase history but are independent of the This search engine recommender (SER) utilizes that the current user (i.e. every user sees the same recommenda- HTTP referrer information typically contains the search tions for an item). Association rules can also be applied terms (keywords) of the user [KMT00]. SER recommends on web usage history to recommend content which is fre- the website content (different from the current page that quently viewed together within a session. was reached from the search engine) that best matches [–,+,+] Information on navigation/purchase history these keywords. The SER implementation in AWESOME can be used to determine usage profiles [MDL+02] or utilizes a predetermined search index of the website to groups of similar users, e.g. by collaborative filtering quickly provide the recommendations at runtime. approaches. Recommenders can assign the current user to With the third classification criteria we differentiate a user group (either based on previous sessions or the cur- recommenders by their use of user history information. rent session) and recommend content most popular for For commercial sites, recommenders can consider infor- this group. mation on previous product purchases of customers. An- In our evaluation we test a personal interests recom- other example is the evaluation of the previous navigation mender, which is applicable to returning users. It deter-

388 mines the most frequently accessed content categories per http://dbs.uni-leipzig.de user as an indication of her personal interests. When the (3109 available pages/2000 page views daily) user returns to the website, the most frequently accessed content of the respective categories is recommended. [+,+,+] A recommender of this type could use both Study Research Navigation ... user groups (as discussed for [–,+,+]) and association (89%/82%) (6%/8%) (0.8%/6%) rules to recommend the current user those items that were frequently accessed (purchased) by similar users in addi- Course Exercises ... Projects Publi- ... Index Menu ... tion to the current content. Material cations 4 Recommender evaluation Figure 4: Example of content hierarchy The AWESOME prototype presented in Section 2 allows tion is also a viewed recommendation, i.e. PA ⊆ PV ⊆ P us to systematically evaluate recommenders for a given and SA ⊆ SV ⊆ S, so that view rates are always larger than website. In Section 4.2, we demonstrate this for a sample or equal to the acceptance rates. non-commercial website. Before that, we introduce sev- In commercial sites, product purchases are of primary eral metrics for measuring recommendation quality which interest. Note that purchase metrics should be session- are needed for our evaluation of recommenders and se- oriented because the number of page views needed to lection strategies. finally purchase a product is of minor interest. A useful 4.1 Evaluation metrics metric for recommendation quality is the share of sessions SAP containing a purchase that followed an accepted rec- To evaluate the quality of presented recommendations we ommendation of the product. Hence, we define the fol- utilize the Accepted, Viewed, and Purchased measures lowing metric: recorded in the recommendation fact table (Section 2.2). ReommendedPurchaseRate = |SAP| / |S| The first two are always applicable, while the last one Obviously, it holds SAP ⊆ SA ⊆ S. only applies for commercial websites. We further differ- entiate between metrics at two levels of granularity, 4.2 Sample evaluation namely with respect to page views and with respect to We implemented and tested the AWESOME approach for user sessions. recommender evaluation for a sample website, namely the Acceptance rate is a straight-forward, domain- website of our database group (http://dbs.uni-leipzig.de). independent metric for recommendation quality. It indi- We use two content hierarchies and Fig. 4 shows a frag- cates the share of page views for which at least one pre- ment of one of them together with some numbers on the sented recommendation was accepted, i.e. clicked. The relative size and access frequencies. The website contains definition thus is more than 3100 pages and receives about 2000 human AcceptanceRate = |PA| / |P| page views per day (excluding accesses from members of where P is the set of all page views containing a recom- our database group and from crawlers). As indicated in mendation and PA the subset of page views with an ac- Fig. 4, about 89% of the content is educational study ma- cepted recommendation. terial, which receives about 82% of the page views. Analogously we define a session-oriented quality metric We changed the existing website to show two recom- SessionAcceptanceRate = |SA| / |S| mendations on each page so that approx. 4000 recom- where S is the set of all user sessions and SA the set of mendations are presented every day. For each page view sessions for which at least one of the presented recom- AWESOME dynamically selects one recommender and mendations was accepted. presents its two top recommendations (see example in Recommendations can also be considered of good Fig. 5) for the respective context as described in Sec- quality if the user does not directly click them but reaches tion 2. We implemented and included more than 100 rec- the associated content later in the session (hence, the rec- ommenders in our recommender library. Many of them ommendation was a correct prediction of user interests). are variations of other approaches, e.g. considering differ- Let PV be the set of all page views for which any of the ent user categories or utilizing history data for different presented recommendations was reached later in the user periods of time. Due to space constraints we only present session. We define results for the six representative recommenders of differ- ViewRate = |PV| / |P| ent types listed in Table 2, which were already introduced The corresponding metric at the session level is in Section 3. The presented results refer to the period from SessionViewRate = |SV| / |S| December 1st, 2003 until January 31st, 2004. where SV is the set of all user sessions with at least one pageview in PV. Obviously, every accepted recommenda-

389 3.5% Recommender User type Most frequent Type Name New Returning Σ 3.0% SER users users Personal Interests [–,–,–] Most recent (0.42%) (0.00%) (0.38% ) 2.5% Similarity [–,–,+] Most frequent 1.00% 0.62% 0.92% Association Rules 2.0% [–,+,–] SER 2.84% 1.95% 2.79% [–,+,+] Personal Interests – 1.54% 1.54% 1.5% [+,–,–] Similarity 1.65% 0.82% 1.56% 1.0% [+,–,+] Association Rules 1.16% 0.68% 1.08% Σ 1.82% 1.09% 1.69% 0.5%

Table 2: Acceptance rate vs. user type 0.0% Navigation Study Others The AWESOME warehouse infrastructure allows us Figure 6 : Acceptance rate vs. page type to aggregate and evaluate recommendation quality metrics for a huge variety of constellations (combination of di- count that some recommenders are not always applicable. mension attributes), in particular for different recommen- For instance, the personal interests recommender is only ders. For our evaluation we primarily use (page view) applicable for returning users (about 15% for our web- acceptance rates as the most precise metric1. The average site). Similarly, SER can only be applied for users coming acceptance rate for all considered recommenders was from a search engine, 95% of which turned out to be new 1.34%; the average view rate was 14.54%. For sessions users of the website. In Fig. 6 we only show results for containing more than one page view the average session recommenders with a minimum support of 5% i.e. they acceptance rate was 8.16%, and the session view rate was were applied for at least 5% of all page views of the re- 25.24%. These rather low acceptance rates are influenced spective page type. In Table 2 the results for the most re- by the fact that every single web page contains a full navi- cent recommender are shown in parentheses because the gation menu with 78 links (partly nested in sub menus) minimal support could not be achieved due to only few and that we consciously do not highlight recommenda- content additions during the considered time period. tions using big fonts or the like. Note however, that Table 2 shows that new users are more likely to accept reported “click-tru” metrics are in a comparable range recommendations than returning users. Except for the than our acceptance rates [CLP02]. Furthermore, the personal interests recommender, this holds for all recom- absolute values are less relevant for our evaluation than menders. An obvious explanation is that returning users the relative differences between recommenders. (e.g., students for our website) often know where to find Table 2 shows the observed acceptance rates for the relevant information on the website. We also observed six recommenders differentiating that the first page view of a session has a much higher between new and returning users. acceptance rate (4.82 %) than later page views in a ses- Fig. 6 compares the recommenders sion (1.28 %). In the latter value, the last page view of a w.r.t. the current page type. As ex- session is not considered, because its acceptance rate ob- pected there are significant differ- viously equals 02. ences between recommenders. For Fig. 6 illustrates that the relative quality of recommen- our website the search engine rec- ders differs for different contexts. While the SER recom- ommender (SER) achieved the best mender achieved the best average results for Study and average acceptance rates (2.79%), Others pages, the most frequent recommender received followed by the similarity and per- the best user feedback on navigation pages. For study sonal interests recommenders. On pages and non search engine users (when SER is not ap- the other hand, simple approaches plicable), either the personal interests or similarity recom- such as recommending the most mender promise the best recommendation quality. frequently accessed or most recent While these observations are site-specific they illus- content achieved only poor average trate that the best recommender depends on context at- results. tributes such as the current content or current user. A To more closely analyze the dif- careful OLAP analysis may help to determine manually Figure 5: ferences for different user and con- which recommender should be selected in which situa- Recommendation tent types, one has to take into ac- tion. However, for larger and highly dynamic websites screenshot

1 The recommendations presented during a session typically come from different recommenders making the session-oriented 2 Layout aspects and other factors also influence acceptance quality metrics unsuitable for evaluating recommenders. Session rates. For instance, from the two recommendations shown per acceptance rates will be used in Section 5. The Recommended- page the acceptance rate of the first one was about 50% higher PurchaseRate does not apply for non-commercial sites. compared to the second recommendation.

390 { Usertype=’new user’ AND ContentCategory1=’Navigation’ } ➠ ‘Most frequent’ [0.6] { Referrer=’search engine’ } ➠ ‘SER’ [0.8] { Clienttype=’university’ AND Usertype=’returning user’ } ➠ ‘Personal interest’ [0.4]

Figure 7: Examples of selection rules this is difficult and labor-intensive so that recommender The rule-based recommender selection is highly flexi- selection should be automatically optimized. ble. Selection rules allow the dynamic consideration of different parts of the current context, and the weights can 5 Adaptive recommender selection be used to indicate different degrees of certainty. Rules can easily be added, deleted or modified independently AWESOME supports a dynamic selection of recommen- from other rules. Moreover, rules can be specified manu- ders for every website access. This selection is based on ally, e.g. by website editors, or be generated automati- selection rules. Rules may either be manually defined or cally. Another option is a hybrid strategy with automati- automatically generated. We first present the structure and cally generated rules that are subsequently modified or use of selection rules. We then propose two approaches to extended manually, e.g. to enforce specific considera- automatically generate recommendation rules which util- tions. ize recommendation feedback to adapt to changing condi- tions. Finally we present a short evaluation to compare the 5.2 Generating selection rules different approaches for recommender selection. We present two approaches to automatically generate se- 5.1 Rule-based recommender selection lection rules, which have been implemented in AWESOME. Both approaches use the positive and nega- Recommender selection entails the dynamic selection of tive feedback on previously presented recommendations. the most promising recommenders for a given context. The first approach uses the aggregation and query func- Therefore selection rules have the following structure: tionality of the data warehouse to determine selection rules. The second approach is more complex and uses a ContextPattern ➠ recommender [weight] machine learning algorithm to learn the most promising recommender for different context constellations. Here context pattern is a sequence of values from dif- ferent context attributes (which are represented as dimen- 5.2.1 Query-based top recommender sion attributes in our warehouse). Typically, only a subset This approach takes advantage of the data warehouse of attributes is specified implying that there is no value query functionality. It generates selection rules as follows: restriction for the unspecified attributes. On the right hand side of selection rules, recommender uniquely identifies 1. Find all relevant context patterns in the recom- an algorithm of the recommender library and weight is a mendation fact table, i.e. context patterns ex- real number specifying the importance of the rule. Fig. 7 ceeding a minimal support shows some examples of such selection rules. In 2. For every such context pattern P do AWESOME, we maintain all rules in a single Selection- a) Find recommender R with highest acceptance Rules table. rate A Selection rules allow a straight-forward and efficient b) Add selection rule P -> R [A] implementation of recommender selection. It entails a 3. Delete inapplicable rules match step to find all rules with a context pattern match- ing the current context. The rules with the highest weights The first step ensures that only context constellations then indicate the recommenders to be applied. The num- with a minimal number of occurrences are considered. ber of recommenders to choose is typically fixed. In this This is important to avoid generalization of very rare and paper, we focus on the selection of only one recommen- special situations (overfitting problem). Note that step 1 der, i.e. we choose the rule with the highest weight. The checks all possible context patterns, i.e. any of the content SQL query of Fig. 8 can be used to perform the sketched attributes may be unspecified, which is efficiently sup- selection process. Since there may be several matching ported by the CUBE operator (SQL extension: GROUP rules per recommender, the ranking could also be based BY CUBE) [GBL+95]. AWESOME is based on a com- on the average instead of the maximal weight per recom- mercial RDBMS providing this operator. For every such mender. context pattern, we run a query to determine the recom- Example: Consider a new user who reaches the web- mender with the highest acceptance rate and produce a site from a search engine. If her current page belongs to corresponding selection rule. the navigation category, only the first two rules in Fig. 7 Finally, we perform a rule pruning taking into account match. We select the recommender with the highest that we only want to determine the top recommender per weight – SER. context. We observe that for a rule A with a more general

391 context pattern and a higher weight than rule B, the latter SELECT Recommender, MAX (Weight) will never be applied (every context that matches rule B FROM SelectionRules also matches rule A, but A will be selected due to its WHERE ((RuleContextAttribute1 = CurrentContextAttribute1) higher weight). Hence, we eliminate all such inapplicable OR (RuleContextAttribute1 IS NULL)) AND ((RuleContextAttribute2 = CurrentContextAttribute2) rules in step 3 to limit the total number of rules. OR (RuleContextAttribute2 IS NULL)) AND … 5.2.2 Machine-learning approach GROUP BY Recommender Recommender selection can be interpreted as a classifier ORDER BY MAX(Weight) DESC selecting one recommender from a predefined set of rec- ommenders. Hence, machine learning (classification) al- Figure 8: SQL query for selection strategy execution gorithms can be applied to generate selection rules. Our approach utilizes a well-known classification algorithm number of training instances nc. To determine nc, we line- constructing a decision tree based on training instances arly scale the respective acceptance rate from the 0 to 1 (Weka J48 algorithm [WF00]). To apply this approach, range by multiplying it with a constant k and rounding to we thus have to transform recommendation feedback into an integer value. For example, assume 50 page views for training instances. An important requirement is that the the combination of context (“returning user”, “search generation of training data must be completely automatic engine”, “Navigation”, …) and recommender “Most fre- so that the periodic re-calculation of selection rules to quent”. If there are 7 accepted recommendations for this incorporate new recommendation feedback is not delayed combination (i.e. acceptance rate 0,14) and k=100, we by the need of human intervention. add nc =14 identical instances of the combination to the The stored recommendation feedback indicates for training data. This procedure ensures that recommenders each presented recommendation, its associated context with a high acceptance rate produce more training in- attributes, and the used recommender whether or not the stances than less effective recommenders and therefore recommendation was accepted. A naïve approach to gen- have a higher chance to be predicted. erate training instances would simply select a random The resulting set of training instances is the input for sample from the recommendation fact table (Fig. 2), e.g. the classification algorithm producing a decision tree. in the format (context, recommender, accepted). How- With the help of cross-validation, all trainings instances ever, classifiers using such training instances would rarely are simultaneously used as test instances. The final deci- predict a successful recommendation since the vast major- sion tree can easily be rewritten into selection rules. Every ity of the instances may represent negative feedback (> path from the root to a leaf defines a context pattern 98% for the sample website). Ignoring negative feedback where all unspecified context attributes are set to NULL. is also no solution since the number of accepted recom- Each leaf specifies a recommender and the rule weight is mendations is heavily influenced by the different applica- set to the relative fraction of correctly classified instances bility of recommenders and not only by their recommen- provided by the classification algorithm. dation quality. Therefore, we propose a more sophisti- 5.3 Evaluation of selection rules cated approach that determines the number of training instances according to the acceptance rates: To evaluate the effectiveness of the two presented selec- tion strategies we tested them with AWESOME on the 1. Find all relevant feedback combinations (con- sample website introduced in Section 4.2. For comparison text, recommender) purposes we also evaluated two sets of manually specified 2. For every combination c do rules and a random recommender selection giving a total a) Determine acceptance rate for c. Scale and of five approaches (see Table 3). For every user session round it to compute integer weight nc AWESOME uniformly selected one of the strategies; the b) Add instance (context, recommender) nc chosen strategy is additionally recorded in the recommen- times to training data dation log file for evaluation purposes. We applied the 3. Apply decision tree algorithm selection strategies from January 1st until February 25th, 4. Rewrite decision tree into selection rules 2004. Table 4 shows the average number of rules per selec- In Step 1, we do not evaluate context patterns (as in tion strategy as well as the average delay to dynamically the previous approach), which may leave some context select a recommender. Even for the two automatic ap- attributes unspecified. We only consider fully specified proaches the number of rules is moderate (250 – 2000) context attributes and select those combinations exceed- and permitted very fast recommender selection of less ing a minimal number of recommendation presentations. than 20 ms on average for a Unix server (execution time For each such relevant combination c (context, recom- for the query in Fig. 8). mender), we use its acceptance rate to determine the

392 Name Description Top-Rec Automatic strategy of section 5.2.1 (query-based) Decision Tree Automatic strategy of section 5.2.2 (machine learning) Manual 1 The most frequently viewed pages per content category are recommended. For returning users, this category is derived from previous sessions of the current user. For new users, the category of the current page is used. Manual 2 For search engine users, the search engine recommender is applied. Otherwise the content similarity recommen- der (for course material pages) or association rule recommender (for other pages) is selected. Random Random selection of a recommender

Table 3: Tested selection strategies Table 4 also shows the average (page view) accep- i.e. the page type of the first session page view. We ob- tance rates and session acceptance rates for the five selec- serve that the manual strategy is more effective for search tion strategies. The two automatic feedback-based strate- engine users by always applying the SER recommender to gies for recommender selection (Top-Rec, Decision Tree) them. This helped to also get slightly better results for showed significantly better average quality than random new users and sessions starting with an access to study and the first manual policy. The machine learning ap- material. On the other hand, the machine learning ap- proach (Decision Tree) and the Manual2 policy were the proach was significantly more effective for users not com- best strategies. Note that the very effective strategy Man- ing from search engines and returning users. These re- ual2 utilizes background knowledge about the website sults indicate that the automatically generated selection structure and typical user groups (students, researchers) as rules help generate good recommendations in many cases well as evaluation results obtained after an extensive without the need of extensive manual evaluations, e.g. manual OLAP analysis (partially presented in Section using OLAP tools. Still, overall quality can likely be im- 4.2), such as the effectiveness of the search engine recom- proved by adding a few manual rules (with high weight) mender. The fact that the completely automatic machine to incorporate background knowledge. We will investi- learning algorithm achieves comparable effectiveness is gate such hybrid strategies in future work. thus a very positive result. It indicates the feasibility of the automatic closed-loop optimization for generating 6 Related work recommendations and the high value of using feedback to significantly improve recommendation quality without An overview of previous recommendation systems and manual effort. the applied techniques can be found in [JKR02], [KDA02] The comparison of the two automatic strategies shows and [SCD+00]. [LSY03] describes the Amazon recom- that the machine learning approach performs much better mendation algorithms, which are primarily content (item)- than the query-based top recommender approach. The based and also heavily use precomputation to achieve decision tree approach uses significantly fewer rules and scalability to many users. [Bu02] surveys and classifies was able to order the context attributes according to their so-called hybrid recommendation systems which combine relevance. The most significant attributes appear in the several recommenders. To improve hybrid recommenda- upper part of the decision tree and therefore have a big tion systems, [SKR02] proposes to manually assign influence on the selection process. On the other hand, weights to recommenders to influence recommendations. Top-Rec handles all context attributes equally and uses [MN03] presents a hybrid recommendation system many more rules. So recommender selection was fre- quently based on less relevant attributes resulting in 25% poorer acceptance rates. Manual 2 The warehouse infrastructure of AWESOME allows 20% Decision Tree us to analyze the recommendation quality of selection strategies for many conditions, similar to the evaluation of 15% individual recommenders (Section 4.2). Figure 9 shows the session acceptance rates of the best two selection 10% strategies w.r.t. user type, referrer, and entry page type, 5% Strategy Nr. of Selection Acceptance Session rules time rate acceptance rate 0% . e Top-Rec ~ 2000 ~ 14 ms 1.25 % 9.58 % n w rk n s s gi er ion dy er ur n h t u h et Ne ma e ga St Decision Tree ~ 250 ~ 12 ms 1.64 % 12.54 % R ok h Ot Ot rc avi Manual 1 24 ~ 13 ms 0.96 % 7.11 % Bo a N Se Session entry User Type Referrer Manual 2 5 ~ 13 ms 1.84 % 12.47 % page type Random 137 ~ 19 ms 0.89 % 6.51 % Figure 9: SessionAcceptanceRate w.r.t. user type, referrer, and Table 4: Comparison of selection strategies session entry page type.

393 switching between different recommenders based on the Reinforcement learning approaches can be used to con- current page’s position within a website. The Yoda sys- sider user feedback for individual recommendations. A tem [SC03] uses information on the current session of a comparative evaluation of this approach with the pre- user to dynamically select recommendations from several sented two-level scheme is subject to future work. predefined recommendation lists. In contrast to AWESOME, these previous hybrid recommendation sys- 7 Summary tems do not evaluate or use recommendation feedback. [LK01] sketches a simple hybrid recommendation sys- We presented AWESOME, a new data warehouse-based tem using recommendation feedback to a limited extent. website evaluation and recommendation system. It allows They measure which recommendations produced by three the coordinated use of a large number of recommenders to different recommenders are clicked to determine a weight automatically generate website recommendations. Rec- per recommender (with a metric corresponding to our ommendations are dynamically determined by a flexible view rate). These weights are used to combine and rank rule-based approach selecting the most promising recom- recommendations from the individual recommenders. In mender for the respective context. AWESOME supports a contrast to AWESOME negative recommendation feed- completely automatic generation and optimization of se- back and the current context are not considered for recom- lection rules to minimize website administration overhead mender evaluation. Moreover, there is no automatic and quickly adapt to changing situations. This optimiza- closed-loop adaptation but the recommender weights are tion is based on a continuous measurement of user feed- determined by an offline evaluation. back on presented recommendations. To our knowledge, The evaluation of recommendation systems and quan- AWESOME is the first system enabling such a com- titative comparison of recommenders has received little pletely automatic closed-loop website optimization. The attention so far. [KCR02] monitored users that were told use of data warehouse technology and precomputation of to solve certain tasks on a website, e.g. to find specific recommendations support scalability and fast web access information. By splitting users in two groups (with rec- times. ommendations vs. without) the influence of the recom- We presented a simple but general recommender mendation system is measured. Other studies [GH02], classification. It distinguishes eight types of recommen- [HC02] asked users to explicitly rate the quality of rec- ders based on whether or not they consider input informa- ommendations. This approach obviously is labor- tion on the current content, current user and users history. intensive and cannot be applied to compare many differ- To evaluate the quality of recommendations and recom- ent recommenders. menders, we proposed the use of several acceptance rate [GH02] and [SKK+00] discuss several metrics for metrics based on measured recommendation feedback. recommendation quality, in particular the use of the in- We used these metrics for a detailed comparative evalua- formation retrieval metrics precision and recall. The stud- tion of different recommenders and different recommen- ies determine recommendations based on an offline der selection strategies for a sample website. Our results evaluation of web log or purchase data; the precision met- so far indicate that the use of machine learning is most ric, for instance, indicates how many of the recommenda- promising for an automatic feedback-based recommenda- tions were reached within the same session (thus corre- tion selection. The presented policy is able to automati- sponding to our view rate). In contrast to our evaluation, cally determine suitable training data so that its periodic these studies are not based on really presented recom- re-execution to consider new feedback does not require mendations and measured recommendation feedback so human intervention. that the predicted recommendation quality remains un- In future work, we will adopt AWESOME to addi- verified. tional websites, in particular e-shops, to further verify and In [HMA+02] a methodology is presented for evaluat- fine-tune the presented approach. We will investigate hy- ing two competing recommenders. It underlines the im- brid selection strategies where automatically generated portance of such an online evaluation and discusses dif- selection rules are complemented by a limited number of ferent evaluation aspects. Cosley et. al. developed the manually specified rules utilizing site-specific optimiza- REFEREE framework to compare different recommen- tion criteria or specific background knowledge. Finally, ders for the CiteSeer website [CLP02]. Click metrics we will explore specific recommendation opportunities (e.g., how often a user followed a link or downloaded a such as selecting the best recommender for product bun- paper), which are similar to the acceptance rates used in dling (cross-selling). our study, are used to measure recommendation quality. As an alternative to the two-level approach for select- Acknowledgements ing recommendations, we recently started to investigate We thank Nick Golovin and Robert Lokaiczyk for fruitful how to directly determine suitable recommendations discussions and help with the implementation. The first without prior selection of recommenders [GR04]. The author was funded by the German Research Foundation approach requires that different recommenders produce within the Graduiertenkolleg “Knowledge Representa- comparable weights for individual recommendations. tion”.

394 References [LK01] Lim, M., Kim, J.: An Adaptive Recommendation System with a Coordinator Agent. In Proc. 1st Asia- [Bu02] Burke, R.: Hybrid Recommender Systems: Survey Pacific Conference on Web Intelligence: Research and and Experiments. User Modeling and User-Adapted Development, 2001 Interaction 12(4), 2002 [LSY03] Linden, G., Smith, B., York, J.: Amazon.com [CMS99] Cooley, R., Mobasher, B., Srivastava, J.: Data Recommendations: Item-to-Item Collaborative Filter- preparation for mining world wide web browsing pat- ing. IEEE Distributed Systems Online 4(1), 2003 terns. Knowledge and Information Systems. 1(1), [MDL+02] Mobasher, B., Dai, H., Luo, T., Nakagawa, 1999 M.: Discovery and Evaluation of Aggregate Usage [CLP02] Cosley, D., Lawrence, S., Pennock, D. M.: Profiles for Web Personalization. Data Mining and REFEREE: An open framework for practical testing of Knowledge Discovery, Kluwer, 6 (1), 2002 recommender systems using ResearchIndex. Proc. [MN03] Mobasher, B., Nakagawa, M.: A Hybrid Web 28th VLDB conf., 2002 Personalization Model Based on Site Connectivity. [GBL+95] J. Gray, A. Bosworth, A. Layman, H. Pirahesh. Proc. ACM WebKDD Workshop, 2003 Data cube: A relational aggregation operator gener- [SBF01] Shahabi, C., Banaei-Kashani, F., Faruque, J.: A alizing groupby, cross-tab, and sub-total. Proc. of the Reliable, Efficient, and Scalable System for Web Us- 12th EEE International Conference on Data Engineer- age Data Acquisition. Proc. ACM WebKDD Work- ing (ICDE), 1995 shop, 2001 [GH02] Geyer-Schulz, A., Hahsler, M.: Evaluation of [SC03] Shahabi, C., Chen, Y.: An Adaptive Recommenda- Recommender Algorithms for an Internet Information tion System without Explicit Acquisition of User Rele- Broker based on Simple Association rules and on the vance Feedback. Distributed and Parallel Databases Repeat-Buying Theory. Proc. of ACM WebKDD 14(2), 2003 Workshop, 2002 [SCD+00] Srivastava, J., Cooley, R., Deshpande, M., [GR04] Golovin, N., Rahm, E.: Reinforcement Learning Tan, P-T.: Web Usage Mining: Discovery and Appli- Architecture for Web Recommendations. Int. Conf. on cations of Usage Patterns from Web Data. SIGKDD Information Technology (ITCC), 2004 Explorations, (1) 2, 2000 [HC02] Heer, J., Chi, E. H.: Separating the Swarm: Cate- [SKK+00] Sarwar, B., Karypis, G., Konstan, J. , Riedl, J.: gorization Methods for User Sessions on the Web. Analysis of recommendation algorithms for e- Prof. Conf. on Human Factors in Computing Systems, commerce. Proc. of ACM E-Commerce, 2000 2002 [SKR01] Schafer, J.B., Konstan, J. A., Riedl, J.: Elec- [HMA+02] Hayes, C., Massa, P., Avesani, P., Cunning- tronic Commerce recommender applications. Journal ham, P.: An on-line evaluation framework for recom- of Data Mining and Knowledge Discovery, 5 (1/2), mender systems. Proc. of Workshop on Personaliza- 2001 tion and Recommendation in E-Commerce, 2002 [SKR02] Schafer, J. B., Konstan, J. A., Riedl, J.: Meta- [JKR02] Jameson, A., Konstan, J., Riedl, J.: AI Tech- recommendation systems: user-controlled integration niques for Personalized Recommendation. Tutorial at of diverse recommendations. Proc. 11th Int. Conf. on 18th National Conf. on (AAAI), Information and Knowledge Management (CIKM), 2002 2002 [KCR02] Kim, K., Carroll, J. M., Rosson, M. B.: An Em- [SMB+03] Spiliopoulou, M., Mobasher, B., Berendt, B., pirical Study of Web Personalization Assistants: Sup- Nakagawa, M.: A Framework for the Evaluation of porting End-Users in Web Information Systems. Proc. Session Reconstruction Heuristics in Web Usage IEEE 2002 Symp. on Human Centric Computing Lan- Analysis. INFORMS Journal of Computing, Special guages and Environments, 2002 Issue on Mining Web-Based Data for E-Business Ap- [KDA02] Koutri, M., Daskalaki, S., Avouris, N.: Adaptive plications, 15 (2), 2003 Interaction with Web Sites: an Overview of Methods [TH01] Terveen, L., Hill, W.: Human-Computer Collabo- and Techniques. Proc. 4th Int. Workshop on Computer ration in Recommender Systems. In: Carroll, J. (ed.): Science and Information Technologies (CSIT), 2002 Human Computer Interaction in the New Millenium. [KM00] Kimball, R., Merz, R.: The Data Webhouse Tool- New York: Addison-Wesley, 2001 kit – Building Web-Enabled Data Warehouse. Wiley [TK00] Tan, P., Kumar, V.: Modeling of Web Robot Computer Publishing, New York, 2000 Navigational Patterns. Proc. ACM WebKDD Work- [KMT00] Kushmerick, N., McKee, J., Toolan, F.: Toward shop, 2000 zero-input personalization: Referrer-based page rec- [WF00] Witten, I.H., Frank, E.: Data Mining. Practical ommendation. Proc. Int. Conf. on Adaptive Hyperme- Machine Learning Tools and techniques with Java dia and Adaptive Web-based Systems, 2000 implementations. Morgan Kaufmann. 2000

395