A Data Warehouse-Based System for Adaptive Website Recommendations

Total Page:16

File Type:pdf, Size:1020Kb

A Data Warehouse-Based System for Adaptive Website Recommendations AWESOME – A Data Warehouse-based System for Adaptive Website Recommendations Andreas Thor Erhard Rahm University of Leipzig, Germany {thor, rahm}@informatik.uni-leipzig.de Abstract There are many ways to automatically generate rec- ommendations taking into account different types of in- Recommendations are crucial for the success of formation (e.g. product characteristics, user characteris- large websites. While there are many ways to de- tics, or buying history) and applying different statistical or termine recommendations, the relative quality of data mining approaches ([JKR02], [KDA02]). Sample these recommenders depends on many factors approaches include recommendations of top-selling prod- and is largely unknown. We propose a new clas- ucts (overall or per product category), new products, simi- sification of recommenders and comparatively lar products, products bought together by customers, evaluate their relative quality for a sample web- products viewed together in the same web session, or site. The evaluation is performed with products bought by similar customers. Obviously, the AWESOME (Adaptive website recommenda- relative utility of these recommendation approaches (rec- tions), a new data warehouse-based recommen- ommenders for short) depends on the website, its users dation system capturing and evaluating user and other factors so that there cannot be a single best ap- feedback on presented recommendations. More- proach. Website developers thus have to decide about over, we show how AWESOME performs an which approaches they should support and where and automatic and adaptive closed-loop website op- when they should be applied. Surprisingly, little informa- timization by dynamically selecting the most tion is available in the open literature on the relative qual- promising recommenders based on continuously ity of different recommenders. Hence, one focus of our measured recommendation feedback. We pro- work is an approach for comparative quantitative evalua- pose and evaluate several alternatives for dy- tions of different recommenders. namic recommender selection including a power- Advanced websites, such as Amazon [LSY03], sup- ful machine learning approach. port many recommenders but apparently are unable to select the most effective approach per user or product. 1 Introduction They overwhelm the user with many different types of recommendations leading to huge web pages and reduced Recommendations are crucial for the success of large web usability. While commercial websites often consider the sites to effectively guide users to relevant information. E- buying behaviour for generating recommendations, the commerce sites offering thousands of products cannot usage (navigation) behaviour on the website remains solely rely on standard navigation and search features but largely unexploited. We believe this a major shortcoming need to apply recommendations to help users quickly find since the navigation behaviour contains detailed informa- “interesting” products or services. With many users and tion on the users’ interests not reflected in the purchase products manual generation of recommendations is much data. Moreover, the web usage behaviour contains valu- too laborious and ineffective. Hence a key question be- able user feedback not only on products or other content comes how should recommendations be generated auto- but also on the presented recommendations. The utiliza- matically to optimally serve the users of a website. tion of this feedback to automatically and adaptively im- prove recommendation quality is a major goal of our Permission to copy without fee all or part of this material is granted work. provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the AWESOME (Adaptive website recommendations) is a publication and its date appear, and notice is given that copying is by new data warehouse-based website evaluation and rec- permission of the Very Large Data Base Endowment. To copy ommendation system under development at the University otherwise, or to republish, requires a fee and/or special permission from of Leipzig. It contains an extensible library of recommen- the Endowment Proceedings of the 30th VLDB Conference, der algorithms that can be comparatively evaluated for Toronto, Canada, 2004 real websites based on user feedback. Moreover, 384 Record web access and recommendation feedback Recommendations Rec. Filter Current context Application Server User history Recommender Selection Recom- Recommender Precomputed Selection Presented Library mender Recommendations Recommendations Web Usage Analysis Recommendation User history Content User Web Data Data feedback Data Warehouse User history & Recommender ETL process Recommendation feedback Evaluation Figure 1: AWESOME architecture AWESOME can perform an automatic closed-loop web- lated work is briefly reviewed in Section 6 before we con- site optimization by dynamically selecting the most prom- clude. ising recommenders for a website access. This selection is based on the continuously measured recommendation 2 Architecture quality of the different recommenders so that AWESOME automatically adapts to changing user interests and chang- 2.1 Overview ing content. To support high performance and scalability, quality characteristics of recommenders and recommenda- Fig. 1 illustrates the overall architecture of AWESOME tions are largely precomputed. AWESOME is fully op- which is closely integrated with the application server erational and in continuous use at a sample website; adop- running the website. AWESOME is invoked for every tion to further sites is in preparation. website access, specified by a so-called context including The main contributions of this paper are as follows: information from the current HTTP request such as URL, - Presentation of the AWESOME architecture for timestamp and user-related data. For such a context, warehouse-based recommender evaluation and for AWESOME dynamically generates a list of recommenda- scaleable adaptive website recommendations tions which are displayed by the application server to- - A new classification of recommenders for websites gether with the requested website content. Recommenda- supporting a comparison of different approaches. tions are automatically determined by a variety of algo- We show how sample approaches fit the classifica- rithms from an extensible recommender library. The rec- tion and propose a new recommender for users ommenders use information on the usage history of the coming from search engines. website and additional information maintained in a web - A comparative quantitative evaluation of several data warehouse. The recommendations are subject to a recommenders for a sample website. The consid- final filter step to avoid the presentation of unsuitable or ered recommenders cover a large part of our classi- irrelevant recommendations (e.g., recommendation of the fication’s design space. current page or the homepage). - Description and comparative evaluation of several Dynamic selection of recommendations is a two-step rule-based approaches for dynamic recommender process. For a given context, AWESOME first selects the selection. In particular, a machine learning ap- most appropriate recommender(s). This recommender proach for feedback-based recommender selection selection is controlled by a moderate number of selection is presented. rules. For evaluation purposes, we support several selec- In the next section we present the AWESOME archi- tion strategies for determining and adapting these rules, in tecture and the underlying data warehouse approach. We particular automatic approaches based on user feedback then outline our recommender classification and sample on previously presented recommendations. This recom- recommenders (Section 3). Section 4 contains the com- mendation feedback is also recorded in the web data parative evaluation of several recommenders for a non- warehouse. For the chosen recommender(s), the best rec- commercial website. In Section 5 we describe and evalu- ommendations for the current context are selected in the ate approaches for dynamic recommender selection. Re- second step. For performance reasons, these recommenda- 385 tions are precomputed (and periodically refreshed) and they have been followed. We thus decided to use tailored can thus quickly be looked up at runtime. application server logging to record this information. Ap- Separating the selection of recommenders and recom- plication server logging also enables us to apply effective mendations makes it easy to add new recommenders. approaches for session and user identification and early Moreover, using recommendation feedback at the level of elimination of crawler accesses, thus supporting high data recommenders is simpler and more stable than trying to quality. use this feedback for individual recommendations, e.g. The AWESOME extensions of the application server specific web pages or products. One problem with the are implemented by PHP programs and run together with latter approach is that individual pages/products are fre- standard web servers such as Apache. We use two log quently added and that there is no feedback available for files: a web usage and a recommendation log file with the such new content. Conversely, removing content would formats shown in Table 1. The recommendation log file result in a loss of the associated recommendation feed- records all presented recommendations and is required for back. our recommender
Recommended publications
  • Reorganisation of Adaptive Websites Using Web Usage Mining Techniques Dr
    International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 Reorganisation of Adaptive Websites using Web Usage Mining Techniques Dr. Ananthi Sheshasaayee#1, V.Vidyapriya#2 #1Head and Associate Professor, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai #2.Research Scholar, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai Abstract— Web Usage Mining is that area of Web Mining In practice, the three Web mining tasks above could which deals with the extraction of interesting knowledge be used in isolation or combined in an application, from logging information produced by Web servers. The especially in Web content and structure mining since motive of mining is to find users’ access models the Web documents might also contain links. Web automatically and quickly from the vast Web log data, such content mining is the process of extracting knowledge as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, from the content of documents or their descriptions. registration information and other relative information left Web document text mining, resource discovery based by user access can be mined with the user access mode on concepts indexing or agent; based technology may which will provide foundation for decision making of also fall in this category. Web structure mining is the organizations. Adaptive web sites are web sites that process of inferring knowledge from the World Wide automatically improve their organization and presentation Web organization and links between references and by learning from their user access patterns.
    [Show full text]
  • Intelligent Web Caching Using Adaptive Regression Trees, Splines, Random Forests and Tree Net
    Intelligent Web Caching Using Adaptive Regression Trees, Splines, Random Forests and Tree Net Sarina Sulaiman, Siti Mariyam Shamsuddin Ajith Abraham Soft Computing Research Group Machine Intelligence Research Labs (MIR Labs) Faculty of Computer Science and Information Systems Washington 98092, USA Universiti Teknologi Malaysia, Johor, Malaysia http://www.mirlabs.org [email protected], [email protected] [email protected] Abstract— Web caching is a technology for improving network sometimes httpd accelerator, to replicate the fact that it caches traffic on the internet. It is a temporary storage of Web objects objects for many clients but normally from one server [1]. This (such as HTML documents) for later retrieval. There are three paper investigates the performance of Classification and significant advantages to Web caching; reduced bandwidth Regression Trees (CART), Multivariate Adaptive Regression consumption, reduced server load, and reduced latency. These Splines (MARS), Random Forest (RF) and TreeNet (TN) for rewards have made the Web less expensive with better classification in Web caching. performance. The aim of this research is to introduce advanced machine learning approaches for Web caching to decide either to The rest of the paper is organised as follows: Section 2 cache or not to the cache server, which could be modelled as a describes the related works, followed by introduction about the classification problem. The challenges include identifying attributes machine learning approaches in Section 3. Section 4 on ranking and significant improvements in the classification experimental setup and Section 5 illustrates the performance accuracy. Four methods are employed in this research; evaluation of the proposed approaches. Section 6 discusses the Classification and Regression Trees (CART), Multivariate Adaptive result from the experiment and finally, Section 7 concludes the Regression Splines (MARS), Random Forest (RF) and TreeNet article.
    [Show full text]
  • A Study of Design Aspects of Web Personalization for Online Users in India
    A Study of Design Aspects of Web Personalization for Online Users in India A Thesis submitted to Gujarat Technological University for the Award of Doctor of Philosophy in Management By Darshana Desai Enrollment No: 129990992004 Under supervision of Prof. Dr. Satendra Kumar GUJARAT TECHNOLOGICAL UNIVERSITY AHMEDABAD August - 2017 A Study of Design Aspects of Web Personalization for Online Users in India A Thesis submitted to Gujarat Technological University for the Award of Doctor of Philosophy in Management By Darshana Desai Enrollment No: 129990992004 under supervision of Prof. Dr. Satendra Kumar GUJARAT TECHNOLOGICAL UNIVERSITY AHMEDABAD August – 2017 © DARSHANA DESAI DECLARATION I declare that the thesis entitled A study of design aspects of web personalization for online users in India submitted by me for the degree of Doctor of Philosophy is the record of research work carried out by me during the period from October 2012 to December 2016 under the supervision of Prof. Dr. Satendra Kumar Professor and Research Head at C. K Shah Vijapurwala Institute of Management & Research(CKSVIM), Vadodara, Gujarat and this has not formed the basis for the award of any degree, diploma, associateship, fellowship, titles in this or any other University or other institution of higher learning. I further declare that the material obtained from other sources has been duly acknowledged in the thesis. I shall be solely responsible for any plagiarism or other irregularities, if noticed in the thesis. Signature of the Research Scholar: ………………………….. Date: …………………. Name of Research Scholar: Darshana Desai Place: Vadodara iv. CERTIFICATE I certify that the work incorporated in the thesis entitled A study of design aspects of web personalization for online users in India submitted by Shri / Smt.
    [Show full text]
  • Applications of Machine Learning in Software Engineering
    APPLICATIONS OF MACHINE LEARNING IN SOFTWARE ENGINEERING Manthan Patel1, Vivek Kumar Prasad2 1Student, Department of Computer Engineering, Nirma University, (India) 2Professor, Department of Computer Engineering, Nirma University, (India) ABSTRACT Machine Learning, the branch of Artificial Intelligence, is an ability for machine to learn, unlearn and relearn things from human activities as well as environment and gives more relevant result or drives towards it. It is more feasible as well as cost effective because it nullifies the manual efforts. It became a well-known topic due to its usages and therefore many people are trying to learn it. This paper mainly focuses on a brief about the different types of applications in software engineering and challenges for development. Keywords: Applications, Challenges, Machine learning I. INTRODUCTION The goal of learning for anything or anyone is to solve the problem from previous records or experiences. Learning for machines is nothing but to become more familiar with the thing in such a way that can help in many ways like weather prediction, recommendation based on the taste, deciding route in traffic, diagnosing samples with most accurate output, etc. Mainly a machine learning does the statistical analysis of data and extract and use most relevant data for solving the current situation or for prediction of future events. If we consider an example of data analysis from a warehouse which produce goods, the machine will take raw data as input and manipulate them such that the data will help the owner to understand as well as predict about profit making products among them so that the overall lose can be reduced.
    [Show full text]
  • Maquetación 1
    DIGIBÍS ® Newsletter. No. 19. January-June, 2018 Information about enriched digitization, software for Libraries, Archives and Museums and international standards . Archivo Carmen Martín Gaite Con el módulo de descripciones archivísticas de DIGIBIB Adapted to all devices The Municipal Archive of Castellón SUMMARY Editorial 3 INTERNATIONAL DIGICLIC DIGIBÍ S ® Newsletter Europeana CEO Tachi Hernando de Larramendi 3rd International EuropeanaTech Conference 4 Project Manager Europeana Business Plan 2018: Xavier Agenjo Bullón CFO democratizing culture 6 Nuria Ruano Penas IT Manager Jesús L. Domínguez Muriel Art Director EVENTS Antonio Otiñano Martínez Sales Manager Courses Javier Mas García Technology Coordinator Course-workshop of the Ignacio Francisca Hernández Carrascal Larramendi Foundation in the AEF 7 Administration Department María Luz Ruiz Rodríguez (coord.) José María Alcega Barroeta IT Department Feli Matarranz de Antonio (coord.) GLAM APPLICATIONS Alejandra Arri Pacheco Andrés Felipe Botero Zapata Julio Diago García Virtual Libraries Carlos Henche Hernández Luis Panadero Guardeño Linked data in the Fernando Román Ortega Innovation Department Virtual Library of Castilla y León 8 Paulo César Juanes Hernández (coord.) Noemí Barbero Urbano Virtual archive of the María Isabel Campillejo Suárez Susana Hernández Rubio municipality of Castellón 10 Montserrat Martínez Guerra Digitization Department The Polymath Virtual Library Francisco Viso Parra (coord.) María José Escuté Serrano present in Wikidata 12 Amando Martínez Catalán Javier Ramos
    [Show full text]