University of Cincinnati

UNIVERSITY OF CINCINNATI Date:___________________ I, _________________________________________________________, hereby submit this work as part of the requirements for the degree of: in: It is entitled: This work and its defense approved by: Chair: _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ A Concept-Based Framework and Algorithms For Recommender Systems by Shriram Narayanaswamy Bachelor of Engineering(Hons.) Electronics and Instrumentation Birla Institute of Technology and Science(BITS), Pilani A Masters thesis submitted to the faculty of University of Cincinnati in partial fulfillment of the requirements for the degree of Master of Science Department of Computer Science University of Cincinnati June 2007 ABSTRACT In today’s consumer driven world, people are faced with the problem of plenty. Choices abound everywhere, be it in movies, books or music. Recommender systems spare the user the frustration of searching for the proverbial needle in the haystack by offering recommendations based on a user’s personal preferences. In this thesis, a generic framework and algorithms for building concept-based recommender systems are presented. A concept-based approach leverages the deep structure of a rating-database and reveals complex, higher level inter-relationships between entities in the data. The algorithms encode user preferences from a ratings database into concepts using collaborative filtering, organize concepts into lattices efficiently, and enable fast querying of the lattices for recommendations. We apply our algorithms on two real-world datasets and demonstrate their capabilities in generating quality recommendations in real-time. ii ACKNOWLEDGMENTS I would to express my heartfelt gratitude to Dr.Raj Bhatnagar, my advisor, for being a constant source of motivation and guiding me through this thesis. I would also like to thank Dr.Ali Minai and Dr.Carla Purdy for their valuable suggestions and feedback. Thanks are due to my friends for all their encouragement. And finally, this work is dedicated to my parents, Ms.Revathi and Mr.Narayanaswamy, and I can only begin being grateful for their unconditional love, support and everything else... iv Contents Table of Contents v List of Figures viii List of Tables x 1 Introduction 1 1.1 Recommender Systems . 1 1.1.1 Why Recommender Systems? . 2 1.1.2 How to make recommendations? . 3 1.2 Concept-based approach . 4 1.3 Problem Statement . 5 1.4 Contributions . 6 1.5 Approach to the problem . 7 2 Related Research 10 2.1 Information Filtering . 10 2.2 Collaborative Filtering . 12 2.3 Concept Based Information Retrieval . 13 3 Framework & Algorithms 14 v 3.1 The Framework . 14 3.2 Components of the Framework . 16 3.3 Concept Generation . 18 3.4 Algorithm for Concept Generation . 18 3.4.1 Description of the algorithm . 20 3.4.2 Illustration of the algorithm . 22 3.4.3 Complexity Analysis . 24 3.5 Concept Partition . 25 3.5.1 Algorithm for Concept Partition . 26 3.5.2 Description of the algorithm . 27 3.5.3 Illustration of the algorithm . 28 3.5.4 Complexity Analysis of the algorithm . 30 3.6 Lattice Generation . 31 3.6.1 Algorithm for lattice generation . 31 3.6.2 Description of the algorithm . 34 3.6.3 Illustration of the algorithm . 35 3.6.4 Correctness Proof of the algorithm . 38 3.7 Lattice Querying . 41 3.7.1 Algorithm for Lattice Querying . 42 3.7.2 Description of the algorithm . 43 3.7.3 Illustration of the algorithm . 44 4 A Joke Recommender System 46 4.1 Components of the UI . 46 5 Case Study - Jester and MovieLens Datasets 52 5.1 Processing the Jester Dataset . 53 vi 5.2 Processing the MovieLens Dataset . 54 5.3 Concept Generation - Identifying repeatedly rated jokes/movies . 54 5.3.1 Exploding space requirements in subspace clustering of SCuBA 55 5.3.2 Results - Basic pair-wise comparisons Vs Optimized algorithm 56 5.4 Timing the Model Building Algorithms . 59 5.5 Space Requirement for the Model Building algorithms . 62 5.6 Real-time performance of Algorithm: Lattice Discover . 64 5.7 An Improved Performance . 67 6 Conclusion and Future Direction 69 6.1 What’s good in a concept based approach? . 69 6.2 Future Direction and Caveats . 70 Bibliography 72 vii List of Figures 1.1 A Sample Concept Lattice . 5 3.1 A framework for concept-based recommender systems . 16 3.2 A sparse dataset and its associated lattice . 26 3.3 Lattice after combining levels 1 and 2 . 36 3.4 Lattice after combining levels 1, 2 and 3 . 37 3.5 Lattice after combining all levels from 1 through 5 . 39 3.6 Example of a typical lattice to search for making recommendations 44 4.1 Main Screen of the GUI Applet for Joke recommendation . 48 4.2 GUI for Joke Recommendation: User has already rated the following jokes: Joke #2, #5 and #6 and is currently viewing joke #7.................................... 49 4.3 GUI for Joke Recommendation: User is viewing recommendations. 50 4.4 The Clear All Button clears all ratings and recommendations and repopulates the Joke Database. 51 5.1 Naive Vs Optimized Concept Generation - Jester . 57 5.2 Naive Concept Generation - MovieLens . 57 5.3 Optimized Concept Generation - MovieLens . 58 viii 5.4 Percentage Pruning: Naive Vs Optimized Concept Generation - Jester dataset . 58 5.5 Percentage Pruning: Naive Vs Optimized Concept Generation - MovieLens dataset . 59 5.6 Jester dataset: Time taken to generate concepts . 60 5.7 MovieLens dataset: Time taken to generate concepts . 60 5.8 Jester dataset: Time taken to generate lattices from concepts . 61 5.9 MovieLens dataset: Time taken for concept generation, partition and lattice building . 61 5.10 Jester dataset: Space required to generate concepts . 63 5.11 Movielens dataset: Space required to generate concepts . 63 5.12 Jester dataset: Space required to generate lattices from concepts . 64 5.13 Jester dataset: Precision Measurement . 66 5.14 Jester dataset: Recall Measurement . 66 5.15 Jester dataset: Query processing time . 67 5.16 MovieLens dataset: Precision Measurement . 68 ix List of Tables 3.1 A typical ratings database . 15 3.2 Concept Generation: Initial database . 22 3.3 Concept Generation: currentTuple = {1 2 3 4}. Iteration 1 . 22 3.4 Concept Generation: currentTuple = {1 2 4}. Iteration 2 . 23 3.5 Concept Generation: currentTuple = {2 3 4}. Iteration 3 . 23 3.6 Concept Generation: currentTuple = {5 6 7}. Iteration 4 . 23 3.7 Concept Generation: currentTuple = {2 3}. Iteration 5 . 23 3.8 Concept Generation: currentTuple = {5 6}. Iteration 6 . 24 3.9 Concept Partition: After Sorting . 28 3.10 Concept Partition: currentConcept = {1 2 4}. Visited[1,2,4] = 0. Iteration 1. conceptSet[1] = (1 2 4) (2 4) (2). 29 3.11 Concept Partition: currentConcept = {2 3 4}. Visited[2 3 4] = 0. Iteration 2. conceptSet[2] = (2 3 4) (2 3) (2 4) (2) . 29 3.12 Concept Partition: currentConcept = {5 6}. Visited[5 6] = 0. Iteration 5. conceptSet[3] = (5 6) (6) . 29 3.13 Concept Partition: currentConcept = {6 7}. Visited[6 7] = 0. Iteration 6. conceptSet[3] = (6 7) (6) . 30 3.14 After partionining . 30 3.15 Lattice Generation: After applying splitByLength() . 36 x Chapter 1 Introduction 1.1 Recommender Systems In today’s consumer driven world people are faced with the problem of plenty. Choices abound everywhere, be it in movies, books or music. Consumers are forced to sift through a number of choices before they discover what they need. This is often time consuming and frustrating. Given today’s fast paced lifestyle, a slow and painstaking search for that elusive item of choice is surely not a sustainable option. People would rather look at items that are customized to their interests and preferences. The bottom line is customers want personalization. But how do we achieve personalization? A good approach is to listen to people whose interests match ours and try things they recommend [17]. Our work adopts this very spirit to make recommendations to users. A recommender system is a platform for providing recommendations to users based on their personal likes and dislikes. The following definition from Wikipedia [1] gives a concise and accurate definition of a recommender system. Recommender systems are a specific type of information filtering (IF) technique 1 that attempt to present to the user information items (movies, music, books, news, web pages) the user is interested in. To do this the user’s profile is compared to some reference characteristics. These characteristics may be from the information item (the content-based approach) or the user’s social environment (the collaborative filtering approach). Let us try and understand the important elements of this definition. A recommender system is described as an information filtering technique that uses reference characteristics to group users together. In the following two sections, we explain how a recommender system functions as an information filtering technique and reference characteristics can be used to make recommendations. We then briefly study concept-based knowledge structures called lattices. Section 1.3 titled Problem Statement gives a precise definition of the problem we are attempt- ing to solve. We then describe our expectations from this work and our contributions. Section 1.5 explains how recommender systems can be built through a concept-based framework. We then conclude the chapter with a brief outline of the rest of the thesis. 1.1.1 Why Recommender Systems? With the advent of the web, people have been engulfed in a sea of knowledge. Un- fortunately, searching through this vast space of knowledge is time consuming and frustrating. However, this cannot trivialize the utility of the web as a significant information source. Hence, massive efforts have been undertaken to present a user with the most pertinent information related to his/her search, as fast as possible.

Load more