UNIVERSITY OF CINCINNATI

Date:______

I, ______, hereby submit this work as part of the requirements for the degree of: in:

It is entitled:

This work and its defense approved by:

Chair: ______

A Concept-Based Framework and Algorithms For Recommender Systems

by

Shriram Narayanaswamy

Bachelor of Engineering(Hons.) Electronics and Instrumentation

Birla Institute of Technology and Science(BITS), Pilani

A Masters thesis submitted to the faculty of

University of Cincinnati

in partial fulfillment of the requirements for the degree of

Master of Science

Department of Computer Science

University of Cincinnati

June 2007 ABSTRACT

In today’s consumer driven world, people are faced with the problem of plenty.

Choices abound everywhere, be it in movies, books or . Recommender systems spare the user the frustration of searching for the proverbial needle in the haystack by offering recommendations based on a user’s personal preferences. In this thesis, a generic framework and algorithms for building concept-based recommender sys- tems are presented. A concept-based approach leverages the deep structure of a rating-database and reveals complex, higher level inter-relationships between entities in the data. The algorithms encode user preferences from a ratings database into concepts using collaborative filtering, organize concepts into lattices efficiently, and enable fast querying of the lattices for recommendations. We apply our algorithms on two real-world datasets and demonstrate their capabilities in generating quality recommendations in real-time.

ii

ACKNOWLEDGMENTS

I would to express my heartfelt gratitude to Dr.Raj Bhatnagar, my advisor, for being a constant source of motivation and guiding me through this thesis. I would also like to thank Dr.Ali Minai and Dr.Carla Purdy for their valuable suggestions and feedback. Thanks are due to my friends for all their encouragement. And finally, this work is dedicated to my parents, Ms.Revathi and Mr.Narayanaswamy, and I can only begin being grateful for their unconditional love, support and everything else...

iv Contents

Table of Contents v

List of Figures viii

List of Tables x

1 Introduction 1

1.1 Recommender Systems ...... 1

1.1.1 Why Recommender Systems? ...... 2

1.1.2 How to make recommendations? ...... 3

1.2 Concept-based approach ...... 4

1.3 Problem Statement ...... 5

1.4 Contributions ...... 6

1.5 Approach to the problem ...... 7

2 Related Research 10

2.1 Information Filtering ...... 10

2.2 ...... 12

2.3 Concept Based Information Retrieval ...... 13

3 Framework & Algorithms 14

v 3.1 The Framework ...... 14

3.2 Components of the Framework ...... 16

3.3 Concept Generation ...... 18

3.4 Algorithm for Concept Generation ...... 18

3.4.1 Description of the algorithm ...... 20

3.4.2 Illustration of the algorithm ...... 22

3.4.3 Complexity Analysis ...... 24

3.5 Concept Partition ...... 25

3.5.1 Algorithm for Concept Partition ...... 26

3.5.2 Description of the algorithm ...... 27

3.5.3 Illustration of the algorithm ...... 28

3.5.4 Complexity Analysis of the algorithm ...... 30

3.6 Lattice Generation ...... 31

3.6.1 Algorithm for lattice generation ...... 31

3.6.2 Description of the algorithm ...... 34

3.6.3 Illustration of the algorithm ...... 35

3.6.4 Correctness Proof of the algorithm ...... 38

3.7 Lattice Querying ...... 41

3.7.1 Algorithm for Lattice Querying ...... 42

3.7.2 Description of the algorithm ...... 43

3.7.3 Illustration of the algorithm ...... 44

4 A Joke 46

4.1 Components of the UI ...... 46

5 Case Study - Jester and MovieLens Datasets 52

5.1 Processing the Jester Dataset ...... 53

vi 5.2 Processing the MovieLens Dataset ...... 54

5.3 Concept Generation - Identifying repeatedly rated jokes/movies . 54

5.3.1 Exploding space requirements in subspace clustering of SCuBA 55

5.3.2 Results - Basic pair-wise comparisons Vs Optimized algorithm 56

5.4 Timing the Model Building Algorithms ...... 59

5.5 Space Requirement for the Model Building algorithms ...... 62

5.6 Real-time performance of Algorithm: Lattice Discover ...... 64

5.7 An Improved Performance ...... 67

6 Conclusion and Future Direction 69

6.1 What’s good in a concept based approach? ...... 69

6.2 Future Direction and Caveats ...... 70

Bibliography 72

vii List of Figures

1.1 A Sample Concept Lattice ...... 5

3.1 A framework for concept-based recommender systems ...... 16

3.2 A sparse dataset and its associated lattice ...... 26

3.3 Lattice after combining levels 1 and 2 ...... 36

3.4 Lattice after combining levels 1, 2 and 3 ...... 37

3.5 Lattice after combining all levels from 1 through 5 ...... 39

3.6 Example of a typical lattice to search for making recommendations 44

4.1 Main Screen of the GUI Applet for Joke recommendation . . . . . 48

4.2 GUI for Joke Recommendation: User has already rated the fol-

lowing jokes: Joke #2, #5 and #6 and is currently viewing joke

#7...... 49

4.3 GUI for Joke Recommendation: User is viewing recommendations. 50

4.4 The Clear All Button clears all ratings and recommendations and

repopulates the Joke Database...... 51

5.1 Naive Vs Optimized Concept Generation - Jester ...... 57

5.2 Naive Concept Generation - MovieLens ...... 57

5.3 Optimized Concept Generation - MovieLens ...... 58

viii 5.4 Percentage Pruning: Naive Vs Optimized Concept Generation -

Jester dataset ...... 58

5.5 Percentage Pruning: Naive Vs Optimized Concept Generation -

MovieLens dataset ...... 59

5.6 Jester dataset: Time taken to generate concepts ...... 60

5.7 MovieLens dataset: Time taken to generate concepts ...... 60

5.8 Jester dataset: Time taken to generate lattices from concepts . . . 61

5.9 MovieLens dataset: Time taken for concept generation, partition

and lattice building ...... 61

5.10 Jester dataset: Space required to generate concepts ...... 63

5.11 Movielens dataset: Space required to generate concepts ...... 63

5.12 Jester dataset: Space required to generate lattices from concepts . 64

5.13 Jester dataset: Precision Measurement ...... 66

5.14 Jester dataset: Recall Measurement ...... 66

5.15 Jester dataset: Query processing time ...... 67

5.16 MovieLens dataset: Precision Measurement ...... 68

ix List of Tables

3.1 A typical ratings database ...... 15

3.2 Concept Generation: Initial database ...... 22

3.3 Concept Generation: currentTuple = {1 2 3 4}. Iteration 1 . . . . 22

3.4 Concept Generation: currentTuple = {1 2 4}. Iteration 2 . . . . . 23

3.5 Concept Generation: currentTuple = {2 3 4}. Iteration 3 . . . . . 23

3.6 Concept Generation: currentTuple = {5 6 7}. Iteration 4 . . . . . 23

3.7 Concept Generation: currentTuple = {2 3}. Iteration 5 ...... 23

3.8 Concept Generation: currentTuple = {5 6}. Iteration 6 ...... 24

3.9 Concept Partition: After Sorting ...... 28

3.10 Concept Partition: currentConcept = {1 2 4}. Visited[1,2,4] = 0.

Iteration 1. conceptSet[1] = (1 2 4) (2 4) (2)...... 29

3.11 Concept Partition: currentConcept = {2 3 4}. Visited[2 3 4] = 0.

Iteration 2. conceptSet[2] = (2 3 4) (2 3) (2 4) (2) ...... 29

3.12 Concept Partition: currentConcept = {5 6}. Visited[5 6] = 0.

Iteration 5. conceptSet[3] = (5 6) (6) ...... 29

3.13 Concept Partition: currentConcept = {6 7}. Visited[6 7] = 0.

Iteration 6. conceptSet[3] = (6 7) (6) ...... 30

3.14 After partionining ...... 30

3.15 Lattice Generation: After applying splitByLength() ...... 36

x Chapter 1

Introduction

1.1 Recommender Systems

In today’s consumer driven world people are faced with the problem of plenty.

Choices abound everywhere, be it in movies, books or music. Consumers are forced to sift through a number of choices before they discover what they need. This is often time consuming and frustrating. Given today’s fast paced lifestyle, a slow and painstaking search for that elusive item of choice is surely not a sustainable option. People would rather look at items that are customized to their interests and preferences. The bottom line is customers want personalization. But how do we achieve personalization? A good approach is to listen to people whose interests match ours and try things they recommend [17]. Our work adopts this very spirit to make recommendations to users.

A recommender system is a platform for providing recommendations to users based on their personal likes and dislikes. The following definition from Wikipedia

[1] gives a concise and accurate definition of a recommender system.

Recommender systems are a specific type of information filtering (IF) technique

1 that attempt to present to the user information items (movies, music, books, news,

web pages) the user is interested in. To do this the user’s profile is compared to

some reference characteristics. These characteristics may be from the information

item (the content-based approach) or the user’s social environment (the collabora-

tive filtering approach).

Let us try and understand the important elements of this definition. A rec-

ommender system is described as an information filtering technique that uses reference characteristics to group users together. In the following two sections, we explain how a recommender system functions as an information filtering tech- nique and reference characteristics can be used to make recommendations. We then briefly study concept-based knowledge structures called lattices. Section 1.3 titled Problem Statement gives a precise definition of the problem we are attempt- ing to solve. We then describe our expectations from this work and our contri- butions. Section 1.5 explains how recommender systems can be built through a concept-based framework. We then conclude the chapter with a brief outline of the rest of the thesis.

1.1.1 Why Recommender Systems?

With the advent of the web, people have been engulfed in a sea of knowledge. Un- fortunately, searching through this vast space of knowledge is time consuming and frustrating. However, this cannot trivialize the utility of the web as a significant information source. Hence, massive efforts have been undertaken to present a user with the most pertinent information related to his/her search, as fast as possible.

Search engines are a shining example of such endeavors. Search engines recom- mend web pages to a user based on the information users seek. In today’s world, we are flooded by choices in a wide variety of things, not just webpages. There 2 are hundreds of thousands of movies, books, news articles and songs to choose from. Depending on our likes and dislikes, much of this is unwanted, irrelevant or redundant. Recommender systems help users wade through a complex sea of choices and show them items they may like. In this sense, recommender systems function as filters, filters that show us only that we desire! Examples of popular recommender systems are Pandora [2], MovieLens [3] and Reader’s Robot [4].

1.1.2 How to make recommendations?

The fundamental problem in information filtering is computing whether a given item is likely to interest a user or not. The outcome of such a computation is either boolean, a crisp yes or no, or a score that represents the degree to which a person may like that item. Such a score helps determine if an item can be recommended to a user or not. Quantifying a user’s taste can be achieved through reference characteristics. Reference characteristics such as, an item’s utility and its aesthetic value are reasons why an item may appeal to a user. In machine learning, these reference characteristics are often referred to as features.

Based on the nature of reference characteristics, two broad categories of infor- mation filtering exist namely content-based filtering, and collaborative filtering.

The ideology of content-based filtering is that the content of an entity determines whether a given user likes it, or not. Collaborative filtering, on the other hand, is a less descriptive approach. It does not strive to look at why someone likes a certain item. Since people who rate items in common have some underlying commonality, collaborative filtering groups these people together. Now each group consists of users, and items that appeal to like-minded people, and a new user who joins this group is recommended items from this list.

3 1.2 Concept-based approach

A lattice is a structure that provides a natural way to formalize and study the or- dering of objects. The ability to order items based on certain constraints is crucial to a recommender system as it helps evaluate items and choose one recommen- dation over another. These constraints are usually dependent on the specificity of a user’s query and his/her need for detail in the response. A lattice structure orders sets of items based on the level of detail (or granularity) of each concept; a concept with many items is highly specialized while a concept with fewer items is comparatively more general. Such an ordering of items can guide the search for a recommendation appropriate to a user’s needs.

Formally, a partially ordered set V is a lattice L=(V,6), when for any two elements x and y in V, the supremum xWy and the infimum xVy always exist[24].

The infimum of two elements is the greatest common lower bound while the supre- mum is the least common upper bound. A concept is a pair (O, A) where O is the set of objects that possess the properties represented by the attributes in set

A, and A is the set of attributes that are possessed by objects in set O. O is called the extent of the concept and A is called the intent of the concept [24]. A concept lattice is a lattice of concepts; each element of the lattice is a concept and any two concepts are related based on their extent and intent. Since ordering is based on sets of attributes, concepts that express more attributes are considered specialized while concepts that contain fewer attributes are general. Based on this definition, the bottom-most node in the lattice is the most specialized concept while the top- most node is the most general concept. As we go from top to bottom, concepts show an increasing specialization while a bottom-up traversal shows an increasing generalization. In recommender systems the users are mapped as objects and the

4 Figure 1.1: A Sample Concept Lattice items they rate are mapped as attributes.

Let us consider a movie recommender system that adopts this approach. The concept lattice shown in figure 1.1 contains a partial list of movies starring Tom

Hanks, ordered in a lattice based on the genre of each movie. Each movie is an attribute and a list of movies is referred to as a pattern. A concept is made up of a pattern i.e. only attributes and no objects. {{1,2,3,4,5,6}, φ} is the most general concept in the lattice since it contains no attributes (genre, in this case).

The concepts {{6}, {C, A, D}} and {{1,2,4}, {C,R,D}} are the most specialized since they belong to 3 genres. In this lattice, {{1,2,3,4,6}, {C}} is a parent of {{1,2,3,4}, {C,R}} since {1,2,3,4,6} % {1,2,3,4} and {C} $ {C,R}. Similarly,

{{6},{C,A,D}} is a child of {{1,2,4,6}, {C,R}} since {6} $ {1,2,4,6} and {C,A,D}

% {C,R}.

1.3 Problem Statement

Given databases of users and items, a ratings database consists of pairs of users and items rated by them, along with additional information such as timestamps

5 etc. Given such a ratings database and a set of preliminary ratings by a new user, the basic requirement of a recommendation system is to recommend the largest and the most pertinent set of recommendations to the new user. The largest set of recommendations refers to the maximum number of items in the items database that could be recommended to the user. The pertinent items are the ones that the new user is most likely to rate in the future.

1.4 Contributions

Little or no attention has been paid to the deep structure inherent in a ratings- database and how its organization may be exploited to enhance the quality of recommendations. By using a lattice to order items, we can evaluate candidate recommendations, and choose one over another depending on the granularity of a user’s query. For example, a user with very few ratings conveys little about his/her preferences, and is made generalized recommendations as opposed to a user with many ratings and thus, a more refined set of preferences. In typical recommender systems, this may not be possible because the specificity of a user’s query is often neglected while making recommendations.

One of the distinguishing capabilities of this approach is the extraction of latent higher level knowledge. Exploring pathways in a lattice of movies, for example, could reveal a structure of abstract ontological categories, such as movie genres, and interrelationships among genres. Such complex inter-relationships can be easily observed by adopting a concept-lattice based knowledge representation scheme. Such higher level patterns can then be coupled with other dimensions such as age, gender or ethnicity to further discern trends in user preferences.

Additionally, a hierarchical structure of concepts enables a quantitative evaluation

6 of the usefulness of an item to a user. Finally, traversing up and down the lattice ensures that we are searching for items based on a common thread/theme, and choosing one item over another does not violate this theme. This is in contrast to cluster based models wherein sets of items corresponding to varied themes may be aggregated due to partial overlap of items and thus, choosing one set of items over another does not guarantee similar properties. Typical model-based approaches do not use hierarchical structures, and may not be able to achieve the capabilities of our approach.

Although concept lattice-based information retrieval (IR) schemes have been proposed earlier [5], they rely on annotations using keywords to index items. We cannot adopt these techniques because collaborative filtering avoids using fea- tures/descriptions for each item in a database. Also, the existence of ratings databases in different realms of life creates a need for generic approach that can operate in a domain-independent manner. We strive to address these inadequa- cies in our two-fold contribution. First, we present a unified approach to concept- based knowledge discovery using a generic and flexible four-component framework to generate recommendations from a ratings database. Second, we present algo- rithms that can efficiently convert user preferences into concepts, organize them into lattices, and then query the lattices for recommendations. We then present results of applying our algorithms to two different real-world datasets and demon- strate improvement compared to other approaches.

1.5 Approach to the problem

We adopt a collaborative filtering approach to this work. It works on the following assumption: people have an underlying reason for liking an item. If a number of

7 people like a certain item, then they have matching interests at least with respect

to that item. Now, if a certain group of people like a number of items in common,

they have some underlying generalized preferences in common that are manifested

through the choice of these items. Now, given a new user’s choice of items, we

identify groups of users whose ratings match our user, and then, recommend to

the new user other items that each group has reviewed/rated/enjoyed/liked.

In order to implement such a system, we adopt a concept based framework.

Entities in the data are represented as concepts while their characteristics are

represented as attributes of these concepts. The concepts are then organized

into a lattice of concepts where parent-child relationships exist between any two

concepts. In this work, each concept is a repeated pattern of rated items from a

ratings database. A pattern consists of a set of items each of which is an attribute

of that pattern.

Consider a movie recommender system that adopts this approach. Now, each

concept in the lattice is a pattern consisting of a list of movies that have been

rated by a significant number of users. Each attribute of a concept is a movie

from the pattern contained in the concept. For example, a concept C may consist

of the following pattern (a list of famous World War II movies):

(Tora Tora Tora, Pearl Harbor, Enemy at the gates, Saving Private Ryan, The

Thin Red Line, U-571, The Dirty Dozen)

Each movie is an attribute while the entire list of movies is referred to as a pattern. A concept is made up of a pattern i.e. only attributes and no objects.

In this setup, a concept c1 is a parent of c2 if it contains a subset of items

present in c2. Similarly, a concept c2 is a child of c1 if it contains a superset

of items present in c1. Once such a lattice is constructed with all parent-child relationships, a new user’s ratings are looked up (in the lattice) and the closest 8 matching concept (pattern) is found. Recommendations consist of those items that are in the pattern and not already rated by this new user.

For example, a concept C1 consisting of (Tora Tora Tora, Pearl Harbor, Enemy at the gates, Saving Private Ryan, The Thin Red Line, U-571, The Dirty Dozen) is a child of concept C2 consisting of (Tora Tora Tora, Saving Private Ryan).

Clearly, C2 is a parent of C1. In the following chapter, we explore prior work in the area of recommender sys- tems and the application of concept-based approaches to this problem. In chapter

3, we describe the concept-based framework for recommender systems. We also study a set of algorithms that may be used by the components of the frame- work. We look at an applet based Graphical User Interface for recommendation in Chapter 4. Chapter 5 presents the results of applying the proposed framework and algorithms on the Jester data set. Finally, Chapter 6 offers conclusions and future direction for research in this area.

9 Chapter 2

Related Research

2.1 Information Filtering

Malone et al. [7] talk about three different categories of information filtering namely content based, social and economic to predict a user’s response to an article. As described earlier, content based filtering is concerned with filtering information based on its content. All keyword-based search algorithms employ this approach. For example, early search engines used string search algorithms on documents annotated with a set of keywords, and any search query would look up this keyword dictionary to determine if a document was relevant to the search or not. The basic keyword search was later augmented with complex functions such as weight vectors to filter out irrelevant documents [8][9].

Social filtering uses people’s subjective judgment to identify interesting content or filter out objectionable ones. Also known as collaborative filtering, it groups users and/or items based on the reasoning that people with similar preferences like similar items[17]. For instance, successful e-commerce websites such as Ama- zon.com recommend additional items to customers using the past history of similar

10 customers. People in a group share similar preferences and if a new member joins this group, his/her interests were extrapolated from the interests of the group.

Collaborative filtering does not strictly require a group to define preferences since a moderator can be nominated and the group’s preferences could be defined based on the moderator’s preferences.

An early system that implemented both content and collaborative filtering was

Tapestry [10]. Tapestry was developed at Xerox PARC and was intended for mail

filtering. The effort was to augment an existing content-based system with the power of collaborative filtering. Emails at Xerox PARC were being filtered based on their content but the addition of people’s reactions to reading attachments helped determine whether people found the attachment useful or not. All system users were set up with mail filters and a user’s filter could access others’ filters.

Now, filtering could be done based on the content of the incoming document as well as whether other receivers found it useful or not. Another application used collaborative filtering for Usenet News filtering [18].

GroupLens [11] improves on the work in Tapestry by adding two important components. They use a client-server model to support evaluations from multi- ple sites. They also support aggregate evaluations wherein past correlations to recommendations are considered for new recommendations.

In this rest of this chapter, we shall examine approaches to collaborative fil- tering in particular.

11 2.2 Collaborative Filtering

Breese et al. [12] take a comprehensive look at some of the early measures to determine the closeness of two users based on the items they rate. They subdi- vide collaborative filtering methods into memory based methods and model based methods. In a memory-based approach, recommendations are obtained by aggre- gating ratings of similar users. A number of similarity measures were proposed to group similar users. Popular measures include cosine-based similarity [28], Pear- son correlation coefficient [27] and later, extensions such as default voting and case amplification [12]. In model-based approaches, many probabilistic [25][26] and clustering [29] techniques have been employed to represent user preferences.

Pennock et al, describe a hybrid approach to model and memory based meth- ods where they use a personality diagnosis measure to probabilistically determine the personality type of a user with respect to other users and also whether they will like a recommendation or not[13]. Our approach is similar in spirit to Pen- nock and others in the sense of determining closeness to a given user. However, we differ in that we don’t explicitly employ a probabilistic measure but use a lattice-based knowledge representation scheme. An interesting approach called

Horting was proposed by Wolf and others [20] where they use a graph-theoretic approach to collaborative filtering. This approach offers the unique advantage of

finding transitive relationships between ratings by traversing the graph.

Sarwar and others look at singular value decomposition (SVD) as a means to reduce the size of the ratings database. Clearly the number of users is growing at a tremendous rate and scalability of the algorithms is a vital factor. The authors report that for extremely sparse datasets, the performance of basic collaborative

filtering approaches is far superior. However for denser datasets, SVD provides a

12 scalable alternative[14]. Maltz and Ehrlich study the possibility of active infor- mation filtering wherein users proactively send pointers to others when they find interesting articles, movies etc [21].

In more recent work, Herlocker and others have undertaken an in-depth survey of the evaluation schemes for recommender systems [19]. Efforts have been made to design interfaces for users who occasionally connect to recommender systems.

The scenario discussed is that of a user wondering what movie to rent for the night at a video store. With no recommender system close at hand, should the user return back empty handed? The authors explore the possibility of a recommender system on a PDA [15].

2.3 Concept Based Information Retrieval

Much of the work in concept-based information retrieval can be found in the natural language processing realm. Priss uses concept-based lattices for document retrieval [5] [30] [31] [32]. In her approach, Priss adopts a facet-driven approach wherein domain knowledge is used to encode conceptual relationships between objects in the domain. For instance, in document retrieval, each document is represented by a set of keywords or index terms and queries are matched with these keywords to identify relevant documents. We cannot adopt these techniques because collaborative filtering avoids using keywords/descriptions for each item in a database.

13 Chapter 3

Framework & Algorithms

In this chapter, we describe the framework for a formal-concept based recom- mendation system. As we observed in earlier chapters, formal concept analysis involves representing information in the form of lattices and using the hierarchical structure of lattices to discover knowledge from data. The proposed framework provides a means to process raw data into concepts, convert them into knowledge lattices, and query the lattices for recommendations. We also propose algorithms that have been optimized to achieve these tasks.

3.1 The Framework

In recommendation systems, raw data is usually in the form of user ratings. A user, in this context, is one who rates items of his/her choice from an item database.

These items are typically movies, music, articles, research papers, consumer elec- tronic goods, software among others. A rating is usually some ordinal quantity.

For example, users may assign a numerical score or a letter grade (having some defined ordering) to the items in the database. For example, consider a movie

14 Table 3.1: A typical ratings database

User-Id Item-Id Score

1 5 3

1 6 2

1 7 1

2 5 5

database such as IMDb. A user is a typical movie watcher who wishes to rate

some of the movies he/she has viewed in the past. They may give each movie

a rating between 1 and 5; 1 for poor and 5 for excellent. Each user may rate

multiple movies and any movie may be rated by multiple users. Thus, there is

an inherent many-to-many relationship betweens items and users. Without loss

of generality, we assume a format such as the one in Table 3.1.

The first step towards mining using a concept-based approach is to identify

concepts in the domain. Once concepts are identified, the next step is to orga- nize concepts in the form of a lattice. The final step is to query the lattice for information. The focus of this thesis is to present this framework and a set of algorithms that can be used by components of the framework in order to produce good recommendations. We have also proposed metrics that quantify a good rec- ommendation taking into account both the pertinence of the recommendation as well its coverage over the possible recommendations.

15 Figure 3.1: A framework for concept-based recommender systems

3.2 Components of the Framework

Consider the framework in Figure 3.1. It has four main components. Each com- ponent is discussed below:

• Optimized Concept Generation: This component achieves the first step

of the process of discovering knowledge from ratings databases, namely, con-

verting the user ratings into concepts. This algorithm improves upon the

basic subspace clustering algorithm presented in SCuBA by Agarwal et al.,

[16]. The basic algorithm of SCuBA accumulated a large number of patterns

rated by an insignificant fraction of users. The pruning step was applied af-

ter all comparisons were made, and hence, all the generated concepts had

to be held in memory until the final step. We have overcome this space

inefficiency by optimizing the comparison process. We prune patterns with

certain number of ratings at the end of each iteration of the pair-wise com-

parison process. We have achieved 60-70% reduction in the space requires

to store the generated concepts (See section 5.3.1 for experimental results).

Although itemset mining algorithms such as APriori(frequent itemset) [33]

16 and MAFIA(maximal itemset) [34] could be applied, these algorithms are

designed to be complete (discover all possible itemsets). Since we don’t re-

quire a complete algorithm, we chose to implement a pair-wise comparison

based approach to discover repeated patterns of items.

• Concept Partition: In sparse databases, generated concepts may have few

items in common and so, placing them in the same lattice may not possible if

parent concepts are to be strict subsets of children concepts. The optimized

lattice search algorithm presented in section 3.7 requires that this parent-

child relationship be preserved and forcing dissimilar concepts into the same

lattice will violate this relationship and preclude the optimization achieved

by our algorithm. The presence of disconnected concepts in a lattice will

have undesirable consequences such as slower lattice generation, querying

and inaccurate recommendations. This component helps partition concepts

into multiple subsets of concepts, each belonging to a different lattice.

• Lattice Generation: Once concepts are identified, we need an efficient

means to organize them into lattices. This component helps build a lattice

from a list of concepts.

• Lattice Querying: Once the lattice is built, we can query the lattice for

information. In this case, we can query the lattice for a recommendation

based on the past history (previous ratings) of a user. This component uses

algorithms that can discover recommendations efficiently from large lattices.

In the following sections, we have presented algorithms that are designed for speed and space-efficiency.

17 3.3 Concept Generation

Given a ratings database, the following algorithm converts user ratings into con-

cepts. Each concept is a set of items that meets a support threshold δ . A support

threshold determines the minimum number of occurrences of a pattern to deem

it as frequent. Concepts are treated as sets i.e. they are ordered based on the

number of items rated and the actual items themselves. They do not depend on

the sequence in which the items occur. For example, given three concepts A {1,

2, 3 4} B {2, 3, 4} C {2, 3}, a typical subset ordering is C < A < B.

3.4 Algorithm for Concept Generation

Algorithm conceptGenerate

Input: A database of user ratings

Output: A list of patterns that occur frequently in the database

1. frequentPatterns ←[]

2. Sort the database in the decreasing order of the number of ratings

3. SizeOfDatabase ←size of the database

(∗ get the number of tuples in the database ∗)

4. Clear localHashTable

(∗ the key of the hash is the pattern, the value is its count ∗)

5. Clear globalHashTable

(∗ the key of the hash is the pattern, the value is its count ∗)

6. currentLength ←-1

7. previousLength ←-1

8. for i ← 1 to sizeOfDatabase − 1 18 9. do

10. currentTuple ←database[i]

11. currentLength ←length of currentTuple

12. for j ←i+1 to sizeOfDatabase − 1

13. do

14. compareTuple ←database[j]

15. intersect ←currentTuple T compareTuple

16. if intersect exists in localHashTable

17. then increment the count of the pattern

18. else insert the pattern into localHashTable and initialize

count to 1

19. endif

20. endfor

21. if prevLength IS NOT EQUAL TO currentLength

22. then

23. for k ← 1 to sizeofglobalHashT able

24. do

25. if pattern length > currentLength and support for pattern is less than δ

26. then remove this pattern

27. endif

28. endfor

29. prevLength ←currLength;

30. endif

31. Update globalHashTable with contents of localHashTable

32. Clear localHashTable

19 33. endfor

34. sizeOfGlobalHash ←size of global hashtable

35. for i ← 1 to sizeOfGlobalHash

36. do

37. if globalHashTable[i].count > δ 38. then Add globalHashTable[i] to frequentPatterns

39. endif

40. endfor

3.4.1 Description of the algorithm

The concept generation algorithm has been described in the previous section (con-

ceptGenerate). This algorithm is based on the subspace clustering algorithm pro- posed by Nitin et al [16] . The central idea of the subspace clustering algorithm of SCuBA is pair-wise comparisons between adjacent rows of a ratings database.

The algorithm computes the intersecting ratings (called patterns) for each such comparison. The algorithm keeps track of the number of occurrences of each pattern and uses a minimum threshold to determine useful patterns. This can be viewed as a subspace clustering approach since each item in a user’s rating is a dimension and by computing pair-wise intersection we compute the subspaces that are of interest.

However, the number of patterns grows rapidly with the number of users in the ratings database. A large percentage of the generated patterns have very little support, and they are held in memory and pruned only after all iterations are completed. In this work, we have proposed an optimization that provides significant improvement over the existing algorithm (See next chapter for exper- 20 imental results). Instead of computing row-wise intersection directly from the ratings database, we sort the users in the decreasing order of the number of items rated. Then we can use a simple property to prune the global hash table after each iteration and moderate its size. The pruning is based on the following property:

Property:Given two sets of items S1 and S2, the cardinality of the intersection is always less than or equal to the size of the smaller (in terms of cardinality) of T the two sets. Formally, given S1 and S2, if |S1| 6 |S2| then |S1 S2| 6 |S1|. The ratings database is initially sorted in the decreasing order of the number of ratings(Line 2). Next, we proceed to perform pair-wise comparisons. To do pair-wise comparisons, we need to execute a dual for-loop. This is shown on Lines

8 and 12. The outer for loop uses the variable i while the inner loop uses j. The function of the outer loop is to traverse through all the elements in the list while the inner loop computes the intersection (called pattern) of each ratings with other ratings in the database. Each step of the inner loop updates a local hash-table with a new pattern (Line 18) or increments the count of an existing pattern (Line

17).

If the length of the current rating is same as the next, then the local cache

(local hash-table) is copied into the global hash-table and then emptied (Lines

31 & 32). If the size of the current rating is larger than that of the next rating, then we can prune those patterns whose length is greater than or equal to the length of the next rating if it does not meet the minimum support threshold δ

(Lines 21-30). This is because future iteration elements (ratings) cannot generate concepts whose length is greater than or equal to their own length (refer to afore mentioned property). Note that sorting in the descending order of length is vital here since it facilitates the optimization.

21 3.4.2 Illustration of the algorithm

Given a database D in Table 3.2 and δ = 2.

Table 3.2: Concept Generation: Initial database

1 2 3 4 2 3

1 2 4 5 6

2 3 4 6 7

5 6 7

Each table from 3.3 to 3.8 has three rows. Rows 1 and 2 are the ratings from the database that are compared. Row 3 is the result of the intersection.

After Iteration 1 (shown in Table 3.3), all concepts of length 3 or more can be pruned since the length of the next element {1 2 4} is 3 and it cannot find overlaps of length 3 or more. In this case, only {2 3 4} and {1 2 4} qualify but both of them do not meet the δ =2 condition and hence are pruned.

Now, after iteration 4 (shown in Table 3.6), all concepts of length 2 or more can be pruned since the length of the next element {6 7} is 2 and it cannot find overlaps of length 2 or more. In this case, {2 3}, {2 4}, {5 6}, {6 7} qualify but only {2 3} meets the δ =2 condition and hence all others are pruned. After the

final pruning only ({φ}, 6) ({2, 3}, 2) remain.

Table 3.3: Concept Generation: currentTuple = {1 2 3 4}. Iteration 1

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 2 4 2 3 4 5 6 7 2 3 5 6 6 7

1 2 4 2 3 4 φ 2 3 φ φ

22 Table 3.4: Concept Generation: currentTuple = {1 2 4}. Iteration 2

1 2 4 1 2 4 1 2 4 1 2 4 1 2 4

2 3 4 5 6 7 2 3 5 6 6 7

2 4 φ 2 φ φ

Table 3.5: Concept Generation: currentTuple = {2 3 4}. Iteration 3

2 3 4 2 3 4 2 3 4 2 3 4

5 6 7 2 3 5 6 6 7

φ 2 3 φ φ

Table 3.6: Concept Generation: currentTuple = {5 6 7}. Iteration 4

5 6 7 5 6 7 5 6 7

2 3 5 6 6 7

φ 5 6 6 7

Table 3.7: Concept Generation: currentTuple = {2 3}. Iteration 5

2 3 2 3

5 6 6 7

φ φ

23 Table 3.8: Concept Generation: currentTuple = {5 6}. Iteration 6

5 6

6 7

φ

Ignoring φ, frequentPatterns = ({2, 3}, 2)

3.4.3 Complexity Analysis

Average Space Complexity of Concept Generation

Given a database of n users,

Average Case assumption: Suppose each user is compared with m other users

producing m candidate patterns. Given a support threshold δ, assume that m/2 patterns have a support threshold greater than δ. Total number of patterns:

(n−1) (n−2) (n−3) (n−(n−1)) = 2 + 2 + 2 + ... + 2

(n−1)+(n−2)+(n−3)+...+(n−(n−1)) = 2

(n−1)∗n = 4

≈ O(n2)

Worst Case assumption: Suppose each user is compared with m other users producing m candidate patterns. Given a support threshold δ, assume that m

patterns have a support threshold greater than δ. Total number of patterns: 24 = (n − 1) + (n − 2) + (n − 3) + ... + (n − (n − 1))

(n−1)∗n = 2

≈ O(n2)

Average Time Complexity of Concept Generation

The average time complexity is independent of the nature of the data. All the pair-wise comparisons need to be done to determine if they meet the support threshold δ. Hence the time complexity is always O(n2), n - number of users.

3.5 Concept Partition

This component helps partition a list of into smaller lists of disjoint concepts.

The case for multiple lattices

To illustrate the need for concept partition, let us consider the ratings database shown in Table 3.2. By applying the algorithm described in the previous section, we can generate the following concepts: (E), (A B), (D E), (C D E), (A B C).

If we apply the requirement that all children concepts should be strict supersets of their parents (w.r.t rated items), there are two disconnected segments in this lattice - {(A B C), (A B)}, and {(E) (D E) (C D E)}. Clearly, these two subsets of concepts belong to different lattices and this algorithm partitions them.

25 Figure 3.2: A sparse dataset and its associated lattice

3.5.1 Algorithm for Concept Partition

Algorithm conceptPartition

Input: A list of unique patterns

Output: Groups of concepts/patterns that belong to disjoint lattices.

1. SizeOfConcepts ←number of unique patterns in frequentPatterns

2. conceptSet ←[ ][ ]

3. visited[SizeOfConcepts] ←0

4. Sort frequentPatterns in the decreasing order of concept length.

5. for i ← 1 to SizeOfConcepts

6. if visited[i] EQUALS 1

7. then continue; endif

8. currentConcept ←frequentPatterns [i]

9. Clear conceptSet[i][]

10. do

11. count ←0

12. for j ←i+1 to SizeOfConcepts

13. do 26 14. compareConcept ←frequentPatterns [j]

15. if compareConcept ( currentConcept 16. then subset = true

17. else subset = false

18. endif

19. if subset equals true

20. then visited[j] ←1

21. conceptSet[i][count] ←compareConcept[j]

22. Increment count

23. endif

24. endfor

25. endfor

3.5.2 Description of the algorithm

The basic idea of the algorithm is the following. Given a list of concepts, we need

to sort it in the descending order of attribute length (Line 4). This way we ensure

that the longest concepts are considered first by the algorithm. This is important

because the aim of the algorithm is to identify subsets of the largest concepts in

the list and group them into one lattice. Once sorted, we begin by looking at each

element E (referred to as currentConcept) in the list (Line 5) and identifying all

concepts M that occur after it (Line 12) and are subsets of E (Line 15-18).

A concept C1 with items {i1,...,ik} is a subset of concept C2 with items

{i1,...,im} iff {i1,...,ik} ( {i1,...,im}. All those concepts M (including E) are then grouped under one partition (Line 21). Each partition is represented by a row of the variable conceptSet. We use a variable called visited to ensure that a 27 a set of concepts in one partition is not a subset of another set of concepts in a different partition. If a node is a subset, it is marked as visited (Line 20) and will not be considered for creating a new partition. This is done in Line 6 where if a node is visited, then the loop just continues to the next node. In the following example, Iterations 3 and 4 illustrate this situation. The nodes {2 3} and {2 4} have been visited during previous iterations (Iterations 1 and 2) and so, they are not considered for creating a new partition and the algorithm continues with the next available node.

3.5.3 Illustration of the algorithm

Table 3.9: Concept Partition: After Sorting

1,2,4 5,6

2,3,4 6,7

2,3 6

2,4 2

Each table from 3.10 to 3.13 has three rows. Rows 1 and 2 are concepts. Row

3 is the result of the determining if row 2 is a subset of row 1 (true) or not(false).

28 Table 3.10: Concept Partition: currentConcept = {1 2 4}. Visited[1,2,4] = 0.

Iteration 1. conceptSet[1] = (1 2 4) (2 4) (2).

1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4

2 3 4 2 3 2 4 5 6 6 7 6 2

false false true false false false true

Table 3.11: Concept Partition: currentConcept = {2 3 4}. Visited[2 3 4] = 0.

Iteration 2. conceptSet[2] = (2 3 4) (2 3) (2 4) (2)

2 3 4 2 3 4 2 3 4 2 3 4 2 3 4 2 3 4

2 3 2 4 5 6 6 7 6 2

true true false false false true

Iteration 3: Concept Partition: currentConcept = (2, 3). Visited[2 3]

= 1. CONTINUE

Iteration 4: Concept Partition: currentConcept = (2, 4). Visited[2 4]

= 1. CONTINUE

Table 3.12: Concept Partition: currentConcept = {5 6}. Visited[5 6] = 0. Iter- ation 5. conceptSet[3] = (5 6) (6)

5 6 5 6 5 6

6 7 6 2

false true false

29 Table 3.13: Concept Partition: currentConcept = {6 7}. Visited[6 7] = 0. Iter-

ation 6. conceptSet[3] = (6 7) (6)

6 7 6 7

6 2

true false

Iteration 7: Concept Partition: currentConcept = (6). Visited[6] = 1.

CONTINUE

Iteration 8: Concept Partition: currentConcept = (2). Visited[2] = 1.

CONTINUE

Table 3.14: After partionining

Lattice1 Lattice2 Lattice3 Lattice4

1 2 3 4 2 3 4 5 6 6 7

2 4 2 3 6 6

2 2 4

2

3.5.4 Complexity Analysis of the algorithm

In the worst case situation, all concepts are disjoint w.r.t each other, and we would need to iterate over the entire list of n concepts. Hence the time complexity would be O (n2).

30 Proof: Given a list of n concepts, we need to calculate the time complexity of

the algorithm. In the worst case each concept is disjoint from each other concept.

Hence the total number of computations equals: = (n − 1) + (n − 2) + (n − 3) +

... + (n − (n − 1)) (n−1)∗n = 2 ≈ O(n2) However if the data has many concepts that are linked to each other, then after every iteration many concepts would be marked as visited and hence would not be considered for finding subsets.

3.6 Lattice Generation

This component helps build lattices from a list of concepts.

3.6.1 Algorithm for lattice generation

Algorithm latticeGenerate

Input: A list of unique concepts conceptList

Output: A lattice of concepts

1. Sort the concepts in increasing order of concept length/Start the algorithm

from the bottom of the list if the list is obtained from conceptPartition algo-

rithm.

2. largestLength ←length of the longest concept in conceptList.

3. conceptSize ←sizeOf(conceptList)

4. conceptsBySize ←splitByLength(conceptList)

5. currentLattice ←[]

31 6. for i ← 1 to largestLength

7. do

8. currentLattice ←combineLevels(currentLattice, conceptsBySize,i)

9. endfor

10. return currentLattice

11.

12. splitByLength(conceptList)

Input: A list of unique concepts conceptList

Output: Split concepts by their length

13. largestLength ←length of the longest concept in conceptList

14. conceptSize ←size of conceptList

15. conceptsBySize[largestLength][ ] ←

16. currentLength ←1

17. count ←0

18. for i ← 1 to largestLength

19. do

20. if length of conceptList[i] equals currentLength

21. then conceptsBySize[currentLength][count] ←conceptList[i]

22. Increment count

23. else Increment currentLength

24. Count = 0

25. conceptsBySize[currentLength][count] ←conceptList[i]

26. endif

27. endfor

28. return conceptsBySize

29. end splitByLength

32 30.

31. combineLevels(currentLattice, conceptsBySize, nextLevel)

Input: A lattice,the list of all concepts available, and the next level

Output: A lattice with concepts from the next level added to the input lattice

32. numberConcepts ←number of the concepts at next level of conceptsBySize

33. levelConcepts ←[]

34. for i ← 1 to numberConcepts

35. qdo

36. newNode ←conceptsBySize[nextLevel][i]

37. currentLevel ←nextLevel - 1

(∗ start from the bottommost level of the current lattice ∗)

38. elusiveConcepts ←newNode

39. checkContains ←newNode

40. while elusiveConcepts is not empty and currentLevel is greater than

0

41. do

42. levelConcepts ←conceptsBySize[currentLevel]

43. numConcepts ←number of concepts in levelConcepts

44. for j ← 1 to numbConcepts

45. do T 46. if levelConcepts[j] ( newNode and levelConcepts[j] checkContains is not NULL

47. then add parent child relationship for levelConcepts[j]

and newNode

48. elusiveConcepts ←elusiveConcepts T (newN-

ode - levelConcepts[j])

33 49. Increment count

50. endif

51. endfor

(∗ checkContains ensures that cyclic parent redundancy is avoided ∗)

52. checkContains ←elusiveConcepts

53. currentLevel ←currentLevel - 1

54. count ←0

55. endwhile

56. endfor

57. end combineLevels

3.6.2 Description of the algorithm

The basic idea of the algorithm is to build the final lattice incrementally. In order

to achieve incremental building, concepts are first partitioned into groups having

equal number of ratings. This is done using the splitByLength method.

The splitByLength() method employs a variable called conceptsBySize,a two-

dimensional array. Each row of this array consists of a list of concepts(patterns)

of equal length while any two rows contains concepts(patterns) of varying lengths.

The method goes through all the concepts in the list (Line 18) and places each

concept in the appropriate row of conceptsBySize (Line 21 & 25). Each row of

conceptsBySize is then added to a growing lattice in the increasing order of pattern

length (which equals the number of items in a concept/pattern).

The primary requirement of a lattice building algorithm is to avoid cyclic

redundancy. Cyclic redundancy occurs when ancestors of a node are also added

as immediate parents. For instance, consider three concepts A, B and C. A is a 34 child of B and B is a child of C. We need to ensure that the only B is listed as a parent of A and not C although it is a valid ancestor of A. This is done in order to ensure that the lineage of a node is traced along a unique path. In order to ensure that there is no cyclic redundancy, Aparna [6] relies on an explicit ancestor search approach that marks all ancestors of a new node as not parent.

The motivation behind this approach is to avoid cyclic redundancy implicitly as opposed to the exhaustive and time consuming method proposed in Aparna’s thesis. By merging levels incrementally, we need not do an exhaustive search of the lattice each time. The merging process is done by the combineLevels() method

(Line 8).

The combineLevels() method goes through all the concepts in the current level that need to be added to the lattice (Line 34). The algorithm merges the new group of concepts, each l ratings long, with the bottom of the lattice while keeping track of items in the new concept that don’t have a parent yet. These items are called elusiveConcepts (Line 48). When a parent is found, elusiveConcepts is recomputed to exclude the currently found parents. If after merging with the bottom level, elusiveConcepts is not empty, we try merging with one level higher and so on until the top of the lattice is reached or until elusiveConcepts becomes empty (Line 40).

3.6.3 Illustration of the algorithm

After applying splitByLength(), table 3.15 shows the concepts grouped by length.

Combining Level 1 and Level 2:

• Adding (2 3). Elusive element = (2 3). Considering Level 1: Add (2) as

parent. Elusive element = (2 3) - (2) = (3) Add (3) as parent. Elusive 35 Table 3.15: Lattice Generation: After applying splitByLength()

Level 1 Level 2 Level 3 Level 4 Level 5

(1) (2 3) (1 2 3) (1 2 3 4 5 6) (1 2 3 4 5 6 7)

(2) (4 5) (2 3 4)

(3) (5 6)

(4)

(5)

(6)

(7)

Figure 3.3: Lattice after combining levels 1 and 2

element = (3) - (3) = φ.

• Adding (4 5). Elusive element = 4 5. Considering Level 1: Elusive element

= φ after adding (4) (5) as parents.

• Lattice generated after this step is shown in Figure 3.3

Combining current lattice with Level 3:

• Adding (1 2 3). Elusive element = 1 2 3. newNode = 1 2 3

– Considering Level 2: Add (2, 3) as a parent. (4 5) or (5 6) are not

subsets. Elusive Element = (1 2 3) - (2 3) = (1) = checkContains. 36 Figure 3.4: Lattice after combining levels 1, 2 and 3

– Considering Level 1: Search for those elements that contain 1. If a

node does not contain 1, then its not a subset or it is an ancestor.

This is how we implicitly encode ancestor relationships and

avoid searching the lattice each time a new node is added. Add

(1) as a parent. Do not add (2) or (3) because levelConcepts[j] = (2)

or (3) T checkContains is φ. Do not add (4) (5) (6) (7) because they

are not subsets of newNode.

• Adding (2 3 4). Elusive Element = (2 3 4) . newNode = (2 3 4)

– Considering Level 2: Add (2 3) as a parent. (4, 5) or (5, 6) are not

subsets. Elusive Element =(2 3 4)- (2 3) = (4) = checkContains

– Considering Level 1: Add (4) as a parent. Do not add (2) or (3) because

levelConcepts[j] = (2) or (3) T checkContains is φ. Do not add (1) (5)

(6) (7) because they are not subsets of newNode.

• Lattice generated after this step is shown in Figure 3.4

Combining current lattice with Level 4:

• Adding (1, 2, 3, 4, 5, 6) 37 – Considering Level 3: Add (1, 2, 3) and (2, 3, 4) as parents. newNode

= {1, 2, 3, 4, 5, 6}. Elusive element = checkContains = {5, 6}.

– Considering Level 2: Add (4, 5) and (5, 6) as parents. Do not add (2,

3) because levelConcepts[j] = (2, 3) T checkContains is φ.

Combining previous lattice with Level 5:

• Adding (1, 2, 3, 4, 5, 6, 7). newNode = 1, 2, 3, 4, 5, 6, 7

– Considering Level 4: Add (1, 2, 3, 4, 5, 6) as a parents. Elusive element

= checkContains = 7

– Considering Level 3: Do not add any nodes since levelConcepts[j]= 1,

2, 3 or 2, 3, 4 T checkContains is φ

– Considering Level 2: Do not add any nodes since levelConcepts[j] = 2,

3, 4, 5,5, 6 T checkContains is φ

– Considering Level 1: Add (7) as parent. Do not add any other nodes

since levelConcepts[j] = 1, 2, 3, 4, 5, 6 T checkContains is φ

• Lattice generated after combining all levels is shown in Figure 3.5

3.6.4 Correctness Proof of the algorithm

In order to prove the correctness of the algorithm, we need to show three things:

• The algorithm adds all the parents for a new node

• The algorithm adds only the correct parents for a new node

• The algorithm adds only the immediate parents and not any of the ancestors

for a new node 38 Figure 3.5: Lattice after combining all levels from 1 through 5

39 Hypothesis: The algorithm does not add ancestors for a new node

To Prove: Given a new node n, its parent m1 there does not exist any node m2 which is a parent of m1 and n

Proof: Suppose there exists m2 such that m2 is a parent of m1 and n. Since m2 is

a parent of m1, m1 is visited before m2 in the level-wise merging process. Before S adding m2, elusiveConcepts = n - p where p is the parent of n. Since m1 is a T parent, elusiveConcepts =n - m1 excludes all elements in m2, m2 elusiveCon-

cepts = φ. Hence m2 will not be added as a parent of n.

Hypothesis: The algorithm adds all the parents for a new node

Proof: Given a new node n, p - parents of n, let node m be a valid parent but not added as a parent. T To add as parent, the condition m ( n and m checkContains 6= φ must be satisfied

• If m is a valid parent, then m ( n is always true

• If m T checkContains = φ, then m is a parent but also an ancestor and

hence not added. Otherwise, m is always added as a parent.

The algorithm terminates only when checkContains = φ or when level 1 (with con- cepts of length 1) is reached. When checkContains 6= φ at the end of a particular level ’l’,

∃ e ∈ n, e ni Sp and the search continues. Now, a node r at level l-1 or lower is a parent if

S ∃ e ∈ n, e ni p. e ∈ r and r ( n

40 This is repeated up to level 1 or until checkContains = φ. If we complete searching level 1, clearly there are no more concepts to search and all possible parents have been added. However, If checkContains = φ at some level p, then all elements have atleast one parent which implies that all remaining nodes (at level p-1 or lower) are either not valid parents or they are ancestors and hence should not be added as parents of n. This guarantees that all parents of n are added without exception.

Hypothesis: The algorithm adds only the correct parents for a new node

Proof: Given a new node ’n’ at level l of length nl, we want to find only correct parents of the node while ensuring that ancestors of a valid parent are avoided.

A node is added as parent iff it is a subset of node ’n’. Suppose we add an invalid parent. Then p should contain atleast one element ei that is not in n. However this contradicts our assumption that we add p only if p ( n. Hence we add only the correct parents.

3.7 Lattice Querying

This component uses an algorithm that can discover recommendable items effi- ciently from large lattices. The motivation for this algorithm is as follows:

Upward Closure: In a lattice, if an element e is not present in a concept C, then it will not be present in any parent of C. Thus, when searching for a minimum numbers of matches η, if a concept does not have η matches, none of its parents will have η matches and hence need not be considered as candidates for recom- mendation. Upward closure can be applied here only because we assume every

41 child is a proper subset of the parent.

3.7.1 Algorithm for Lattice Querying

Algorithm latticeQuerying

Input: A user query Q of items of the form (UserId, RatingId1, RatingId2) where

each rating is an item in the database and a minimum match threshold η and

a minimum to-recommend threshold δ

Output: A list of items that the user is likely to rate in the future.

1. Candidate Nodes ←Bottommost node of the lattice

2. Good Nodes = [ ]

3. while all nodes have not been visited

4. do

5. if node ⊂ Q AND cardinality of (node T Q) ≥ η and node is NOT

in Banned Nodes)

6. then Add all parents of node to Candidate Nodes that are not in

Banned Nodes

7. Add node, cardinality of (node bigcap Q) to Good Nodes

8. else Add all parents of node to Banned Nodes

9. endif

10. Remove node from Candidate Nodes

11. node ←Get next node in Candidate Nodes

12. endwhile

13. Sort Good Nodes in the descending order of cardinality of (node T Q).

14. if Good Nodes is EMPTY

15. then return NULL 42 16. else Return the top node n in Good Nodes which has atleast δ items to

recommend

17. endif

3.7.2 Description of the algorithm

The search starts from the bottom of the lattice (Line 2) and proceeds all the way

up to the top of the lattice. Each candidate node has to produce an overlap of at

least η items (Line 6) if it is to be considered a candidate for recommendation. If

a given node does not have η matches, then it can safely be ignored. Also, due

to the upward closure property, all parents of this candidate node can be ignored

(Line 9). This produces a large amount of saving as compared to ignoring just

the candidate node. Even in a moderately connected lattice, this can achieve a

lot of pruning. If a given node has η matches, then all its parents are possible

candidates and need to be added to the variable Candidate Nodes. If a node is no

longer a candidate, it is added to Banned Nodes (Line 8). Thus, when a node is

considered a candidate, we check to make sure that it is not banned (Line 6). If

a parent of a node is banned, then that parent is not added to Candidate Nodes

(Line 7). This is necessary since each node can be reached from multiple paths

and some of these paths may be useful while others may not. Even if a single path

leading to a node is unfavorable, then that node upwards (of the lattice) can be

avoided. Once the entire lattice is traversed, the candidate good nodes are sorted

in decreasing order of the degree of overlap with the input query and the top n nodes are returned.

43 Figure 3.6: Example of a typical lattice to search for making recommendations

3.7.3 Illustration of the algorithm

Consider the lattice L in Figure 3.6

Given the lattice L and query Q = 4 5 6. η = 2. δ = 3.

• Candidate = (1 2 3 4 5 6). Banned Nodes = () Good Nodes = ()

• Node = (1 2 3 4 5 6) is a subset of Q. Node T Q = 3. Candidate = (1 2 3

4 5), (2 3 4 5 6) , (1 3 4 5 6). Banned Nodes = () Good Nodes = {(1 2 3 4

5 6), 3}

• Node = (1 2 3 4 5) is NOT a subset of Q. Banned Nodes = (1 2 3), (2 3 5),

(2), (3) Candidate = (2 3 4 5 6), (1 3 4 5 6) Good Nodes = {(1 2 3 4 5 6),

3}

• Node = (2 3 4 5 6) is a subset of Q. Node T Q = 3. Banned Nodes = (1 2

3), (2 3 5), (2), (3). Good Nodes = {{(1 2 3 4 5 6), 3}, {(2 3 4 5 6), 3}} .

Candidate = (1 3 4 5 6). (2 3 5) is in Banned Nodes and hence is not added

44 to Candidate.

• Node = (1 3 4 5 6)is a subset of Q. Node T Q = 3. Banned Nodes = (1 2

3), (2 3 5), (2), (3). Candidate = (3 5 6). Good Nodes = {{(1 2 3 4 5 6),

3}, {(2 3 4 5 6), 3} , {(1 3 4 5 6),3}}.

• Node = (3 5 6) is NOT a subset of Q. Banned Nodes = (1 2 3), (2 3 5), (2),

(3), (5) Candidate = ( )

• candidate is empty. TERMINATE

• Choose the topmost node in Good Nodes. This element is the longest can-

didate since we start from the bottom of the lattice. In this case, the rec-

ommendation is (1 2 3 4 5 6).

45 Chapter 4

A Joke Recommender System

In this chapter, we look at a graphical user interface (GUI) implementation of a recommender system for Jokes. We test and apply our framework and algorithms on the Jester data set. A detailed description of this dataset can be found in the next chapter on results and also on the Jester Dataset Webpage [22]. Since most recommender systems are going to run off web pages, we have implemented the GUI as a Java applet. Applets can be embedded easily into HTML pages using the applet tag or viewed using an appletviewer. The same GUI could be converted into a Frame based Java desktop application within a short period of time retaining much of the code.

4.1 Components of the UI

The GUI consists of three JList widgets. The first JList contains the list of all jokes in the Joke database. It is supported by a scroll pane that supports scrolling the jokes. At any time, a maximum of 14 jokes are displayed in the list box. The second JList is the set of rated jokes for a new user. The final list JList displays

46 the recommended jokes. All three JList widgets support joke display i.e. clicking on an item in any of the list box widgets would display the corresponding joke in a panel below.

The GUI also contains 4 buttons from the Java Swing library. Two buttons serve two rate items. They help transfer items from the first list box named

Joke Database to the list box named Rated Jokes (using >> button) and back

(using << button). Two other buttons named Recommend and Clear All are also presented to the user.

The Recommend button initiates the recommender engine, fetches the recom- mendations and displays it in the Recommended Jokes list box. The Clear All button clears all the rated and recommended items and re-populates the Joke

Database list with all the items in the database.

Finally, the GUI contains a JLabel panel that displays the jokes as images in its Icon. Screen shots of the GUI can be found on the following pages.

47 Figure 4.1: Main Screen of the GUI Applet for Joke recommendation

48 Figure 4.2: GUI for Joke Recommendation: User has already rated the following jokes: Joke #2, #5 and #6 and is currently viewing joke #7.

49 Figure 4.3: GUI for Joke Recommendation: User is viewing recommendations.

50 Figure 4.4: The Clear All Button clears all ratings and recommendations and repopulates the Joke Database.

51 Chapter 5

Case Study - Jester and

MovieLens Datasets

In this chapter, we investigate the performance of the proposed framework and the algorithms on two real world dataset, namely the Jester dataset and the MovieLens dataset. The Jester dataset is a collaborative filtering data set [22] that has ratings for 100 jokes rated between -10 and +10 from 73,421 users collected between April

1999 and May 2003. This dataset was chosen because its’ highly dense (an average of 0.7) and is capable of producing highly connected lattices. In order to measure density, we convert the ratings into a binary matrix (of users and jokes) assigning

1 if a user has rated a joke and 0 otherwise. We then compute the fraction of 1s over all the cells in the matrix. A dense dataset implies that a significant number of jokes have been rated by users and this increases the number of commonly rated jokes across users. When users rate a number of jokes in common, concepts generated from ratings would be highly linked (through these common jokes).

This translates to a densely connected lattice of concepts as opposed to a lattice with very few parent-child relationships. The density makes the Jester dataset

52 unique since typical collaborative filtering datasets have a lot of users rating very few items.

The MovieLens dataset is another popular collaborative filtering dataset [23].

Two versions of this dataset exist; one contains 100,000 ratings by 943 users over

1682 movies, the other contains 1,000,000 ratings by 6040 users on 3900 movies.

Such sparse datasets are typical of ratings databases. Both versions have a density of around 5%. Here, the density is computed as the fraction of ratings over the possible number of ratings. For example, the smaller version has 100,000 ratings out of a possible 943 X 1682 ratings. We apply our algorithms on both these typical yet contrasting datasets to analyze their performance.

5.1 Processing the Jester Dataset

As mentioned above, the Jester data set contains continuous ratings for each of the 100 jokes. If a user has not rated a particular joke, he/she assigns a score of 99 implying that they did not rate it. The data set is available in 3 different excel files. We chose to test our algorithms on the data set that contains the most ratings. Hence we chose the data set with 23500 users. Since we are concerned with only the jokes rated by the user and not the accompanying scores, we neglect them in our experiments. After executing our pre-processing scripts on the data set, we end up with a database in the following format

UserId, JokeId1, JokeId2, ., JokeIdm

53 5.2 Processing the MovieLens Dataset

The MovieLens data set contains discrete ratings between 1 and 5 for each of the 1682 movies. The ratings are explicit. Hence, the lack of an entry implies that a user has not rated that movie. Since we are concerned with only the movies rated by the user and not the accompanying scores, we neglect them in our experiments. After executing our pre-processing scripts on the dataset, we end up with a database in the following format

UserId, MovieId1, MovieId2, ., MovieIdm

5.3 Concept Generation - Identifying repeatedly

rated jokes/movies

If a new user N rates a few items and these items are also found in the history of other users U, then other items rated by users U are likely candidates to be rated by N. For example, if a new user rated items 1 through 5 and a large number of users who have rated 1 through 5 also rated item #6, then it is likely that the new user will also rate item #6. This is based on the assumption that there exist some underlying patterns in the manner users rate items. This is the basis of many item-based collaborative filtering approaches and the underlying assumption of our work. In the case of joke recommendation, it is quite natural to expect users having similar tastes in jokes to rate similar jokes. Thus, given a new user N who rated jokes JN = {j1, ..., jn} if we manage to find a set of jokes JU that contain most of the jokes in JN along with other jokes JO, we could recommend the jokes JO to user N. On applying the concept generation algorithm on this data set we would obtain a list of concepts each consisting of a set of joke ids. In 54 order to build our model, we used 10,000 user ratings from the database. For the dataset, the model was built using 80% of the dataset while the rest

20% was used for testing. All the experiments have been performed on a 2GB

RAM Dell Desktop running P4 3.2 SUSE Linux operating system.

5.3.1 Exploding space requirements in subspace clustering

of SCuBA

As explained earlier, SCuBA uses a basic pair-wise comparison based algorithm to identify repeatedly rated items. Although the algorithm identifies many patterns it suffers from the problem of having to store an explosive number of candidate patterns prior to pruning. In our optimized algorithm, we propose a means to prune the table at specific stages and we have observed a significant reduction in space requirement. The following graph in Figure 5.1 shows the actual number of patterns generated by the basic algorithm for the Jester dataset. It also shows the number of patterns generated by our optimized concept generation algorithm.

Figure 5.2 shows the concepts generated using the naive algorithm proposed in

SCuBA and 5.3 shows the concepts generated using our optimized concept gener- ation algorithm for the MovieLens dataset. In order to study the performance of the algorithm on twin dimensions of the data set (user size and number of jokes), we have experimented with lattices built from 1000, 5000 and 10000 users. We have also observed the performance over 20, 40, 60, 80 and 100 jokes available to the user. Similarly, in the MovieLens dataset, we have experimented with 200,

400 and 800 users and 100, 200, 400 and 800 movies.

55 5.3.2 Results - Basic pair-wise comparisons Vs Optimized

algorithm

We can clearly observe that the number of concepts generated using the basic algorithm rapidly reaches enormous proportions w.r.t the number of items rated.

In the Jester dataset, for example, for 5000 users and 40 rated jokes an approxi- mate 9,000 patterns are generated while doubling the number of items increases the candidate patterns to over 125,000. In contrast, for 5000 users and 40 rated jokes our approach produces just over a 1,000 concepts while doubling the num- ber of jokes produces 18,000 concepts. In the Movielens dataset, for example, the naive approach produces a maximum of nearly 252,000 concepts for 800 users and an equal number of movies while our algorithm produces only one-third (81,000 concepts) for that user and movie configuration. Clearly, the number of concepts generated by our algorithm is much smaller than in the basic approach.

Another perspective of the pruning results is to observe the average pruning percentages. As is evident from the graph, we can observe an average 80% pruning in the number of candidate patterns generated by the naive approach. This is uniform over tests on a wide number of users (1000, 2000, 5000 and 10000) and on a wide number of items rated (20, 40, 60, 80, 100). It is obvious that the pruning does great space saving at the cost of the initial sorting of the user data base.

Since standard sorting algorithms employ an O (n*logn) algorithm, it appears as if the overhead is worth it! The graph in Figure 5.4 shows the percentage pruning obtained by our optimization step for the Jester dataset. The corresponding figure for the MovieLens dataset is in Figure 5.5.

56 Figure 5.1: Naive Vs Optimized Concept Generation - Jester

Figure 5.2: Naive Concept Generation - MovieLens

57 Figure 5.3: Optimized Concept Generation - MovieLens

Figure 5.4: Percentage Pruning: Naive Vs Optimized Concept Generation - Jester dataset

58 Figure 5.5: Percentage Pruning: Naive Vs Optimized Concept Generation -

MovieLens dataset

5.4 Timing the Model Building Algorithms

Concept generation is a memory and time intensive approach to identifying sub- spaces in the data set. However, as mentioned in previous sections, the per- formance of this algorithm is comparable to a number of the existing subspace clustering algorithms. For a large data set of 10,000 X 100 (Jester), concepts were generated in roughly 2 hours. The largest size for the MovieLens dataset was the complete dataset itself (100,000 ratings) and concepts were generated in a few milliseconds over one minute. As may be observed, the time required to build the lattices is of the order of a few minutes. For a large data set of 10,000 X 100, the lattice was built in roughly 3 minutes. For the movielens dataset with 100,000 ratings this was done in under 70 seconds. It is important to keep in mind that the Jester data set is an extremely dense data set. Hence it has many parent-child

59 Figure 5.6: Jester dataset: Time taken to generate concepts

Figure 5.7: MovieLens dataset: Time taken to generate concepts

60 Figure 5.8: Jester dataset: Time taken to generate lattices from concepts

Figure 5.9: MovieLens dataset: Time taken for concept generation, partition and lattice building

61 relationships in the lattice. This means that each node in the lattice has many links to other nodes. This tends to increase the lattice building time. It would not be surprising to observe shorter lattice building time for sparser data sets. Please note that time requirements for Concept Partition have not been reported because they have remained consistently under 1 second for all the variations in the Jester data namely varying the number of items rated and the number of users. The

MovieLens dataset model building time is shown in Figure 5.7. This is comparable with the time values shown in Figure 14 of the paper by Nitin and others [16].We observe that the total model building time for the MovieLens dataset (shown in

Figure 5.9) is largely linear similar to the model building times observed by Nitin and others.

5.5 Space Requirement for the Model Building

algorithms

Please note that space requirements for Concept Partition have not been reported because they have remained consistently under 1 megabyte for all the variations in the data namely varying the number of items rated and the number of users.

Space requirements for the Movielens dataset in the lattice generation phase has not be provided because of very low memory footprints.

62 Figure 5.10: Jester dataset: Space required to generate concepts

Figure 5.11: Movielens dataset: Space required to generate concepts

63 Figure 5.12: Jester dataset: Space required to generate lattices from concepts

5.6 Real-time performance of Algorithm: Lat-

tice Discover

In order to study the real time performance of the algorithm, we built a lattice consisting of nearly 10500 nodes using 10000 user ratings from the aforementioned data set. After the model was built, we randomly chose 1000 user ratings from the remaining 13500 users and created 4 data sets each containing 250 users.

These are labeled as Jester Set1 through Jester Set4. In order to measure query performance in terms of speed of processing and accuracy of results, we calculated the time required, precision and recall based on the following experiment. Similar experiments have been performed on the MovieLens dataset by Nitin Agarwal [16] and others.

From each row of the user ratings, we extract a portion of the ratings and label them as query terms. These could be the initial ratings made by a new user 64 of an online rating system. The goal of the recommender system is to predict possible jokes of interest to the new user. These are referred to as the target terms. Recommendations made by the system are referred to as recommended terms. For instance,given a list of ratings (Joke1, Joke2, Joke3,...,Joke50), we may use the first 15 (Joke1,...,Joke15) as query terms and (Joke16,...,Joke50) as target terms. Suppose the recommender offers recommender terms = (Joke19 ,

3 Joke31, Joke35, Joke56) as recommendations, then precision = 4 ∗ 100= 75% and 3 recall= 35 ∗ 100 ∼ 10%. The following definitions have been applied to calculate the various metrics.

Time: Amount of time in milliseconds required to retrieve a recommendation for a given set of query terms.

RecommenderT erms T T argetT erms Precision: RecommenderT erms ∗ 100

RecommenderT erms T T argetT erms Recall: T argetT erms ∗ 100

The size of the query terms ranges between 5 and 15% of the maximum number of items in the data set. This range was chosen because users are likely to rate very few items before requesting recommendations. As can be noted in Figure 5.13, the average precision for the Jester data set is about 97% while the average recall in Figure 5.14 is about 65%. Although recall can be increased by offering more recommendations, this is at the cost of precision. It is essential for users to get the right recommendations rather than get many recommendations. Hence, we decided to tradeoff precision over recall. Finally, in Figure 5.15 we can observe that

65 Figure 5.13: Jester dataset: Precision Measurement

Figure 5.14: Jester dataset: Recall Measurement

66 Figure 5.15: Jester dataset: Query processing time the average query processing time is approximately 2 seconds which is acceptable time to wait for a recommendation.

5.7 An Improved Performance

Nitin et. al., use the MovieLens dataset to compute precision of their approach.

The precision for the 100,000 ratings database using our approach is shown in

Figure 5.16. We compare this Figure 11 in the paper on SCuBA by Nitin and others [16]. They show constantly degrading precision values as the percentage of terms increase from 5 to 50%. The precision at 5% is roughly 30% while the precision at 50% is nearly 10%! On the other hand, our approach shows a steady performance independent of the number of items considered. Although the performance is sightly poorer at lower percentages, our precision values remain consistent and this is very essential for a good overall system. When the number of

67 Figure 5.16: MovieLens dataset: Precision Measurement query terms increases, the users expect the system to have learnt their preferences well and would not tolerate degrading performance.

68 Chapter 6

Conclusion and Future Direction

In this chapter, we examine the strengths and the drawbacks of our approach in depth. We also suggest possible directions of future research using the approach.

This work is merely a starting point for interesting research in this powerful ap- proach to better recommendation systems and we sincerely believe that innovative future augmentations can greatly enhance the performance of the current model.

6.1 What’s good in a concept based approach?

We believe that the concept based approach is generic and can be adopted by data of any nature. Concepts are an abstraction and entities in any data can be molded to represent concepts. Characteristics of entities become attributes of the corresponding concepts in this framework. The basic working of a recom- mendation system is one of clustering i.e. segmenting concepts into groups such that items in a group are similar to each other and different from others. By this token, choice of how to involve the attributes in grouping/segmenting individual entities of the data is up to the designer. In our approach, each item in the ratings

69 database is an attribute while a concept is a set of such items. Another strength of this approach is that the model-building time and real-time performance are acceptable for any modern application.

6.2 Future Direction and Caveats

Our basic model can be easily augmented by adding user statistics to each node in the lattice to guide us in the search for information. A basic augmentation is to include the number of users who lend support to a repeated pattern from the ratings database. In our approach, we retain this count till the end of Concept

Generation wherein we prune patterns based on support. We do not utilize user statistics in the search for information. Nodes may also include summarized infor- mation about a perspective of the user. For example, consider a study of gender based analysis of the data. The perspective in this case is gender. Here, we may include a count of the number of men and women along with the ratings in each node.

Suppose we wish to study the scoring pattern (i.e. patterns in the scores assigned by users to the items in the ratings database) of the users. We could store the mean and standard deviation of the scores assigned by the users to each item in a concept node. This has some interesting applications. Maintaining such statistics could help us discern patterns in the focus of a group of item users. A sample conclusion is that people who rate items 1 through 5 are divided in their choice of item 6 but are unequivocal in their dislike for item 7. This may be observed in the values of the mean and standard deviation of those items as we traverse the lattice.

Another possible direction is to look at relating a set of lattices given a set

70 of items. This is the problem of lattice intersection over query items. A simple solution is to just union the results of all the resulting nodes in the lattices.

However this may not be entirely accurate as the query terms may have found a more accurate match in one lattice over the other. Hence it is essential to bias the final recommendations based on the level match in each lattice.

Enhancements in the lattice search can be employed to improve real-time per- formance of the system. Encoded domain knowledge can be used to guide the search for faster responses. If there is a natural partitioning in the items of the ratings database, then they would translate to multiple concept lists after the concept partitioning process, and these can be used to produce smaller lattices that can be searched faster and may be, in parallel.

A major bottleneck that we foresee in excess augmentation of concepts is the problem of time-consumptive computation. Designers should bear in mind that each additional attribute requires additional storage and processing time.

Computing and storing complicated statistical information may deteriorate the real-time performance. The choice of statistics guide the information discovery in the knowledge structures and complicating them may limit the candidate recom- mendations severely.

It is quite evident that the opportunities for future research are immense in the knowledge representation and discovery aspects of formal concept-based rec- ommendation systems.

71 Bibliography

[1] http://en.wikipedia.org/wiki/Recommender_system - Definition of Rec- ommender Systems on Wikipedia

[2] http://www.pandora.com/ - Pandora Music Recommender System created by Music Genome Project

[3] http://movielens.umn.edu/ - MovieLens from the GroupLens Research group at the University of Minnesota

[4] http://www.tnrdlib.bc.ca/rr.html - Reader’s Robot - Book Recom- mender System

[5] Priss, Uta. ”Lattice-based Information Retrieval,” in Knowledge Organiza-

tion, Vol. 27, No.3, 2000, pp. 132-142

[6] Yardi, Aparna Arvind. ”Concept Based Information Organization and Re-

trieval.”, Masters Thesis, University of Cincinnati, 2006.

[7] Malone, Thomas W., Grant, Kenneth R., Turbak, Franklyn A., Brobst,

Stephen A. and Cohen, Michael D. Intelligent Information Sharing Systems.

Communications of the ACM, 30, 5 (1987), pp. 390-402.

72 [8] Foltz, Peter W., and Dumais, Susan T. Personalized Information Delivery:

An Analysis of Information Filtering Methods. Communications of the ACM,

35, 12 (1992), pp. 51-60.

[9] Salton, Gerard and Buckley, Christopher. Term-Weighting Approaches in

Automatic Text Retrieval. Information Processing and Management, 24, 5

(1988), pp. 513-523

[10] Goldberg, David and Nichols, David and Oki, Brian M. and Terry, Douglas.

Using Collaborative Filtering to Weave an Information Tapestry. Communi-

cations of the ACM, 35, 12 (1992), pp. 61-70

[11] Resnick, Paul and Iacovou, Neophytos and Suchak, Mitesh and Bergstorm,

Peter and Riedl, John. GroupLens: An Open Architecture for Collaborative

Filtering of Netnews. Proceedings of ACM Conference on Computer Sup-

ported Cooperative Work, 1994, pp. 175-186.

[12] Breese, John S. and Heckerman, David and Kadie, Carl. Empirical Analysis

of Predictive Algorithms for Collaborative Filtering. Proceedings of the Four-

teenth Annual Conference on Uncertainty in Artificial Intelligence, 1998, pp.

43-52.

[13] Pennock, David and Horvitz, Eric and Lawrence, Steve and Giles, C Lee.

Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and

Model-based Approach. Proceedings of the Sixteenth Conference on Uncer-

tainty in Artificial Intelligence, 2000, pp. 473-480.

[14] Sarwar, Badrul and Karypis, George and Konstan, Joseph and Riedl, John.

Application of in recommender systems - a case

73 study. ACM Web Knowledge Discovery in Databases(WebKDD) Workshop.

2000.

[15] Miller, Bradley N., Albert, Istvan and Lam, Shyong K., and Konstan, Joseph

A., and Riedl, John. MovieLens: Unplugged: Experiences with an occasion-

ally connected recommender system. of ACM Conference on Intelligent User

Interfaces (Accepted Poster), 2003.

[16] Agarwal, Nitin and Haque, Ehtesham U., and Liu, Huan and Parsons, Lance.

”A Subspace Clustering Framework for Research Group Collaboration”. In-

ternational Journal on Information Technology and Web Engineering, 1(1),

pp 35 - 58, 2006.

[17] Shardanand, Upendra and Maes, Pattie. ”Social Information Filtering: Algo-

rithms for Automating Word of Mouth’,” Proceedings of CHI’95 Conference

on Human Factors in Computing Systems, ACM Press, Vol:1, pp 210-217,

1995.

[18] Konstan, Joseph A., and Miller, Bradley N., and Maltz, David and Herlocker,

Jonathan L., and Gordon, Lee R., and Riedl, John. ‘GroupLens: Applying

Collaborative Filtering to Usenet News’ , Communications of the ACM , Vol.

40, No. 3, pp 77-87, 1997.

[19] Herlocker, Jonathan L., and Konstan, Joseph A., Terveen, Loren G., and

Riedl, John. ‘Evaluating collaborative filtering recommender systems’, ACM

Transactions on Information Systems (TOIS) , Vol. 22, Issue. 1, pp 5-53,

2004.

[20] Aggarwal, Charu and Wolf, Joel L., , Wu, Kun-Lung., and Yu, Philip S. 1999.

Horting Hatches an Egg: A New Graph-Theoretic Approach to Collabora- 74 tive Filtering. In Proceedings of ACM SIGKDD International Conference on

Knowledge Discovery & Data Mining, San Diego, CA, pp 201-212,1999.

[21] Maltz, David, and Ehrlich, Kate. Pointing the way: Active collaborative

filtering. In Proceedings of the ACM Conference on Human Factors in Com-

puting Systems, CHI95, ACM Press, pp. 202209, 1995.

[22] http://ieor.berkeley.edu/~goldberg/jester-data/ - Jester Online Joke Recommender Dataset Webpage

[23] http://www.grouplens.org/node/12#attachments - MovieLens dataset from GroupLens Research Group at the University of Minnesota

[24] Bernhard Ganter and Rudolf Wille. Translator-C. Franzke. Formal Concept

Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., 1997.

[25] Getoor, Lise and Sahami, Mehran. Using probabilistic relational models for

collaborative filtering. Working Notes of the KDD Workshop on Web Usage

Analysis and User Profiling, 1999.

[26] Goldberg, Ken and Roeder, Theresa and Gupta, Dhruv and Perkins, Chris.

Eigentaste: A constant time collaborative filtering algorithm. Information

Retrieval, 4(2):133–151, 2001.

[27] Resnick, Paul and Iacovou, Neophytos and Suchak, Mitesh and Bergstorm,

Peter and Riedl, John. GroupLens: An Open Architecture for Collaborative

Filtering of Netnews. Proceedings of ACM Conference on Computer Sup-

ported Cooperative Work, 1994, p. 175-186.

75 [28] Sarwar, Badrul M., Karypis, George and Konstan, Joseph A., and Reidl,

John. Item-based collaborative filtering recommendation algorithms. World

Wide Web, p. 285–295, 2001.

[29] Ungar, Lyle and Foster, Dean. Clustering methods for collaborative filtering.

Proceedings of the Workshop on Recommendation Systems, 1998.

[30] Priss, Uta. ”Knowledge Discovery in Databases Using Formal Concept Anal-

ysis.” in Bulletin of ASIS 27, 1, 2000, p. 18-20.

[31] Priss, Uta. ”Faceted Information Representation.” in Stumme, Gerd (ed.),

Working with Conceptual Structures. Proceedings of the 8th International

Conference on Conceptual Structures, Shaker Verlag, Aachen, 2000, p. 84-94

[32] Priss, Uta. ”A Graphical Interface for Document Retrieval Based on Formal

Concept Analysis.” in Santos, Eugene (ed.), Proceedings of the 8th Midwest

Artificial Intelligence and Cognitive Science Conference. AAAI Technical Re-

port CF-97-01, 1997, p. 66-70.

[33] Agrawal, Rakesh and Srikant, Ramakrishnan. Fast algorithms for mining

association rules. Proc. 20th Int. Conf. Very Large Data Bases, VLDB, p.

487-499, 1994.

[34] Burdick, Douglas and Calimlim, Manuel and Gehrke, Johannes. Mafia: A

maximal frequent itemset algorithm for transactional databases. p. 443-452,

2001.

76