UNIVERSITY OF CINCINNATI
Date:______
I, ______, hereby submit this work as part of the requirements for the degree of: in:
It is entitled:
This work and its defense approved by:
Chair: ______
A Concept-Based Framework and Algorithms For Recommender Systems
by
Shriram Narayanaswamy
Bachelor of Engineering(Hons.) Electronics and Instrumentation
Birla Institute of Technology and Science(BITS), Pilani
A Masters thesis submitted to the faculty of
University of Cincinnati
in partial fulfillment of the requirements for the degree of
Master of Science
Department of Computer Science
University of Cincinnati
June 2007 ABSTRACT
In today’s consumer driven world, people are faced with the problem of plenty.
Choices abound everywhere, be it in movies, books or music. Recommender systems spare the user the frustration of searching for the proverbial needle in the haystack by offering recommendations based on a user’s personal preferences. In this thesis, a generic framework and algorithms for building concept-based recommender sys- tems are presented. A concept-based approach leverages the deep structure of a rating-database and reveals complex, higher level inter-relationships between entities in the data. The algorithms encode user preferences from a ratings database into concepts using collaborative filtering, organize concepts into lattices efficiently, and enable fast querying of the lattices for recommendations. We apply our algorithms on two real-world datasets and demonstrate their capabilities in generating quality recommendations in real-time.
ii
ACKNOWLEDGMENTS
I would to express my heartfelt gratitude to Dr.Raj Bhatnagar, my advisor, for being a constant source of motivation and guiding me through this thesis. I would also like to thank Dr.Ali Minai and Dr.Carla Purdy for their valuable suggestions and feedback. Thanks are due to my friends for all their encouragement. And finally, this work is dedicated to my parents, Ms.Revathi and Mr.Narayanaswamy, and I can only begin being grateful for their unconditional love, support and everything else...
iv Contents
Table of Contents v
List of Figures viii
List of Tables x
1 Introduction 1
1.1 Recommender Systems ...... 1
1.1.1 Why Recommender Systems? ...... 2
1.1.2 How to make recommendations? ...... 3
1.2 Concept-based approach ...... 4
1.3 Problem Statement ...... 5
1.4 Contributions ...... 6
1.5 Approach to the problem ...... 7
2 Related Research 10
2.1 Information Filtering ...... 10
2.2 Collaborative Filtering ...... 12
2.3 Concept Based Information Retrieval ...... 13
3 Framework & Algorithms 14
v 3.1 The Framework ...... 14
3.2 Components of the Framework ...... 16
3.3 Concept Generation ...... 18
3.4 Algorithm for Concept Generation ...... 18
3.4.1 Description of the algorithm ...... 20
3.4.2 Illustration of the algorithm ...... 22
3.4.3 Complexity Analysis ...... 24
3.5 Concept Partition ...... 25
3.5.1 Algorithm for Concept Partition ...... 26
3.5.2 Description of the algorithm ...... 27
3.5.3 Illustration of the algorithm ...... 28
3.5.4 Complexity Analysis of the algorithm ...... 30
3.6 Lattice Generation ...... 31
3.6.1 Algorithm for lattice generation ...... 31
3.6.2 Description of the algorithm ...... 34
3.6.3 Illustration of the algorithm ...... 35
3.6.4 Correctness Proof of the algorithm ...... 38
3.7 Lattice Querying ...... 41
3.7.1 Algorithm for Lattice Querying ...... 42
3.7.2 Description of the algorithm ...... 43
3.7.3 Illustration of the algorithm ...... 44
4 A Joke Recommender System 46
4.1 Components of the UI ...... 46
5 Case Study - Jester and MovieLens Datasets 52
5.1 Processing the Jester Dataset ...... 53
vi 5.2 Processing the MovieLens Dataset ...... 54
5.3 Concept Generation - Identifying repeatedly rated jokes/movies . 54
5.3.1 Exploding space requirements in subspace clustering of SCuBA 55
5.3.2 Results - Basic pair-wise comparisons Vs Optimized algorithm 56
5.4 Timing the Model Building Algorithms ...... 59
5.5 Space Requirement for the Model Building algorithms ...... 62
5.6 Real-time performance of Algorithm: Lattice Discover ...... 64
5.7 An Improved Performance ...... 67
6 Conclusion and Future Direction 69
6.1 What’s good in a concept based approach? ...... 69
6.2 Future Direction and Caveats ...... 70
Bibliography 72
vii List of Figures
1.1 A Sample Concept Lattice ...... 5
3.1 A framework for concept-based recommender systems ...... 16
3.2 A sparse dataset and its associated lattice ...... 26
3.3 Lattice after combining levels 1 and 2 ...... 36
3.4 Lattice after combining levels 1, 2 and 3 ...... 37
3.5 Lattice after combining all levels from 1 through 5 ...... 39
3.6 Example of a typical lattice to search for making recommendations 44
4.1 Main Screen of the GUI Applet for Joke recommendation . . . . . 48
4.2 GUI for Joke Recommendation: User has already rated the fol-
lowing jokes: Joke #2, #5 and #6 and is currently viewing joke
#7...... 49
4.3 GUI for Joke Recommendation: User is viewing recommendations. 50
4.4 The Clear All Button clears all ratings and recommendations and
repopulates the Joke Database...... 51
5.1 Naive Vs Optimized Concept Generation - Jester ...... 57
5.2 Naive Concept Generation - MovieLens ...... 57
5.3 Optimized Concept Generation - MovieLens ...... 58
viii 5.4 Percentage Pruning: Naive Vs Optimized Concept Generation -
Jester dataset ...... 58
5.5 Percentage Pruning: Naive Vs Optimized Concept Generation -
MovieLens dataset ...... 59
5.6 Jester dataset: Time taken to generate concepts ...... 60
5.7 MovieLens dataset: Time taken to generate concepts ...... 60
5.8 Jester dataset: Time taken to generate lattices from concepts . . . 61
5.9 MovieLens dataset: Time taken for concept generation, partition
and lattice building ...... 61
5.10 Jester dataset: Space required to generate concepts ...... 63
5.11 Movielens dataset: Space required to generate concepts ...... 63
5.12 Jester dataset: Space required to generate lattices from concepts . 64
5.13 Jester dataset: Precision Measurement ...... 66
5.14 Jester dataset: Recall Measurement ...... 66
5.15 Jester dataset: Query processing time ...... 67
5.16 MovieLens dataset: Precision Measurement ...... 68
ix List of Tables
3.1 A typical ratings database ...... 15
3.2 Concept Generation: Initial database ...... 22
3.3 Concept Generation: currentTuple = {1 2 3 4}. Iteration 1 . . . . 22
3.4 Concept Generation: currentTuple = {1 2 4}. Iteration 2 . . . . . 23
3.5 Concept Generation: currentTuple = {2 3 4}. Iteration 3 . . . . . 23
3.6 Concept Generation: currentTuple = {5 6 7}. Iteration 4 . . . . . 23
3.7 Concept Generation: currentTuple = {2 3}. Iteration 5 ...... 23
3.8 Concept Generation: currentTuple = {5 6}. Iteration 6 ...... 24
3.9 Concept Partition: After Sorting ...... 28
3.10 Concept Partition: currentConcept = {1 2 4}. Visited[1,2,4] = 0.
Iteration 1. conceptSet[1] = (1 2 4) (2 4) (2)...... 29
3.11 Concept Partition: currentConcept = {2 3 4}. Visited[2 3 4] = 0.
Iteration 2. conceptSet[2] = (2 3 4) (2 3) (2 4) (2) ...... 29
3.12 Concept Partition: currentConcept = {5 6}. Visited[5 6] = 0.
Iteration 5. conceptSet[3] = (5 6) (6) ...... 29
3.13 Concept Partition: currentConcept = {6 7}. Visited[6 7] = 0.
Iteration 6. conceptSet[3] = (6 7) (6) ...... 30
3.14 After partionining ...... 30
3.15 Lattice Generation: After applying splitByLength() ...... 36
x Chapter 1
Introduction
1.1 Recommender Systems
In today’s consumer driven world people are faced with the problem of plenty.
Choices abound everywhere, be it in movies, books or music. Consumers are forced to sift through a number of choices before they discover what they need. This is often time consuming and frustrating. Given today’s fast paced lifestyle, a slow and painstaking search for that elusive item of choice is surely not a sustainable option. People would rather look at items that are customized to their interests and preferences. The bottom line is customers want personalization. But how do we achieve personalization? A good approach is to listen to people whose interests match ours and try things they recommend [17]. Our work adopts this very spirit to make recommendations to users.
A recommender system is a platform for providing recommendations to users based on their personal likes and dislikes. The following definition from Wikipedia
[1] gives a concise and accurate definition of a recommender system.
Recommender systems are a specific type of information filtering (IF) technique
1 that attempt to present to the user information items (movies, music, books, news,
web pages) the user is interested in. To do this the user’s profile is compared to
some reference characteristics. These characteristics may be from the information
item (the content-based approach) or the user’s social environment (the collabora-
tive filtering approach).
Let us try and understand the important elements of this definition. A rec-
ommender system is described as an information filtering technique that uses reference characteristics to group users together. In the following two sections, we explain how a recommender system functions as an information filtering tech- nique and reference characteristics can be used to make recommendations. We then briefly study concept-based knowledge structures called lattices. Section 1.3 titled Problem Statement gives a precise definition of the problem we are attempt- ing to solve. We then describe our expectations from this work and our contri- butions. Section 1.5 explains how recommender systems can be built through a concept-based framework. We then conclude the chapter with a brief outline of the rest of the thesis.
1.1.1 Why Recommender Systems?
With the advent of the web, people have been engulfed in a sea of knowledge. Un- fortunately, searching through this vast space of knowledge is time consuming and frustrating. However, this cannot trivialize the utility of the web as a significant information source. Hence, massive efforts have been undertaken to present a user with the most pertinent information related to his/her search, as fast as possible.
Search engines are a shining example of such endeavors. Search engines recom- mend web pages to a user based on the information users seek. In today’s world, we are flooded by choices in a wide variety of things, not just webpages. There 2 are hundreds of thousands of movies, books, news articles and songs to choose from. Depending on our likes and dislikes, much of this is unwanted, irrelevant or redundant. Recommender systems help users wade through a complex sea of choices and show them items they may like. In this sense, recommender systems function as filters, filters that show us only that we desire! Examples of popular recommender systems are Pandora [2], MovieLens [3] and Reader’s Robot [4].
1.1.2 How to make recommendations?
The fundamental problem in information filtering is computing whether a given item is likely to interest a user or not. The outcome of such a computation is either boolean, a crisp yes or no, or a score that represents the degree to which a person may like that item. Such a score helps determine if an item can be recommended to a user or not. Quantifying a user’s taste can be achieved through reference characteristics. Reference characteristics such as, an item’s utility and its aesthetic value are reasons why an item may appeal to a user. In machine learning, these reference characteristics are often referred to as features.
Based on the nature of reference characteristics, two broad categories of infor- mation filtering exist namely content-based filtering, and collaborative filtering.
The ideology of content-based filtering is that the content of an entity determines whether a given user likes it, or not. Collaborative filtering, on the other hand, is a less descriptive approach. It does not strive to look at why someone likes a certain item. Since people who rate items in common have some underlying commonality, collaborative filtering groups these people together. Now each group consists of users, and items that appeal to like-minded people, and a new user who joins this group is recommended items from this list.
3 1.2 Concept-based approach
A lattice is a structure that provides a natural way to formalize and study the or- dering of objects. The ability to order items based on certain constraints is crucial to a recommender system as it helps evaluate items and choose one recommen- dation over another. These constraints are usually dependent on the specificity of a user’s query and his/her need for detail in the response. A lattice structure orders sets of items based on the level of detail (or granularity) of each concept; a concept with many items is highly specialized while a concept with fewer items is comparatively more general. Such an ordering of items can guide the search for a recommendation appropriate to a user’s needs.
Formally, a partially ordered set V is a lattice L=(V,6), when for any two elements x and y in V, the supremum xWy and the infimum xVy always exist[24].
The infimum of two elements is the greatest common lower bound while the supre- mum is the least common upper bound. A concept is a pair (O, A) where O is the set of objects that possess the properties represented by the attributes in set
A, and A is the set of attributes that are possessed by objects in set O. O is called the extent of the concept and A is called the intent of the concept [24]. A concept lattice is a lattice of concepts; each element of the lattice is a concept and any two concepts are related based on their extent and intent. Since ordering is based on sets of attributes, concepts that express more attributes are considered specialized while concepts that contain fewer attributes are general. Based on this definition, the bottom-most node in the lattice is the most specialized concept while the top- most node is the most general concept. As we go from top to bottom, concepts show an increasing specialization while a bottom-up traversal shows an increasing generalization. In recommender systems the users are mapped as objects and the
4 Figure 1.1: A Sample Concept Lattice items they rate are mapped as attributes.
Let us consider a movie recommender system that adopts this approach. The concept lattice shown in figure 1.1 contains a partial list of movies starring Tom
Hanks, ordered in a lattice based on the genre of each movie. Each movie is an attribute and a list of movies is referred to as a pattern. A concept is made up of a pattern i.e. only attributes and no objects. {{1,2,3,4,5,6}, φ} is the most general concept in the lattice since it contains no attributes (genre, in this case).
The concepts {{6}, {C, A, D}} and {{1,2,4}, {C,R,D}} are the most specialized since they belong to 3 genres. In this lattice, {{1,2,3,4,6}, {C}} is a parent of {{1,2,3,4}, {C,R}} since {1,2,3,4,6} % {1,2,3,4} and {C} $ {C,R}. Similarly,
{{6},{C,A,D}} is a child of {{1,2,4,6}, {C,R}} since {6} $ {1,2,4,6} and {C,A,D}
% {C,R}.
1.3 Problem Statement
Given databases of users and items, a ratings database consists of pairs of users and items rated by them, along with additional information such as timestamps
5 etc. Given such a ratings database and a set of preliminary ratings by a new user, the basic requirement of a recommendation system is to recommend the largest and the most pertinent set of recommendations to the new user. The largest set of recommendations refers to the maximum number of items in the items database that could be recommended to the user. The pertinent items are the ones that the new user is most likely to rate in the future.
1.4 Contributions
Little or no attention has been paid to the deep structure inherent in a ratings- database and how its organization may be exploited to enhance the quality of recommendations. By using a lattice to order items, we can evaluate candidate recommendations, and choose one over another depending on the granularity of a user’s query. For example, a user with very few ratings conveys little about his/her preferences, and is made generalized recommendations as opposed to a user with many ratings and thus, a more refined set of preferences. In typical recommender systems, this may not be possible because the specificity of a user’s query is often neglected while making recommendations.
One of the distinguishing capabilities of this approach is the extraction of latent higher level knowledge. Exploring pathways in a lattice of movies, for example, could reveal a structure of abstract ontological categories, such as movie genres, and interrelationships among genres. Such complex inter-relationships can be easily observed by adopting a concept-lattice based knowledge representation scheme. Such higher level patterns can then be coupled with other dimensions such as age, gender or ethnicity to further discern trends in user preferences.
Additionally, a hierarchical structure of concepts enables a quantitative evaluation
6 of the usefulness of an item to a user. Finally, traversing up and down the lattice ensures that we are searching for items based on a common thread/theme, and choosing one item over another does not violate this theme. This is in contrast to cluster based models wherein sets of items corresponding to varied themes may be aggregated due to partial overlap of items and thus, choosing one set of items over another does not guarantee similar properties. Typical model-based approaches do not use hierarchical structures, and may not be able to achieve the capabilities of our approach.
Although concept lattice-based information retrieval (IR) schemes have been proposed earlier [5], they rely on annotations using keywords to index items. We cannot adopt these techniques because collaborative filtering avoids using fea- tures/descriptions for each item in a database. Also, the existence of ratings databases in different realms of life creates a need for generic approach that can operate in a domain-independent manner. We strive to address these inadequa- cies in our two-fold contribution. First, we present a unified approach to concept- based knowledge discovery using a generic and flexible four-component framework to generate recommendations from a ratings database. Second, we present algo- rithms that can efficiently convert user preferences into concepts, organize them into lattices, and then query the lattices for recommendations. We then present results of applying our algorithms to two different real-world datasets and demon- strate improvement compared to other approaches.
1.5 Approach to the problem
We adopt a collaborative filtering approach to this work. It works on the following assumption: people have an underlying reason for liking an item. If a number of
7 people like a certain item, then they have matching interests at least with respect
to that item. Now, if a certain group of people like a number of items in common,
they have some underlying generalized preferences in common that are manifested
through the choice of these items. Now, given a new user’s choice of items, we
identify groups of users whose ratings match our user, and then, recommend to
the new user other items that each group has reviewed/rated/enjoyed/liked.
In order to implement such a system, we adopt a concept based framework.
Entities in the data are represented as concepts while their characteristics are
represented as attributes of these concepts. The concepts are then organized
into a lattice of concepts where parent-child relationships exist between any two
concepts. In this work, each concept is a repeated pattern of rated items from a
ratings database. A pattern consists of a set of items each of which is an attribute
of that pattern.
Consider a movie recommender system that adopts this approach. Now, each
concept in the lattice is a pattern consisting of a list of movies that have been
rated by a significant number of users. Each attribute of a concept is a movie
from the pattern contained in the concept. For example, a concept C may consist
of the following pattern (a list of famous World War II movies):
(Tora Tora Tora, Pearl Harbor, Enemy at the gates, Saving Private Ryan, The
Thin Red Line, U-571, The Dirty Dozen)
Each movie is an attribute while the entire list of movies is referred to as a pattern. A concept is made up of a pattern i.e. only attributes and no objects.
In this setup, a concept c1 is a parent of c2 if it contains a subset of items
present in c2. Similarly, a concept c2 is a child of c1 if it contains a superset
of items present in c1. Once such a lattice is constructed with all parent-child relationships, a new user’s ratings are looked up (in the lattice) and the closest 8 matching concept (pattern) is found. Recommendations consist of those items that are in the pattern and not already rated by this new user.
For example, a concept C1 consisting of (Tora Tora Tora, Pearl Harbor, Enemy at the gates, Saving Private Ryan, The Thin Red Line, U-571, The Dirty Dozen) is a child of concept C2 consisting of (Tora Tora Tora, Saving Private Ryan).
Clearly, C2 is a parent of C1. In the following chapter, we explore prior work in the area of recommender sys- tems and the application of concept-based approaches to this problem. In chapter
3, we describe the concept-based framework for recommender systems. We also study a set of algorithms that may be used by the components of the frame- work. We look at an applet based Graphical User Interface for recommendation in Chapter 4. Chapter 5 presents the results of applying the proposed framework and algorithms on the Jester data set. Finally, Chapter 6 offers conclusions and future direction for research in this area.
9 Chapter 2
Related Research
2.1 Information Filtering
Malone et al. [7] talk about three different categories of information filtering namely content based, social and economic to predict a user’s response to an article. As described earlier, content based filtering is concerned with filtering information based on its content. All keyword-based search algorithms employ this approach. For example, early search engines used string search algorithms on documents annotated with a set of keywords, and any search query would look up this keyword dictionary to determine if a document was relevant to the search or not. The basic keyword search was later augmented with complex functions such as weight vectors to filter out irrelevant documents [8][9].
Social filtering uses people’s subjective judgment to identify interesting content or filter out objectionable ones. Also known as collaborative filtering, it groups users and/or items based on the reasoning that people with similar preferences like similar items[17]. For instance, successful e-commerce websites such as Ama- zon.com recommend additional items to customers using the past history of similar
10 customers. People in a group share similar preferences and if a new member joins this group, his/her interests were extrapolated from the interests of the group.
Collaborative filtering does not strictly require a group to define preferences since a moderator can be nominated and the group’s preferences could be defined based on the moderator’s preferences.
An early system that implemented both content and collaborative filtering was
Tapestry [10]. Tapestry was developed at Xerox PARC and was intended for mail
filtering. The effort was to augment an existing content-based system with the power of collaborative filtering. Emails at Xerox PARC were being filtered based on their content but the addition of people’s reactions to reading attachments helped determine whether people found the attachment useful or not. All system users were set up with mail filters and a user’s filter could access others’ filters.
Now, filtering could be done based on the content of the incoming document as well as whether other receivers found it useful or not. Another application used collaborative filtering for Usenet News filtering [18].
GroupLens [11] improves on the work in Tapestry by adding two important components. They use a client-server model to support evaluations from multi- ple sites. They also support aggregate evaluations wherein past correlations to recommendations are considered for new recommendations.
In this rest of this chapter, we shall examine approaches to collaborative fil- tering in particular.
11 2.2 Collaborative Filtering
Breese et al. [12] take a comprehensive look at some of the early measures to determine the closeness of two users based on the items they rate. They subdi- vide collaborative filtering methods into memory based methods and model based methods. In a memory-based approach, recommendations are obtained by aggre- gating ratings of similar users. A number of similarity measures were proposed to group similar users. Popular measures include cosine-based similarity [28], Pear- son correlation coefficient [27] and later, extensions such as default voting and case amplification [12]. In model-based approaches, many probabilistic [25][26] and clustering [29] techniques have been employed to represent user preferences.
Pennock et al, describe a hybrid approach to model and memory based meth- ods where they use a personality diagnosis measure to probabilistically determine the personality type of a user with respect to other users and also whether they will like a recommendation or not[13]. Our approach is similar in spirit to Pen- nock and others in the sense of determining closeness to a given user. However, we differ in that we don’t explicitly employ a probabilistic measure but use a lattice-based knowledge representation scheme. An interesting approach called
Horting was proposed by Wolf and others [20] where they use a graph-theoretic approach to collaborative filtering. This approach offers the unique advantage of
finding transitive relationships between ratings by traversing the graph.
Sarwar and others look at singular value decomposition (SVD) as a means to reduce the size of the ratings database. Clearly the number of users is growing at a tremendous rate and scalability of the algorithms is a vital factor. The authors report that for extremely sparse datasets, the performance of basic collaborative
filtering approaches is far superior. However for denser datasets, SVD provides a
12 scalable alternative[14]. Maltz and Ehrlich study the possibility of active infor- mation filtering wherein users proactively send pointers to others when they find interesting articles, movies etc [21].
In more recent work, Herlocker and others have undertaken an in-depth survey of the evaluation schemes for recommender systems [19]. Efforts have been made to design interfaces for users who occasionally connect to recommender systems.
The scenario discussed is that of a user wondering what movie to rent for the night at a video store. With no recommender system close at hand, should the user return back empty handed? The authors explore the possibility of a recommender system on a PDA [15].
2.3 Concept Based Information Retrieval
Much of the work in concept-based information retrieval can be found in the natural language processing realm. Priss uses concept-based lattices for document retrieval [5] [30] [31] [32]. In her approach, Priss adopts a facet-driven approach wherein domain knowledge is used to encode conceptual relationships between objects in the domain. For instance, in document retrieval, each document is represented by a set of keywords or index terms and queries are matched with these keywords to identify relevant documents. We cannot adopt these techniques because collaborative filtering avoids using keywords/descriptions for each item in a database.
13 Chapter 3
Framework & Algorithms
In this chapter, we describe the framework for a formal-concept based recom- mendation system. As we observed in earlier chapters, formal concept analysis involves representing information in the form of lattices and using the hierarchical structure of lattices to discover knowledge from data. The proposed framework provides a means to process raw data into concepts, convert them into knowledge lattices, and query the lattices for recommendations. We also propose algorithms that have been optimized to achieve these tasks.
3.1 The Framework
In recommendation systems, raw data is usually in the form of user ratings. A user, in this context, is one who rates items of his/her choice from an item database.
These items are typically movies, music, articles, research papers, consumer elec- tronic goods, software among others. A rating is usually some ordinal quantity.
For example, users may assign a numerical score or a letter grade (having some defined ordering) to the items in the database. For example, consider a movie
14 Table 3.1: A typical ratings database
User-Id Item-Id Score
1 5 3
1 6 2
1 7 1
2 5 5
database such as IMDb. A user is a typical movie watcher who wishes to rate
some of the movies he/she has viewed in the past. They may give each movie
a rating between 1 and 5; 1 for poor and 5 for excellent. Each user may rate
multiple movies and any movie may be rated by multiple users. Thus, there is
an inherent many-to-many relationship betweens items and users. Without loss
of generality, we assume a format such as the one in Table 3.1.
The first step towards mining using a concept-based approach is to identify
concepts in the domain. Once concepts are identified, the next step is to orga- nize concepts in the form of a lattice. The final step is to query the lattice for information. The focus of this thesis is to present this framework and a set of algorithms that can be used by components of the framework in order to produce good recommendations. We have also proposed metrics that quantify a good rec- ommendation taking into account both the pertinence of the recommendation as well its coverage over the possible recommendations.
15 Figure 3.1: A framework for concept-based recommender systems
3.2 Components of the Framework
Consider the framework in Figure 3.1. It has four main components. Each com- ponent is discussed below:
• Optimized Concept Generation: This component achieves the first step
of the process of discovering knowledge from ratings databases, namely, con-
verting the user ratings into concepts. This algorithm improves upon the
basic subspace clustering algorithm presented in SCuBA by Agarwal et al.,
[16]. The basic algorithm of SCuBA accumulated a large number of patterns
rated by an insignificant fraction of users. The pruning step was applied af-
ter all comparisons were made, and hence, all the generated concepts had
to be held in memory until the final step. We have overcome this space
inefficiency by optimizing the comparison process. We prune patterns with
certain number of ratings at the end of each iteration of the pair-wise com-
parison process. We have achieved 60-70% reduction in the space requires
to store the generated concepts (See section 5.3.1 for experimental results).
Although itemset mining algorithms such as APriori(frequent itemset) [33]
16 and MAFIA(maximal itemset) [34] could be applied, these algorithms are
designed to be complete (discover all possible itemsets). Since we don’t re-
quire a complete algorithm, we chose to implement a pair-wise comparison
based approach to discover repeated patterns of items.
• Concept Partition: In sparse databases, generated concepts may have few
items in common and so, placing them in the same lattice may not possible if
parent concepts are to be strict subsets of children concepts. The optimized
lattice search algorithm presented in section 3.7 requires that this parent-
child relationship be preserved and forcing dissimilar concepts into the same
lattice will violate this relationship and preclude the optimization achieved
by our algorithm. The presence of disconnected concepts in a lattice will
have undesirable consequences such as slower lattice generation, querying
and inaccurate recommendations. This component helps partition concepts
into multiple subsets of concepts, each belonging to a different lattice.
• Lattice Generation: Once concepts are identified, we need an efficient
means to organize them into lattices. This component helps build a lattice
from a list of concepts.
• Lattice Querying: Once the lattice is built, we can query the lattice for
information. In this case, we can query the lattice for a recommendation
based on the past history (previous ratings) of a user. This component uses
algorithms that can discover recommendations efficiently from large lattices.
In the following sections, we have presented algorithms that are designed for speed and space-efficiency.
17 3.3 Concept Generation
Given a ratings database, the following algorithm converts user ratings into con-
cepts. Each concept is a set of items that meets a support threshold δ . A support
threshold determines the minimum number of occurrences of a pattern to deem
it as frequent. Concepts are treated as sets i.e. they are ordered based on the
number of items rated and the actual items themselves. They do not depend on
the sequence in which the items occur. For example, given three concepts A {1,
2, 3 4} B {2, 3, 4} C {2, 3}, a typical subset ordering is C < A < B.
3.4 Algorithm for Concept Generation
Algorithm conceptGenerate
Input: A database of user ratings
Output: A list of patterns that occur frequently in the database
1. frequentPatterns ←[]
2. Sort the database in the decreasing order of the number of ratings
3. SizeOfDatabase ←size of the database
(∗ get the number of tuples in the database ∗)
4. Clear localHashTable
(∗ the key of the hash is the pattern, the value is its count ∗)
5. Clear globalHashTable
(∗ the key of the hash is the pattern, the value is its count ∗)
6. currentLength ←-1
7. previousLength ←-1
8. for i ← 1 to sizeOfDatabase − 1 18 9. do
10. currentTuple ←database[i]
11. currentLength ←length of currentTuple
12. for j ←i+1 to sizeOfDatabase − 1
13. do
14. compareTuple ←database[j]
15. intersect ←currentTuple T compareTuple
16. if intersect exists in localHashTable
17. then increment the count of the pattern
18. else insert the pattern into localHashTable and initialize
count to 1
19. endif
20. endfor
21. if prevLength IS NOT EQUAL TO currentLength
22. then
23. for k ← 1 to sizeofglobalHashT able
24. do
25. if pattern length > currentLength and support for pattern is less than δ
26. then remove this pattern
27. endif
28. endfor
29. prevLength ←currLength;
30. endif
31. Update globalHashTable with contents of localHashTable
32. Clear localHashTable
19 33. endfor
34. sizeOfGlobalHash ←size of global hashtable
35. for i ← 1 to sizeOfGlobalHash
36. do
37. if globalHashTable[i].count > δ 38. then Add globalHashTable[i] to frequentPatterns
39. endif
40. endfor
3.4.1 Description of the algorithm
The concept generation algorithm has been described in the previous section (con-
ceptGenerate). This algorithm is based on the subspace clustering algorithm pro- posed by Nitin et al [16] . The central idea of the subspace clustering algorithm of SCuBA is pair-wise comparisons between adjacent rows of a ratings database.
The algorithm computes the intersecting ratings (called patterns) for each such comparison. The algorithm keeps track of the number of occurrences of each pattern and uses a minimum threshold to determine useful patterns. This can be viewed as a subspace clustering approach since each item in a user’s rating is a dimension and by computing pair-wise intersection we compute the subspaces that are of interest.
However, the number of patterns grows rapidly with the number of users in the ratings database. A large percentage of the generated patterns have very little support, and they are held in memory and pruned only after all iterations are completed. In this work, we have proposed an optimization that provides significant improvement over the existing algorithm (See next chapter for exper- 20 imental results). Instead of computing row-wise intersection directly from the ratings database, we sort the users in the decreasing order of the number of items rated. Then we can use a simple property to prune the global hash table after each iteration and moderate its size. The pruning is based on the following property:
Property:Given two sets of items S1 and S2, the cardinality of the intersection is always less than or equal to the size of the smaller (in terms of cardinality) of T the two sets. Formally, given S1 and S2, if |S1| 6 |S2| then |S1 S2| 6 |S1|. The ratings database is initially sorted in the decreasing order of the number of ratings(Line 2). Next, we proceed to perform pair-wise comparisons. To do pair-wise comparisons, we need to execute a dual for-loop. This is shown on Lines
8 and 12. The outer for loop uses the variable i while the inner loop uses j. The function of the outer loop is to traverse through all the elements in the list while the inner loop computes the intersection (called pattern) of each ratings with other ratings in the database. Each step of the inner loop updates a local hash-table with a new pattern (Line 18) or increments the count of an existing pattern (Line
17).
If the length of the current rating is same as the next, then the local cache
(local hash-table) is copied into the global hash-table and then emptied (Lines
31 & 32). If the size of the current rating is larger than that of the next rating, then we can prune those patterns whose length is greater than or equal to the length of the next rating if it does not meet the minimum support threshold δ
(Lines 21-30). This is because future iteration elements (ratings) cannot generate concepts whose length is greater than or equal to their own length (refer to afore mentioned property). Note that sorting in the descending order of length is vital here since it facilitates the optimization.
21 3.4.2 Illustration of the algorithm
Given a database D in Table 3.2 and δ = 2.
Table 3.2: Concept Generation: Initial database
1 2 3 4 2 3
1 2 4 5 6
2 3 4 6 7
5 6 7
Each table from 3.3 to 3.8 has three rows. Rows 1 and 2 are the ratings from the database that are compared. Row 3 is the result of the intersection.
After Iteration 1 (shown in Table 3.3), all concepts of length 3 or more can be pruned since the length of the next element {1 2 4} is 3 and it cannot find overlaps of length 3 or more. In this case, only {2 3 4} and {1 2 4} qualify but both of them do not meet the δ =2 condition and hence are pruned.
Now, after iteration 4 (shown in Table 3.6), all concepts of length 2 or more can be pruned since the length of the next element {6 7} is 2 and it cannot find overlaps of length 2 or more. In this case, {2 3}, {2 4}, {5 6}, {6 7} qualify but only {2 3} meets the δ =2 condition and hence all others are pruned. After the
final pruning only ({φ}, 6) ({2, 3}, 2) remain.
Table 3.3: Concept Generation: currentTuple = {1 2 3 4}. Iteration 1
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
1 2 4 2 3 4 5 6 7 2 3 5 6 6 7
1 2 4 2 3 4 φ 2 3 φ φ
22 Table 3.4: Concept Generation: currentTuple = {1 2 4}. Iteration 2
1 2 4 1 2 4 1 2 4 1 2 4 1 2 4
2 3 4 5 6 7 2 3 5 6 6 7
2 4 φ 2 φ φ
Table 3.5: Concept Generation: currentTuple = {2 3 4}. Iteration 3
2 3 4 2 3 4 2 3 4 2 3 4
5 6 7 2 3 5 6 6 7
φ 2 3 φ φ
Table 3.6: Concept Generation: currentTuple = {5 6 7}. Iteration 4
5 6 7 5 6 7 5 6 7
2 3 5 6 6 7
φ 5 6 6 7
Table 3.7: Concept Generation: currentTuple = {2 3}. Iteration 5
2 3 2 3
5 6 6 7
φ φ
23 Table 3.8: Concept Generation: currentTuple = {5 6}. Iteration 6
5 6
6 7
φ
Ignoring φ, frequentPatterns = ({2, 3}, 2)
3.4.3 Complexity Analysis
Average Space Complexity of Concept Generation
Given a database of n users,
Average Case assumption: Suppose each user is compared with m other users
producing m candidate patterns. Given a support threshold δ, assume that m/2 patterns have a support threshold greater than δ. Total number of patterns:
(n−1) (n−2) (n−3) (n−(n−1)) = 2 + 2 + 2 + ... + 2
(n−1)+(n−2)+(n−3)+...+(n−(n−1)) = 2
(n−1)∗n = 4
≈ O(n2)
Worst Case assumption: Suppose each user is compared with m other users producing m candidate patterns. Given a support threshold δ, assume that m
patterns have a support threshold greater than δ. Total number of patterns: 24 = (n − 1) + (n − 2) + (n − 3) + ... + (n − (n − 1))
(n−1)∗n = 2
≈ O(n2)
Average Time Complexity of Concept Generation
The average time complexity is independent of the nature of the data. All the pair-wise comparisons need to be done to determine if they meet the support threshold δ. Hence the time complexity is always O(n2), n - number of users.
3.5 Concept Partition
This component helps partition a list of into smaller lists of disjoint concepts.
The case for multiple lattices
To illustrate the need for concept partition, let us consider the ratings database shown in Table 3.2. By applying the algorithm described in the previous section, we can generate the following concepts: (E), (A B), (D E), (C D E), (A B C).
If we apply the requirement that all children concepts should be strict supersets of their parents (w.r.t rated items), there are two disconnected segments in this lattice - {(A B C), (A B)}, and {(E) (D E) (C D E)}. Clearly, these two subsets of concepts belong to different lattices and this algorithm partitions them.
25 Figure 3.2: A sparse dataset and its associated lattice
3.5.1 Algorithm for Concept Partition
Algorithm conceptPartition
Input: A list of unique patterns
Output: Groups of concepts/patterns that belong to disjoint lattices.
1. SizeOfConcepts ←number of unique patterns in frequentPatterns
2. conceptSet ←[ ][ ]
3. visited[SizeOfConcepts] ←0
4. Sort frequentPatterns in the decreasing order of concept length.
5. for i ← 1 to SizeOfConcepts
6. if visited[i] EQUALS 1
7. then continue; endif
8. currentConcept ←frequentPatterns [i]
9. Clear conceptSet[i][]
10. do
11. count ←0
12. for j ←i+1 to SizeOfConcepts
13. do 26 14. compareConcept ←frequentPatterns [j]
15. if compareConcept ( currentConcept 16. then subset = true
17. else subset = false
18. endif
19. if subset equals true
20. then visited[j] ←1
21. conceptSet[i][count] ←compareConcept[j]
22. Increment count
23. endif
24. endfor
25. endfor
3.5.2 Description of the algorithm
The basic idea of the algorithm is the following. Given a list of concepts, we need
to sort it in the descending order of attribute length (Line 4). This way we ensure
that the longest concepts are considered first by the algorithm. This is important
because the aim of the algorithm is to identify subsets of the largest concepts in
the list and group them into one lattice. Once sorted, we begin by looking at each
element E (referred to as currentConcept) in the list (Line 5) and identifying all
concepts M that occur after it (Line 12) and are subsets of E (Line 15-18).
A concept C1 with items {i1,...,ik} is a subset of concept C2 with items
{i1,...,im} iff {i1,...,ik} ( {i1,...,im}. All those concepts M (including E) are then grouped under one partition (Line 21). Each partition is represented by a row of the variable conceptSet. We use a variable called visited to ensure that a 27 a set of concepts in one partition is not a subset of another set of concepts in a different partition. If a node is a subset, it is marked as visited (Line 20) and will not be considered for creating a new partition. This is done in Line 6 where if a node is visited, then the loop just continues to the next node. In the following example, Iterations 3 and 4 illustrate this situation. The nodes {2 3} and {2 4} have been visited during previous iterations (Iterations 1 and 2) and so, they are not considered for creating a new partition and the algorithm continues with the next available node.
3.5.3 Illustration of the algorithm
Table 3.9: Concept Partition: After Sorting
1,2,4 5,6
2,3,4 6,7
2,3 6
2,4 2
Each table from 3.10 to 3.13 has three rows. Rows 1 and 2 are concepts. Row
3 is the result of the determining if row 2 is a subset of row 1 (true) or not(false).
28 Table 3.10: Concept Partition: currentConcept = {1 2 4}. Visited[1,2,4] = 0.
Iteration 1. conceptSet[1] = (1 2 4) (2 4) (2).
1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4
2 3 4 2 3 2 4 5 6 6 7 6 2
false false true false false false true
Table 3.11: Concept Partition: currentConcept = {2 3 4}. Visited[2 3 4] = 0.
Iteration 2. conceptSet[2] = (2 3 4) (2 3) (2 4) (2)
2 3 4 2 3 4 2 3 4 2 3 4 2 3 4 2 3 4
2 3 2 4 5 6 6 7 6 2
true true false false false true
Iteration 3: Concept Partition: currentConcept = (2, 3). Visited[2 3]
= 1. CONTINUE
Iteration 4: Concept Partition: currentConcept = (2, 4). Visited[2 4]
= 1. CONTINUE
Table 3.12: Concept Partition: currentConcept = {5 6}. Visited[5 6] = 0. Iter- ation 5. conceptSet[3] = (5 6) (6)
5 6 5 6 5 6
6 7 6 2
false true false
29 Table 3.13: Concept Partition: currentConcept = {6 7}. Visited[6 7] = 0. Iter-
ation 6. conceptSet[3] = (6 7) (6)
6 7 6 7
6 2
true false
Iteration 7: Concept Partition: currentConcept = (6). Visited[6] = 1.
CONTINUE
Iteration 8: Concept Partition: currentConcept = (2). Visited[2] = 1.
CONTINUE
Table 3.14: After partionining
Lattice1 Lattice2 Lattice3 Lattice4
1 2 3 4 2 3 4 5 6 6 7
2 4 2 3 6 6
2 2 4
2
3.5.4 Complexity Analysis of the algorithm
In the worst case situation, all concepts are disjoint w.r.t each other, and we would need to iterate over the entire list of n concepts. Hence the time complexity would be O (n2).
30 Proof: Given a list of n concepts, we need to calculate the time complexity of
the algorithm. In the worst case each concept is disjoint from each other concept.
Hence the total number of computations equals: = (n − 1) + (n − 2) + (n − 3) +
... + (n − (n − 1)) (n−1)∗n = 2 ≈ O(n2) However if the data has many concepts that are linked to each other, then after every iteration many concepts would be marked as visited and hence would not be considered for finding subsets.
3.6 Lattice Generation
This component helps build lattices from a list of concepts.
3.6.1 Algorithm for lattice generation
Algorithm latticeGenerate
Input: A list of unique concepts conceptList
Output: A lattice of concepts
1. Sort the concepts in increasing order of concept length/Start the algorithm
from the bottom of the list if the list is obtained from conceptPartition algo-
rithm.
2. largestLength ←length of the longest concept in conceptList.
3. conceptSize ←sizeOf(conceptList)
4. conceptsBySize ←splitByLength(conceptList)
5. currentLattice ←[]
31 6. for i ← 1 to largestLength
7. do
8. currentLattice ←combineLevels(currentLattice, conceptsBySize,i)
9. endfor
10. return currentLattice
11.
12. splitByLength(conceptList)
Input: A list of unique concepts conceptList
Output: Split concepts by their length
13. largestLength ←length of the longest concept in conceptList
14. conceptSize ←size of conceptList
15. conceptsBySize[largestLength][ ] ←
16. currentLength ←1
17. count ←0
18. for i ← 1 to largestLength
19. do
20. if length of conceptList[i] equals currentLength
21. then conceptsBySize[currentLength][count] ←conceptList[i]
22. Increment count
23. else Increment currentLength
24. Count = 0
25. conceptsBySize[currentLength][count] ←conceptList[i]
26. endif
27. endfor
28. return conceptsBySize
29. end splitByLength
32 30.
31. combineLevels(currentLattice, conceptsBySize, nextLevel)
Input: A lattice,the list of all concepts available, and the next level
Output: A lattice with concepts from the next level added to the input lattice
32. numberConcepts ←number of the concepts at next level of conceptsBySize
33. levelConcepts ←[]
34. for i ← 1 to numberConcepts
35. qdo
36. newNode ←conceptsBySize[nextLevel][i]
37. currentLevel ←nextLevel - 1
(∗ start from the bottommost level of the current lattice ∗)
38. elusiveConcepts ←newNode
39. checkContains ←newNode
40. while elusiveConcepts is not empty and currentLevel is greater than
0
41. do
42. levelConcepts ←conceptsBySize[currentLevel]
43. numConcepts ←number of concepts in levelConcepts
44. for j ← 1 to numbConcepts
45. do T 46. if levelConcepts[j] ( newNode and levelConcepts[j] checkContains is not NULL
47. then add parent child relationship for levelConcepts[j]
and newNode
48. elusiveConcepts ←elusiveConcepts T (newN-
ode - levelConcepts[j])
33 49. Increment count
50. endif
51. endfor
(∗ checkContains ensures that cyclic parent redundancy is avoided ∗)
52. checkContains ←elusiveConcepts
53. currentLevel ←currentLevel - 1
54. count ←0
55. endwhile
56. endfor
57. end combineLevels
3.6.2 Description of the algorithm
The basic idea of the algorithm is to build the final lattice incrementally. In order
to achieve incremental building, concepts are first partitioned into groups having
equal number of ratings. This is done using the splitByLength method.
The splitByLength() method employs a variable called conceptsBySize,a two-
dimensional array. Each row of this array consists of a list of concepts(patterns)
of equal length while any two rows contains concepts(patterns) of varying lengths.
The method goes through all the concepts in the list (Line 18) and places each
concept in the appropriate row of conceptsBySize (Line 21 & 25). Each row of
conceptsBySize is then added to a growing lattice in the increasing order of pattern
length (which equals the number of items in a concept/pattern).
The primary requirement of a lattice building algorithm is to avoid cyclic
redundancy. Cyclic redundancy occurs when ancestors of a node are also added
as immediate parents. For instance, consider three concepts A, B and C. A is a 34 child of B and B is a child of C. We need to ensure that the only B is listed as a parent of A and not C although it is a valid ancestor of A. This is done in order to ensure that the lineage of a node is traced along a unique path. In order to ensure that there is no cyclic redundancy, Aparna [6] relies on an explicit ancestor search approach that marks all ancestors of a new node as not parent.
The motivation behind this approach is to avoid cyclic redundancy implicitly as opposed to the exhaustive and time consuming method proposed in Aparna’s thesis. By merging levels incrementally, we need not do an exhaustive search of the lattice each time. The merging process is done by the combineLevels() method
(Line 8).
The combineLevels() method goes through all the concepts in the current level that need to be added to the lattice (Line 34). The algorithm merges the new group of concepts, each l ratings long, with the bottom of the lattice while keeping track of items in the new concept that don’t have a parent yet. These items are called elusiveConcepts (Line 48). When a parent is found, elusiveConcepts is recomputed to exclude the currently found parents. If after merging with the bottom level, elusiveConcepts is not empty, we try merging with one level higher and so on until the top of the lattice is reached or until elusiveConcepts becomes empty (Line 40).
3.6.3 Illustration of the algorithm
After applying splitByLength(), table 3.15 shows the concepts grouped by length.
Combining Level 1 and Level 2:
• Adding (2 3). Elusive element = (2 3). Considering Level 1: Add (2) as
parent. Elusive element = (2 3) - (2) = (3) Add (3) as parent. Elusive 35 Table 3.15: Lattice Generation: After applying splitByLength()
Level 1 Level 2 Level 3 Level 4 Level 5
(1) (2 3) (1 2 3) (1 2 3 4 5 6) (1 2 3 4 5 6 7)
(2) (4 5) (2 3 4)
(3) (5 6)
(4)
(5)
(6)
(7)
Figure 3.3: Lattice after combining levels 1 and 2
element = (3) - (3) = φ.
• Adding (4 5). Elusive element = 4 5. Considering Level 1: Elusive element
= φ after adding (4) (5) as parents.
• Lattice generated after this step is shown in Figure 3.3
Combining current lattice with Level 3:
• Adding (1 2 3). Elusive element = 1 2 3. newNode = 1 2 3
– Considering Level 2: Add (2, 3) as a parent. (4 5) or (5 6) are not
subsets. Elusive Element = (1 2 3) - (2 3) = (1) = checkContains. 36 Figure 3.4: Lattice after combining levels 1, 2 and 3
– Considering Level 1: Search for those elements that contain 1. If a
node does not contain 1, then its not a subset or it is an ancestor.
This is how we implicitly encode ancestor relationships and
avoid searching the lattice each time a new node is added. Add
(1) as a parent. Do not add (2) or (3) because levelConcepts[j] = (2)
or (3) T checkContains is φ. Do not add (4) (5) (6) (7) because they
are not subsets of newNode.
• Adding (2 3 4). Elusive Element = (2 3 4) . newNode = (2 3 4)
– Considering Level 2: Add (2 3) as a parent. (4, 5) or (5, 6) are not
subsets. Elusive Element =(2 3 4)- (2 3) = (4) = checkContains
– Considering Level 1: Add (4) as a parent. Do not add (2) or (3) because
levelConcepts[j] = (2) or (3) T checkContains is φ. Do not add (1) (5)
(6) (7) because they are not subsets of newNode.
• Lattice generated after this step is shown in Figure 3.4
Combining current lattice with Level 4:
• Adding (1, 2, 3, 4, 5, 6) 37 – Considering Level 3: Add (1, 2, 3) and (2, 3, 4) as parents. newNode
= {1, 2, 3, 4, 5, 6}. Elusive element = checkContains = {5, 6}.
– Considering Level 2: Add (4, 5) and (5, 6) as parents. Do not add (2,
3) because levelConcepts[j] = (2, 3) T checkContains is φ.
Combining previous lattice with Level 5:
• Adding (1, 2, 3, 4, 5, 6, 7). newNode = 1, 2, 3, 4, 5, 6, 7
– Considering Level 4: Add (1, 2, 3, 4, 5, 6) as a parents. Elusive element
= checkContains = 7
– Considering Level 3: Do not add any nodes since levelConcepts[j]= 1,
2, 3 or 2, 3, 4 T checkContains is φ
– Considering Level 2: Do not add any nodes since levelConcepts[j] = 2,
3, 4, 5,5, 6 T checkContains is φ
– Considering Level 1: Add (7) as parent. Do not add any other nodes
since levelConcepts[j] = 1, 2, 3, 4, 5, 6 T checkContains is φ
• Lattice generated after combining all levels is shown in Figure 3.5
3.6.4 Correctness Proof of the algorithm
In order to prove the correctness of the algorithm, we need to show three things:
• The algorithm adds all the parents for a new node
• The algorithm adds only the correct parents for a new node
• The algorithm adds only the immediate parents and not any of the ancestors
for a new node 38 Figure 3.5: Lattice after combining all levels from 1 through 5
39 Hypothesis: The algorithm does not add ancestors for a new node
To Prove: Given a new node n, its parent m1 there does not exist any node m2 which is a parent of m1 and n
Proof: Suppose there exists m2 such that m2 is a parent of m1 and n. Since m2 is
a parent of m1, m1 is visited before m2 in the level-wise merging process. Before S adding m2, elusiveConcepts = n - p where p is the parent of n. Since m1 is a T parent, elusiveConcepts =n - m1 excludes all elements in m2, m2 elusiveCon-
cepts = φ. Hence m2 will not be added as a parent of n.
Hypothesis: The algorithm adds all the parents for a new node
Proof: Given a new node n, p - parents of n, let node m be a valid parent but not added as a parent. T To add as parent, the condition m ( n and m checkContains 6= φ must be satisfied
• If m is a valid parent, then m ( n is always true
• If m T checkContains = φ, then m is a parent but also an ancestor and
hence not added. Otherwise, m is always added as a parent.
The algorithm terminates only when checkContains = φ or when level 1 (with con- cepts of length 1) is reached. When checkContains 6= φ at the end of a particular level ’l’,
∃ e ∈ n, e ni Sp and the search continues. Now, a node r at level l-1 or lower is a parent if
S ∃ e ∈ n, e ni p. e ∈ r and r ( n
40 This is repeated up to level 1 or until checkContains = φ. If we complete searching level 1, clearly there are no more concepts to search and all possible parents have been added. However, If checkContains = φ at some level p, then all elements have atleast one parent which implies that all remaining nodes (at level p-1 or lower) are either not valid parents or they are ancestors and hence should not be added as parents of n. This guarantees that all parents of n are added without exception.
Hypothesis: The algorithm adds only the correct parents for a new node
Proof: Given a new node ’n’ at level l of length nl, we want to find only correct parents of the node while ensuring that ancestors of a valid parent are avoided.
A node is added as parent iff it is a subset of node ’n’. Suppose we add an invalid parent. Then p should contain atleast one element ei that is not in n. However this contradicts our assumption that we add p only if p ( n. Hence we add only the correct parents.
3.7 Lattice Querying
This component uses an algorithm that can discover recommendable items effi- ciently from large lattices. The motivation for this algorithm is as follows:
Upward Closure: In a lattice, if an element e is not present in a concept C, then it will not be present in any parent of C. Thus, when searching for a minimum numbers of matches η, if a concept does not have η matches, none of its parents will have η matches and hence need not be considered as candidates for recom- mendation. Upward closure can be applied here only because we assume every
41 child is a proper subset of the parent.
3.7.1 Algorithm for Lattice Querying
Algorithm latticeQuerying
Input: A user query Q of items of the form (UserId, RatingId1, RatingId2) where
each rating is an item in the database and a minimum match threshold η and
a minimum to-recommend threshold δ
Output: A list of items that the user is likely to rate in the future.
1. Candidate Nodes ←Bottommost node of the lattice
2. Good Nodes = [ ]
3. while all nodes have not been visited
4. do
5. if node ⊂ Q AND cardinality of (node T Q) ≥ η and node is NOT
in Banned Nodes)
6. then Add all parents of node to Candidate Nodes that are not in
Banned Nodes
7. Add node, cardinality of (node bigcap Q) to Good Nodes
8. else Add all parents of node to Banned Nodes
9. endif
10. Remove node from Candidate Nodes
11. node ←Get next node in Candidate Nodes
12. endwhile
13. Sort Good Nodes in the descending order of cardinality of (node T Q).
14. if Good Nodes is EMPTY
15. then return NULL 42 16. else Return the top node n in Good Nodes which has atleast δ items to
recommend
17. endif
3.7.2 Description of the algorithm
The search starts from the bottom of the lattice (Line 2) and proceeds all the way
up to the top of the lattice. Each candidate node has to produce an overlap of at
least η items (Line 6) if it is to be considered a candidate for recommendation. If
a given node does not have η matches, then it can safely be ignored. Also, due
to the upward closure property, all parents of this candidate node can be ignored
(Line 9). This produces a large amount of saving as compared to ignoring just
the candidate node. Even in a moderately connected lattice, this can achieve a
lot of pruning. If a given node has η matches, then all its parents are possible
candidates and need to be added to the variable Candidate Nodes. If a node is no
longer a candidate, it is added to Banned Nodes (Line 8). Thus, when a node is
considered a candidate, we check to make sure that it is not banned (Line 6). If
a parent of a node is banned, then that parent is not added to Candidate Nodes
(Line 7). This is necessary since each node can be reached from multiple paths
and some of these paths may be useful while others may not. Even if a single path
leading to a node is unfavorable, then that node upwards (of the lattice) can be
avoided. Once the entire lattice is traversed, the candidate good nodes are sorted
in decreasing order of the degree of overlap with the input query and the top n nodes are returned.
43 Figure 3.6: Example of a typical lattice to search for making recommendations
3.7.3 Illustration of the algorithm
Consider the lattice L in Figure 3.6
Given the lattice L and query Q = 4 5 6. η = 2. δ = 3.
• Candidate = (1 2 3 4 5 6). Banned Nodes = () Good Nodes = ()
• Node = (1 2 3 4 5 6) is a subset of Q. Node T Q = 3. Candidate = (1 2 3
4 5), (2 3 4 5 6) , (1 3 4 5 6). Banned Nodes = () Good Nodes = {(1 2 3 4
5 6), 3}
• Node = (1 2 3 4 5) is NOT a subset of Q. Banned Nodes = (1 2 3), (2 3 5),
(2), (3) Candidate = (2 3 4 5 6), (1 3 4 5 6) Good Nodes = {(1 2 3 4 5 6),
3}
• Node = (2 3 4 5 6) is a subset of Q. Node T Q = 3. Banned Nodes = (1 2
3), (2 3 5), (2), (3). Good Nodes = {{(1 2 3 4 5 6), 3}, {(2 3 4 5 6), 3}} .
Candidate = (1 3 4 5 6). (2 3 5) is in Banned Nodes and hence is not added
44 to Candidate.
• Node = (1 3 4 5 6)is a subset of Q. Node T Q = 3. Banned Nodes = (1 2
3), (2 3 5), (2), (3). Candidate = (3 5 6). Good Nodes = {{(1 2 3 4 5 6),
3}, {(2 3 4 5 6), 3} , {(1 3 4 5 6),3}}.
• Node = (3 5 6) is NOT a subset of Q. Banned Nodes = (1 2 3), (2 3 5), (2),
(3), (5) Candidate = ( )
• candidate is empty. TERMINATE
• Choose the topmost node in Good Nodes. This element is the longest can-
didate since we start from the bottom of the lattice. In this case, the rec-
ommendation is (1 2 3 4 5 6).
45 Chapter 4
A Joke Recommender System
In this chapter, we look at a graphical user interface (GUI) implementation of a recommender system for Jokes. We test and apply our framework and algorithms on the Jester data set. A detailed description of this dataset can be found in the next chapter on results and also on the Jester Dataset Webpage [22]. Since most recommender systems are going to run off web pages, we have implemented the GUI as a Java applet. Applets can be embedded easily into HTML pages using the applet tag or viewed using an appletviewer. The same GUI could be converted into a Frame based Java desktop application within a short period of time retaining much of the code.
4.1 Components of the UI
The GUI consists of three JList widgets. The first JList contains the list of all jokes in the Joke database. It is supported by a scroll pane that supports scrolling the jokes. At any time, a maximum of 14 jokes are displayed in the list box. The second JList is the set of rated jokes for a new user. The final list JList displays
46 the recommended jokes. All three JList widgets support joke display i.e. clicking on an item in any of the list box widgets would display the corresponding joke in a panel below.
The GUI also contains 4 buttons from the Java Swing library. Two buttons serve two rate items. They help transfer items from the first list box named
Joke Database to the list box named Rated Jokes (using >> button) and back
(using << button). Two other buttons named Recommend and Clear All are also presented to the user.
The Recommend button initiates the recommender engine, fetches the recom- mendations and displays it in the Recommended Jokes list box. The Clear All button clears all the rated and recommended items and re-populates the Joke
Database list with all the items in the database.
Finally, the GUI contains a JLabel panel that displays the jokes as images in its Icon. Screen shots of the GUI can be found on the following pages.
47 Figure 4.1: Main Screen of the GUI Applet for Joke recommendation
48 Figure 4.2: GUI for Joke Recommendation: User has already rated the following jokes: Joke #2, #5 and #6 and is currently viewing joke #7.
49 Figure 4.3: GUI for Joke Recommendation: User is viewing recommendations.
50 Figure 4.4: The Clear All Button clears all ratings and recommendations and repopulates the Joke Database.
51 Chapter 5
Case Study - Jester and
MovieLens Datasets
In this chapter, we investigate the performance of the proposed framework and the algorithms on two real world dataset, namely the Jester dataset and the MovieLens dataset. The Jester dataset is a collaborative filtering data set [22] that has ratings for 100 jokes rated between -10 and +10 from 73,421 users collected between April
1999 and May 2003. This dataset was chosen because its’ highly dense (an average of 0.7) and is capable of producing highly connected lattices. In order to measure density, we convert the ratings into a binary matrix (of users and jokes) assigning
1 if a user has rated a joke and 0 otherwise. We then compute the fraction of 1s over all the cells in the matrix. A dense dataset implies that a significant number of jokes have been rated by users and this increases the number of commonly rated jokes across users. When users rate a number of jokes in common, concepts generated from ratings would be highly linked (through these common jokes).
This translates to a densely connected lattice of concepts as opposed to a lattice with very few parent-child relationships. The density makes the Jester dataset
52 unique since typical collaborative filtering datasets have a lot of users rating very few items.
The MovieLens dataset is another popular collaborative filtering dataset [23].
Two versions of this dataset exist; one contains 100,000 ratings by 943 users over
1682 movies, the other contains 1,000,000 ratings by 6040 users on 3900 movies.
Such sparse datasets are typical of ratings databases. Both versions have a density of around 5%. Here, the density is computed as the fraction of ratings over the possible number of ratings. For example, the smaller version has 100,000 ratings out of a possible 943 X 1682 ratings. We apply our algorithms on both these typical yet contrasting datasets to analyze their performance.
5.1 Processing the Jester Dataset
As mentioned above, the Jester data set contains continuous ratings for each of the 100 jokes. If a user has not rated a particular joke, he/she assigns a score of 99 implying that they did not rate it. The data set is available in 3 different excel files. We chose to test our algorithms on the data set that contains the most ratings. Hence we chose the data set with 23500 users. Since we are concerned with only the jokes rated by the user and not the accompanying scores, we neglect them in our experiments. After executing our pre-processing scripts on the data set, we end up with a database in the following format
UserId, JokeId1, JokeId2, ., JokeIdm
53 5.2 Processing the MovieLens Dataset
The MovieLens data set contains discrete ratings between 1 and 5 for each of the 1682 movies. The ratings are explicit. Hence, the lack of an entry implies that a user has not rated that movie. Since we are concerned with only the movies rated by the user and not the accompanying scores, we neglect them in our experiments. After executing our pre-processing scripts on the dataset, we end up with a database in the following format
UserId, MovieId1, MovieId2, ., MovieIdm
5.3 Concept Generation - Identifying repeatedly
rated jokes/movies
If a new user N rates a few items and these items are also found in the history of other users U, then other items rated by users U are likely candidates to be rated by N. For example, if a new user rated items 1 through 5 and a large number of users who have rated 1 through 5 also rated item #6, then it is likely that the new user will also rate item #6. This is based on the assumption that there exist some underlying patterns in the manner users rate items. This is the basis of many item-based collaborative filtering approaches and the underlying assumption of our work. In the case of joke recommendation, it is quite natural to expect users having similar tastes in jokes to rate similar jokes. Thus, given a new user N who rated jokes JN = {j1, ..., jn} if we manage to find a set of jokes JU that contain most of the jokes in JN along with other jokes JO, we could recommend the jokes JO to user N. On applying the concept generation algorithm on this data set we would obtain a list of concepts each consisting of a set of joke ids. In 54 order to build our model, we used 10,000 user ratings from the database. For the movielens dataset, the model was built using 80% of the dataset while the rest
20% was used for testing. All the experiments have been performed on a 2GB
RAM Dell Desktop running P4 3.2 SUSE Linux operating system.
5.3.1 Exploding space requirements in subspace clustering
of SCuBA
As explained earlier, SCuBA uses a basic pair-wise comparison based algorithm to identify repeatedly rated items. Although the algorithm identifies many patterns it suffers from the problem of having to store an explosive number of candidate patterns prior to pruning. In our optimized algorithm, we propose a means to prune the table at specific stages and we have observed a significant reduction in space requirement. The following graph in Figure 5.1 shows the actual number of patterns generated by the basic algorithm for the Jester dataset. It also shows the number of patterns generated by our optimized concept generation algorithm.
Figure 5.2 shows the concepts generated using the naive algorithm proposed in
SCuBA and 5.3 shows the concepts generated using our optimized concept gener- ation algorithm for the MovieLens dataset. In order to study the performance of the algorithm on twin dimensions of the data set (user size and number of jokes), we have experimented with lattices built from 1000, 5000 and 10000 users. We have also observed the performance over 20, 40, 60, 80 and 100 jokes available to the user. Similarly, in the MovieLens dataset, we have experimented with 200,
400 and 800 users and 100, 200, 400 and 800 movies.
55 5.3.2 Results - Basic pair-wise comparisons Vs Optimized
algorithm
We can clearly observe that the number of concepts generated using the basic algorithm rapidly reaches enormous proportions w.r.t the number of items rated.
In the Jester dataset, for example, for 5000 users and 40 rated jokes an approxi- mate 9,000 patterns are generated while doubling the number of items increases the candidate patterns to over 125,000. In contrast, for 5000 users and 40 rated jokes our approach produces just over a 1,000 concepts while doubling the num- ber of jokes produces 18,000 concepts. In the Movielens dataset, for example, the naive approach produces a maximum of nearly 252,000 concepts for 800 users and an equal number of movies while our algorithm produces only one-third (81,000 concepts) for that user and movie configuration. Clearly, the number of concepts generated by our algorithm is much smaller than in the basic approach.
Another perspective of the pruning results is to observe the average pruning percentages. As is evident from the graph, we can observe an average 80% pruning in the number of candidate patterns generated by the naive approach. This is uniform over tests on a wide number of users (1000, 2000, 5000 and 10000) and on a wide number of items rated (20, 40, 60, 80, 100). It is obvious that the pruning does great space saving at the cost of the initial sorting of the user data base.
Since standard sorting algorithms employ an O (n*logn) algorithm, it appears as if the overhead is worth it! The graph in Figure 5.4 shows the percentage pruning obtained by our optimization step for the Jester dataset. The corresponding figure for the MovieLens dataset is in Figure 5.5.
56 Figure 5.1: Naive Vs Optimized Concept Generation - Jester
Figure 5.2: Naive Concept Generation - MovieLens
57 Figure 5.3: Optimized Concept Generation - MovieLens
Figure 5.4: Percentage Pruning: Naive Vs Optimized Concept Generation - Jester dataset
58 Figure 5.5: Percentage Pruning: Naive Vs Optimized Concept Generation -
MovieLens dataset
5.4 Timing the Model Building Algorithms
Concept generation is a memory and time intensive approach to identifying sub- spaces in the data set. However, as mentioned in previous sections, the per- formance of this algorithm is comparable to a number of the existing subspace clustering algorithms. For a large data set of 10,000 X 100 (Jester), concepts were generated in roughly 2 hours. The largest size for the MovieLens dataset was the complete dataset itself (100,000 ratings) and concepts were generated in a few milliseconds over one minute. As may be observed, the time required to build the lattices is of the order of a few minutes. For a large data set of 10,000 X 100, the lattice was built in roughly 3 minutes. For the movielens dataset with 100,000 ratings this was done in under 70 seconds. It is important to keep in mind that the Jester data set is an extremely dense data set. Hence it has many parent-child
59 Figure 5.6: Jester dataset: Time taken to generate concepts
Figure 5.7: MovieLens dataset: Time taken to generate concepts
60 Figure 5.8: Jester dataset: Time taken to generate lattices from concepts
Figure 5.9: MovieLens dataset: Time taken for concept generation, partition and lattice building
61 relationships in the lattice. This means that each node in the lattice has many links to other nodes. This tends to increase the lattice building time. It would not be surprising to observe shorter lattice building time for sparser data sets. Please note that time requirements for Concept Partition have not been reported because they have remained consistently under 1 second for all the variations in the Jester data namely varying the number of items rated and the number of users. The
MovieLens dataset model building time is shown in Figure 5.7. This is comparable with the time values shown in Figure 14 of the paper by Nitin and others [16].We observe that the total model building time for the MovieLens dataset (shown in
Figure 5.9) is largely linear similar to the model building times observed by Nitin and others.
5.5 Space Requirement for the Model Building
algorithms
Please note that space requirements for Concept Partition have not been reported because they have remained consistently under 1 megabyte for all the variations in the data namely varying the number of items rated and the number of users.
Space requirements for the Movielens dataset in the lattice generation phase has not be provided because of very low memory footprints.
62 Figure 5.10: Jester dataset: Space required to generate concepts
Figure 5.11: Movielens dataset: Space required to generate concepts
63 Figure 5.12: Jester dataset: Space required to generate lattices from concepts
5.6 Real-time performance of Algorithm: Lat-
tice Discover
In order to study the real time performance of the algorithm, we built a lattice consisting of nearly 10500 nodes using 10000 user ratings from the aforementioned data set. After the model was built, we randomly chose 1000 user ratings from the remaining 13500 users and created 4 data sets each containing 250 users.
These are labeled as Jester Set1 through Jester Set4. In order to measure query performance in terms of speed of processing and accuracy of results, we calculated the time required, precision and recall based on the following experiment. Similar experiments have been performed on the MovieLens dataset by Nitin Agarwal [16] and others.
From each row of the user ratings, we extract a portion of the ratings and label them as query terms. These could be the initial ratings made by a new user 64 of an online rating system. The goal of the recommender system is to predict possible jokes of interest to the new user. These are referred to as the target terms. Recommendations made by the system are referred to as recommended terms. For instance,given a list of ratings (Joke1, Joke2, Joke3,...,Joke50), we may use the first 15 (Joke1,...,Joke15) as query terms and (Joke16,...,Joke50) as target terms. Suppose the recommender offers recommender terms = (Joke19 ,
3 Joke31, Joke35, Joke56) as recommendations, then precision = 4 ∗ 100= 75% and 3 recall= 35 ∗ 100 ∼ 10%. The following definitions have been applied to calculate the various metrics.
Time: Amount of time in milliseconds required to retrieve a recommendation for a given set of query terms.
RecommenderT erms T T argetT erms Precision: RecommenderT erms ∗ 100
RecommenderT erms T T argetT erms Recall: T argetT erms ∗ 100
The size of the query terms ranges between 5 and 15% of the maximum number of items in the data set. This range was chosen because users are likely to rate very few items before requesting recommendations. As can be noted in Figure 5.13, the average precision for the Jester data set is about 97% while the average recall in Figure 5.14 is about 65%. Although recall can be increased by offering more recommendations, this is at the cost of precision. It is essential for users to get the right recommendations rather than get many recommendations. Hence, we decided to tradeoff precision over recall. Finally, in Figure 5.15 we can observe that
65 Figure 5.13: Jester dataset: Precision Measurement
Figure 5.14: Jester dataset: Recall Measurement
66 Figure 5.15: Jester dataset: Query processing time the average query processing time is approximately 2 seconds which is acceptable time to wait for a recommendation.
5.7 An Improved Performance
Nitin et. al., use the MovieLens dataset to compute precision of their approach.
The precision for the 100,000 ratings database using our approach is shown in
Figure 5.16. We compare this Figure 11 in the paper on SCuBA by Nitin and others [16]. They show constantly degrading precision values as the percentage of terms increase from 5 to 50%. The precision at 5% is roughly 30% while the precision at 50% is nearly 10%! On the other hand, our approach shows a steady performance independent of the number of items considered. Although the performance is sightly poorer at lower percentages, our precision values remain consistent and this is very essential for a good overall system. When the number of
67 Figure 5.16: MovieLens dataset: Precision Measurement query terms increases, the users expect the system to have learnt their preferences well and would not tolerate degrading performance.
68 Chapter 6
Conclusion and Future Direction
In this chapter, we examine the strengths and the drawbacks of our approach in depth. We also suggest possible directions of future research using the approach.
This work is merely a starting point for interesting research in this powerful ap- proach to better recommendation systems and we sincerely believe that innovative future augmentations can greatly enhance the performance of the current model.
6.1 What’s good in a concept based approach?
We believe that the concept based approach is generic and can be adopted by data of any nature. Concepts are an abstraction and entities in any data can be molded to represent concepts. Characteristics of entities become attributes of the corresponding concepts in this framework. The basic working of a recom- mendation system is one of clustering i.e. segmenting concepts into groups such that items in a group are similar to each other and different from others. By this token, choice of how to involve the attributes in grouping/segmenting individual entities of the data is up to the designer. In our approach, each item in the ratings
69 database is an attribute while a concept is a set of such items. Another strength of this approach is that the model-building time and real-time performance are acceptable for any modern application.
6.2 Future Direction and Caveats
Our basic model can be easily augmented by adding user statistics to each node in the lattice to guide us in the search for information. A basic augmentation is to include the number of users who lend support to a repeated pattern from the ratings database. In our approach, we retain this count till the end of Concept
Generation wherein we prune patterns based on support. We do not utilize user statistics in the search for information. Nodes may also include summarized infor- mation about a perspective of the user. For example, consider a study of gender based analysis of the data. The perspective in this case is gender. Here, we may include a count of the number of men and women along with the ratings in each node.
Suppose we wish to study the scoring pattern (i.e. patterns in the scores assigned by users to the items in the ratings database) of the users. We could store the mean and standard deviation of the scores assigned by the users to each item in a concept node. This has some interesting applications. Maintaining such statistics could help us discern patterns in the focus of a group of item users. A sample conclusion is that people who rate items 1 through 5 are divided in their choice of item 6 but are unequivocal in their dislike for item 7. This may be observed in the values of the mean and standard deviation of those items as we traverse the lattice.
Another possible direction is to look at relating a set of lattices given a set
70 of items. This is the problem of lattice intersection over query items. A simple solution is to just union the results of all the resulting nodes in the lattices.
However this may not be entirely accurate as the query terms may have found a more accurate match in one lattice over the other. Hence it is essential to bias the final recommendations based on the level match in each lattice.
Enhancements in the lattice search can be employed to improve real-time per- formance of the system. Encoded domain knowledge can be used to guide the search for faster responses. If there is a natural partitioning in the items of the ratings database, then they would translate to multiple concept lists after the concept partitioning process, and these can be used to produce smaller lattices that can be searched faster and may be, in parallel.
A major bottleneck that we foresee in excess augmentation of concepts is the problem of time-consumptive computation. Designers should bear in mind that each additional attribute requires additional storage and processing time.
Computing and storing complicated statistical information may deteriorate the real-time performance. The choice of statistics guide the information discovery in the knowledge structures and complicating them may limit the candidate recom- mendations severely.
It is quite evident that the opportunities for future research are immense in the knowledge representation and discovery aspects of formal concept-based rec- ommendation systems.
71 Bibliography
[1] http://en.wikipedia.org/wiki/Recommender_system - Definition of Rec- ommender Systems on Wikipedia
[2] http://www.pandora.com/ - Pandora Music Recommender System created by Music Genome Project
[3] http://movielens.umn.edu/ - MovieLens from the GroupLens Research group at the University of Minnesota
[4] http://www.tnrdlib.bc.ca/rr.html - Reader’s Robot - Book Recom- mender System
[5] Priss, Uta. ”Lattice-based Information Retrieval,” in Knowledge Organiza-
tion, Vol. 27, No.3, 2000, pp. 132-142
[6] Yardi, Aparna Arvind. ”Concept Based Information Organization and Re-
trieval.”, Masters Thesis, University of Cincinnati, 2006.
[7] Malone, Thomas W., Grant, Kenneth R., Turbak, Franklyn A., Brobst,
Stephen A. and Cohen, Michael D. Intelligent Information Sharing Systems.
Communications of the ACM, 30, 5 (1987), pp. 390-402.
72 [8] Foltz, Peter W., and Dumais, Susan T. Personalized Information Delivery:
An Analysis of Information Filtering Methods. Communications of the ACM,
35, 12 (1992), pp. 51-60.
[9] Salton, Gerard and Buckley, Christopher. Term-Weighting Approaches in
Automatic Text Retrieval. Information Processing and Management, 24, 5
(1988), pp. 513-523
[10] Goldberg, David and Nichols, David and Oki, Brian M. and Terry, Douglas.
Using Collaborative Filtering to Weave an Information Tapestry. Communi-
cations of the ACM, 35, 12 (1992), pp. 61-70
[11] Resnick, Paul and Iacovou, Neophytos and Suchak, Mitesh and Bergstorm,
Peter and Riedl, John. GroupLens: An Open Architecture for Collaborative
Filtering of Netnews. Proceedings of ACM Conference on Computer Sup-
ported Cooperative Work, 1994, pp. 175-186.
[12] Breese, John S. and Heckerman, David and Kadie, Carl. Empirical Analysis
of Predictive Algorithms for Collaborative Filtering. Proceedings of the Four-
teenth Annual Conference on Uncertainty in Artificial Intelligence, 1998, pp.
43-52.
[13] Pennock, David and Horvitz, Eric and Lawrence, Steve and Giles, C Lee.
Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and
Model-based Approach. Proceedings of the Sixteenth Conference on Uncer-
tainty in Artificial Intelligence, 2000, pp. 473-480.
[14] Sarwar, Badrul and Karypis, George and Konstan, Joseph and Riedl, John.
Application of dimensionality reduction in recommender systems - a case
73 study. ACM Web Knowledge Discovery in Databases(WebKDD) Workshop.
2000.
[15] Miller, Bradley N., Albert, Istvan and Lam, Shyong K., and Konstan, Joseph
A., and Riedl, John. MovieLens: Unplugged: Experiences with an occasion-
ally connected recommender system. of ACM Conference on Intelligent User
Interfaces (Accepted Poster), 2003.
[16] Agarwal, Nitin and Haque, Ehtesham U., and Liu, Huan and Parsons, Lance.
”A Subspace Clustering Framework for Research Group Collaboration”. In-
ternational Journal on Information Technology and Web Engineering, 1(1),
pp 35 - 58, 2006.
[17] Shardanand, Upendra and Maes, Pattie. ”Social Information Filtering: Algo-
rithms for Automating Word of Mouth’,” Proceedings of CHI’95 Conference
on Human Factors in Computing Systems, ACM Press, Vol:1, pp 210-217,
1995.
[18] Konstan, Joseph A., and Miller, Bradley N., and Maltz, David and Herlocker,
Jonathan L., and Gordon, Lee R., and Riedl, John. ‘GroupLens: Applying
Collaborative Filtering to Usenet News’ , Communications of the ACM , Vol.
40, No. 3, pp 77-87, 1997.
[19] Herlocker, Jonathan L., and Konstan, Joseph A., Terveen, Loren G., and
Riedl, John. ‘Evaluating collaborative filtering recommender systems’, ACM
Transactions on Information Systems (TOIS) , Vol. 22, Issue. 1, pp 5-53,
2004.
[20] Aggarwal, Charu and Wolf, Joel L., , Wu, Kun-Lung., and Yu, Philip S. 1999.
Horting Hatches an Egg: A New Graph-Theoretic Approach to Collabora- 74 tive Filtering. In Proceedings of ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining, San Diego, CA, pp 201-212,1999.
[21] Maltz, David, and Ehrlich, Kate. Pointing the way: Active collaborative
filtering. In Proceedings of the ACM Conference on Human Factors in Com-
puting Systems, CHI95, ACM Press, pp. 202209, 1995.
[22] http://ieor.berkeley.edu/~goldberg/jester-data/ - Jester Online Joke Recommender Dataset Webpage
[23] http://www.grouplens.org/node/12#attachments - MovieLens dataset from GroupLens Research Group at the University of Minnesota
[24] Bernhard Ganter and Rudolf Wille. Translator-C. Franzke. Formal Concept
Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., 1997.
[25] Getoor, Lise and Sahami, Mehran. Using probabilistic relational models for
collaborative filtering. Working Notes of the KDD Workshop on Web Usage
Analysis and User Profiling, 1999.
[26] Goldberg, Ken and Roeder, Theresa and Gupta, Dhruv and Perkins, Chris.
Eigentaste: A constant time collaborative filtering algorithm. Information
Retrieval, 4(2):133–151, 2001.
[27] Resnick, Paul and Iacovou, Neophytos and Suchak, Mitesh and Bergstorm,
Peter and Riedl, John. GroupLens: An Open Architecture for Collaborative
Filtering of Netnews. Proceedings of ACM Conference on Computer Sup-
ported Cooperative Work, 1994, p. 175-186.
75 [28] Sarwar, Badrul M., Karypis, George and Konstan, Joseph A., and Reidl,
John. Item-based collaborative filtering recommendation algorithms. World
Wide Web, p. 285–295, 2001.
[29] Ungar, Lyle and Foster, Dean. Clustering methods for collaborative filtering.
Proceedings of the Workshop on Recommendation Systems, 1998.
[30] Priss, Uta. ”Knowledge Discovery in Databases Using Formal Concept Anal-
ysis.” in Bulletin of ASIS 27, 1, 2000, p. 18-20.
[31] Priss, Uta. ”Faceted Information Representation.” in Stumme, Gerd (ed.),
Working with Conceptual Structures. Proceedings of the 8th International
Conference on Conceptual Structures, Shaker Verlag, Aachen, 2000, p. 84-94
[32] Priss, Uta. ”A Graphical Interface for Document Retrieval Based on Formal
Concept Analysis.” in Santos, Eugene (ed.), Proceedings of the 8th Midwest
Artificial Intelligence and Cognitive Science Conference. AAAI Technical Re-
port CF-97-01, 1997, p. 66-70.
[33] Agrawal, Rakesh and Srikant, Ramakrishnan. Fast algorithms for mining
association rules. Proc. 20th Int. Conf. Very Large Data Bases, VLDB, p.
487-499, 1994.
[34] Burdick, Douglas and Calimlim, Manuel and Gehrke, Johannes. Mafia: A
maximal frequent itemset algorithm for transactional databases. p. 443-452,
2001.
76