Collaborative Filtering for Binary, Positive-Only Data

Collaborative Filtering for Binary, Positive-only Data Koen Verstrepen∗ Kanishka Bhaduriy Froomle Apple, Inc. Antwerp, Belgium Cupertino, USA [email protected] [email protected] Boris Cule Bart Goethals University of Antwerp Froomle, University of Antwerp Antwerp, Belgium Antwerp, Belgium [email protected] [email protected] ABSTRACT other. Studying recommender systems specifically, and the connection between individuals and relevant items in gen- Traditional collaborative filtering assumes the availability of eral, is the subject of recommendation research. But the explicit ratings of users for items. However, in many cases relevance of recommendation research goes beyond connect- these ratings are not available and only binary, positive-only ing users with items. Recommender systems can, for exam- data is available. Binary, positive-only data is typically as- ple, also connect genes with diseases, biological targets with sociated with implicit feedback such as items bought, videos drug compounds, words with documents, tags with photos, watched, ads clicked on, etc. However, it can also be the re- etc. sults of explicit feedback such as likes on social networking sites. Because binary, positive-only data contains no neg- 1.1 Collaborative Filtering ative information, it needs to be treated differently than Collaborative filtering is a principal problem in recommen- rating data. As a result of the growing relevance of this dation research. In the most abstract sense, collaborative problem setting, the number of publications in this field in- filtering is the problem of weighting missing edges in a bi- creases rapidly. In this survey, we provide an overview of the partite graph. existing work from an innovative perspective that allows us to emphasize surprising commonalities and key differences. The concrete version of this problem that got most attention until recently is rating prediction. In rating prediction, one set of nodes in the bipartite graph represent users, the 1. INTRODUCTION other set of nodes represent items, an edge with weight r Increasingly, people are overwhelmed by an abundance of between user u and item i expresses that u has given i the choice. Via the World Wide Web, everybody has access to rating r, and the task is to predict the missing ratings. Since a wide variety of news, opinions, (encyclopedic) informa- rating prediction is a mature domain, multiple overviews ex- tion, social contacts, books, music, videos, pictures, prod- ist [23; 47; 1; 59]. Recently, the attention for rating predic- ucts, jobs, houses, and many other items, from all over the tion diminished because of multiple reasons. First, collect- world. However, from the perspective of a particular person, ing rating data is relatively expensive in the sense that it the vast majority of items is irrelevant; and the few relevant requires a non-negligible effort from the users. Second, user items are difficult to find because they are buried under a ratings do not correlate as well with user behavior as one large pile or irrelevant ones. There exist, for example, lots would expect. Users tend to give high ratings to items they of books that one would enjoy reading, if only one could think they should consume, for example a famous book by identify them. Moreover, not only do people fail to find rel- Dostoyevsky. However, they would rather read Superman evant existing items, niche items fail to be created because comic books, which they rate much lower. Finally, in many it is anticipated that the target audience will not be able to applications, predicting ratings is not the final goal, and the find them under the pile of irrelevant items. Certain books, predicted ratings are only used to find the most relevant for example, are never written because writers anticipate items for every user. Consequently, high ratings need to they will not be able to reach a sufficiently large portion of be accurate whereas the exact value of low ratings is irrel- their target audience, although the audience exists. Recom- evant. However, in rating prediction high and low ratings mender systems contribute to overcome these difficulties by are equally important. connecting individuals with items relevant to them. A good Today, attention is increasingly shifting towards collabora- book recommender system, for example, would typically rec- tive filtering with binary, positive-only data. In this version, ommend 3, previously unknown, books that the user would edges are unweighted, an edge between user u and item i ex- enjoy reading, and that are sufficiently different from each presses that user u has given positive feedback about item i, and the task is to attach to every missing edge between ∗This work was done while Koen Verstrepen was working at a user u and an item i a score that indicates the suitabil- the University of Antwerp. ity of recommending i to u. Binary, positive-only data is yThis work was done while Kanishka Bhaduri was working typically associated with implicit feedback such as items at Netflix, Inc. bought, videos watched, songs listened to, books lent from a library, ads clicked on, etc. However, it can also be the is desirable for collaborative filtering, but typically not for result of explicit feedback, such as likes on social networking association rule mining. sites. As a result of the growing relevance of this problem setting, the number of publications in this field increases 1.3 Outline rapidly. In this survey, we provide an overview of the exist- After the Preliminaries (Sec. 2), we introduce our framework ing work on collaborative filtering with binary, positive-only (Sec. 3) and review the state of the art along the three di- data from an innovative perspective that allows us to em- mensions of our framework: Factorization Models (Sec. 4), phasize surprising commonalities and key differences. To Deviation Functions (Sec. 5), and Minimization Algorithms enhance the readability, we sometimes omit the specifica- (Sec. 6). Finally, we discuss the usability of methods for tion `binary, positive-only' and use the abbreviated term rating prediction (Sec. 7) and conclude (Sec. 8). `collaborative filtering’. Besides the bipartite graph, five types of extra information 2. PRELIMINARIES can be available. First, there can be item content or item We introduced collaborative filtering as the problem of weight- metadata. In the case of books, for example, the content is ing missing edges in a bipartite graph. Typically, however, the full text of the book and the metadata can include the this bipartite graph is represented by its adjacency matrix, writer, the publisher, the year it was published etc. Meth- which is called the preference matrix. ods that exclusively use this kind of information are typi- Let be a set of users and a set of items. We are given cally classified as content based. Methods that combine this a preferenceU matrix with trainingI data R 0; 1 jU|×|Ij. kind of information with a collaborative filtering method 2 f g Rui = 1 indicates that there is a known preference of user are typically classified as hybrid. Second, there can be user u for item i . Rui = 0 indicates that there is no metadata such as gender, age, location, etc. Third, users such2 U information.2 Notice I that the absence of information can be connected with each other in an extra, unipartite means that either there exists no preference or there exists graph. A typical example is a social network between the a preference but it is not known. users. An analogous graph can exist for the items. Finally, there can be contextual information such as location, date, Collaborative filtering methods compute for every user-item time, intent, company, device, etc. Exploiting information pair (u; i) a recommendation score s(u; i) that indicates the besides the bipartite graph, is out of the scope of this sur- suitability of recommending i to u. Typically, the user-item- pairs are (partially) sorted by their recommendation scores. vey. Comprehensive discussions on exploiting information jU|×|Ij outside the user-item matrix have been presented [53; 72]. We define the matrix S R as Sui = s(u; i). Further- more, c(x) gives the count2 of x, meaning 1.2 Relation to Other Domains (P i2I Rxi if x To emphasize the unique aspects of collaborative filtering, c(x) = P 2 U Rux if x : we highlight the commonalities and differences with two re- u2U 2 I lated data science problems: classification and association Although we conveniently call the elements of users and rule mining. the elements of items, these sets can containU any type of First, collaborative filtering is equivalent to jointly solving object. In the caseI of online social networks, for example, many one-class classification problems, in which every one- both sets contain the people that participate in the social class classification problem corresponds to one of the items. network, i.e., = , and Rui = 1 if there exists a friendship In the classification problem that corresponds to item i, i link between personsU I u and i. In image tagging/annotation serves as the class, all other items serve as the features, the problems, contains images, contains words, and Rui = 1 users that have i as a known preference serve as labeled if image uUwas tagged with wordI i. In chemogenomics, an examples and the other users serve as unlabeled examples. early stage in the drug discovery process, contains active U Amazon.com, for example, has more than 200 million items drug compounds, contains biological targets, and Rui = 1 in its catalog, hence solving the collaborative filtering prob- if there is a strongI interaction between compound u and lem for Amazon.com is equivalent to jointly solving more biological target i. than 200 million one-class classification problems, which ob- Typically, datasets for collaborative filtering are extremely viously requires a distinct approach.

Load more