Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems

Daniel Valcarce, Javier Parapar, Álvaro Barreiro Information Retrieval Lab Department of Computer Science University of A Coruña, Spain {daniel.valcarce, javierparapar, barreiro}@udc.es

ABSTRACT the users’ past behaviour. Continuous developments in this The use of Relevance-Based Language Models for top-N rec- field have been made to meet these high expectations. ommendation has become a promising line of research. Pre- We can distinguish multiple approaches to recommenda- vious works have used collection-based smoothing methods tion [24]. They are often classified in three main categories: for this task. However, a recent analysis on RM1 (an estima- content-based, collaborative filtering and hybrid techniques. tion of Relevance-Based Language Models) in document re- Content-based approaches generate recommendations based trieval showed that this type of smoothing methods demote on the item and user descriptions: they suggest items simi- the IDF effect in pseudo-relevance feedback. In this paper, lar to those liked by the target user [9]. In contrast, collab- we claim that the IDF effect from retrieval is closely related orative filtering methods rely on the interactions (typically to the concept of novelty in recommendation. We perform ratings) between users and items in the system [21]. Finally, an axiomatic analysis of the IDF effect on RM2 conclud- there exist hybrid algorithms that combine both collabora- ing that this kind of smoothing methods also demotes the tive filtering and content-based approaches. IDF effect in recommendation. By axiomatic analysis, we Traditionally, Information Retrieval (IR) has focused on find that a collection-agnostic method, Additive smoothing, delivering the information that users demand [1]. On the does not demote this property. Our experiments confirm other hand, Information Filtering (IF) explores ways of se- that this alternative improves the accuracy, novelty and di- lecting relevant pieces of information from a stream of data versity figures of the recommendations. [12]. Recommender systems are active information filters: these methods learn from the users’ behaviour providing per- sonalised suggestions. Given the similarities between IR and CCS Concepts IF, some authors have considered them to be sibling fields •Human-centered computing → Collaborative filter- or two sides of the same coin [2]. The main difference be- ing; •Information systems → Recommender systems; Lan- tween these fields is the way in which the information need guage models; is obtained. Information Retrieval systems usually receive a query prompted by the user meanwhile Information Filter- ing systems infer the users’ needs. Keywords Due to the closeness of IR and IF, exploiting Information Recommender systems, Relevance-Based Language Models, Retrieval methods for recommenders systems has become collaborative filtering a fertile area of research [31, 4, 23, 27, 30]. In particu- lar, in this paper we want to further investigate the use of Relevance-Based Language Models for recommendation. 1. INTRODUCTION Lavrenko and Croft [17] devised the Relevance-Based Lan- Recommender systems are becoming increasingly popu- guage Modelling framework for the pseudo-relevance feed- lar these days. The enormous growth of data available to back task proposing two methods: RM1 and RM2. Later, users has changed the way we access information. Users are Parapar et al. adapted this technique to the collaborative fil- becoming more and more demanding: they are eager to re- tering scenario achieving high figures of precision [23]. For ceive personalised contents instead of explicitly stating their pseudo-relevance feedback, RM1 is the preferred method; information needs. Therefore, the applicability of recom- however, RM2 yields better results than RM1 in top-N rec- mender systems is undeniable. This technology is designed ommendation. for providing relevant items of information by learning from To be effective, language models employ smoothed prob- ability estimates. The selection of the smoothing method Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed is crucial for the performance of language models both in for profit or commercial advantage and that copies bear this notice and the full citation Information Retrieval [33] and in recommendation [29]. A on the first page. Copyrights for components of this work owned by others than the recent study presented an axiomatic analysis of smoothing author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or methods for different pseudo-relevance feedback techniques republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. (including RM1) [13]. The authors concluded that the tra- CERI ’16, June 14 - 16, 2016, Granada, Spain ditional collection-based methods that have been used for c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. smoothing language models in the ad hoc retrieval task [33] ISBN 978-1-4503-4141-7/16/06. . . $15.00 are not suitable for performing pseudo-relevance feedback DOI: http://dx.doi.org/10.1145/2934732.2934737 because they demote the IDF (inverse document ) Recommender systems work with a set of users U and a effect. Instead, they proposed the use of RM1 with Ad- set of items I. Collaborative filtering approaches employ ditive smoothing, a collection-agnostic smoothing method. the interactions between users and items to generate recom- This choice is supported by their axiomatic analysis and mendations. This work is devoted to explicit feedback col- outperformed traditional alternatives in their experiments. laborative filtering techniques which are based on ratings. The IDF is a measure of term specificity [26, 25]. We claim A rating from a user u ∈ U to an item i ∈ I is denoted by that this concept is related to novelty in recommender sys- r(u, i). Additionally, Iu refers to the set of items rated by tems [7]. In this paper, we study the connection between the user u. Finally, the recommendation list for the user u k these two concepts and its implications in recommendation. of length k is represented by Lu. Furthermore, we perform an axiomatic analysis of the IDF effect in RM2 in the context of recommendation. We study 2.2 Relevance-Based Language Models three collection-based smoothing methods (Jelinek-Mercer, Relevance-based Language Models (often abbreviated as Dirichlet Priors and Absolute Discounting) and a collection- Relevance Models or simply RM) are a state-of-the-art tech- agnostic method (Additive smoothing). Our goal is to deter- nique for the pseudo-relevance feedback task in text retrieval mine if the application of Additive smoothing to Relevance- [17]. Nonetheless, this framework has also been adapted to Based Language Models is valuable not only for the pseudo- top-N recommendation with great success [23]. relevance feedback task, but also for top-N recommenda- Pseudo-relevance feedback (PRF) is a way of expanding tion. Our axiomatic analysis proves that the aforemen- queries with new terms in a text retrieval system. Since tioned collection-based smoothing methods demote the IDF a standard retrieval system returns a list of documents ac- effect on RM2 in recommendation as it does on RM1 for cording to the query prompted by the user, the performance pseudo-relevance feedback. Moreover, we find that Additive of this process depends fundamentally on the retrieval algo- smoothing neither promotes nor demotes the IDF effect on rithm and the quality of the user’s query [1]. For this reason, RM2. Thus, if the IDF heuristic is valuable for recommenda- expanding this query with relevant terms is a way of im- tion, Additive smoothing should work better than the other proving the outcome of a retrieval system. Pseudo-relevance methods. feedback is an automatic and effective query expansion ap- To verify this last assumption—the utility of the IDF ef- proach. In PRF, we assume that the top documents re- fect in recommendation—we test experimentally the quality trieved with the original query are relevant (they form the of the recommendations generated by RM2 with different pseudo-relevant set). PRF methods expand the initial query smoothing methods. We use accuracy, diversity and novelty with the most relevant terms from the pseudo-relevant set. metrics to cover different aspects of top-N recommendation. Then, the expanded query is used for performing a second Our experiments show that Additive smoothing provides retrieval which provides the results to be presented to the better figures of accuracy, diversity and novelty and, at the user. same time, it is more stable with respect to the smoothing The PRF approach has been adapted to collaborative fil- parameter than the collection-based smoothing methods. tering recommendation [23] in the following way. Instead In summary, the contributions of this paper are (1) an of a query, we have a user whose profile (i.e., the set of investigation of the relationship between the IDF effect in items that the user has rated) has to be expanded with new retrieval and novelty in recommendation, (2) an axiomatic relevant items. For doing so, we use a set of neighbours analysis of the IDF effect of RM2 using the three most or similar users. The intuitive idea is that the candidates popular collection-based smoothing methods and Additive for recommendation are those items that are relevant in the smoothing and (3) an empirical comparison of the four stud- neighbourhood of the user. In this way, users play a dual ied smoothing methods in terms of accuracy, novelty and role: they act as queries when they are the target user of the diversity of the recommendations concluding that Additive recommendation process but they also act as documents of smoothing is the method that provides best figures for these the pseudo-relevant set when they are considered as neigh- metrics. bours. On the other hand, items only play the role of terms in retrieval. 2. BACKGROUND There exists two estimations of Relevance Models—RM1 and RM2—which differ in the assumptions they In this section, first, we describe the top-N recommenda- make [17]. In this work, we employed the latter estimation tion task and the notation. Second, we contextualise the because it outperformed the former one in the collaborative Relevance-Based Language Modelling approach in its origi- filtering scenario [23]. In RM2, the probability of an item i nal task (i.e., pseudo-relevance feedback) and, then, we de- under the Relevance Model of the user u is given by: scribe its adaptation to top-N recommendation. Finally, we present different smoothing strategies for language models Y X p(i|v) p(v) p(i|R ) ∝ p(i) p(j|v) (1) in the context of recommendation. u p(i) j∈Iu v∈Vu 2.1 Top-N Recommendation Recommendations are presented to the user ordered ac- Top-N recommendation refers to the task of finding the cording to decreasing relevance, that is, decreasing values of top-N most relevant items for a user [8]. This approach con- p(i|Ru). We consider the prior p(i) and p(v) to trasts with the rating prediction task where a recommender be uniform for the sake of simplicity. system is supposed to predict the values of the ratings that In addition, Vu is the set of neighbours of user u. These the users will give to the items. Several authors have ar- neighbourhoods are computed using a clustering technique. gued that the top-N recommendation task is more realistic Following a common practice in the literature, we decided to than rating prediction [14, 8, 11, 24]. Therefore, this work use k-NN algorithm [8, 21]. This method consists in finding is devoted to the top-N recommendation problem. the k most similar users to the target user using a pairwise metric. In this work, we used cosine similarity for this pur- performed to avoid zeroes in non-rated items [33]. However, pose: this is done in a way that popular items in the collection re- ceive more probability than those less common. Intuitively, P r(u, i) r(v, i) i∈Iu∩Iv this type of smoothing may reduce novelty. In fact, previ- cosine(u, v) = q q (2) P 2 P 2 ous studies found that RM2, compared to state-of-the-art i∈I r(u, i) i∈I r(v, i) u v recommenders, do not have very good results of novelty and Finally, it only remains to describe the estimation of p(i|v) diversity [27]. Therefore, in the next section, we explore al- which is based on the maximum likelihood estimate (MLE) ternatives to collection-based smoothing in order to improve over a of positive ratings: novelty figures. r(u, i) p (i|u) = (3) ml P r(u, j) 3. IDF EFFECT AND NOVELTY ON RM j∈Iu Novelty and diversity have been two important aspects in The problem of the maximum likelihood estimate stems text retrieval, to the point that several TREC tracks and from its high sparsity: if a user did not rate an item, the esti- tasks have been devoted to them1. However, regarding the mate yields a value of zero. For this reason, language models recommendation task, there has been little interest in nov- are smoothed. In text retrieval, the most common practice elty or diversity until the beginning of the 2000s [7]. Ac- is to smooth the MLE with the collection model [33]. Like- curacy, particularly measured with error metrics, was the wise, in recommendation, collection-based smoothing meth- primary objective of any recommender. For example, the ods have been applied to Relevance Models [29]. Next, we Netflix Prize goal was to improve the accuracy of Cinematch present the most common collection-based smoothing meth- (Netflix recommendation system) by 10% [5]. Nowadays, re- ods for addressing the sparsity problem of MLE. search efforts have moved from the rating prediction task to 2.3 Collection-Based Smoothing Methods the top-N recommendation task [14, 8]. Additionally, there exist a consensus on the critical importance of measuring dif- Using a collection-based smoothing method, the probabil- ferent properties of recommenders systems such as diversity ity of an item given a user p(i|u) is calculated by smoothing and novelty [14, 15, 7]. the maximum likelihood estimate pml(i|u) with the back- On the other hand, in the Information Retrieval commu- ground model of the collection p(i|C). This collection model nity, recent criticism has been raised to the use of collection- is given by: based smoothing methods in the context of pseudo-relevance P r(v, i) feedback. Hazimeh and Zhai analysed the effect that this p(i|C) = v∈U (4) P r(v, j) type of smoothing has on three PRF techniques [13]. In par- j∈I, v∈U ticular, they found that applying collection-based smooth- For text retrieval, the prominent smoothing methods are ing methods to RM1 conflicts with the IDF effect—a desired Jelinek-Mercer and Dirichlet Priors [33]. In contrast, it has property of a retrieval system. been shown that Absolute Discounting works better than We claim that there exists a connection between the con- the previous methods when Relevance Models are applied cept of novelty from recommendation and the IDF effect to collaborative filtering [29]. from Information Retrieval. In this section, we describe this relationship. We start by defining novelty and diversity in 2.3.1 Jelinek-Mercer (JM) recommendation and, then, we present the IDF effect and This method performs a linear interpolation between the its similarities with novelty. maximum likelihood and the collection model [16] which is regulated by the parameter λ ∈ [0, 1]: 3.1 Novelty and Diversity Novelty has been studied in Information Retrieval as the p (i|u) = (1 − λ) p (i|u) + λ p(i|C) (5) λ ml proportion of relevant documents in the result set that are 2.3.2 Dirichlet Priors (DP) unknown to the user [1]. Since this definition is not very pragmatic when using top-N recommenders in a collabora- DP is derived from a Bayesian analysis using Dirichlet tive filtering scenario, novelty is usually measured as how priors [18]. It has a parameter µ > 0 to control the amount unusual the recommended items are [15]. Diversity, on the of smoothing applied: other hand, measures whether a recommender systems sug- r(u, i) + µ p(i|C) gest different items or, on the contrary, it recommends mostly pµ(i|u) = P (6) the same items [7]. Both properties, novelty and diversity, µ + j∈I r(u, j) u are closely connected and to some degree complementary [7]. 2.3.3 Absolute Discounting (AD) The rationale behind the importance of novelty is that AD subtracts a value of δ > 0 from the count of the rated an accurate recommendation can be useless if the suggested items [20]. This discount is compensated with the back- items are already known by the user. Recommendations ground collection: should also try to suggest items that users would not have discovered by themselves. However, this property, called ser- max[r(u, i) − δ, 0] + δ |Iu| p(i|C) endipity, is difficult to measure. Thus, it is usually approx- p (i|u) = (7) δ P r(u, j) imated by novelty and relevance [7]. Somehow, novelty is a j∈Iu similar concept to serendipity but weaker: novel recommen- Collection-based smoothing methods have in common that dations provide the users with information about uncommon they substitute part of the probability mass provided by the items, although these items could have been discovered even- MLE (Eq. 3) with probability mass obtained from the col- lection model (Eq. 4). This reallocation of probability is 1http://trec.nist.gov/tracks.html tually. Recommendations are considered diverse when they that is less popular in the collection (in terms of probability, suggest a great variety of items instead of recommending a see Eq. 4). This property is desirable in order to enhance the few popular ones. novelty of the recommendations while keeping high accuracy. Recommender systems that strongly focus on accuracy This effect does not conflicts with accuracy because uncom- may give poor results on diversity and novelty metrics and mon items are preferred over common items only when they vice versa. Intuitively, we can see that if we recommend have the same ratings. to the users the most popular items for their similar neigh- Formally, we can define the IDF effect for recommendation bours, the suggestions will be accurate but diversity and as follows: novelty will suffer. On the contrary, recommending unusual Definition (IDF effect). Let u be a user from the set of items can improve novelty and diversity at the risk of making users U and V be her/his neighbourhood. Given two items some mistaken suggestions. This is perhaps the most promi- u i and i with the same ratings r(v, i ) = r(v, i ) ∀ v ∈ V nent trade-off in the field of recommender systems [34]. 1 2 1 2 u and different popularity p(i1|C) < p(i2|C), a recommender 3.2 IDF Effect system that outputs p(i1|Ru) > p(i2|Ru) is said to support the IDF effect. In Information Retrieval, the inverse document frequency (IDF) is a measure of term specificity [26, 25]. It is defined Now we proceed to analyse axiomatically RM2. If we as the inverse of the number of documents in the collection assume that i1 and i2 are two items as in the previous defi- that contains the target term. For example, stopwords are nition, studying the sign of ∆ = p(i1|Ru) − p(i2|Ru) allows terms that appear in almost every document and they do to check whether RM2 supports the IDF effect or not. If not provide much information. In contrast, those terms that ∆ > 0, the recommender system supports this property. On only appear in a few documents are highly informative and the contrary, if ∆ < 0, the algorithm violates the definition help in discriminating which documents are relevant. Thus, of the IDF effect. Finally, ∆ = 0 means that the system the IDF effect gives more importance to those query terms neither promotes nor demotes the IDF effect. Given the that are more specific (i.e., higher IDF). formula of RM2, ∆ is computed as follows: IDF was not born from a formal analysis, however it was ∆ = p(i |R ) − p(i |R ) considered a useful and robust heuristic [26]. Later, Robert- 1 u 2 u son provided a theoretical justification for this term weight- Y X p(i1|v) p(v) = p(i) p(j|v) ing function [25]. Mostly all the text retrieval algorithms p(i) j∈I v∈V introduce the IDF effect to weight query terms [26]. This u u Y X p(i2|v) p(v) property can be included in the retrieval model either explic- − p(i2) p(j|v) (8) itly (e.g., the vector space model or BM25 [1]) or implicitly p(i2) j∈Iu v∈Vu (e.g., the probabilistic model [26] or language models [33]). −1 We claim that when adapting the Relevance Modelling If we suppose that item priors are uniform, p(i) = |I| , framework to collaborative filtering [23], term specificity is we obtain: related to item novelty. The IDF effect promotes specific Y X p(j|v) p(v) ∆ = p(i) p(i |v) terms over popular—and to some extent meaningless—ones. p(i) 1 Since items play the role of terms when using Relevance j∈Iu v∈Vu Y X p(j|v) p(v) Models for recommendation, promoting uncommon terms − p(i) p(i |v) (9) will be beneficial for improving novelty figures. p(i) 2 j∈I v∈V Previous work has explored different estimations of Rele- u u vance Models that promote divergent terms with great suc- We can observe that the sign of ∆ depends on the sign of cess [6, 22]. Another work in this line of research was the p(i1|v)−p(i2|v) which may vary among smoothing methods. study of Hazimeh and Zhai [13]. They performed an ax- Therefore, we need to analyse each smoothing technique one iomatic analysis of the IDF effect in several pseudo-relevance by one. Next, we examine the three collection-based meth- feedback methods. They found that collection-based meth- ods described Sec. 2.3. Moreover, we present and study Ad- ods (those from Sec. 2.3) penalise the IDF effect on RM1. ditive smoothing, a collection-agnostic method, as a possible To overcome this problem, they propose to use Additive alternative to the traditional ones. smoothing which does not rely on a background collection model. Their analysis showed that this type of smoothing 4.1 Analysis of Jelinek-Mercer neither promotes nor demotes the IDF effect. However, it Applying Jelinek-Mercer smoothing from Eq. 5 to Eq. 9: is not clear whether this conclusion is applicable to RM2. Y X p(j|v)p(v) ∆ = p(i) (1 − λ)p (i |v) + λp(i |C) For this reason, we present an axiomatic analysis of the IDF p(i) ml 1 1 effect on RM2 for recommender systems. j∈Iu v∈Vu Y X p(j|v)p(v) − p(i) (1 − λ)p (i |v) + λp(i |C) 4. AXIOMATIC ANALYSIS OF RM2 p(i) ml 2 2 j∈Iu v∈Vu In this section, we study the IDF effect on RM2. We per- < 0 (10) formed an axiomatic analysis of the IDF effect on RM2. Our goal is to examine if RM2, in the context of top-N recom- we obtain that the difference is negative because λ ∈ [0, 1], mendation, penalises the IDF effect as RM1 with collection- all the probabilities are positive and p(i1|C) < p(i2|C) from based smoothing methods does in the pseudo-relevance feed- definition. Note that pml(i1|u) = pml(i2|u) because both back task [13]. items have the same ratings. Thus, Jelinek-Mercer demotes In recommendation, given two items with the same rat- the IDF effect for RM2 as it does for RM1 in pseudo-relevance ings in the neighbourhood, the IDF effect promote the item feedback [13]. 4.2 Analysis of Dirichlet Priors Table 1: Datasets Plugging DP smoothing method from Eq. 6 into Eq. 9: Dataset Users Items Ratings Y X p(j|v)p(v) r(v, i1) + µ p(i1|C) ∆ = p(i) P MovieLens 100k 943 1,682 100,000 p(i) µ + k∈I r(v, k) j∈Iu v∈Vu u MovieLens 1M 6,040 3,706 1,000,209

Y X p(j|v)p(v) r(v, i2) + µ p(i2|C) − p(i) p(i) µ + P r(v, k) j∈I v∈V k∈Iu u u 5. EXPERIMENTS < 0 (11) Next, we present the empirical evaluation of Relevance we obtain that the difference is also negative because µ > 0, Models with Additive smoothing applied in a collaborative all the ratings and probabilities are positive and, by defi- filtering scenario. We used the MovieLens 100k and Movie- 2 nition, p(i1|C) < p(i2|C). We can conclude that Dirichlet Lens 1M collections . These datasets were extracted from Priors violates the IDF effect for RM2. This also happens a film recommendation platform. Their statistics are pre- for RM1 in pseudo-relevance feedback [13]. sented in Table 1. We used the training/test splits provided by the Movie- 4.3 Analysis of Absolute Discounting Lens 100k collection. Since the other dataset do not offer a Absolute Discounting was not studied in the context of default partition, we split the dataset in the following man- pseudo-relevance feedback; however, since it is the preferred ner: 80% of ratings of each user are for training and the rest method in recommendation [29], we analyse if it supports for test. the IDF effect: 5.1 Evaluation Methodology Y X p(j|v)p(v) rδ(v, i1) + δ |Iv| p(i1|C) ∆ = p(i) P Traditionally, Recommenders Systems intended to predict p(i) k∈I r(u, k) j∈Iu v∈Vu v the ratings of unknown items. Thus, error measures such as

Y X p(j|v)p(v) rδ(v, i2) + δ |Iv| p(i2|C) Root Mean Squared Error (RMSE) or Mean Average Er- − p(i) P ror (MAE) were commonly used to assess the performance p(i) k∈I r(u, k) j∈Iu v∈Vu v of the recommenders [11]. Acknowledging that evaluating < 0 (12) recommender systems using error metrics does not lead to better recommenders, several studies proposed the use of where rδ(v, i) = max[rδ(v, i) − δ, 0]. We can observe that IR metrics for evaluating the rankings of recommendations the difference ∆ is negative taking into account that δ > (i.e., top-N recommendation) as well as diversity and novelty 0, |Iv| > 0, all the ratings are positive and, by definition, metrics [14, 8, 11, 7]. p(i1|C) < p(i2|C). In this work, all the metrics are evaluated at a given cut- We can observe that the three collection-based smoothing off rank, that is, taking into account only the top k recom- methods demote the IDF effect on RM2 for recommenda- mendations computed by the recommender. The rationale tion. For this reason, next we also explore Additive smooth- for this decision is that users seldom consider more recom- ing as a collection-agnostic smoothing method. mendations than the first ones. Thus, we are interested in studying the quality of the top suggestions generated by the 4.4 Analysis of Additive Smoothing system. Additive smoothing (also known as Laplace smoothing) is a collection-agnostic method. It increases all the ratings by 5.1.1 Accuracy a parameter γ > 0. If the user u has not rated the item i, In contrast with error measures, top-N recommendation that item will receive a rating value of γ. The probability metrics analyse the quality of the recommendation lists. To estimate with this technique is computed as follows: apply these metrics, we followed the TestItems approach r(u, i) + γ described by Bellog´ınet al. [3]. For each user, we computed p (i|u) = (13) γ P r(u, j) + γ|I| a ranking composed of all the items having a test rating by j∈Iu some user and no training rating by the target user. The set Since this method is collection-agnostic, it does not rely on of relevant items for the user u consists of all the items rated the probability of an item in the collection, p(i|C), which is a by the user u in the test set that have a rating greater than measure of item popularity—opposed to novelty. Applying or equals to 3. As it has been acknowledged, considering the same axiomatic analysis as before: non-rated items as irrelevant may underestimate the true metric value; however, it provides a better estimation of the Y X p(j|v)p(v) r(u, i1) + γ recommender quality [19]. ∆ = p(i) p(i) P r(u, j) + γ|I| We used normalised discounted cumulative gain (nDCG) j∈Iu j∈Iu v∈Vu for measuring the quality of the recommendation list. In Y X p(j|v)p(v) r(u, i2) + γ addition, this metric takes into account graded relevance − p(i) P p(i) j∈I r(u, j) + γ|I| (i.e., a higher rating is preferred) and position (i.e., relevant j∈Iu v∈Vu u suggestions in the top positions are better than in bottom = 0 (14) places). We used the standard formulation as described in we find that this method neither supports nor violates the [32]. Adapted to collaborative filtering, this metric is com- IDF effect. This result coincides with the analysis of RM1 for pseudo-relevance feedback [13]. 2http://grouplens.org/datasets/movielens puted as follows: 5.2 Results and Discussion We tested the four different smoothing methods on the 1 X DCGu@k nDCG@k = (15) MovieLens 100k and MovieLens 1M datasets. We tuned |U| IDCGu@k u∈U the smoothing parameters γ ∈ {0.001, 0.01, 0.1, 1, 10}, δ ∈ where the Discounted Cumulative Gain is defined as: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}, λ ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} and µ ∈ {100, 200, 300, 400, 500, k k X rel(u, Lu[i]) 600, 700, 800, 900, 1000} of the Additive, AD, JM and DP DCGu@k = (16) methods, respectively. We used k-NN algorithm with cosine log2(i + 1) i=1 similarity for computing neighbourhoods. We set k = 50 and IDCG@k (Ideal DCG at k for user u) refers to the for the MovieLens 100k and k = 100 for the MovieLens 1M. maximum possible DCG till position k for the user u. It The results in terms of nDCG@10, Gini@10 and MSI@10 is computed calculating the DCGu@k of the perfect ranked are presented in Figs. 1 and 2 for the MovieLens 100k and list of items that can be recommended to user u. Last, 1M, respectively. k rel(u, Lu[i]) is the graded relevance, for the user u, of the Smoothing is a crucial aspect of RM2: the choice of the k item located in the i-th position in the ranking list Lu. We smoothing method as well as its correct parameter optimi- consider the graded relevance to be equal to the rating if it sation notably affects the final quality of the recommenda- is superior or equals to 3; otherwise, the relevance will be tions. Although the absolute values of performance vary, zero. the trends in both datasets are very similar. This supports the generalisation of these results to other collections. 5.1.2 Diversity Overall, additive smoothing provides the best recommen- We measured diversity with the Gini index. This coeffi- dations in terms of precision, diversity and novelty followed cient is commonly used for quantifying wealth distribution by Absolute Discounting. Previous work reported that AD inequalities, but it has also been utilised for measuring rec- outperformed DP and JM in terms of accuracy [29]. How- ommendation diversity [10, 11, 7]. Note that we use the com- ever, this is the first study that also presents diversity and plement of this metric for convenience. In this way, when novelty figures of these methods. the index is 0, it indicates that a single item is recommended In recommendation, there is always a trade-off between for every user which corresponds to the minimum diversity accuracy and diversity or novelty [34]. It is possible to im- scenario. On the contrary, a value of 1 means that all the prove the diversity or novelty of the recommendations at items are equally recommended across the users. The Gini the expense of a reduction of accuracy (e.g., recommending index is computed as follows: very unpopular items to different users). Therefore, simulta- neous improvements in accuracy and in novelty or diversity |I| 1 X are highly valuable. Additive smoothing obtains notable Gini@k = 1 − (2j − |I| − 1) p(i |rec@k) (17) |I| − 1 j improvements on these three aspects. This confirms the im- j=1 portance of the IDF effect in recommendation. where i1, ··· , i|I| is the list of items sorted by increasing To further explore this trade-off, we plot the geometric p(ij |rec@k). This term refers to the probability that item mean of nDCG@10, Gini@10 and MSI@10. We used the ij is being recommended in some recommendation list of geometric mean because the arithmetic mean does not ad- length k and is given by: dress different scales properly. Figure 3 shows that Additive smoothing has the best trade-off among accuracy, diversity k |{u ∈ U|i ∈ Lu}| and novelty. In general, we can set γ between 0.001 and 0.01 p(i|rec@k) = P k (18) u∈|U| |Lu| to obtain good results. An important property of Additive smoothing is the sta- 5.1.3 Novelty bility to the changes in its parameter. We used a logarithmic We measured novelty using an Information Theoretic met- scale to visualise very large variations of parameter γ. This ric. Zhou et al. proposed to use the mean self-information method only showed a small decrease in accuracy, novelty to quantify the ability of a recommender system to generate and diversity when we used enormous values of γ. AD also unexpected recommendations [34]. This metric is also called showed quite stable results, as stated in previous work [29], surprisal because it measures the improbability of an out- but the method deteriorates with high amount of smoothing. come. We use the concept of popularity in this metric, that In contrast, the performance of Jelinek-Mercer and Dirich- is, the proportion of users that interacted with the item. let Priors is far lower than the rest. Additionally, the com- putational complexity of the three collection-based methods 1 X X 1 MSI@k = log (19) is the same. Thus, there is no reason to consider using these |U| pop(i) smoothing methods with RM2 for recommendation. u∈U i∈Lk u Additive smoothing also presents another advantage over where the popularity of an item is computed as: AD (and also over JM and DP): it does not depend on col- lection statistics. This fact not only preserves the IDF ef- |{u ∈ U|i ∈ I }| pop(i) = u (20) fect but also reduces the computational resources required |U| to maintain global statistics of the collection. Please note, that this definition of popularity is equivalent to the document frequency in Information Retrieval when re- placing users by documents and items by terms. Therefore, 6. CONCLUSIONS AND FUTURE WORK with this metric we can measure the IDF effect which, as we Recommender systems have greatly improved accuracy re- argued before, is directly related to novelty. sults in the last years. Recently, research has focused on γ γ γ 0.001 0.01 0.1 1 10 0.001 0.01 0.1 1 10 0.001 0.01 0.1 1 10 0.40 0.07 11.5

0.39 11.0 0.06 0.38 Additive (γ) 10.5 0.37 Absolute Discounting (δ) Jelinek-Mercer (λ) 0.05 0.36 Dirichlet Priors (µ) Additive (γ) 10.0 Absolute Discounting (δ) 0.35 0.04 Jelinek-Mercer (λ) 9.5 µ

Dirichlet Priors ( ) MSI@10 Gini@10

nDCG@10 0.34 9.0 0.03 0.33 8.5 Additive (γ) 0.32 δ 0.02 Absolute Discounting ( ) 8.0 Jelinek-Mercer (λ) 0.31 Dirichlet Priors (µ) 0.30 0.01 7.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 δ, λ, µ 103 δ, λ, µ 103 δ, λ, µ 103 × × × Figure 1: Values of nDCG@10 (left), Gini@10 (centre) and MSI@10 (right) for RM2 using Additive, Absolute Discounting, Jelinek-Mercer and Dirichlet Priors methods on the Movielens 100k collection varying the smoothing parameter. Neighbour- hoods are computed taking the 50 closest users according to cosine similarity.

γ γ γ 0.001 0.01 0.1 1 10 0.001 0.01 0.1 1 10 0.001 0.01 0.1 1 10 0.36 0.04 12.5 0.35 12.0 0.34 0.03 11.5 0.33 Additive (γ) 11.0 δ 0.32 Absolute Discounting ( ) Additive (γ) Jelinek-Mercer (λ) δ 10.5 µ Absolute Discounting ( ) 0.31 Dirichlet Priors ( ) 0.02 Jelinek-Mercer (λ) 10.0 MSI@10 Gini@10 Dirichlet Priors (µ)

nDCG@10 0.30 9.5 0.29 γ 0.01 9.0 Additive ( ) 0.28 Absolute Discounting (δ) 8.5 Jelinek-Mercer (λ) 0.27 Dirichlet Priors (µ) 0.26 0.00 8.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 δ, λ, µ 103 δ, λ, µ 103 δ, λ, µ 103 × × × Figure 2: Values of nDCG@10 (left), Gini@10 (centre) and MSI@10 (right) for RM2 using Additive, Absolute Discounting, Jelinek-Mercer and Dirichlet Priors methods on the Movielens 1M collection varying the smoothing parameter. Neighbour- hoods are computed taking the 100 closest users according to cosine similarity.

γ γ 0.001 0.01 0.1 1 10 0.001 0.01 0.1 1 10 0.7 0.6

0.6 0.5

0.5 0.4

0.4 0.3 Additive (γ) Additive (γ) Absolute Discounting (δ) Absolute Discounting (δ) 0.3 Jelinek-Mercer (λ) 0.2 Jelinek-Mercer (λ) Dirichlet Priors (µ) Dirichlet Priors (µ) G(Gini@10, MSI@10, nDCG@10) G(Gini@10, MSI@10, nDCG@10) 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 δ, λ, µ 103 δ, λ, µ 103 × × Figure 3: G-measure of nDCG@10 , Gini@10 and MSI@10 for RM2 using Additive, Absolute Discounting, Jelinek-Mercer and Dirichlet Priors methods on the Movielens 100k (left) and 1M (right) collections varying the smoothing parameter. Neighbourhoods are computed taking the 50 and 100 closest users according to cosine similarity, respectively for each dataset. enhancing many other aspects such as novelty and diver- in Relevance Models. Additionally, in this work we consid- sity. In the present paper, we study the connection between ered only uniform priors; however, it has been shown that the IDF effect from Information Retrieval to the concept of different priors can improve the quality of recommendations novelty in recommendation. We analysed axiomatically how [28]. We think that studying these priors axiomatically may different smoothing methods affect the IDF effect on RM2. be worthwhile. We found that collection-based methods penalise this effect while Additive smoothing neither promotes nor demotes this 7. ACKNOWLEDGMENTS property. Our experiments confirmed that Additive smooth- ing provides better result than collection-based smoothing This work was supported by the grant TIN2015-64282-R methods improving accuracy, diversity and novelty figures. of Ministerio de Econom´ıa y Competitividad of the Gover- As future work, it would be interesting to develop new ment of Spain and FEDER funds. The first author also smoothing methods that do actively promote the IDF effect wants to acknowledge the support of Ministerio de Edu- caci´on, Cultura y Deporte of the Government of Spain under the grant FPU014/01724. [17] V. Lavrenko and W. B. Croft. Relevance-Based Language Models. In SIGIR ’01, pages 120–127, 2001. 8. REFERENCES [18] D. J. C. MacKay and L. C. B. Peto. A hierarchical [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern Dirichlet language model. Nat. Lang. Eng., Information Retrieval: The Concepts and Technology 1(03):289–308, 1995. Behind Search. Addison Wesley, 2011. [19] M. R. McLaughlin and J. L. Herlocker. A [2] N. J. Belkin and W. B. Croft. Information Filtering Collaborative Filtering Algorithm and Evaluation and Information Retrieval: Two Sides of the Same Metric that Accurately Model the User Experience. In Coin? Commun. ACM, 35(12):29–38, 1992. SIGIR ’04, pages 329–336, 2004. [3] A. Bellog´ın, P. Castells, and I. Cantador. [20] H. Ney, U. Essen, and R. Kneser. On Structuring Precision-Oriented Evaluation of Recommender Probabilistic Dependences in Stochastic Language Systems. In RecSys ’11, pages 333–336, 2011. Modelling. Comput. Speech Lang., 8(1):1–38, 1994. [4] A. Bellog´ın, J. Wang, and P. Castells. Bridging [21] X. Ning, C. Desrosiers, and G. Karypis. A Memory-Based Collaborative Filtering and Text Comprehensive Survey of Neighborhood-Based Retrieval. Inf. Retr., 16(6):697–724, 2013. Recommendation Methods. In F. Ricci, L. Rokach, [5] J. Bennett and S. Lanning. The Netflix Prize. In and B. Shapira, editors, Recommender Systems Proceedings of KDD Cup and Workshop 2007, pages Handbook, pages 37–76. Springer, 2nd edition, 2015. 3–6, 2007. [22] J. Parapar and A.´ Barreiro. Promoting Divergent [6] C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An Terms in the Estimation of Relevance Models. In Information-Theoretic Approach to Automatic Query ICTIR ’11, pages 77–88. Springer, 2011. Expansion. ACM Trans. Inf. Syst., 19(1):1–27, 2001. [23] J. Parapar, A. Bellog´ın, P. Castells, and A.´ Barreiro. [7] P. Castells, N. J. Hurley, and S. Vargas. Novelty and Relevance-Based Language Modelling for Diversity in Recommender Systems. In F. Ricci, Recommender Systems. Inf. Process. Manage., L. Rokach, and B. Shapira, editors, Recommender 49(4):966–980, 2013. Systems Handbook, pages 881–918. Springer, 2nd [24] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. edition, 2015. Recommender Systems Handbook. Springer, 2nd [8] P. Cremonesi, Y. Koren, and R. Turrin. Performance edition, 2015. of Recommender Algorithms on Top-N [25] S. Robertson. Understanding inverse document Recommendation Tasks. In RecSys ’10, pages 39–46, frequency: on theoretical arguments for IDF. J. Doc., 2010. 60(5):503–520, 2004. [9] M. de Gemmis, P. Lops, C. Musto, F. Narducci, and [26] K. Sp¨arck Jones. A Statistical Interpretation of Term G. Semeraro. Semantics-Aware Content-Based Specificity and its Retrieval. J. Doc., 28(1):11–21, Recommender Systems. In F. Ricci, L. Rokach, and 1972. B. Shapira, editors, Recommender Systems Handbook, [27] D. Valcarce. Exploring Statistical Language Models pages 119–159. Springer, 2nd edition, 2015. for Recommender Systems. In RecSys ’15, pages [10] D. Fleder and K. Hosanagar. Blockbuster Culture’s 375–378, 2015. Next Rise or Fall: The Impact of Recommender [28] D. Valcarce, J. Parapar, and A.´ Barreiro. A Study of Systems on Sales Diversity. Manage. Sci., Priors for Relevance-Based Language Modelling of 55(5):697–712, 2009. Recommender Systems. In RecSys ’15, pages 237–240, [11] A. Gunawardana and G. Shani. Evaluating 2015. Recommender Systems. In F. Ricci, L. Rokach, and [29] D. Valcarce, J. Parapar, and A.´ Barreiro. A Study of B. Shapira, editors, Recommender Systems Handbook, Smoothing Methods for Relevance-Based Language pages 265–308. Springer, 2nd edition, 2015. Modelling of Recommender Systems. In ECIR ’15, [12] U. Hanani, B. Shapira, and P. Shoval. Information pages 346–351. Springer, 2015. Filtering: Overview of Issues, Research and Systems. [30] D. Valcarce, J. Parapar, and A.´ Barreiro. Language User Model. User-Adapt. Interact., 11(3):203–259, Models for Collaborative Filtering Neighbourhoods. In 2001. ECIR ’16, pages 614–625. Springer, 2016. [13] H. Hazimeh and C. Zhai. Axiomatic Analysis of [31] J. Wang, A. P. de Vries, and M. J. T. Reinders. A Smoothing Methods in Language Models for User-Item Relevance Model for Log-based Pseudo-Relevance Feedback. In ICTIR ’15, pages Collaborative Filtering. In ECIR ’06, volume 3936, 141–150, 2015. pages 37–48. Springer, 2006. [14] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and [32] Y. Wang, L. Wang, Y. Li, D. He, W. Chen, and T.-Y. J. T. Riedl. Evaluating Collaborative Filtering Liu. A Theoretical Analysis of NDCG Ranking Recommender Systems. ACM Trans. Inf. Syst., Measures. In COLT ’13, pages 1–30. JMLR.org, 2013. 22(1):5–53, 2004. [33] C. Zhai and J. Lafferty. A Study of Smoothing [15] N. Hurley and M. Zhang. Novelty and Diversity in Methods for Language Models Applied to Information Top-N Recommendation – Analysis and Evaluation. Retrieval. ACM Trans. Inf. Syst., 22(2):179–214, 2004. ACM Trans. Internet. Technol., 10(4):1–30, 2011. [34] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. [16] F. Jelinek and R. L. Mercer. Interpolated Estimation Wakeling, and Y.-C. Zhang. Solving the Apparent of Markov Source Parameters from Sparse Data. In Diversity-Accuracy Dilemma of Recommender Proceedings of the Workshop on Pattern Recognition Systems. PNAS, 107(10):4511–5, 2010. in Practice, pages 381–397, 1980.