<<

The Thirty-Second International Florida Artificial Intelligence Research Society Conference (FLAIRS-32)

Managing Popularity Bias in Recommender Systems with Personalized Re-Ranking

Himan Abdollahpouri,2 Robin Burke,2 Bamshad Mobasher3 1,2That Recommender Systems Lab, Department of Information Science, University of Colorado Boulder 3 Web Intelligence Lab, DePaul University [email protected], 2 [email protected], [email protected]

Abstract Many recommender systems suffer from popularity bias: popular items are recommended frequently while less pop- ular, niche products, are recommended rarely or not at all. However, recommending the ignored products in the “” is critical for as they are less likely to be discovered. In this paper, we introduce a personalized diver- sification re-ranking approach to increase the representation of less popular items in recommendations while maintain- ing acceptable recommendation accuracy. Our approach is a post-processing step that can be applied to the output of any . We show that our approach is capable of managing popularity bias more effectively, compared with an existing method based on regularization. We also exam- ine both new and existing metrics to measure the coverage of Figure 1: The long-tail of item popularity. long-tail items in the recommendation. The second vertical line divides the tail of the distribu- tion into two parts. We call the first part the long tail: these Introduction items are amenable to collaborative recommendation, even Recommender systems have an important role in e- though many algorithms fail to include them in recommen- commerce and information sites, helping users find new dation lists. The second part, the distant tail, are items that items. One obstacle to the effectiveness of recommenders receive so few ratings that meaningful cross-user compar- is the problem of popularity bias (Bellog´ın, Castells, and ison of their ratings becomes unreliable. For these cold- Cantador 2017): collaborative filtering recommenders typ- start items, content-based and hybrid recommendation tech- ically emphasize popular items (those with more ratings) niques must be employed. Our work in this paper is con- over other “long-tail” items (Park and Tuzhilin 2008) that cerned with collaborative recommendation and therefore fo- may only be popular among small groups of users. Although cuses on the long-tail segment. popular items are often good recommendations, they are also We present a general and flexible approach for control- likely to be well-known. So delivering only popular items ling the balance of item exposure in different portions of will not enhance new item discovery and will ignore the in- the item catalog as a post-processing phase for standard rec- terests of users with niche tastes. It also may be unfair to the ommendation algorithms. Our work is inspired by (Santos, producers of less popular or newer items since they are rated Macdonald, and Ounis 2010) where authors introduced a by fewer users. novel probabilistic framework called xQuAD for Web search Figure 1 illustrates the long-tail phenomenon in recom- result diversification which aims to generate search results mender systems. The y axis represents the number of ratings that explicitly account for various aspects associated with an per item and the x axis shows the product rank. The first under-specified query. We adapt the xQuAD approach to the vertical line separates the top 20% of items by popularity – popularity bias problem. Our approach enables the system these items cumulatively have many more ratings than the designer to tune the system to achieve the desired trade-off 80% tail items to the right. These “short head” items are the between accuracy and better coverage of long-tail, less pop- very popular items, such as blockbuster movies in a movie ular items. recommender system, that garner much more viewer atten- tion. Similar distributions can be found in other consumer Related Work domains. Recommending serendipitous items from the long tail is Copyright c 2019, Association for the Advancement of Artificial generally considered to be a key function of recommenda- Intelligence (www.aaai.org). All rights reserved. tion (Anderson 2006), as these are items that users are less

413 likely to know about. Authors in (Brynjolfsson, Hu, and Controlling Popularity Bias Smith 2006) showed that 30-40% of book sales are xQuAD represented by titles that would not normally be found in brick-and-mortar stores. Result diversification has been studied in the context of information retrieval, especially for web search engines, Long-tail items are also important for generating a fuller which have a similar goal to find a ranking of documents understanding of users’ preferences. Systems that use active that together provide a complete coverage of the aspects un- learning to explore each user’s profile will typically need to derlying a query (Santos et al. 2015). EXplicit Query As- present more long tail items because these are the ones that pect Diversification (xQuAD) (Santos, Macdonald, and Ou- the user is less likely to know about, and where user’s pref- nis 2010) explicitly accounts for the various aspects associ- erences are more likely to be diverse (Resnick et al. 2013). ated with an under-specified query. Items are selected itera- tively by estimating how well a given document satisfies an Finally, long-tail recommendation can also be understood uncovered aspect. as a social good. A market that suffers from popularity bias In adapting this approach, we seek to recognize the dif- will lack opportunities to discover more obscure products ference among users in their interest in long-tail items. and will be, by definition, dominated by a few large brands Uniformly-increasing diversity of items with different pop- or well-known artists (Celma and Cano 2008). Such a mar- ularity levels in the recommendation lists may work poorly ket will be more homogeneous and offer fewer opportunities for some users. We propose a variant that adds a personal- for and creativity. ized bonus to the items that belong to the under-represented group (i.e. the long-tail items). The personalization factor is The idea of the long-tail of item popularity and its im- determined based on each user’s historical interest in long- pact on recommendation quality has been explored by some tail items. researchers (Brynjolfsson, Hu, and Smith 2006; Park and Tuzhilin 2008). In those works, authors tried to improve the performance of the recommender system in terms of accu- Methodology racy and precision, given the long-tail in the ratings. Our We build on the xQuAD model to control popularity bias work, instead, focuses on reducing popularity bias and bal- in recommendation algorithms. We assume that for a given ancing the representation of items across the popularity dis- user u, a ranked recommendation list R has already been tribution. generated by a base recommendation algorithm. The task of the modified xQuAD method is to produce a new re-ranked A regularization-based approach to improving long tail list S (|S| < |R|) that manages popularity bias while still recommendations is found in (Abdollahpouri, Burke, and being accurate. Mobasher 2017a). One limitation with that work is that this The new list is built iteratively according to the following work is restricted to factorization models where the long-tail criterion: preference can be encoded in terms of the latent factors. In P (v|u) + λP (v, S0|u) (1) contrast, a re-ranking approach can be applied to the output of any algorithm. Another limitation of that work is that it where P (v|u) is the likelihood of user u ∈ U being inter- does not account for user tolerance towards long-tail items: ested in item v ∈ V , independent of the items on the list so the fact that there may be some users only interested in pop- far as, predicted by the base recommender. The second term ular items. In our model, we take personalization of long-tail P (v, S0|u) denotes the likelihood of user u being interested promotion into account as well. in an item v as an item not in the currently generated list S. Intuitively, the first term incorporates ranking accuracy And finally, there is substantial research in recommen- while the second term promotes diversity between two dif- dation diversity, where the goal is to avoid recommending ferent categories of items (i.e. short head and long tail). The too many similar items (Zhou et al. 2010; Castells, Vargas, parameter λ controls how strongly controlling popularity and Wang 2011; Zhang and Hurley 2008). Personalized di- bias is weighted in general. The item that scores most highly versity is also another related area of research where the under the equation 1 is added to the output list S and the pro- amount of diversification is dependent on the user’s toler- cess is repeated until S has achieved the desired length. ance for diversity (Eskandanian, Mobasher, and Burke 2017; To achieve more diverse recommendation containing Wasilewski and Hurley 2018). Another similar work to ours items from both short head and long tail items, the marginal is (Vargas, Castells, and Vallet 2012) where authors used a likelihood P (v, S0|u) over both item categories long-tail modified version of xQuAD called based xQuAD head (Γ) and short head (Γ’) is computed by: for intent-oriented diversification of search results and rec- X ommendations. Another work used a similar approach but P (v, S0|u) = P (v, S0|d)P (d|u) (2) for fairness-aware recommendation (Liu and Burke 2018) 0 where xQuAD was used to make a fair representation of d∈{Γ,Γ } items from different item providers. Our work is different Following the approach of (Santos, Macdonald, and Ou- from all these previous diversification approaches in that it nis 2010), we assume that the remaining items are inde- is not dependent on the characteristics of items, but rather pendent of the current contents of S and that the items are on the relative popularity of items. independent of each other given the short head and long

414 tail categories. Under these assumptions, we can compute also removed distant long-tail items from each dataset using P (v, S0|d) in Eq.2 as a limit of 20 ratings, a number 20 is chosen to be consistent with the cut-off for users. 0 0 Y After filtering, the MovieLens dataset has 6,040 users P (v, S |d) = P (v|d)P (S |d) = P (v|d) (1 − P (i|d, S)) who rated 3043 movies with a total number of 995,492 rat- i∈S ings, a reduction of about 0.4%. Applying the same criteria (3) to the Epinions dataset decreases the data to 220,117 ratings By substituting equation 3 into equation 2, we can obtain given by 8,144 users to 5,195 items, a reduction of around X Y 66%. We split the items in both datasets into two categories: P (v, S0|u) = P (d|u)P (v|d) (1 − P (i|d, S)) long-tail (Γ) and short head (Γ’)in a way that short head d∈{Γ,Γ0} i∈S items correspond to %80 of the ratings while long-tail items (4) have the rest of the %20 of the ratings. We plan to consider where P (v|d) is equal to 1 if v ∈ d and 0 otherwise. other divisions of the popularity distribution in future work. We measure P (i|d, S) in two different ways to produce For MovieLens, the short-head items were those with more two different algorithms. The first way is to use the same than 506 ratings. In Epinions, a short-head item needed only function as P (v|d), an indicator function where it equals to to have more than 73 ratings. 1 when item i in list S already covers category d and 0 oth- erwise. We call this method Binary xQuAD and it is how Evaluation original xQuAD was introduced. Another method that we present in this paper is to find the ratio of items in list S that The experiments compare four algorithms. Since we are covers category d. We call this method Smooth xQuAD. concerned with ranking performance, we chose as our base- The likelihood P (d|u) is the measure of user preference line algorithm RankALS, a pair-wise learning-to-rank algo- over different item categories. In other words, it measures rithm. We also include the regularized long-tail diversifi- how much each user is interested in short head items versus cation algorithm in (Abdollahpouri, Burke, and Mobasher long tail items. We calculate this likelihood by the ratio of 2017a) (indicated as LT-Reg in the figures.) We used the items in the user profile which belong to category d. output from RankALS as input for the two re-ranking vari- In order to select the next item to add to S, we compute ants described above: Binary xQuAD and Smooth xQuAD, a re-ranking score for each item in R\S according to Eq. marked Binary and Smooth in the figures. We compute lists 4. For an item v0 ∈ d, if S does not cover d, then an addi- of length 100 from RankALS and pass these to the re- tional positive term will be added to the estimated user pref- ranking algorithms to compute the final list of 10 recom- 0 mendations for each user. We used the implementation of erence P (v |u). Therefore, the chance that it will be selected 1 is larger, balancing accuracy and popularity bias. RankALS in LibRec 2.0 for all experiments. Q In order to evaluate the effectiveness of algorithms in mit- In Binary xQuAD, the product term i∈S(1 − P (i|d, S)) is only equal to 1 if the current items in S have not covered igating popularity bias we use four different metrics: the category d yet. Binary xQuAD is, therefore, optimizing Average Recommendation Popularity (ARP): This for a minimal re-ranking of the original list by including the measure from (Yin et al. 2012) calculates the average popu- best long-tail item it can, but not seeking diversity beyond larity of the recommended items in each list. For any given that. recommended item in the list, we measure the average num- ber of ratings for those items. More formally: P Experiment 1 X i∈L φ(i) ARP = u (5) In this section, we test our proposed algorithm on two public |Ut| |Lu| u∈U datasets. The first is the well-known Movielens 1M dataset t that contains 1,000,209 anonymous ratings of approximately where φ(i) is the number of times item i has been rated in 3,900 movies made by 6,040 MovieLens users (Harper and the training set. Lu is the recommended list of items for user Konstan 2015). The second dataset is the Epinions dataset, u and |Ut| is the number of users in the test set. which is gathered from a consumers opinion site where users Average Percentage of Long Tail Items (APLT): As can review items (Massa and Avesani 2007). This dataset has used in (Abdollahpouri, Burke, and Mobasher 2017a), this the total number of 664,824 ratings given by 40,163 users to metric measures the average percentage of long tail items in 139,736 items. In Movielens, each user has a minimum of the recommended lists and it is defined as follows: 20 ratings but in Epinions, there are many users with only a 1 X |{i, i ∈ (Lu ∩ Γ)}| single rated item. AP LT = (6) |Ut| |Lu| Following the data reduction procedure in (Abdollah- u∈Ut pouri, Burke, and Mobasher 2017a), we removed users who This measure gives us the average percentage of items in had fewer than 20 ratings from the Epinion dataset, as users users’ recommendation lists that belong to the long tail set. with longer profiles are much more likely to have long-tail Average Coverage of Long Tail items (ACLT): We in- items in their profiles. MovieLens dataset already consists troduce another metric to evaluate how much exposure long- of only users with more than 20 ratings. The retained users tail items get in the entire recommendation. One problem were those likely to have rated enough long-tail items so that our objective could be evaluated in a train / test scenario. We 1www.librec.net

415 with AP LT is that it could be high even if all users get the In the sparser dataset, the Binary and Smooth measures same set of long tail items. ACLT measures what fraction are similar in performance, but differences appear in Movie- of the long-tail items the recommender has covered: Lens, where the Smooth algorithm shows stronger improve- ment in the ACLT measure, particularly. This effect is most 1 X X 1 ACLT = (i ∈ Γ) (7) likely due to the fact that in the sparser data, it is more dif- |Ut| u∈Ut i∈Lu ficult to find a single long-tail item to promote into a rec- where 1(i ∈ Γ) is an indicator function and it equals to 1 ommendation list, with greater accuracy cost in doing so. when i is in Γ. This function is related to the Aggregate Di- In MovieLens, these higher-quality items appear more often versity metric of (Adomavicius and Kwon 2012) but it looks and the Smooth objective values the promotion of multiple only at the long-tail part of the item catalog. such items into the recommendation lists. In addition to the aforementioned long tail diversity met- Another conclusion we can draw is that the ARP mea- rics, we also evaluate the accuracy of the ranking algorithms sure is not a good measure of long-tail diversity when it is in order to examine the diversity-accuracy trade-offs. For used only on its own. It has the benefit of not requiring the this purpose we use the standard Normalized Discounted cu- experimenter to set a threshold distinguishing long-tail and mulative Gain (NDCG) measure of ranking accuracy. short-head items. However, as we see here, algorithms can have very similar ARP performance and be quite different Results in terms of their ability to cover the long-tail catalog and to Figure 2 shows the results for the Epinions dataset across promote long-tail items to users. So it is important to look at the different algorithms using a range of values for λ. (Note all these metrics together. that the LT-Reg algorithm uses the parameter λ to control the weight placed on the long-tail regularization term.) All Conclusion and future work results are averages from five-fold cross-validation using Adequate coverage of long-tail items is an important factor a %80 -%20 split for train and test, respectively. As ex- in the practical success of -to-consumer organiza- pected, the diversity scores improve for all algorithms, with tions and information providers. Since short-head items are some loss of ranking accuracy. Differences between the al- likely to be well known to many users, the ability to recom- gorithms are evident, however. The exposure metric (ACLT) mend items outside of this band of popularity will determine plot shows that the two re-ranking algorithms, and especially if a recommender system can introduce users to new prod- the Smooth version, are doing a much better job of exposing ucts and experiences. Yet, it is well-known that recommen- items across the long-tail inventory than the regularization dation algorithms have biases towards popular items. method. The ranking accuracy shows that, as expected, the In this paper, we presented re-ranking approaches for Binary version does slightly better as it performs minimal long-tail promotion and compared them to a state-of-the- adjustment to the ranked lists. LT-Reg is not as effective at art model-based approach. On two datasets, we were able to promoting long-tail items, either by the list-wise APLT mea- show that the re-ranking methods boost long-tail items while sure or by the catalog-wise ACLT. keeping the accuracy loss small, compared to the model- Another view of the same results is provided in Figure 3. based technique. We also showed that the average recom- Here we look at the long-tail diversity metrics relative to mendation popularity (ARP) measure from (Yin et al. 2012) NDCG loss, which clarifies the patterns seen in Figure 2. We is not a good metric on its own for evaluating long-tail pro- see that the Binary and Smooth algorithms are fairly sim- motion, as algorithms might have similar ARP performance ilar in terms of diversity-accuracy trade-off, while LT-Reg but quite different performance on other measures of popu- has a distinctly lower and flatter improvement curve with in- larity bias. So it is better to use it along with other metrics creased loss of ranking accuracy. ARP metric is the only one such as APLT and ACLT to get the right picture of the effec- where the algorithms are fairly similar, especially at lower tiveness of the algorithms. values of NDCG loss. One interesting area for future work would be using this The MovieLens dataset shows different relative perfor- model for multistakeholder recommendation where the sys- mance across the algorithms as seen in Figure 4. The Smooth tem needs to make recommendations in the presence of re-ranking method shows a more distinct benefit and LT-Reg different stakeholders providing the products (Abdollah- is somewhat more effective. This finding is confirmed in the pouri, Burke, and Mobasher 2017b; Burke and Abdollah- relative results shows in Figure 5, which also shows the al- pouri 2016; Burke et al. 2016). In those cases, another pa- gorithms having quite similar values for the ARP metric, in rameter could be used to control the priority of each stake- spite of the differences on the other metrics. holder in the system. Comparing the two datasets, we see that long-tail di- versification is more of a challenge in the sparser Epin- References ions dataset. With 10% NDCG loss, it is possible to bring exposure to around 15% of the long-tail catalog in Epin- Abdollahpouri, H.; Burke, R.; and Mobasher, B. 2017a. ions; whereas for MovieLens, 0.2% loss yields an equivalent Controlling popularity bias in learning-to-rank recommen- or greater benefit. LT-Reg is much less effective. (In both dation. In Proceedings of the Eleventh ACM Conference on datasets, the baseline value is very close to zero.) The aver- Recommender Systems, 42–46. ACM. age number of long-tail items in each recommendation list Abdollahpouri, H.; Burke, R.; and Mobasher, B. 2017b. shows a similar pattern. Recommender systems as multi-stakeholder environments.

416 a issi nomto erea erc o recommender for metrics retrieval systems. information in biases cal Bellog Hyperion. less. for more selling is 2006. C. Engineering Anderson, Data and Knowledge 24(5):896–911. on Transactions techniques. IEEE ranking-based using aggregate Improving diversity 2012. recommendation Y. Kwon, and G., Adomavicius, ACM. (UMAP2017). Personalization and Adaptation In .21.Twrsmlisaeodruiiyeauto of evaluation utility In multi-stakeholder Towards systems. recommender Gupta, and 2016. B.; Mobasher, T. H.; Abdollahpouri, D.; R. Burke, Systems. Recommender Educational Rec- on Workshop Educational tional In Stakeholders. 2016. Multiple with H. ommendation Abdollahpouri, and R., Burke, From tail. long 2006. the Review D. of ment M. Anatomy Smith, riches: and to niches J.; Y. Hu, E.; Brynjolfsson, rceig fte2t ofrneo srModeling, User on Conference 25th the of Proceedings iue3 oprsno ouaiyba oto o ifrn loihsa ifrn eeso DGls (Epinions) loss NDCG of levels different at algorithms different for control bias popularity of Comparison 3: Figure ´ n . atls . n atdr .21.Statisti- 2017. I. Cantador, and P.; Castells, A.; ın, nomto erea Journal Retrieval Information APLT@10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 NDCG@10 0.009 0.010 0.011 0.012 0.013 0.014 0.015 67–71. APLT@10 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

h ogti:Wyteftr fbusiness of future the Why tail: long The 0 0

Percentage ofNDCGloss 0.1 0.1 0.1 Epinion

okhpo upie Opposi- Surprise, on Workshop 0.2 0.2 0.15 0.3 0.3

0.4 0.4 Epinion

0.2 LT-Reg ¸ 0.5 ¸ 0.5 20(6):606–634.

iue2 eut o h pnosdataset Epinions the for Results 2: Figure 0.6 0.6 la Manage- Sloan hr Interna- Third ACLT@10 0.7 0.7 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.8 0.8 0 LT-Reg 0.9 0.9 Binary

Percentage ofNDCGloss 0.1 1 1 417 Epinion

Binary 0.15 Celma, Press. ACM 29–37. (DDR) , Retrieval Document in Diversity In discovery di- relevance. choice, and and systems: Novelty recommender 2011. for metrics J. versity Wang, and S.; Vargas, P.; Castells, Systems, 2016. Personalized UMAP and Adaptive in Obstruction and tion, i,W,adBre .21.Proaiigfairness-aware Personalizing 2018. R. Burke, and W., The Liu, 2015. (TiiS) Systems Intelligent active A. context. J. and Konstan, History datasets: and M., F. Harper, Personalization ACM. and 280–284. Adaptation Modeling, User In on ference collaborative in systems. diversity recommender personalizing clus- A for 2017. approach R. tering Burke, and B.; Mobasher, F.; Eskandanian, ACM. Competi- 5. Prize Netflix tion, the and Systems Recommender Scale discov- and In recommendation ery. bias can artists popular

ARP@10 ACLT@10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.2 300 350 400 450 500 550 Smooth . n ao .20.Fo ist ihs:o how or niches?: to hits From 2008. P. Cano, and O., ` rceig fte2dKDWrso nLarge- on Workshop KDD 2nd the of Proceedings

Smooth 0 0

0.1 0.1

ARP@10 rceig fItrainlWrso on Workshop International of Proceedings 440 460 480 500 520 540 560 0.2 0.2

0 0.3 0.3

0.4 0.4 Epinion RankALS Percentage ofNDCGloss 0.1 rceig fte2t Con- 25th the of Proceedings ¸ 0.5 ¸ 0.5 C rnatoso Inter- on Transactions ACM 5(4):19. Epinion 0.6 0.6 0.15 0.7 0.7

0.8 0.8 0.2 0.9 0.9

1 1 , rceig fte1t nentoa ofrneo World ACM. on 881–890. conference web, international wide 19th the of In diversification. Proceedings result Exploiting search web 2010. for I. reformulations Ounis, query and C.; Macdonald, L.; R. Santos, Search 2015. Retrieval tion al. et I.; diversification. Ounis, result C.; Macdonald, L.; R. Santos, com- ACM. work 95–100. cooperative panion, supported Computer strategies on bubble: conference In (filter) exposure. your diverse Bursting promoting and for 2013. A.; S. J. Munson, N. T.; Stroud, Kriplean, K.; R. Garrett, P.; Resnick, 11–18. In systems, ACM. Recommender recom- on it. conference of leverage ACM tail 2008 to long the how The and 2008. systems mender A. Tuzhilin, and Y.-J., Park, ACM. 17–24. systems, Recommender In recommender Trust-aware 2007. systems. P. Avesani, and P., Massa, re-ranking. iue5 oprsno ouaiyba oto o ifrn loihsa ifrn eeso DGls (MovieLens) loss NDCG of levels different at algorithms different for control bias popularity of Comparison 5: Figure

APLT@10 0.00 0.05 0.10 0.15 0.20 ri rpitarXiv:1809.02921. preprint arXiv NDCG@10 0.257 0.258 0.259 0.260 0.261 0.262 0.263

rceig fte20 C ofrneon conference ACM 2007 the of Proceedings APLT@10 9(1):1–90. 0 0.00 0.05 0.10 0.15 0.20 0.25

0 0 onain n Trends and Foundations Percentage ofNDCGloss 0.002 0.1 0.1 Movielens 0.2 0.2 0.003 0.3 0.3 Movielens

rceig fte2013 the of Proceedings 0.4 0.4

0.005 LT-Reg ¸ 0.5 ¸ 0.5 iue4 eut o h oiLn dataset MovieLens the for Results 4: Figure 0.6 0.6 rceig of Proceedings

R ACLT@10 0.7 0.7 0.00 0.05 0.10 0.15 0.20 0.25 0.30 nInforma- in 0.8 0.8 0 LT-Reg 0.9 0.9 Binary

Percentage ofNDCGloss 0.002 1 1 418 Movielens

Binary 0.003 n hn,Y-.21.Sligteaprn diversity- Sciences apparent of Academy the National systems. the Solving of recommender of 2010. dilemma accuracy R.; Y.-C. J. Wakeling, Zhang, M.; Medo, and J.-G.; Liu, Z.; Kuscsik, T.; Zhou, systems Recommender ACM. on conference 123–130. ACM In 2008 the im- lists. of monotony: recommendation ings Avoiding of diversity 2008. the proving N. Hurley, and Chal- M., Zhang, 2012. C. Chen, recommendation. Endowment and VLDB tail J.; long Yao, J.; the Li, lenging B.; Cui, H.; Yin, ACM. 81–89. Personalization, and Adaptation item- Intent-aware diversification. In 2018. personalised for filtering N. collaborative based Hurley, and J., Wasilewski, ACM. 75–84. informa- retrieval, in tion development and Research on conference SIGIR In rele- diver- Explicit retrieval information sification. 2012. intent-oriented in D. models Vallet, vance and P.; Castells, S.; Vargas,

rceig fte2t ofrneo srModeling, User on Conference 26th the of Proceedings ARP@10 1790 1800 1810 1820 1830 1840 1850 ACLT@10

0.005 0.0 0.1 0.2 0.3 0.4 Smooth

Smooth 0 0

rceig fte3t nentoa ACM international 35th the of Proceedings 0.1 0.1 ARP@10 1820 1825 1830 1835 1840 1845 1850 0.2 0.2 5(9):896–907. 0 0.3 0.3 Movielens 0.4 0.4 RankALS Percentage ofNDCGloss 0.002 ¸ 0.5 ¸ 0.5 Movielens 0.6 0.6 0.003

107(10):4511–4515. 0.7 0.7

rceig fthe of Proceedings 0.8 0.8 0.005 0.9 0.9 Proceedings 1 1 Proceed- ,