Correcting for Selection Bias in Learning-To-Rank Systems
Total Page:16
File Type:pdf, Size:1020Kb
Correcting for Selection Bias in Learning-to-rank Systems Zohreh Ovaisi Ragib Ahsan Yifan Zhang [email protected] [email protected] [email protected] University of Illinois at Chicago University of Illinois at Chicago Sun Yat-sen University Kathryn Vasilaky Elena Zheleva [email protected] [email protected] California Polytechnic State University of Illinois at Chicago University ABSTRACT the work on unbiasing the parameter estimation for learning-to- Click data collected by modern recommendation systems are an rank systems has focused on position bias [29], the bias caused by important source of observational data that can be utilized to train the position where a result was displayed to a user. Position bias learning-to-rank (LTR) systems. However, these data suffer from a makes higher ranked results (e.g., top search engine results) more number of biases that can result in poor performance for LTR sys- likely to be clicked even if they are not the most relevant. tems. Recent methods for bias correction in such systems mostly Algorithms that correct for position bias typically assume that focus on position bias, the fact that higher ranked results (e.g., top all relevant results have non-zero probability of being observed search engine results) are more likely to be clicked even if they are (and thus clicked) by the user and focus on boosting the relevance not the most relevant results given a user’s query. Less attention of lower ranked relevant results [29]. However, users rarely have has been paid to correcting for selection bias, which occurs because the chance to observe all relevant results, either because the sys- clicked documents are reflective of what documents have been tem chose toshow a truncated list of top k recommended results or shown to the user in the first place. Here, we propose new coun- because users do not spend the time to peruse through tens to hun- terfactual approaches which adapt Heckman’s two-stage method dreds of ranked results. In this case, lower ranked, relevant results and accounts for selection and position bias in LTR systems. Our have zero probability of being observed (and clicked) and never get empirical evaluation shows that our proposed methods are much the chance to be boosted in LTR systems. This leads to selection more robust to noise and have better accuracy compared to exist- bias in clicked results which is the focus of our work. ing unbiased LTR algorithms, especially when there is moderate Here, we frame the problem of learning to rank as a counterfac- to no position bias. tual problem of predicting whether a document would have been clicked had it been observed. In order to recover from selection bias KEYWORDS for clicked documents, we focus on identifying the relevant docu- ments that were never shown to users. Our formulation is different recommender systems, learning-to-rank, position bias, selection from previous counterfactual formulations which correct for posi- bias tion bias and study the likelihood of a document being clicked had it been placed in a higher position given that it was placed in a lower position [29]. 1 INTRODUCTION Here, we propose a general framework for recovering from se- The abundance of data found online has inspired new lines of in- lection bias that stems from both limited choices given to users and quiry about human behavior and the development of machine-learning position bias. First, we propose Heckmanrank, an algorithm for ad- algorithms that learn individual preferences from such data. Pat- dressing selection bias in the context of learning-to-rank systems. arXiv:2001.11358v2 [cs.IR] 12 May 2020 terns in such data are often driven by the underlying algorithms By adapting Heckman’s two-stage method, an econometric toolfor supporting online platforms, rather than naturally-occurring user addressing selection bias, we account for the limited choice given behavior. For example, interaction data from social media news to users and the fact that some items are more likely to be shownto feeds, such as user clicks and comments on posts, reflect not only a user than others. Because this correction method is very general, latent user interests but also news feed personalization and what it is applicable to any type of selection bias in which the system’s the underlying algorithms chose to show to users in the first place. decision to show documents can be learned from features. Because Such data in turn are used to train new news feed algorithms, prop- Heckmanrank treats selection as a binary variable, we propose two agating the bias further [9]. This can lead to phenomena such as bias-correcting ensembles that account for the nuanced probability filter bubbles and echo chambers and can challenge the validity of of being selected due to position bias and combine Heckmanrank social science research that relies on found data [26, 30]. with existing position-bias correction methods. One of the places where these biases surface is in personalized Our experimental evaluations demonstrate the utility of our pro- recommender systems whose goal is to learn user preferences from posed method when compared to state-of-the-art algorithms for available interaction data. These systems typically rely on learning unbiased learning-to-rank. Our ensemble methods have better ac- procedures to estimate the parameters of new ranking algorithms curacy compared to existing unbiased LTR algorithms under real- that are capable of ranking items based on inferred user prefer- istic selection bias assumptions, especially when the position bias ences, in a process known as learning-to-rank (LTR) [32]. Much of 1 Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, Elena Zheleva is not severe. Moreover, Heckmanrank is more robust to noise than bias, i.e., when a variable can affect both treatment and control. We both ensemble methods and position-bias correcting methods across leverage this work in our discussion of identifiability under selec- difference position bias assumptions. The experiments also show tion bias. that selection bias affects the performance of LTR systems even in The most related work to our context are studies by Hernández- the absence of position bias, and Heckmanrank is able to correct Lobato et al. [22], Schnabel et al. [37], Wang et al. [43]. Both Schn- for it. abel et al. [37] and Hernández-Lobato et al. [22] use a matrix factor- ization model to represent data (ratings by users) that are missing not-at-random, where Schnabel et al. [37] outperform Hernández- 2 RELATED WORK Lobato et al. [22]. More recently, Joachims et al. [29] propose a Here, we provide the context for our work and present the three position debiasing approach in the context of learning-to-rank sys- areas that best encompass our problem: bias in recommender sys- tems as a more general approach compared to Schnabel et al. [37]. tems, selection bias correction, and unbiased learning-to-rank. Throughout, we compare our results to Joachims et al. [29], al- Bias in recommender system. Many technological platforms, though, it should be noted that the latter deals with a more specific such as recommendation systems, tailor items to users by filter- bias - position bias - than what we address here. Finally, Wang et al. ing and ranking information according to user history. This pro- [43] address selection bias due to confounding, whereas we address cess influences the way users interact with the system and how selection bias that is treatment-dependent only. the data collected from users is fed back to the system and can Unbiased learning-to-rank. The problem we study here in- lead to several types of biases. Chaney et al. [9] explore a closely vestigates debiasing data in learning-to-rank systems. There are related problem called algorithmic confounding bias, where live two approaches to LTR systems, offline and online, and the work systems are retrained to incorporate data that was influenced by we propose here falls in the category of offline LTR systems. the recommendation algorithm itself. Their study highlights the Offline LTR systems learn a ranking model from historical click fact that training recommendation platforms with naive data that data and interpret clicks as absolute relevance indicators [2, 7, 12, are not debiased can cause a severe decrease in the utility of such 16, 24, 27–29, 36, 41, 42]. Offline approaches must contend with systems. For example, “echo chambers” are consequence of this the many biases that found data are subject to, including position problem [17, 20], where users are limited to an increasingly nar- and selection bias, among others. For example, Wang et al. [41] use rower choice set over time which can lead to a phenomenon called a propensity weighting approach to overcome position bias. Simi- polarization [18]. Popularity bias, is another bias affecting recom- larly, Joachims et al. [29] propose a method to correct for position mender system that is studied by Celma and Cano [8]. Popularity bias, by augmenting SV Mrank learning with an Inverse Propensity bias refers to the idea that a recommender system will display the Score defined for clicks rather than queries. They demonstrate that most popular items to a user, even if they are not the most rele- Propensity-Weighted SV Mrank outperforms a standard Ranking vant to a user’s query. Recommender systems can also affect users SV Mrank by accounting for position bias. More recently Agarwal decision making process, known as decision bias, and Chen et al. et al. [1] proposed nDCG SV Mrank that outperforms Propensity- [13] show how understanding this bias can improve recommender Weighted SV Mrank [29], but only when position bias is severe. We systems. Position bias is yet another type of bias that is studied in show that our proposed algorithm outperforms [29] when position the context of learning-to-rank systems and refers to documents bias is not severe.