Autodebias: Learning to Debias for Recommendation
Total Page:16
File Type:pdf, Size:1020Kb
AutoDebias: Learning to Debias for Recommendation Jiawei Chen1∗, Hande Dong1∗, Yang Qiu1, Xiangnan He1y, Xin Xin2, Liang Chen3, Guli Lin4, Keping Yang4. 1University of Science and Technology of China, 2University of Glasgow, 3Sun Yat-Sen University, 4Alibaba Group. {cjwustc@,donghd@mail.,yangqiu@mail.,hexn@}ustc.edu.cn [email protected],[email protected],{guli.lingl,shaoyao}@taobao.com. ABSTRACT ACM Reference Format: Recommender systems rely on user behavior data like ratings and Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin and Keping Yang. 2021. AutoDebias: Learning to Debias for Rec- clicks to build personalization model. However, the collected data ommendation. In Proceedings of the 44th International ACM SIGIR Confer- is observational rather than experimental, causing various biases in ence on Research and Development in Information Retrieval (SIGIR ’21), July the data which significantly affect the learned model. Most existing 11–15, 2021, Virtual Event, Canada. ACM, New York, NY, USA, 10 pages. work for recommendation debiasing, such as the inverse propensity https://doi.org/10.1145/3404835.3462919 scoring and imputation approaches, focuses on one or two specific biases, lacking the universal capacity that can account for mixed or 1 INTRODUCTION even unknown biases in the data. Being able to provide personalized suggestions to each user, rec- Towards this research gap, we first analyze the origin of bi- ommender systems (RS) have been widely used in countless online ases from the perspective of risk discrepancy that represents the applications. Recent years have witnessed flourishing publications difference between the expectation empirical risk and the true on recommendation, most of which aim at inventing machine learn- risk. Remarkably, we derive a general learning framework that ing model to fit user behavior data [17, 44, 52]. However, these well summarizes most existing debiasing strategies by specifying models may be deteriorated in real-world RS, as the behavior data some parameters of the general framework. This provides a valu- is often full of biases. In practice, the data is observational rather able opportunity to develop a universal solution for debiasing, than experimental, and is often affected by many factors, including e.g., by learning the debiasing parameters from data. However, but not limited to self-selection of the user (selection bias)[19, 33], the training data lacks important signal of how the data is biased exposure mechanism of the system (exposure bias)[34, 47], public and what the unbiased data looks like. To move this idea forward, opinions (conformity bias)[25, 29] and the display position (position we propose AotoDebias that leverages another (small) set of uni- bias)[22, 23]. These biases make the data deviate from reflecting form data to optimize the debiasing parameters by solving the user true preference. Hence, blindly fitting data without consider- bi-level optimization problem with meta-learning. Through theo- ing the data biases would yield unexpected results, e.g., amplifying retical analyses, we derive the generalization bound for AutoDebias the long-tail effect [1] and previous-model bias [28]. and prove its ability to acquire the appropriate debiasing strategy. Given the wide existence of data biases and their large impact on Extensive experiments on two real datasets and a simulated dataset the learned model, we cannot emphasize too much the importance demonstrated effectiveness of AutoDebias. The code is available at of properly debiasing for practical RS. Existing efforts on recom- https://github.com/DongHande/AutoDebias. mendation (or learning-to-rank) biases can be divided into three major categories: 1) data imputation [19, 42], which assigns pseudo- CCS CONCEPTS labels for missing data to reduce variance, 2) inverse propensity • Information systems ! Recommender systems. scoring (IPS) [39, 47], a counterfactual technique that reweighs the collected data for an expectation-unbiased learning, and 3) genera- KEYWORDS tive modeling [27], which assumes the generation process of data Recommendation, Bias, Debias, Meta-learning and reduces the biases accordingly. Despite their effectiveness in some scenarios, we argue that they suffer from two limitations: • Lacking Universality. These methods are designed for address- ∗ Jiawei Chen and Hande Dong contribute equally to the work. y Xiangnan He is the corresponding author. ing one or two biases of a specific scenario, e.g., IPS for selection bias [39], click model for position bias [12]. Thus, when facing real data that commonly contains multiple types of biases, these Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed methods will fall short. for profit or commercial advantage and that copies bear this notice and the full citation • Lacking Adaptivity. The effectiveness of these methods is guar- on the first page. Copyrights for components of this work owned by others than the anteed only when the debiasing configurations (e.g., pseudo- author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission labels, propensity scores, or data-generating process) are prop- and/or a fee. Request permissions from [email protected]. erly specified. However, obtaining such proper configurations is SIGIR ’21, July 11–15, 2021, Virtual Event, Canada quite difficult, requiring domain expertise that thoroughly un- © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-8037-9/21/07...$15.00 derstands the biases in the data and how they affect the model. https://doi.org/10.1145/3404835.3462919 Even worse, the optimal configurations may evolve with time as new users/items/interactions may change the data distribution, Table 1: Notations and Definitions. which has been a nightmare for practitioners to manually tune Notations Annotations the configurations continually. f¹ ºg a biased training set with entries D:, 8:,A: 1≤: ≤ j퐷) j 퐷) Considering the shortcomings of existing work, we believe it is collected from user interaction history a unbiased uniform set with entries f¹D , 8 ,A ºg essential to develop a universal debiasing solution, which not only 퐷 ; ; ; 1≤; ≤ j퐷* j * collected with uniform logging policy accounts for multiple biases and their combinations, but also frees ? ¹D, 8, Aº the data distribution for which 퐷 collected human efforts to identify biases and tune the configurations. To ) ) ? ¹D, 8, Aº the ideal unbiased data distribution achieve this goal, we first review the common biases and debiasing * 5 ¹., .º RS model that maps a user-item pair into the prediction strategies, offering two important insights: (1) Although different \ !ˆ) ¹5 º the empirical risk of 5 on 퐷) biases have different origins and properties, they all can be formu- !¹5 º the true risk of 5 lated as the risk discrepancy between the empirical risk and the true the subspace of U × I × R with constraints: risk, resulting from the inconsistency between the distribution for (1 ?) ¹D, 8, :º ¡ 0,?* ¹D, 8, :º ¡ 0 which the training data is collected and the one used for unbiased the subspace of U × I × R with constraints: (0 test; (2) The success of most recent debiasing strategies can be at- ?) ¹D, 8, :º = 0,?* ¹D, 8, :º ¡ 0 tributed to their specific configurations to offset the discrepancy for model training. Based on the insights, we propose a general • Proposing a new method that leverages uniform data to learn debiasing framework by reducing the risk discrepancy, which sub- optimal debiasing strategy with theoretical guarantees. sumes most debiasing strategies — each strategy can be recovered • Conducting experiments on three types of data (explicit and by specifying the parameters of the framework. This framework implicit feedback, and simulated data of list feedback) to provides a valuable opportunity to develop a universal debiasing so- validate the effectiveness of our proposal. lution for recommendation — we can perform automatic debiasing by learning the debiasing parameters of the framework. 2 PRELIMINARY Now the question lies in how to optimize the debiasing parame- In this section, we formulate the recommendation task and review ters. Obviously, the biased training data lacks important signals of various biases in it from risk discrepancy perspective. how the data is biased and what the unbiased data looks like. To deal with this problem, we propose to leverage another uniform 2.1 Task Formulation data to supervise the learning of debiasing parameter. The uniform U data is assumed to be collected by a random logging policy [39], Suppose we have a recommender system with a user set and an I U I reflecting user preference in an unbiased way. We make full use item set . Let D (or 8) denotes a user (or an item) in (or ). Let 2 R of this important evidence, optimizing the debiasing parameters A denotes the feedback (e.g., rating values, clicks, and retention by minimizing the loss on the uniform data. Specifically, we for- time) given by a user to an item. The collected history behavior data 퐷 can be notated as a set of triplets f¹D , 8 ,A ºg mulate the process as a bi-level optimization problem, where the ) : : : 1≤: ≤ j퐷) j ¹ º debiasing parameters serve as the hyper-parameters for learning generated from an unknown distribution ?) D, 8, A over user-item- U × I × R the recommender model, and optimize debiasing parameters by label space . The task of a recommendation system meta-learning technique [14]. We conduct theoretical analyses on can be stated as follows: learning a recommendation model from the learning framework, proving that: (1) the optimum learned 퐷) so that it can capture user preference and make a high-quality ¹ º under such objective is approximate to the best case where biases recommendation.