Statistical Analysis of Bayes Optimal Subset Ranking

1 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock Yahoo Inc., Santa Clara, CA, USA [email protected] Tong Zhang Rutgers University, NJ, USA [email protected] Abstract—The ranking problem has become increasingly asked to choose a few items a user is most likely important in modern applications of statistical methods to buy based on the user’s profile and buying history. in automated decision making systems. In particular, we The selected items will then be presented to the user consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG as recommendations. Another important example that (discounted cumulated gain) criterion that measures the affects millions of people everyday is the internet search quality of items near the top of the rank-list. Similar to problem, where the user presents a query to the search error minimization for binary classification, direct opti- engine, and the search engine then selects a few web- mization of natural ranking criteria such as DCG leads to pages that are most relevant to the query from the whole a non-convex optimization problems that can be NP-hard. Therefore a computationally more tractable approach is web. The quality of a search engine is largely determined needed. We present bounds that relate the approximate by the top-ranked, or highest ranked results the search optimization of DCG to the approximate minimization of engine can display on the first page. Internet search is certain regression errors. These bounds justify the use the main motivation of this theoretical study, although of convex learning formulations for solving the subset the model presented here can be useful for many other ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in applications. For example, another ranking problem is ad the top-portion of the rank-list. We further investigate the placement in a web-page (either search result, or some asymptotic statistical behavior of these formulations. Under content page) according to revenue-generating potential. appropriate conditions, the consistency of the estimation Although there has been much theoretical investiga- schemes with respect to the DCG metric can be derived. tion of the ranking problem in recent years, many authors have only considered the global ranking problem, where I. INTRODUCTION a single ranking function is used to order a fixed set of items. However, in web-search, a different ranking of We consider the general ranking problem, where a web-pages is needed for each different query. That is, computer system is required to rank a set of items based we want to find a ranking function that is conditioned on a given input. In such applications, the system often on (or local to) the query. We may look at the problem needs to present only a few top ranked items to the user. from an equivalent point of view: in web-search, instead Therefore the quality of the system output is determined of considering many different ranking functions of web- by the performance near the top of its rank-list. pages (one for each query), we may consider a single Ranking is especially important in electronic com- ranking function that depends on the (query,web-page) merce and many internet applications, where person- pair. Given a query q, we only need to rank the subset alization and information based decision making are of all possible (query,web-page) pairs restricted to query critical to the success of such businesses. The decision q. Moreover, we may have a pre-processing step to making process can often be posed as a problem of filter out documents unlikely to be relevant to the query selecting top candidates from a set of potential alter- q, so that the subset of (query,web-page) pairs to be natives, leading to a conditional ranking problem. For ranked is further reduced. We call this framework subset example, in a recommender system, the computer is ranking. A formal mathematical definition will be given Partially supported by NSF grant DMS-0706805. Part of the work in Section III. was done when the second author was at Yahoo Inc. For web-search (and many other ranking problems), we are only interested in the quality of the top choices; totic behavior of regression learning, where we focus the evaluation of the system output is different from on the L2-regularization approach. Together with earlier many traditional error metrics such as classification error. theoretical results, we can establish the consistency of In this setting, a useful figure of merit should focus regression based ranking under appropriate conditions. on the top portion of the rank-list. This characteristic of ranking problems has not been carefully explored in II. RANKING AND PAIR-WISE PREFERENCE earlier studies (except for a recent paper [24], which also LEARNING touched on this issue). The purpose of this paper is to develop some theoretical results for converting a ranking In the standard (global) ranking problem, a set of problem into a convex optimization problem that can be items is ranked relative to each other according to a efficiently solved. The resulting formulation focuses on single criterion. The goal is to learn a ranking (or linear the quality of the top ranked results. The theory can be ordering) for all items from a small set of training items regarded as an extension of related theory for convex risk (with partially defined preference relations among them), minimization formulations for classification, which has so that the remaining items can be ranked according to drawn much attention recently in the statistical learning the same criterion. However, as we have explained in the literature [3], [18], [26], [27], [30], [31]. introduction, this paper considers subset ranking, where only a subset of items are ranked according to an input Due to our motivation from web-search, this paper q (representing the query in the web-search example). focuses on ranking problems that measure quality only Since our motivation is document retrieval, we will in the top portion of the rank-list. However, it is im- follow the convention of using q to represent query and portant to note that in some other applications, global p to represent pages to be retrieved. Further discussion ranking criteria such as Spearman rank correlation and on ranking in the document retrieval domain and some Kendall’s τ metric are used. Such global metrics have mathematical formulations can be found in [17], [21], been investigated in a number of papers in recent years. although we use different notations here. We organize the paper as follows. Section II discusses In the context of document retrieval, we may consider earlier work in statistics and machine learning on global ranking as a prediction problem. The traditional predic- and pair-wise ranking. Section III introduces the subset tion problem in statistical machine learning assumes that ranking problem. We define two ranking metrics: one is we observe an input vector q ∈ Q, so as to predict the DCG measure which we focus on in this paper, and an unobserved output p ∈ P. However, in a ranking the other is a measure that counts the number of correctly problem, if we assume P = {1, . , m} contains m ranked pairs. The latter has been studied recently by possible values, then instead of predicting a value in P, several authors in the context of pair-wise preference we predict a permutation of P that gives an optimal learning. The order induced by the regression function ordering of P. That is, if we denote by P! the set of is shown to be optimal with respect to both metrics. permutations of P, then the goal is to predict an output Section IV introduces some basic estimation methods in P!. There are two fundamental issues: first, how to for ranking. Although this paper focuses on the least measure the quality of ranking; second, how to learn a squares regression based formulation, we also briefly good ranking procedure from historical data. discuss other possible loss functions approximating the At first glance, it may seem that we can simply cast optimal order here. The later sections develop bounds the ranking problem as an ordinary prediction problem and consistency arguments for a faster convergence rate where the output space becomes P!. However, the num- to optimal ranking than is possible with naive (uni- ber of permutations in P! is m!, which can be extremely form) regression. Section V contains the main theo- large even for small m. Therefore it is not practical retical results in this paper, where we show that the to solve the ranking problem directly without imposing approximate minimization of certain regression errors certain structures on the search space. Moreover, in leads to the approximate optimization of the ranking practice, given a training point q ∈ Q, we are generally metrics defined earlier. This implies that asymptotically not given an optimal permutation in P! as the observed the non-convex ranking problem can be solved using output. Instead, we may observe another form of output regression methods that are convex. Section VI presents from which an optimal ranking can be inferred, but it the regression learning formulation derived from the may also contain extra information. For example, in theoretical results in Section V. Similar methods are web-search, we may observe relevance score or click- currently used to optimize Yahoo’s production search en- through-rate of a web-page with respect to a query, either gine. Section VII studies the generalization and asymp- using human editorial judgment, or from search-logs. 2 The training procedure should take advantage of such A number of researchers have proposed theoretical information.

Statistical Analysis of Bayes Optimal Subset Ranking

Demand STAR Ranking Methodology

Ranking of Genes, Snvs, and Sequence Regions Ranking Elements Within Various Types of Biosets for Metaanalysis of Genetic Data

Conduct and Interpret a Mann-Whitney U-Test

Different Perspectives for Assigning Weights to Determinants of Health

Learning to Combine Multiple Ranking Metrics for Fault Localization Jifeng Xuan, Martin Monperrus

Pairwise Versus Pointwise Ranking: a Case Study

Power Comparisons of the Mann-Whitney U and Permutation Tests

Practice of Epidemiology a Family Longevity Selection Score

Ranking Spatial Areas by Risk of Cancer: Modelling in Epidemiological Surveillance

DETECTION MONITORING TESTS Unified Guidance

1 Exercise 8: an Introduction to Descriptive and Nonparametric

Research Techniques Made Simple: Network Meta-Analysis