An Attention-Based Deep Net for Learning to Rank

An Attention-Based Deep Net for Learning to Rank Baiyang Wang 1 Diego Klabjan 1 Abstract the query and the search result, which is particularly suitable In information retrieval, learning to rank con- for image retrieval. Usually the mechanism is represented structs a machine-based ranking model which by a similarity matrix which outputs a bilinear form as given a query, sorts the search results by their the ranking priority score; for instance, such a structure is degree of relevance or importance to the query. applied in (Severyn & Moschitti, 2015). Neural networks have been successfully applied We postulate that it could be beneficial to apply multiple to this problem, and in this paper, we propose an embeddings of the queries and search results to a learning attention-based deep neural network which better to rank model. It has already been observed that for train- incorporates different embeddings of the queries ing images, applying a committee of convolutional neural and search results with an attention-based mecha- nets improves digit and character recognition (Ciresan et al., nism. This model also applies a decoder mecha- 2011; Meier et al., 2011). From such an approach, the ran- nism to learn the ranks of the search results in a domness of the architecture of a single neural network can listwise fashion. The embeddings are trained with be effectively reduced. For training text data, combining convolutional neural networks or the word2vec different techniques such as tf-idf, latent Dirichlet alloca- model. We demonstrate the performance of this tion (LDA) (Blei et al., 2003), or word2vec (Mikolov et al., model with image retrieval and text querying data 2013), has also been explored by Das et al. (2015). This sets. is due to the fact that it is relatively hard to judge different models a priori. However, we have seen no literature on designing a mechanism to incorporate different embeddings 1. Introduction for ranking. We hypothesize that applying multiple embed- Learning to rank applies supervised or semi-supervised ma- dings to a ranking neural network can improve the accuracy chine learning to construct ranking models for information not only in terms of “averaging out” the error, but it can retrieval problems. In learning to rank, a query is given and also provide a more robust solution compared to applying a a number of search results are to be ranked by their relevant single embedding. importance given the query. Many problems in information For learning to rank, we propose the application of the atten- retrieval can be formulated or partially solved by learning tion mechanism (Bahdanau et al., 2015; Cho et al., 2015), to rank. In learning to rank, there are typically three ap- which is demonstrated to be successful in focusing on dif- proaches: the pointwise, pairwise, and listwise approaches ferent aspects of the input so that it can incorporate distinct (Liu, 2011). The pointwise approach assigns an importance features. It incorporates different embeddings with weights score to each pair of query and search result. The pairwise changing over time, derived from a recurrent neural network approach discerns which search result is more relevant for (RNN) structure. Thus, it can help us better summarize in- arXiv:1702.06106v3 [cs.LG] 10 Dec 2017 a certain query and a pair of search results. The listwise formation from the query and search results. We also apply approach outputs the ranks for all search results given a a decoder mechanism to rank all the search results, which specific query, therefore being the most general. provides a flexible list-wise ranking approach that can be ap- For learning to rank, neural networks are known to enjoy plied to both image retrieval and text querying. Our model a success. Generally in such models, neural networks are has the following contributions: (1) it applies the attention applied to model the ranking probabilities with the features mechanism to listwise learning to rank problems, which of queries and search results as the input. For instance, we think is novel in the learning to rank literature; (2) it RankNet (Burges et al., 2005) applies a neural network to takes different embeddings of queries and search results into calculate a probability for any search result being more account, incorporating them with the attention mechanism; relevant compared to another. Each pair of query and search (3) double attention mechanisms are applied to both queries result is combined into a feature vector, which is the input of and search results. the neural network, and a ranking priority score is the output. Section 2 reviews RankNet, similarity matching, and the Another approach learns the matching mechanism between An Attention-Based Deep Net for Learning to Rank attention mechanism in details. Section 3 constructs the applied deep convolutional neural nets together with the OA- attention-based deep net for ranking, and discusses how to SIS algorithm (Chechik et al., 2009) for similarity learning. calibrate the model. Section 4 demonstrates the performance Still, our approach is different from them in that we apply of our model on image retrieval and text querying data the attention mechanism, and develop an approach allowing sets. Section 5 discusses about potential future research and both image and text queries. concludes the paper. We explain the idea of similarity matching as follows. We take a triplet (q; r; r0) into account, where q denotes an 2. Literature Review embedding, i.e. vectorized feature representation of a query, and (r; r0) denotes the embeddings of two search results. A To begin with, for RankNet, each pair of query and search similarity function is defined as follows, result is turned into a feature vector. For two feature vectors d0 0 d0 T x0 2 R and x0 2 R sharing the same query, we define SW (q; r) = q W r; (3) 0 x0 ≺ x0 if the search result associated with x0 is ranked 0 and apparently r ≺ r0 if and only if S (q; r) > S (q; r0). before that with x0, and vice versa. For x0, W W ( Note that we may create multiple deep convolutional nets x = f(W x + b ) 2 d1 ; 1 0 0 0 R (1) so that we obtain multiple embeddings for the queries and d2 x2 = f(W1x1 + b1) 2 R = R; search results. Therefore, it is a question how to incorporate them together. The attention mechanism weighs the embed- 0 and similarly for x0. Here Wl is a dl+1 × dl weight matrix, dings with different sets of weights for each state t, which d and bl 2 R l+1 is a bias for l = 0; 1. Function f is an are derived with a recurrent neural network (RNN) from element-wise nonlinear activation function; for instance, it t = 1 to t = T . Therefore, for each state t, the different can be the sigmoid function σ(u) = eu=(1 + eu). Then for embeddings can be “attended” differently by the attention RankNet, the ranking probability is defined as follows, mechanism, thus making the model more flexible. This model has been successfully applied to various problems. 0 0 0 x2−x2 x2−x2 P (x0 ≺ x0) = e =(1 + e ): (2) For instance, Bahdanau et al. (2015) applied it to neural machine translation with a bidirectional recurrent neural net- Therefore the ranking priority of two search results can be work. Cho et al. (2015) further applied it to image caption determined with a two-layer neural network structure, offer- and video description generation with convolutional neural ing a pairwise approach. A deeper application of RankNet nets. Vinyals et al. (2015) applied it for solving combinato- can be found in (Song et al., 2014), where a five-layer rial problems with the sequence-to-sequence paradigm. RankNet is proposed, and each data example is weighed Note that in our scenario, the ranking process, i.e. sorting differently for each user in order to adapt to personalized the search results from the most related one to the least search. A global model is first trained with the training data, related one for a query, can be modeled by different “states.” and then a different regularized model is adapted for each Thus, the attention mechanism helps incorporating different user with a validation data set. embeddings along with the ranking process, therefore pro- A number of models similar to RankNet has been proposed. viding a listwise approach. Below we explain our model in For instance, LambdaRank (Burges et al., 2006) speeds more details. up RankNet by altering the cost function according to the change in NDCG caused by swapping search results. Lamb- 3. Model and Algorithm daMART (Burges, 2010) applies the boosted tree algorithm to LambdaRank. Ranking SVM (Joachims, 2002) applies 3.1. Introduction to the Model the support vector machine to pairs of search results. Addi- Both queries and search results can be embedded with neural tional models such as ListNet (Cao et al., 2007) and FRank networks. Given an input vector x representing a query or (Tsai et al., 2007) can be found in the summary of Liu 0 a search result, we denote the l-th layer in a neural net as (2011). d xl 2 R l , l = 0; 1;:::;L. We have the following relation However, we are different from the above models not only because we integrate different embeddings with the atten- xl+1 = f(Wlxl + bl); l = 0; 1;:::;L − 1: (4) tion mechanism, but also because we learn the matching d where Wl is a dl+1 × dl weight matrix, bl 2 R l+1 is the mechanism between a query and search results with a sim- bias, and f is a nonlinear activation function.

Load more