An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures
Total Page:16
File Type:pdf, Size:1020Kb
An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures Zhucheng Tu, Ma Crane, Royal Sequiera, Junchen Zhang, and Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo, Ontario, Canada fmichael.tu,ma.crane,rdsequie,j345zhan,[email protected] ABSTRACT results. Recent eorts include the Lucene4IR1 workshop organized We explore dierent approaches to integrating a simple convolu- by Azzopardi et al. [3], the Lucene for Information Access and tional neural network (CNN) with the Lucene search engine in a Retrieval Research (LIARR) Workshop [2] at SIGIR 2017, and the multi-stage ranking architecture. Our models are trained using the Anserini IR toolkit built on top of Lucene [24]. PyTorch deep learning toolkit, which is implemented in C/C++ with Given the already substantial and growing interest in applying a Python frontend. One obvious integration strategy is to expose deep learning to information retrieval [11], it would make sense to the neural network directly as a service. For this, we use Apache integrate existing deep learning toolkits with open-source search ri, a soware framework for building scalable cross-language engines. In this paper, we explore dierent approaches to integrat- services. In exploring alternative architectures, we observe that ing a simple convolutional neural network (CNN) with the Lucene once trained, the feedforward evaluation of neural networks is search engine, evaluating alternatives in terms of performance quite straightforward. erefore, we can extract the parameters (latency and throughput) as well as ease of integration. of a trained CNN from PyTorch and import the model into Java, Fundamentally, neural network models (and even more broadly, taking advantage of the Java Deeplearning4J library for feedfor- learning-to-rank approaches) for a variety of information retrieval ward evaluation. is has the advantage that the entire end-to-end tasks behave as rerankers in a multi-stage ranking architecture [1, system can be implemented in Java. As a third approach, we can 5, 8, 11, 14, 18, 22]. Given a user query, systems typically begin extract the neural network from PyTorch and “compile” it into a with a document retrieval stage, where a standard ranking function C++ program that exposes a ri service. We evaluate these alter- such as BM25 is used to generate a set of candidates. ese are then natives in terms of performance (latency and throughput) as well passed to one or more rerankers to generate the nal results. In as ease of integration. Experiments show that feedforward evalu- this paper, we focus on the question answering task, although the ation of the convolutional neural network is signicantly slower architecture is exactly the same: we consider a standard pipeline in Java, while the performance of the compiled C++ network does architecture [17] where the natural language question is rst used not consistently beat the PyTorch implementation. as a query to retrieve a set of candidate documents, which are then segmented into sentences and rescored with a convolutional neural network (CNN). Viewed in isolation, this nal stage is commonly 1 INTRODUCTION referred to as answer selection. We explored three dierent approaches to integrating a CNN As an empirical discipline, information retrieval research requires for answer selection with Lucene in the context of an end-to-end substantial soware infrastructure to index and search large collec- question answering system. e fundamental challenge we tackle tions. To address this challenge, many academic research groups is that Lucene is implemented in Java, whereas many deep learning have built and shared open-source search engines with the broader toolkits (including PyTorch, the one we use) are wrien in C/C++ community—prominent examples include Lemur/Indri [9, 10] and with a Python frontend. We desire a seamless yet high-performance Terrier [7, 13]. ese systems, however, are relatively unknown way to interface between components in dierent programming lan- beyond academic circles. With the exception of a small number of guages; in this respect, our work shares similar goals as Pyndri [19] arXiv:1707.08275v1 [cs.IR] 26 Jul 2017 companies (e.g., commercial web search engines), the open-source and Luandri [12]. Lucene search engine (and its derivatives such as Solr and Elas- Overall, our integration eorts proceeded along the following ticsearch) have become the de facto platform for deploying search line of reasoning: applications in industry. • ere is a recent push in the information retrieval community Since we are using the PyTorch deep learning toolkit to train our to adopt Lucene as the research toolkit of choice. is would lead convolutional neural networks, it makes sense to simply expose the network as a service. For this, we use Apache ri, a so- to a beer alignment of information retrieval research with the 2 practice of building search applications, hopefully leading to richer ware framework for building scalable cross-language services. • academic–industrial collaborations, more ecient knowledge trans- e complexities of deep learning toolkits lie mostly in train- fer of research innovations, and greater reproducibility of research ing neural networks. Once trained, the feedforward evaluation of neural networks is quite straightforward and can be com- pletely decoupled from backpropagation training. erefore, we SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR’17), August 7–11, 2017, Shinjuku, Tokyo, Japan 1hps://sites.google.com/site/lucene4ir/home © 2017 Copyright held by the owner/author(s). 2hps://thri.apache.org/ convolution pooled join ! hidden ! sentence matrix softmax! feature maps ! representation! layer! layer! namespace py qa service QuestionAnswering { ! double getScore(1:string question, 2:string answer) xd Fd } document Figure 2: ri IDL for a service that reranks question- The! cat! sat! on! the! mat! answer pairs. ! xq Fq query of words »w1;w2;::: w jS j¼, each of which is translated into its cor- responding distributional vector (i.e., from a word embedding), Where! was! the! cat! ?! additional ! features xfeat yielding a sentence matrix. Convolutional feature maps are applied to this sentence matrix, followed by ReLU activation and simple Figure 1: e overall architecture of our convolutional neu- max-pooling, to arrive at a representation vector xq for the question ral network for answer selection. and xd for the “document” (i.e., candidate answer sentence). At the join layer (see Figure 1), all intermediate representations explored an approach where we extract the parameters of the are concatenated into a single vector: trained CNN from PyTorch and imported the model into Java, us- » T T T ¼ xjoin = xq ; xd ; xfeat (1) ing the Deeplearning4J library for feedforward evaluation. is has the advantage of language uniformity—the entire end-to-end e nal component of the input vector at the join layer consists system can be implemented in Java. of “extra features” xfeat derived from four word overlap measures • e ability to decouple neural network training from deployment between the question and the candidate sentence: word overlap introduces additional options for integration. We explored an and idf-weighted word overlap between all words and only non- approach where we directly “compile” our convolutional neural stopwords. network into a C++ program that also exposes a ri service. Our model is implemented using the PyTorch deep learning toolkit based on a reproducibility study of Severyn and Moschii’s Experiments show that feedforward evaluation of the convolutional model [16] by Rao et al. [15] using the Torch deep learning toolkit neural network is signicantly slower in Java, which suggests that (in Lua). Our network conguration uses the best seing, as deter- Deeplearning4J is not yet as mature and optimized as other toolkits, mined by Rao et al. via ablation analyses. Specically, they found at least when used directly “out-of-the-box”. Note that PyTorch that the bilinear similarity component actually decreases eective- presents a Python frontend, but the backend takes advantage of ness, and therefore is not included in our model. highly-optimized C libraries. e relative performance of our C++ For our experiments, the CNN was trained using the popular implementation and the PyTorch implementation is not consistent TrecQA dataset, rst introduced by Wang et al. [23] and further across dierent processors and operating systems, and thus it re- elaborated by Yao et al. [25]. Ultimately, the data derive from the mains to be seen if our “network compilation” approach can yield estion Answering Tracks from TREC 8–13 [20, 21]. performance gains. 2 BACKGROUND 3 METHODOLOGY In this paper, we explore a very simple multi-stage architecture 3.1 PyTorch ri for question answering that consists of a document retrieval and Given that our convolutional neural network for answer selection answer selection stage. An input natural language question is used (as described in the previous section) is implemented and trained as a bag-of-words query to retrieve h documents from the collection. in PyTorch, the most obvious architecture for integrating disparate ese documents are segmented into sentences, which are treated components is to expose the neural network model as a service. as candidates that feed into the answer selection module. is answer selection service can then be called, for example, from Given a questionq and a candidate set of sentences fc1;c2;::: cn g, a Java client that integrates directly with Lucene. For building this the answer selection task is to identify sentences that contain the service, we take advantage of Apache ri. answer. For this task, we implemented the convolutional neural Apache ri is a widely-deployed framework in industry for network (CNN) shown in Figure 1, which is a slightly simplied building cross-language services. From an interface denition lan- version of the model proposed by Severyn and Moschii [16]. guage (IDL), ri can automatically generate server and client We chose to work with this particular CNN for several reasons.