SPARTA: Efficient Open-Domain Question Answering Via Sparse Transformer Matching Retrieval

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval Tiancheng Zhao1, Xiaopeng Lu2 and Kyusong Lee1 SOCO Inc. 1ftianchez,[email protected] Language Technologies Institute, Carnegie Mellon University [email protected] Abstract able ranker (Chidambaram et al., 2018; Lee et al., 2018; Wang et al., 2018). End-to-end systems We introduce SPARTA, a novel neural retrieval method that shows great promise in have also been proposed to combine passage re- performance, generalization, and interpretabil- trieval and machine reading by directly retrieving ity for open-domain question answering. Un- answer span (Seo et al., 2019; Lee et al., 2019). like many neural ranking methods that use Despite of their differences, the above approaches dense vector nearest neighbor search, SPARTA are all built on top of the dual-encoder architec- learns a sparse representation that can be ef- ture, where query and answer are encoded into ficiently implemented as an Inverted Index. fixed-size dense vectors, and their relevance score The resulting representation enables scalable neural retrieval that does not require expen- is computed via dot products. Approximate nearest sive approximate vector search and leads to neighbor (ANN) search is then used to enable real- better performance than its dense counter- time retrieval for large dataset (Shrivastava and Li, part. We validated our approaches on 4 open- 2014). domain question answering (OpenQA) tasks In this paper, we argue that the dual-encoder and 11 retrieval question answering (ReQA) structure is far from ideal for open-domain QA tasks. SPARTA achieves new state-of-the-art retrieval. Recent research shows its limitations results across a variety of open-domain question answering tasks in both English and Chi- and suggests the importance of modeling complex nese datasets, including open SQuAD, Natural queries to answer interactions for strong QA per- Question, CMRC and etc. Analysis also con- formance. Seo et al.(2019) shows that their best firms that the proposed method creates human performing system underperforms the state-of-the- interpretable representation and allows flexi- art due to query-agnostic answer encoding and its ble control over the trade-off between perfor- over-simplified matching function. Humeau et al. 1 mance and efficiency. (2019) shows the trade-off between performance 1 Introduction and speed when moving from expressive cross- attention in BERT (Devlin et al., 2018) to simple in- Open-domain Question Answering (OpenQA) is ner product interaction for dialog response retrieval. the task of answering a question based on a Therefore, our key research goal is to develop new knowledge source. One promising approach a method that can simultaneously achieve expres- arXiv:2009.13013v1 [cs.CL] 28 Sep 2020 to solve OpenQA is Machine Reading at Scale sive query to answer interaction and fast inference (MRS) (Chen et al., 2017). MRS leverages an infor ranking. formation retrieval (IR) system to narrow down to We introduce SPARTA (Sparse Transformer a list of relevant passages and then uses a machine Matching), a novel neural ranking model. Unlike reading comprehension reader to extract the final existing work that relies on a sequence-level in- answer span. This approach, however, is bounded ner product, SPARTA uses token-level interaction by its pipeline nature since the first stage retriever between every query and answer token pair, lead- is not trainable and may return no passage that ing to superior retrieval performance. Concretely, contains the correct answer. SPARTA learns sparse answer representations that To address this problem, prior work has focused model the potential interaction between every query on replacing the first stage retriever with a train- term with the answer. The learned sparse an- 1Work done during Lu’s internship at SOCO. swer representation can be efficiently saved in an Inverted Index, e.g., Lucene (McCandless et al., machine readers to extract the final answer (Chen 2010), so that one can query a SPARTA index with et al., 2017). It needs two stages because all ex- almost the same speed as a standard search engine isting machine readers, for example, BERT-based and enjoy the more reliable ranking performance models (Devlin et al., 2018), are prohibitively slow without depending on GPU or ANN search. (BERT only processes a few thousands of words Experiments are conducted on two settings: per second with GPU acceleration). Many attempts OpenQA (Chen et al., 2017) that requires phrase- have been made to improve the first-stage retrieval level answers and retrieval QA (ReQA) that re- performance (Chidambaram et al., 2018; Seo et al., quires sentence-level answers (Ahmad et al., 2019). 2019; Henderson et al., 2019; Karpukhin et al., Our proposed SpartaQA system achieves new state- 2020; Chang et al., 2020). Yet, the information of-the-art results across 15 different domains and 2 retrieval (IR) community has shown that simple languages with significant performance gain, in- word embedding matching do not perform well cluding OpenSQuAD, Open Natural Questions, for ad-hoc document search compared to classic OpenCMRC and etc. methods (Guo et al., 2016; Xiong et al., 2017). Moreover, model analysis shows that SPARTA To increase the expressiveness of dual encoders, exhibits several desirable properties. First SPARTA Xiong et al.(2017) develops kernel function to shows strong domain generalization ability and learn soft matching score at token-level instead of achieves the best performance compared to both sequence-level. Humeau et al.(2019) proposes classic IR method and other learning methods in Poly-Encoders to enable more complex interac- low-resources domains. Second, SPARTA is sim- tions between the query and the answer by letting ple and efficient and achieves better performance one encoder output multiple vectors instead of one than many more sophisticated methods. Lastly, it vector. Dhingra et al.(2020) incorporates entity provides a human-readable representation that is vectors and multi-hop reasoning to teach systems easy to interpret. In short, the contributions of this to answer more complex questions. (Lee et al., work include: 2020) augments the dense answer representation with learned n-gram sparse feature from contex- • A novel ranking model SPARTA that offers tualized word embeddings, achieving significant token-level query-to-answer interaction and improvement compared to the dense-only baseline. enables efficient large-scale ranking. Chang et al.(2020) explores various unsupervised pretraining objectives to improve dual-encoders’ • New state-of-the-art experiment results on 11 QA performance in the low-resources setting. ReQA tasks and 4 OpenQA tasks in 2 lan- Unlike most of the existing work based-on dual- guages. encoders, we explore a different path where we • Detailed analyses that reveal insights about focus on learning sparse representation and em- the proposed methods, including generaliza- phasizes token-level interaction models instead of tion and computation efficiency. sequence-level. This paper is perhaps the most related to the sparse representations from (Lee et al., 2 Related Work 2020). However, the proposed approach is cate- gorically different in the following ways. (1) it is The classical approach for OpenQA depends on stand-alone and does not need augmentation with knowledge bases (KB)s that are manually or au- dense vectors while keeping superior performance tomatically curated, e.g., Freebase KB (Bollacker (2) our proposed model is architecturally simpler et al., 2008), NELL (Fader et al., 2014) etc. Seman- and is generative so that it will understand words tic parsing is used to understand the query and com- that not appear in the answer document, whereas putes the final answer (Berant et al., 2013; Berant the one developed at (Lee et al., 2020) only models and Liang, 2014). However, KB-based systems are n-grams appear in the document. often limited due to incompleteness in the KB and inflexibility to changes in schema (Ferrucci et al., 3 Proposed Method 2010). A more recent approach is to use text data di- 3.1 Problem Formulation rectly as a knowledge base. Dr.QA uses a search en- First, we formally define the problem of answer gine to filter to relevant documents and then applies ranking for question answering. Let q be the input Figure 1: SPARTA Neural Ranker computes token-level matching score via dot product. Each query terms’ contribution is first obtained via max-pooling and then pass through ReLU and log. The final score is the summation of each query term contribution. question, and A = f(a; c)g be a set of candidate at indexing time. Since it is an offline opera- answers. Each candidate answer is a tuple (a; c) tion, we can use the most powerful model for where a is the answer text and c is context infor- indexing and simplify the computation needed mation about a. The objective is to find model at inference. parameter θ that rank the correct answer as high as possible, .i.e: As shown in Figure1, a query is repre- sented as a sequence of tokens q = [t1; :::tjqj] ∗ ∗ θ = argmax E[pθ((a ; c )jq)] (1) and each answer is also a sequence of tokens θ2Θ (a; c) = [c1; ::a1; ::ajaj; ca+1; :::cjcj]. We use a non- contextualized embedding to encode the query to- This formulation is general and can cover many kens to e , and a contextualized transformer model tasks. For example, typical passage-level retrieval i to encode the answer and obtain contextualized systems sets the a to be the passage and leaves c token-level embedding s : empty (Chen et al., 2017; Yang et al., 2019a). The j sentence-level retrieval task proposed at sets a to be E(q) = [e ; :::e ] Query Embedding (2) each sentence in a text knowledge base and c to be 1 jqj the surrounding text (Ahmad et al., 2019).

SPARTA: Efficient Open-Domain Question Answering Via Sparse Transformer Matching Retrieval

Cal Poly Maintanence Request

API 101: Modern Technology for Creating Business Value a Guide to Building and Managing the Apis That Empower Today’S Organizations

Learning Graphql Query Costs

Epson Epiqvision™ Mini EF12 Streaming Laser Projector

Student Success Handbook / Table of Contents DR

Google Earth Studio GSA Short Course

UNITEDHEALTHCARE INSURANCE COMPANY Cal Poly San Luis Obispo

Google Earth for Dummies

Blender Modelling

(Iot) Device Network Traffic and Power Consumption

Transaction / Regular Paper Title

K–12 Instructor Brings the World to Mesquite, TX Students Through VR with Google Expeditions