Arxiv:2109.00527V1 [Cs.CL] 1 Sep 2021

Boosting Search Engines with Interactive Agents Leonard Adolphsy∗ Benjamin Boerschingerz Christian Buckz Michelle Chen Huebscherz Massimiliano Ciaramitaz Lasse Espeholtz Thomas Hofmanny Yannic Kilcher†∗ y ETH, Zurich z Google Research Abstract Can machines learn to use a search engine as an interactive tool for finding information? That would have far reaching consequences for making the world’s knowledge more accessible. This paper presents first steps in designing agents that learn meta-strategies for contextual query refinements. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results. We develop a novel way of generating synthetic search sessions, which leverages the power of transformer-based generative language models through (self-)supervised learning. We also present a reinforcement learning agent with dynamically constrained actions that can learn interactive search strategies completely from scratch. In both cases, we obtain significant improvements over one-shot search with a strong information retrieval baseline. Finally, we provide an in-depth analysis of the learned search policies. 1 Introduction Web search is the portal to a vast ecosystem of general and specialized knowledge, designed to support humans in their effort to seek relevant information and make well-informed decisions. Utilizing search as a tool is intuitive, and users quickly learn interactive, language-based search strategies characterized by sequential reasoning, exploration, and synthesis [Rutter et al., 2015, Russell, 2019]. The success of web search relies on machines learning human notions of relevance, but also on the users’ ability to (re-)formulate appropriate queries, grounded in a tacit understanding of strengths and limitations of search engines. Given recent breakthroughs in language models (LM) [Vaswani arXiv:2109.00527v1 [cs.CL] 1 Sep 2021 et al., 2017, Devlin et al., 2019, Brown et al., 2020] as well as reinforcement learning (RL) [Mnih et al., 2013, Silver et al., 2016, Berner et al., 2019] it seems timely to ask whether, and how, machine learning agents can be trained to interactively use search engines. However, the lack of expert search sessions puts supervised learning out of reach, and RL is often ineffective for solving complex natural language understanding (NLU) tasks. The feasibility of search agents hence remains an open question, which has inspired our research. We pursue a design philosophy in which search agents operate in structured action spaces defined as generative grammars, resulting in policies that are compositional, productive, and semantically transparent. Further domain knowledge is incorporated through the use of well-known models and algorithms from NLU and information retrieval (IR). Most notably, we develop a self-supervised learning scheme for generating high-quality search sessions by exploiting insights from relevance feedback [Rocchio, 1971]. We train a supervised LM search agent based on T5 [Raffel et al., 2020] directly on this data. We also build an RL search agent based on MuZero [Schrittwieser et al., 2020], which performs planning via rule-constrained Monte Carlo tree search and a learned dynamics model. ∗Work carried out in part during internships at Google. Created by joni from the Noun Project Created by joni Created by joni from the Noun Project from the Noun Project Figure 1: Schematic overview of the agent’s interaction with the search environment (BM25). Our evaluation is performed on a passage retrieval task for open-domain question answer- ing [Voorhees, 2000]. The T5 agent achieves a relative improvement of 95% in ranking performance over a Lucene BM25 system, running as the search engine. The improvement of the MuZero agent is somewhat lower at 73%; however, both agents learn diverse policies that lead to deep and effective explorations of the search results. We open-source the code and trained checkpoints for both agents.2 2 Self-Supervised Learning for Interactive Search It has been a powerful vision for more than 20 years to design search engines that are intuitive and simple to use. Despite their remarkable success, search engines are not perfect and may not yield the most relevant result(s) in one shot. This is particularly true for rare and intrinsically difficult queries, which may require interactive exploration by the user to be answered correctly and exhaustively. For instance, Russell [2019] observes that search engines are sensitive to local conditions and phrasing: if one asks Google “How many languages are there in India” the answer is 2, while Bing says 23 and Wolfram Alpha, 400.3 A comprehensive answer can be found by carefully refining the query, borrowing relevant terms (e.g., “dialects”) from the search results. By rephrasing the question as “how many languages or dialects are spoken in India” one gets a single answer clarifying that there are 2 official languages (Hindi and English), 22 major languages and 720 dialects. Contextual query refinement is a familiar technique [Jansen et al., 2009], even among children [Rutter et al., 2015], used to improve search by combining evidence from previous results and background knowledge [Huang and Efthimiadis, 2009]. Such refinements often rely on inspecting result snippets and titles or on skimming the content of top-ranked documents. This process is iterative and may be repeated to produce a sequence of queries q0; q1; : : : ; qT until (optimistically) a satisfactory answer is found. It seems natural to mimic this interactive process by a search agent, which learns the basic step of generating a follow-up query from previous queries and their search results. Furthermore, it is noteworthy that power users apply dedicated search operators and sophisticated investigative strategies to solve deep search puzzles [Russell, 2019]. In particular, unary operators for exact term matches (‘+’) and for term exclusions (‘-’) offer a great deal of fine-grained control and transparency and as such are highly effective in the hands of expert users to overcome limitations in document retrieval and ranking. As we show in this paper, these operators are also pivotal in designing interactive agents that learn to use search engines. 2.1 Result Aggregation for Contextual Refinement Inspired by human experts’ strategies, we suggest using a machine reader (MR, cf. [Rajpurkar et al., 2016]) with a passage scorer (PS) to rank the most relevant text snippets found so far. Specifically, we use a DPR reader [Karpukhin et al., 2020], which builds upon a pre-trained BERT [Devlin et al., 2019] model in order to identify the most promising answer span within each result document d and which also estimates the probability of d containing the (unspecified) answer P(d ANSWER q) [0; 1]. This probability can also be viewed as a PS-score that induces a calibrated ranking3 acrossj all2 result 2https://github.com/google-research/google-research/tree/master/muzero and https:// github.com/google-research/language/tree/master/language/search_agents. 3In Russell’s words, this question “needs research”; cf. searchresearch1.blogspot.com. 2 documents within a session. Moreover, the machine reader can extract a context window around the best answer in every document. The default setting in our experiments is to use the 5 top-ranked documents and extract a 70 token window centered at the answer predicted by the reader. In addition, we include the document titles, truncated to 10 tokens. Finally, the query tokens and refinements describing qt are also included. In total this leads to a segmented observation token sequence ot which is truncated to length 512. We then use pre-trained transformer-based language models (LM), BERT or T5, to produce an≤ embedding st from which the search agent will generate the next query. If we denote result sets for qt by t, then we get diagrammatically (see also Figure 1): D 2 3 q0; : : : ; qt MR/PS LM agent search engine o s q (1) 4 # 5 t t t+1 ;:::; |7−!{z } |7−!{z } |7−!{z } D0 Dt observation encoding generation We will focus on the case, where qt+1 is obtained from qt through augmentation. This may add a keyword w Σidx, the search index vocabulary, with the usual disjunctive search engine semantics or a structured2 search term formed by the use of unary operators (‘+’,‘-’) and fields (see below). 2.2 Rocchio Query Expansions In the absence of training sessions collected from human expert users, we propose to generate synthetic search sessions in a self-supervised manner, making use of a set of question-answer pairs (q; a), which – cf. §5.1 – can also be synthetic. We initialize q0 = q and aim to find a reasonably short sequence of query refinements that make progress towards identifying documents containing the correct answer (i.e. matching the string a). For this purpose, we make use of an NDCG reward function qt t rt [0; 1] (see §4 for details). A query is not further refined, if either t = 4 (maximal length)7! D or7! if no2 score increasing refinement can be found. To create candidate refinements, we make use of the idea of pseudo-relevance feedback as suggested in Rocchio [1971]. An elementary refinement – called a Rocchio expansion – then takes the form q τ α β q := q ∆q ; ∆q := [+ TITLE CONTENT ] w ; w Σ := Σ Σ Σ Σ (2) t+1 t t t j − j t t 2 t t [ t [ t [ t where Σt refers to a set of terms accessible to the agent. By that we mean terms that occur in the top PS-ranked session documents. We use superscripts to refer to the vocabulary of the question (q), titles (τ), answers (α) or bodies (β) of documents in ot. Note that adding terms Σt would make refinements difficult to reproduce for an agent and thus would provide supervision62 of low utility.

Load more