Arxiv:2105.06232V1 [Cs.CL] 13 May 2021

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters Yan Xu*, Etsuko Ishii*, Samuel Cahyawijaya, Zihan Liu, Genta Indra Winata, Andrea Madotto, Dan Su, Pascale Fung Center for Artificial Intelligence Research (CAiRE) The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong fyxucb,[email protected], [email protected] Abstract I want to visit Tokyo and get a night view from Tokyo Tower. To diversify and enrich generated dialogue responses, knowledge-grounded dialogue has been investigated in recent Oh, isn't it better from Tokyo Skytree, the tallest tower? years. The existing methods tackle the knowledge ground- 1. Tokyo Tower is the second tallest ing challenge by retrieving the relevant sentences over a structure in Japan, behind Tokyo large corpus and augmenting the dialogues with explicit ex- Skytree. 2. Tokyo Tower is painted in white tra information. Despite their success, however, the exist- Existing Natural Language Knowledge and orange to comply with air ing works have drawbacks on the inference efficiency. This Works Corpus safety regulations. Generation paper proposes KnowExpert, an end-to-end framework 3. Tokyo Tower has transmission Retrieval antennae used for radio and to bypass the explicit retrieval process and inject knowl- television broadcasting. edge into the pre-trained language models with lightweight Knowledge Selction adapters and adapt to the knowledge-grounded dialogue task. To the best of our knowledge, this is the first attempt to Natural Language tackle this challenge without retrieval in this task under an Ours open-domain chit-chat scenario. The experimental results Generation show that KnowExpert performs comparably with some retrieval-based baselines while being time-efficient in inference, demonstrating the potential of our proposed direction. Knowledge Expert Weighting Introduction Figure 1: KnowExpert is a retrieval-free dialogue generation framework. Unlike the existing works, The open-domain conversational agents have been rapidly KnowExpert leverages knowledge stored in model progressing in recent years. Although such models are ca- parameters of knowledge experts by weighting them based pable of producing grammatically correct and natural re- on the topic distribution over dialogue history. sponses based on the conversation history, conversations with these systems still present an apparent gap compared to human-to-human conversation. One primary reason is that existing dialogue systems are lacking understanding of cer- able responses. In this scheme, knowledge retrieval and tain knowledge and thus cannot involve in deeper conversa- knowledge selection are considered to make up the knowl- tions with humans when they dive into a specific topic (Li edge conceptualization process. et al. 2020). To bridge this gap, many research works de- Despite having a major progression and a great success velop knowledge-grounded conversational agents (Dinan arXiv:2105.06232v2 [cs.CL] 15 Sep 2021 in achieving promising performances on the correspond- et al. 2019; Chen et al. 2020; Zhao et al. 2020). ing task, this retrieval-based approach has drawbacks on As it is shown in Figure 1, many recent works (Lian et al. its efficiency: First, knowledge retrieval in corpora needs 2019; Kim, Ahn, and Kim 2019; Roller et al. 2020; Chen a model to search over a large amount of data, which re- et al. 2020; Zhao et al. 2020) develop the solutions com- quires considerable memory resources to store the whole prising: (1) Knowledge Retrieval, for retrieving the related knowledge corpus, and additional processing time to retrieve knowledge sentences from a large corpus (e.g., Wikipedia); knowledge and conduct further knowledge selection; Sec- 2) Knowledge Selection, for selecting the most relevant ond, adding knowledge as additional context to the language knowledge sentences for generation; and 3) Knowledge- generation model also cause significant computation over- augmented Generation, for augmenting the retrieved knowl- head, which slows down the language generation process. edge and conversation history to generate more knowledge- Efficiency plays an essential role in the practical use of di- *These authors contributed equally. alogue systems, and it is necessary to generate responses Copyright © 2022, Association for the Advancement of Artificial faster by requiring fewer resources to support more active Intelligence (www.aaai.org). All rights reserved. users. + Output Topic Topic + 1 Modeling Modeling Training + Feed-forward Up Projection Knowledge ReLU 2 Experts GPT-2 Layer Training Feed-forward GPT-2 Layer Down Projection Topic Modeling Layer Norm Task 3 GPT-2 Layer Adaptation Input Adapter Usr Sys Usr Token type Figure 2: High-level architecture of the model. Taking a dialogue history as an Figure 3: Illustration of the training pro- input, the adapters are inserted upon the GPT-2 layers, acting as the knowledge cedure, where the thick lined modules are experts, to enhance the response generation with the help from a topic model trained while the rest (dash lined) kept which assigns weights over the knowledge experts. frozen in each training step. Recently, large pre-trained language models (LMs) have the relevance of the dialogue sample and the clusters. We been shown to have the capability to carry implicit knowl- thus fine-tune LM layers where frozen pre-trained adapters edge (Wang et al. 2020; Lauscher et al. 2020), which can be are inserted for task adaptation. Experimental results show further applied to downstream classification tasks (Shwartz that KnowExpertperforms comparably with some strong et al. 2020). Many existing works have proved that the retrieval-based baselines, while the inference process of “knowledge” could be embedded in the pre-training pro- KnowExpertis much more efficient since extra knowledge cess (Brown et al. 2020). The explorations on closed-book sentences are not required as one of the components of the QA task (Petroni et al. 2019; Roberts, Raffel, and Shazeer inputs. 2020; Wang, Liu, and Zhang 2021) with large pre-train LMs Our contributions are three-fold: (1) to the best of our also indicate the potential of leveraging the knowledge em- knowledge, we are the first to explore injecting prior knowl- bedded inside the LMs. For task-oriented dialogue systems, edge to generative models for knowledge-grounded dialogue Madotto et al. (2020) stores the knowledge bases (KBs) with generation task under open-domain chit-chat scenario; (2) different sizes directly into the model parameters by align- our model conduct knowledge conceptualization without an ing auto-extracted dialogue templates with the correspond- explicit knowledge retrieval process, which has constant in- ing KBs for each data sample. Our scenario is different from ference time regardless of the size of the knowledge corpus; those mentioned above. For both closed-book QA tasks and and (3) our model performs comparably with some strong task-oriented dialogue tasks, given a question or user query, baselines and shows the potential of purely generation meth- relevant knowledge choices are highly constrained by the ods in the task. inputs. However, open-domain chit-chat suffers much from the one-to-many issue between the inputs and possible out- Related Work puts. In other words, given the inputs on a specific topic, the choice of knowledge candidates is varying, which brings Knowledge-Grounded Dialogue new challenges to embed knowledge in this task. The knowledge-grounded dialogue task requires the models Inspired by these explorations, we propose to tackle the to identify relevant knowledge sentences from a large cor- knowledge-grounded dialogue challenge by using the im- pus for grounding the response generation. IR systems, such plicit knowledge in LMs, to serve as the knowledge con- as TF-IDF, can retrieve related knowledge over a large cor- ceptualization process under the open-domain chit-chat sce- pus quickly. However, the effectiveness of the IR systems is nario. Compared to the existing work scheme shown in Fig- the bottleneck. They are only leveraged to retrieve several ure 1, we bypass the retrieval step and propose an end- relevant documents coarsely. However, providing the mod- to-end framework, KnowExpert, to inject the knowledge els with more documents may not improve the system since corpus into the memory of pre-trained LMs and incorpo- it will also bring more noise into the inputs. What is more, rate the learned knowledge, taking advantage of latent top- the length of packed inputs could exceed the length limita- ics, for knowledge-grounded dialogue generation. In the tion of the LMs. Thus, the existing works still conduct fur- model, lightweight adapters (Bapna and Firat 2019) are at- ther fine-grained knowledge selection to improve the accu- tached in the pre-trained GPT-2 (Radford et al. 2019), act- racy of the knowledge retrieval process, which is one of the ing as knowledge experts. The knowledge sentences are critical problems in the knowledge-grounded dialogue task. embedded into different knowledge experts by pseudo- Motivated by this, latent variables have been introduced to conversation style training, while the latent topics measure minimize the gap between prior and the posterior knowl- Wikipedia is a free, multilingual Wikipedia is a free, multilingual Wikipedia online encyclopedia. online encyclopedia. [1] Wikipedia is a free, multilingual It is written and maintained by As of 2021, it was ranked as the online encyclopedia. [2] It is written a community of volunteer 13th most popular site. and maintained by a community of contributions. volunteer contributors. [3] Wikipedia Wikipedia is the largest and most- is the largest and most-read reference Wikipedia is the largest and most- read reference work in history. work in history. [4] It is consistently read reference work in history. Usr Sys Usr Sys one of the 15 most popular web- It is consistently one of the most It is consistently one of the most sites as ranked by Alexa. [5] As of popular websites as ranked by Alexa. popular websites as ranked by Alexa. 2021, it was ranked as the 13th most It is written and maintained by popular site.

Load more