Arxiv:2106.11575V1 [Cs.CL] 22 Jun 2021 Mans in a Dialogue
Total Page:16
File Type:pdf, Size:1020Kb
Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering Gangwoo Kim Hyunjae Kim Jungsoo Park Jaewoo Kangy Korea University fgangwoo kim,[email protected] fjungsoo park,[email protected] Conversation History Abstract Title : Leonardo da Vinci One of the main challenges in conversational q Who were his pupils? question answering (CQA) is to resolve the 1 conversational dependency, such as anaphora and ellipsis. However, existing approaches a1 with his pupils Salai and Melzi. do not explicitly train QA models on how to resolve the dependency, and thus these mod- q2 Was he close to his pupils? els are limited in understanding human dia- logues. In this paper, we propose a novel a2 Leonardo's most intimate relationships framework, EXCORD(Explicit guidance on how to resolve Conversational Dependency) q Was he close with anyone else? to enhance the abilities of QA models in com- 3 prehending conversational context. EXCORD Question Rewrite first generates self-contained questions that can be understood without the conversation Self-contained Q history, then trains a QA model with the pairs Was Leonardo da Vinci close with anyone else of original and self-contained questions using q3 other than his pupils Salai and Melzi? a consistency-based regularizer. In our exper- iments, we demonstrate that EXCORD signifi- cantly improves the QA models’ performance Figure 1: An example of the QuAC dataset (Choi et al., by up to 1.2 F1 on QuAC (Choi et al., 2018), 2018). Owing to linguistic phenomena in human con- and 5.2 F1 on CANARD (Elgohary et al., versations, such as anaphora and ellipsis, the current 2019), while addressing the limitations of the question q3 should be understood based on the conver- existing approaches.1 sation history: q1, a1, q2, and a2. Question q3 can be reformulated as a self-contained question q~3 via a ques- 1 Introduction tion rewriting (QR) process. Conversational question answering (CQA) involves the current question “Was he close with anyone modeling the information-seeking process of hu- else?,” a model should resolve the conversational arXiv:2106.11575v1 [cs.CL] 22 Jun 2021 mans in a dialogue. Unlike single-turn question dependency, such as anaphora and ellipsis, based answering (QA) tasks (Rajpurkar et al., 2016; on the conversation history. Kwiatkowski et al., 2019), CQA is a multi-turn A line of research in CQA proposes the end-to- QA task, where questions in a dialogue are context- end approach, where a single QA model jointly 2 dependent; hence they need to be understood with encodes the evidence document, the current ques- the conversation history (Choi et al., 2018; Reddy tion, and the whole conversation history (Huang et al., 2019). As illustrated in Figure1, to answer et al., 2018; Yeh and Chen, 2019; Qu et al., 2019a). y Corresponding author In this approach, models are required to automati- 1Our models and code are available at: cally learn to resolve conversational dependencies. https://github.com/dmis-lab/excord However, existing models have limitations to do so 2While the term “context” usually refers to the evidence document from which the answer is extracted, in CQA, it without explicit guidance on how to resolve these refers to conversational context. dependencies. In the example presented in Figure 1, models are trained without explicit signals that on various CQA datasets in future work. We sum- “he” refers to “Leonardo da Vinci,” and “anyone marize the contributions of this work as follows: else” can be more elaborated with “other than his pupils, Salai and Melzi.” • We identify the limitations of previous ap- proaches and propose a unified framework Another line of research proposes a pipeline ap- to address these. Our novel framework im- proach that decomposes the CQA task into question proves QA models by incorporating QR mod- rewriting (QR) and QA, to reduce the complexity els, while reducing the reliance on them. of the task (Vakulenko et al., 2020). Based on the conversation history, QR models first generate self- • Our framework encourages QA models to contained questions by rewriting the original ques- learn how to resolve the conversational de- tions, such that the self-contained questions can be pendency via consistency regularization. To understood without the conversation history. For the best of our knowledge, our work is the first instance, the current question q3 is reformulated as to apply the consistency training framework the self-contained question q~3 by a QR model in to the CQA task. Figure1. After rewriting the question, QA models are asked to answer the self-contained questions • We demonstrate the effectiveness of our rather than the original questions. In this approach, framework on three CQA benchmarks. Our QA models are trained to answer relatively simple framework is model-agnostic and systemati- questions whose dependencies have been resolved cally improves the performance of QA mod- by QR models. Thus, this limits reasoning abilities els. of QA models for the CQA task, and causes QA models to rely on QR models. 2 Background In this paper, we emphasize that QA models 2.1 Task Formulation can be enhanced by using both types of ques- In CQA, a single instance is a dialogue, which tions with explicit guidance on how to resolve the consists of an evidence document d, a list of ques- conversational dependency. Accordingly, we pro- tions q = [q1; :::; qT ], and a list of answers for pose EXCORD (Explicit guidance on how to Re- the questions a = [a1; :::; aT ], where T represents solve Conversational Dependency), a novel train- the number of turns in the dialogue. For the t-th ing framework for the CQA task. In this framework, turn, the question qt and the conversation history we first generate self-contained questions using QR Ht = [(q1; a1); :::; (qt−1; at−1)] are given, and a models. We then pair the self-contained questions model should extract the answer from the evidence with the original questions, and jointly encode them document as: to train QA models with consistency regularization (Laine and Aila, 2016; Xie et al., 2019). Specifi- a^t = arg max P(atjd; qt; Ht) (1) cally, when original questions are given, we encour- at age QA models to yield similar answers to those where P(·) represents a likelihood function over when self-contained questions are given. This train- all the spans in the evidence document, and a^t is ing strategy helps QA models to better understand the predicted answer. Unlike single-turn QA, since the conversational context, while circumventing the current question qt is dependent on the con- the limitations of previous approaches. versation history Ht, it is important to effectively To demonstrate the effectiveness of EXCORD, encode the conversation history and resolve the we conduct extensive experiments on the three conversational dependency in CQA. CQA benchmarks. In the experiments, our frame- work significantly outperforms the existing ap- 2.2 End-to-end Approach proaches by up to 1.2 F1 on QuAC (Choi et al., A naive approach in solving CQA is to train a 2018) and by 5.2 F1 on CANARD (Elgohary et al., model in an end-to-end manner (Figure 2a). Since 2019). In addition, we find that our framework standard QA models generally are ineffective in is also effective on a dataset CoQA (Reddy et al., the CQA task, most studies attempt to develop a 2019) that does not have the self-contained ques- QA model structure or mechanism for encoding the tions generated by human annotators. This indi- conversation history effectively (Huang et al., 2018; cates that the proposed framework can be adopted Yeh and Chen, 2019; Qu et al., 2019a,b). Although Answer Answer Answer Answer QA Loss QA Loss QA Loss Consistency Loss QA Loss Question Question Question Answering Answering Answering Evidence Evidence Evidence Document Document Self-contained Document Question Self-contained Question Question Question Rewriting Rewriting Current Conversation Current Conversation Current Conversation Question History Question History Question History (a) End-to-end approach (b) Pipeline approach (c) Ours Figure 2: Overview of the end-to-end approach, the pipeline approach, and ours. In the end-to-end approach, QA models are asked to answer the original questions based on the conversation history. In the pipeline approach, the self-contained questions are generated by a QR model, and then QA models answer them. Standard QA models are commonly used in this approach; however conversational QA models that encode the history can be adopted (the dotted line in Figure (b)). In ours, the original and self-contained question are jointly encoded to train QA models with the consistency loss. these efforts improved performance on the CQA predicting the answer in the pipeline approach as: benchmarks, existing models remain limited in un- derstanding conversational context. In this paper, P(atjd; qt; Ht) ≈ rewr read (2) we emphasize that QA models can be further im- P (~qtjqt; Ht) · P (atjd; q~t) proved with explicit guidance using self-contained rewr read questions effectively. where P (·) and P (·) are the likelihood func- tions of QR and QA models, respectively. q~t is a 2.3 Pipeline Approach self-contained question rewritten by the QR model. The main limitation of the pipeline approach Recent studies decompose the task into two sub- is that QA models are never trained on the origi- tasks to reduce the complexity of the CQA task. nal questions, which limits their abilities to under- The first sub-task, question rewriting, involves stand the conversational context. Moreover, this generating self-contained questions by reformu- approach makes QA models dependent on QR mod- lating the original questions. Neural-net-based els; hence QA models suffer from the error prop- QR models are commonly used to obtain self- agation from QR models.