Synthesiing Adversarial Negative Responses for Robust Response
Total Page:16
File Type:pdf, Size:1020Kb
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation Prakhar Gupta| Yulia Tsvetkov♠ Jeffrey P. Bigham|;~ |Language Technologies Institute, Carnegie Mellon University ♠Paul G. Allen School of Computer Science & Engineering, University of Washington ~Human-Computer Interaction Institute, Carnegie Mellon University [email protected], [email protected], [email protected] Abstract to a provided context, consisting of past dialogue turns. Dialogue ranking (Zhou et al., 2018; Wu Open-domain neural dialogue models have et al., 2019) and evaluation models (Tao et al., achieved high performance in response rank- 2018; Yi et al., 2019; Sato et al., 2020), in turn, are ing and evaluation tasks. These tasks are deployed to select and score candidate responses formulated as a binary classification of re- sponses given in a dialogue context, and mod- according to coherence and appropriateness. els generally learn to make predictions based Ranking and evaluation models are generally on context-response content similarity. How- trained using true positive responses and randomly ever, over-reliance on content similarity makes selected negative responses, which raises two is- the models less sensitive to the presence of in- sues. First, random negative candidates often have consistencies, incorrect time expressions and low content similarity with the context, and thus other factors important for response appro- models learn to associate response coherence and priateness and coherence. We propose ap- proaches for automatically creating adversar- appropriateness with content similarity (Yuan et al., ial negative training data to help ranking and 2019; Whang et al., 2021; Sai et al., 2020). In evaluation models learn features beyond con- real systems, generated response candidates tend tent similarity. We propose mask-and-fill and to be more similar in terms of content, and so other keyword-guided approaches that generate neg- factors (e.g., time expressions, dialogue acts, in- ative examples for training more robust dia- consistencies) tend to be more important. Second, logue systems. These generated adversarial re- randomly selecting candidates as negative exam- sponses have high content similarity with the ples in an open domain context can result in false contexts but are either incoherent, inappropri- ate or not fluent. Our approaches are fully negatives, leading to misclassification of appropri- data-driven and can be easily incorporated in ate responses. existing models and datasets. Experiments To make dialogue models more robust to the on classification, ranking and evaluation tasks spurious pattern of content similarity, prior work across multiple datasets demonstrate that our proposed to leverage adversarial and counterfactual approaches outperform strong baselines in pro- examples (Kaushik et al., 2020; Srivastava et al., viding informative negative examples for train- 2020). A reliable method for creating counterfac- ing dialogue systems.1 tual data is to collect human-written adversarial 1 Introduction negative responses (Sai et al., 2020), but it is expen- sive, time-consuming, and difficult to scale. Our Due to growing availability of dialogue corpora (Li goal is to create reliable automatic methods for et al., 2017; Zhang et al., 2018; Smith et al., 2020) synthesizing adversarial negative responses. and the advancement of neural architectures (Rad- The most common approach to generating natu- ford et al., 2019; Brown et al., 2020; Devlin et al., ral language adversarial examples is to paraphrase 2019), dialogue systems have achieved consider- or insert typos, synonyms, or words relevant to the able success. As typically formulated, dialogue context in the inputs (Iyyer et al., 2018; Ebrahimi models generate one or more candidate responses et al., 2018; Alzantot et al., 2018; Zhang et al., 2019). In open domain conversations, however, a 1Code and data are publicly available https: //github.com/prakharguptaz/Adv_gen_ context can have a wide range of possible responses dialogue with varied forms and semantics. Small lexical Error category Description Sample responses C-ent Incorrect entities Incorrect subject or object of Context: I am so happy that you are doing okay. or actors (R,G) verbs or presence of one or more Response: My friend is always happy. incorrect entities or coreference. C-time Incorrect Time Use of incorrect time expressions Context: What are you going to do on Monday? expressions (R) or tense of verbs. Response: Yesterday, I celebrated my daughter’s wed- ding anniversary. C-cont Contradictory Presence of details which make Context: A: I don’t know why I bothered to come or extraneous the response inconsistent within here. details (R,G) itself or contradict the context B: Did you enjoy your stay? Response: I enjoyed the concert a lot. C-speaker Incorrect speaker The response is relevant to the Context: What starting salary would you expect here? turn (R) conversation but from the wrong Response: If you work overtime, I will pay you extra speaker. salary. C-follow Does not directly The response does not follow im- Context: What would you like for main course sir? address the con- mediately from the context. Response: I know very well how to make noodles, text (R,G) and I taught one of my friends. C-strat Incorrect strate- Use of incorrect dialogue act, Context: I can’t find the paper clips. gies (R,G) emotion, persona or style Response: Ok, great work. C-lang Poor language Presence of poor grammar, incor- Context: Do you have mixed drinks available here? (G) rect sentence structures or repeti- Response: Yes. This order is divided by 16 divided tions for main main ones of order. Table 1: Error categories prevalent in inappropriate responses with high context-response semantic relatedness. We present 7 categories with their descriptions and sample context and response pairs. For each category we also indicate whether it is frequently observed in Retrieval (R) or Generation (G) models. Models which simply learn to associate response coherence with content similarity often ignore these errors. Our approaches create adversarial negative data for training dialogue models by introducing such errors in context relevant utterances. variations via substitutions and paraphrasing do not 2 Properties of Adversarial Responses provide adequate coverage over the possible space Models trained using randomly sampled negative of adversarial responses, and they can also lead to examples tend to assign high scores to responses generation of false negatives due to the open-ended with high content similarity with the context, and nature of dialogues. Creating adversarial dialogue often ignore other important factors necessary for responses is thus different, and can be more chal- response appropriateness and coherence. There- lenging than in other natural language domains. fore, we aim to generate adversarial negative re- We propose two approaches for adversarial re- sponses which have high content similarity with sponse creation: 1) a mask-and-fill approach that the context, but which still possess factors render- corrupts gold responses related to the context but ing the responses inappropriate to the context. We retains content similarity, and 2) a keyword-guided present the categorization of such factors or error generative approach that uses concepts from the types which can make a response inappropriate context to generate topically relevant but incoher- in Table1. For each category, we provide its de- ent responses. These approaches do not require scription and sample context-response pairs. To additional annotations, are black-box (do not need create this categorization, we manually analyzed access to model parameters), and are easily adapted responses present in outputs of generative models, to new datasets and domains. candidates of retrieval sets, and human written ad- The main contributions of this paper are: 1) We versarial dialogue responses (Sai et al., 2020). Cat- identify and discuss error patterns present in re- egories C-ent, C-time and C-cont are errors related trieval and generation model outputs, which are to various inconsistencies and logical flaws in the difficult to detect due to high content similarity; 2) responses and indicate poor response appropriate- To the best of our knowledge, we are the first to ness. Categories C-speaker, C-follow and C-strat propose automatic approaches for creating adver- are error types specific to the dialogue setting and sarial responses for dialogue model training in a indicate poor response coherence. Category C-lang black-box setting; and, 3) We demonstrate that our indicates poor response fluency. Our categorization proposed approaches achieve better performance of errors is inspired by the categorization suggested compared to strong baselines on two datasets on di- by Pagnoni et al.(2021) for factuality of summa- alogue classification, ranking and evaluation tasks. rization, and Higashinaka et al.(2019); Ko et al. (2019) and Sato et al.(2020) for dialogue. These categories inform our approaches as well as error analysis. 3 Methodology For a given dialogue context C and its gold re- sponse Rg, our goal is to generate an adversar- ial response Ra such that while achieving high Figure 1: Mask-and-fill approach using ILM model. scores from dialogue ranking or evaluation models, ILM is trained to infill n-grams in place of blanks in it should not be a valid response to the context C. a response. Tokens after [infill] replace the [blank] Dialogue