Generating Informative Conclusions for Argumentative Texts

Generating Informative Conclusions for Argumentative Texts Shahbaz Syed y Khalid Al-Khatib y Milad Alshomary z Henning Wachsmuth z Martin Potthast y yLeipzig University zPaderborn University <[email protected]> Abstract increases epinephrine (adrenaline) levels, a fight- The purpose of an argumentative text is to sup- or-flight hormone preparing the body for physical port a certain conclusion. Yet, they are often exertion. With free body fat acids as fuel, on aver- omitted, expecting readers to infer them rather. age, 12% higher performance is attainable.” While appropriate when reading an individual text, this rhetorical device limits accessibility Consider further these alternative conclusions: when browsing many texts (e.g., on a search 1. Caffeine is good. engine or on social media). In these scenarios, an explicit conclusion makes for a good candi- 2. Caffeine improves physical performance. date summary of an argumentative text. This is especially true if the conclusion is informative, The first conclusion conveys a pro stance towards emphasizing specific concepts from the text. the target, caffeine. The second, conveys a pro With this paper we introduce the task of gen- stance towards caffeine, too, but it also emphasizes erating informative conclusions: First, Webis- a specific concept (“physical performance”). The ConcluGen-21 is compiled, a large-scale cor- former conclusion is generic, only indicating the pus of 136,996 samples of argumentative texts stance, while the latter is informative; a distinction and their conclusions. Second, two paradigms also made in text summarization (Section3). 3 for conclusion generation are investigated; one Argumentative texts include short arguments, extractive, the other abstractive in nature. The latter exploits argumentative knowledge that such as forum posts and reviews, as well as long- augment the data via control codes and finetun- form texts, such as essays, blogs, and editorials. ing the BART model on several subsets of the Most of these typically have an intended conclu- corpus. Third, insights are provided into the sion of which the authors seek to persuade their suitability of our corpus for the task, the differ- readers.4 While the conclusion may be already ences between the two generation paradigms, implied in a given text, authors often choose not the trade-off between informativeness and con- to explicitly provide one, either for rhetorical rea- ciseness, and the impact of encoding argumentative knowledge. The corpus, code, and the sons (Habernal and Gurevych, 2015; Al-Khatib trained models are publicly available.1 et al., 2016), or to encourage critical thinking (Mar- tin et al., 2003). However, when browsing many 1 Introduction argumentative texts (e.g., via a search engine or on arXiv:2106.01064v1 [cs.CL] 2 Jun 2021 A conclusion of an argument is a statement that con- a social media timeline), having an explicit conclu- veys a stance towards a specific target (Bar-Haim sion helps human readers (and by extension also et al., 2017; Alshomary et al., 2020b). Drawing machines) to quickly process the texts. conclusions is an integral part of argumentation, In this paper, we introduce the task of gener- but often various conclusions may be drawn from ating informative conclusions for argumentative a set of premises. Consider the following argumen- texts, and take the first steps with four key con- tative text on caffeine adapted from the web:2 tributions: (1) Adaptation of the notion of infor- “Caffeine stimulates the nervous system, sig- mativeness from text summarization as a desired naling fat cells to break down body fat. It also 3Other works on argumentation use the term specificity to express a similar idea (Durmus et al., 2019; Ke et al., 2019). 1https://github.com/webis-de/ACL-21 4An exception is an argumentative text dedicated to deliber- 2https://www.healthline.com/nutrition/top-13-evidence- ation, which merely surveys the argument landscape on a based-health-benefits-of-coffee given topic without trying to influence the reader’s opinion. property of a conclusion besides stating a target and models (Sutskever et al., 2014; Bahdanau et al., the stance towards it. (2) Compilation of Webis- 2015) for summarizing movie reviews and debate ConcluGen-21, a corpus of 136,996 pairs of argu- portal arguments from idebate.org. Several argumentative texts and associated conclusions, creat- ment mining approaches have also been applied to ing the first large-scale ground truth for conclusion identify the main claim from arguments (Petasis generation. (3) Modeling conclusion generation and Karkaletsis, 2016; Daxenberger et al., 2017). as an end-to-end task by finetuning a pretrained Recently, Alshomary et al.(2020a) proposed a sequence-to-sequence model, and augmenting the graph-based model using PageRank (Page et al., corpus with three types of argumentative knowl- 1999) that extracts the argument’s conclusion and edge: topic, target, and aspect. (4) Extensive quan- the main supporting reason as an extractive snippet. titative and qualitative (crowdsourced) evaluation This model is the core of our extractive summariza- of both the quality of our dataset and the effective- tion approach (Section5). ness of two paradigms for conclusion generation, A key difference between conclusion genera- namely extractive and abstractive approaches. tion and general text summarization is the con- We present three key findings: (a) Finetuning straint that a conclusion must have a clear stance pretrained language models on our dataset shows towards a certain topic. A similar constraint applies strong in-domain performance compared to the ex- to high-quality summaries of long-form argumen- tractive approach. (b) Qualitative evaluation shows tative texts such as editorials (Syed et al., 2020), that the extractive approach generates more infor- where the persuasiveness of the editorial should be mative conclusions, demonstrating a trade-off be- preserved alongside its thesis. Therefore, existing tween conciseness and informativeness. (c) Encod- summarization corpora (although large-scale) are ing argumentative knowledge guides the finetun- unsuitable for studying conclusion generation. A ing towards generating argumentative sentences; majority of them contain only non-argumentative however, more sophisticated encoding techniques texts (e.g., news reports) which are more suitable to than just using the conventional control codes are general-purpose summarization (Kryscinski et al., needed to generate informative conclusions. 2019). Moreover, intrinsic evaluation of summarization corpora has revealed a lower-quality and/or 2 Related Work inconsistent ground-truth, rendering them partially Our work complements and builds on that of Al- unfit for their intended purpose (Bommasani and shomary et al.(2020b), who introduced a concep- Cardie, 2020). To fill this gap, we compile Webis- tual model for conclusion generation, outlining a ConcluGen-21, a large-scale corpus of argumenta- three-step process: inferring the conclusion’s tar- tive texts and their conclusions on diverse topics. get from the argument’s premises, inferring the Pre-trained language models have significantly author’s stance towards this target, and generating advanced the state-of-the-art in neural text summa- the conclusion based on these two pieces of infor- rization (Liu and Lapata, 2019; Zhang et al., 2019a; mation. But Alshomary et al. focused only on the Rothe et al., 2020; Huang et al., 2020). However, first step of target inference, whereas we model they have been applied to the domain of argumenta- conclusion generation as an end-to-end task. tion only recently, specifically for argument gener- Conclusion generation can be viewed as a com- ation. Gretz et al.(2020) proposed a pipeline based plementary task to summarizing argumentative on GPT-2 (Radford et al., 2019) for generating co- texts. Previous approaches to the summarization herent claims for a given debate topic. A more of such texts have been primarily extractive. Egan controlled approach for argument generation was et al.(2016) proposed summarizing online discus- developed by Schiller et al.(2020), which performs sions via “point” extraction, where a point is a verb argument generation with fine-grained control of and its syntactic arguments. Similarly, Bar-Haim topic, aspect (core reasoning), and stance. Con- et al.(2020) compiled the ArgKP corpus (which we clusion generation can be viewed as supplement- also sample from in Section4) comprised of argu- ing argument generation. Ideally, given a conclu- ments for a given topic mapped to key points, com- sion, an argument can be generated constrained by posing a summary from a large collection of rele- the conclusion’s target and stance. To the best of vant arguments. Wang and Ling(2016) proposed a our knowledge, studies investigating pretrained lan- data-driven approach using sequence-to-sequence guage models for end-to-end conclusion generation do not exist. Besides providing a suitable corpus, towards a topic (e.g., “Caffeine is good.”), informa- we analyze the impact of encoding argumentative tive conclusions also discuss specific concepts from knowledge in pretrained language models and as- (or implied by) the argumentative text (e.g., “Caf- sess the popular method of control codes (Keskar feine improves physical performance.”). Concepts et al., 2019; Cachola et al., 2020) for encoding of the argumentative text exemplified in Section1 the knowledge in our dataset. Furthermore, our may refer to the topic (e.g., “Is coffee beneficial?”), qualitative evaluation highlights three key errors the target of the conclusion (e.g., “caffeine”), or a (Section6) arising in the generated outputs that specific aspect (e.g.,“energy levels”). disqualify them as conclusions. 4 The Webis-ConcluGen-21 Corpus 3 On Informative Conclusions This section details the construction of the We- In the literature, the conclusion of an argument is bis Conclusion Generation Corpus 2021 (Webis- the statement that depicts a particular stance to- ConcluGen-21), a corpus of 136,996 pairs of argu- wards a certain concept, the target (Walton et al., mentative texts and conclusions covering diverse 2008; Alshomary et al., 2020b).

Load more