High Quality Real-Time Structured Debate Generation

High Quality Real-Time Structured Debate Generation Eric Bolton Alex Calderwood Niles Christensen Columbia University Columbia University Columbia University [email protected] [email protected] [email protected] Jerome Kafrouni Iddo Drori Columbia University Columbia University [email protected] [email protected] Abstract Source: Income inequality makes people Automatically generating debates is a chal- unhappy. Inequality is bad for lenging task that requires an understanding of everyone. arguments and how to negate or support them. Response: The economic outcome of an in- In this work we define debate trees and paths crease in income is not a good for generating debates while enforcing a high factor in determining the effects level structure and grammar. We leverage of starvation and poverty. a large corpus of tree-structured debates that have metadata associated with each argument. Source: Many religious teachings pro- We develop a framework for generating plau- mote violence. sible debates which is agnostic to the sentence Response: Religious protesters have a embedding model. Our results demonstrate strong history of engaging the ability to generate debates in real-time on in violence against violence. complex topics at a quality that is close to humans, as evaluated by the style, content, and Many of the people who oppose strategy metrics used for judging competitive the violence are often on a human debates. In the spirit of reproducible case-by-case basis. research we make our data, models, and code publicly available. Table 1: Sample results: novel responses generated to be contradictory. 1 Introduction Machine debate generation must meet several Automatic generation of human-level debates has requirements: the need to respect long-distance important applications: providing coherent decon- dependencies while maintaining semantic coher- structions of given topics, assisting in decision- ence with the prompt, the need for consistent making, and surfacing previously overlooked themes and tone over long durations, and finally, viewpoints as demonstrated in Table1. Debate the need for adherence to grammar rules. generation would enable policymakers and writers These requirements entail compelling, thought- to improve their rhetorical abilities by practicing arXiv:2012.00209v1 [cs.CL] 1 Dec 2020 ful responses to a given writing prompt. Models against an automated agent in real-time. that excel at this task must understand the low- The field of debate is closely related to linguis- level grammar of the language, but also need to tic negation, which has studied logical rules for incorporate an understanding of the relationships forming various types of negation (Malpas and between sentences. Thus, existing story genera- Davidson, 2012b), at the low level of words and tion models constitute a natural tool for generating sentence. In this work we use a large dataset and arguments with an understanding of what it means pre-trained neural language model to synthesize to refute or support an argument’s point. long debates with a high-level linguistic structure. Our work focuses on informal logic, the study of 1.1 Kialo arguments in human discourse, rather than in rig- orously specified formal languages (Malpas and We collected a data set from Kialo (Kialo, 2019), Davidson, 2012a). an online debate platform, and trained a text generation model (Fan et al., 2018) to generate argu- 2018) and stories (Purdy et al., 2018) by humans ments in favor or against given debate prompts. using Amazon’s Mechanical Turk (AMT). We also Users submit responses (pros and cons), which are use AMT for human evaluation of debates, based then approved by a moderator. Other users can on the criteria commonly used in debate competi- then respond to those responses as though they tions (Speech and Competition, 2018). themselves were prompts, forming a debate tree. Our focus is debate generation, which is struc- We leveraged the tree structure of the Kialo tured in multiple rounds. The state of the art dataset along with its existing pro/con annotations in structured text generation focuses on stories to develop a novel self-supervised corpus creation rather than debates, and is based on generating strategy, using the insight that the metadata con- sequences of predicates and arguments, replac- straints we used to define an argument (our gener- ing entities with names and references (Fan et al., ative unit) effectively enforced a high-order gram- 2019). Other structuring strategies include using mar on the output. We experiment with this type outlines (Drissi et al., 2018), plans (Yao et al., of tree traversal strategy to model multi-turn text 2018), or skeletons (Jingjing et al., 2018). All generation. This strategy is applicable to other of these works build upon significant advances on seq2seq models that model long-form text genera- the problem of text generation using hierarchical tion. models (Fan et al., 2018). These methods can be improved and extended by using very large lan- 1.2 Contributions guage models which have shown excellent results In this we work, we describe a system in language generation and reading comprehen- which learns to participate in traditional sion (Devlin et al., 2018; Peters et al., 2018; Rad- debates by modeling different sequences ford et al., 2018, 2019). of argument nodes through tree-structured The performance of these neural language mod- debates to generate prompts and responses els has relied on curating large datasets for differ- correspondingtosourcesandtargetsinthetranslationparadigment tasks, such. as question answering (Rajpurkar Our main contributions are: et al., 2016) and reading comprehension (Yu et al., 2018; Clark and Gardner, 2018). In contrast, our • Defining debate tree and paths for gener- work creates a large dataset specifically for the ating text: Enforcing a high level structure task of generating debates. IBM’s Project Debater and grammar, defining complex arguments, (IBM, 2019) also shares the same focus, however multi-turn debates, as described in Section they are driven by speech and do not take advan- 2.1 and 2.2. tage of debate specific datasets. There are a num- • Multi-turn debate generation: An exten- ber of online debate hosting platforms that contain sion of the prompt to response generation pro/con metadata (ProCon, 2019; iDebate, 2019). paradigm to enable multi-turn generation, as We use Kialo, which uniquely combines this meta- described in Section 2.3. data with a tree structure that models a long multi- turn debate, depending on the parsing strategy. • Measure of debate quality: as evaluated with criteria used in debate competitions: 2 Methods content, style, and strategy, repeated over We define a framework for generating debates multiple rounds. from a tree-structured corpus, as illustrated in Fig- Based on these key contributions, our results ure1: (i) define and collect debate tree corpus, as demonstrate near-human quality debates as shown described in Section 2.1; (ii) build debate paths, in Section3. which are content/target pairs, as described in Sec- tion 2.2; (iii) train the debate encoding on all gen- 1.3 Related Work erated debate paths, as described in Section 2.3. Our goal is to create debates that perform at a high- level according to the same evaluation metrics as 2.1 Debate Trees human debates. Quantitatively assessing the qual- A debate tree is a tree with one extra bit of stor- ity of debates and arguments is a challenging task. age for each node, its stance: either ‘Pro’ or ‘Con’ Recent work evaluates dialogue (Kulikov et al., depending on whether the node supports or refutes Figure 1: Our debate generation framework. We begin with debate trees on the far left, from which we extract all paths through the tree which correspond to meaningful debates, creating a dataset in the central bubble. On the far right, this new dataset is used to retrain an existing model to debate. In this paper, the retrained model is the one developed by Fan et al.(2018). In our diagram, each green bubble supports the point above it, while each red bubble argues against the point above it. Parsing Strategy Prompt Response Supportive Arguments [ProjCon][Pro]* [Pro]+ Contradicting Arguments [ProjCon][Pro]* [Con][Pro]* Complex Arguments [ProjCon][Pro]* [ProjCon][Pro]* Multi-Turn Debates [ProjCon][Pro]*([Con][Pro]*)* [Con][Pro]* Table 2: Description of debate structures, expressed through regex notation. its parent argument. A debate is a path on this debate tree, for example from the root to a leaf, as shown in Figure2. An argument is defined as one node in tree. We extract debate tree structures from a public debate site (Kialo, 2019) which allows crawling, and filter for English debates (Langdetect, 2014). When Kialo users referenced different parts of the debate tree, which rarely happened, we copy the text in the referenced node into the referring node. 2.2 Debate Paths The debate tree can be parsed in different ways depending on the desired properties of the resulting corpus. We present four parsing strategies. For each, we provide a regex-like description corre- sponding to the prompt and the response of the debate, as described in Table2. The majority of paths through the tree would correspond to nonsensical debates. In order to model sensible and internally coherent debates in Figure 2: An example of a path through a Kialo debate which each modeled participant has a clear view- tree. Green boxes indicate arguments that support their point, we developed the following parsing strate- parents (Pro), while red boxes indicate arguments that gies: contradict their parents (Con). Supportive Arguments - a prompt and a response supporting it. Contradicting Arguments - a prompt and a response refuting it. Complex Supportive Contradictory Multi-Turn # Training examples 432,999 222,852 211,355 459,643 # Test examples 45,471 21,946 17,677 30,607 # Validation examples 38,779 20,810 14,485 35,916 Prompt dictionary size 20,535 14,691 16,690 50,626 Response dictionary size 28,313 21,907 19,534 55,359 Table 3: Description of datasets created from KIALO using each tree-traversal strategy.

Load more