Arxiv:2104.07644V2 [Cs.CL] 17 Apr 2021 Written Graphs

EXPLAGRAPHS: An Explanation Graph Generation Task for Structured Commonsense Reasoning Swarnadeep Saha Prateek Yadav Lisa Bauer Mohit Bansal UNC Chapel Hill {swarna, prateek, lbauer6, mbansal}@cs.unc.edu Abstract challenge ML models to use various kinds of com- Recent commonsense-reasoning tasks are typ- monsense knowledge for solving tasks (Davis and ically discriminative in nature, where a model Marcus, 2015). Recent state-of-the-art common- answers a multiple-choice question for a cer- sense reasoning models are typically trained and tain context. Discriminative tasks are limit- evaluated on discriminative commonsense reasoning because they fail to adequately evaluate ing datasets and tasks, in which a model answers the model’s ability to reason and explain pre- a multiple-choice question for a certain context dictions with underlying commonsense knowl- (Zellers et al., 2018, 2019; Talmor et al., 2019; edge. They also allow such models to use reasoning shortcuts and not be “right for the Sap et al., 2019b; Bisk et al., 2020). While pre- right reasons". In this work, we present EX- trained language models perform well on these PLAGRAPHS, a new generative and structured tasks (Lourie et al., 2021), this setup severely limits commonsense-reasoning task (and an associ- the exploration and evaluation of a model’s ability ated dataset) of explanation graph generation to reason and explain its predictions with relevant for stance prediction. Specifically, given a commonsense knowledge. In fact, neural models belief and an argument, a model has to pre- are often right for the wrong reasons (McCoy et al., dict whether the argument supports or counters the belief and also generate a commonsense- 2019) and use statistical biases or annotation arti- augmented graph that serves as non-trivial, facts to solve tasks via shortcuts (Gururangan et al., complete, and unambiguous explanation for 2018). the predicted stance. The explanation graphs Thus, we emphasize the importance of genera- for our dataset are collected via crowdsourcing through a novel Collect-Judge-And-Refine tive commonsense reasoning capability, in which graph collection framework that improves the a model is challenged to compose and reveal the graph quality via multiple rounds of verifi- plausible commonsense knowledge that is required cation and refinement. A significant 83% to solve a reasoning task. Moreover, structured of our graphs contain external commonsense (e.g., graph-based) commonsense explanations, un- nodes with diverse structures and reasoning like unstructured sentence-based explanations, can depths. We also propose a multi-level eval- more explicitly explain and evaluate the reason- uation framework that checks for the struc- ing structures of the model by visually laying out tural and semantic correctness of the generated graphs and their plausibility with human- the relevant context and commonsense knowledge arXiv:2104.07644v2 [cs.CL] 17 Apr 2021 written graphs. We experiment with state- edges, chains, and subgraphs. We propose EX- of-the-art text generation models like BART PLAGRAPHS, a new generative and structured and T5 to generate explanation graphs and ob- commonsense-reasoning task (in English) of ex- serve that there is a large gap with human per- planation graph generation for stance prediction formance, thereby encouraging useful future on popular debate topics. Specifically, our task work for this new commonsense graph-based requires a model to predict whether a certain argu- explanation generation task.1 ment supports or counters a belief about a debate topic, but correspondingly, also generate a com- 1 Introduction monsense explanation graph that explicitly lays In the past few years, numerous commonsense out the reasoning process involved in inferring the reasoning benchmarks have been developed that predicted stance. For example, consider Fig.1 1EXPLAGRAPHS will be publicly available at https: which shows two examples from our benchmark- //github.com/swarnaHub/ExplaGraphs. ing dataset EXPLAGRAPHS collected for this task. Belief: Children should be able to consent to cosmetic surgery. Belief: Factory farming should not be banned. Argument: Children do not have the mental capacity to Argument: Factory farming feeds millions. understand the consequences of medical decisions. Stance: Support Stance: Counter Factory Millions Children Farming Both Belief and causes Argument has property has context desires Commonsense Food Cosmetic Concept Still Developing Surgery has context Necessary Only Belief not capable of has property Concept Important not desires Decision Only Argument capable of Banned Concept Consequences Figure 1: Two representative examples from our dataset showing the belief, the argument, the stance label and the corresponding commonsense explanation graph. The graphs are read by following the edge directions to express the reasoning process involved in explaining why the argument supports or counters the belief. Each example contains a belief, an argument, and because of multiple reasons: (1) unlike chain of a stance label of either “support" or “counter". facts, they can capture complex dependencies be- Each belief-argument pair requires understanding tween facts, while also avoiding redundancy (e.g., social, cultural, or taxonomic commonsense knowl- “Factory farming causes food and millions desire edge about debate topics in order to infer the cor- food" forms a “V-structure"), (2) unlike natural lan- rect stance. Specifically, the example on the left guage or free-form explanations (Camburu et al., requires the knowledge that “children" are “still 2018; Rajani et al., 2019; Narang et al., 2020; Brah- developing" and that this indicates that they are man et al., 2021; Zhang et al., 2020), it’s easier not capable of making an “important decision" to impose task-specific constraints on graphs (e.g. and that “cosmetic surgery" is an “important deci- connectivity, acyclicity), that eventually help in bet- sion", and that an “important decision" is capable ter quality control during data collection (Sec.4) of “consequences". Given this knowledge, one and designing structural validity metrics for model- can understand that the argument “children do not evaluation (Sec.6) and (3) unlike semi-structured have the mental capacity to understand the conse- templates or extractive rationales (Zaidan et al., quences of medical decisions." is counter to the 2007; Lei et al., 2016; Yu et al., 2019; DeYoung belief “children should be able to consent to cos- et al., 2020), they allow for more flexibility and metic surgery". We represent this knowledge in the expressiveness (e.g., graphs can encode any reason- form of a commonsense explanation graph which ing structure and the nodes are not limited to just allows for causal relationships, ease of imposing phrases from the context). constraints, flexibility, and expressiveness. We dis- Our explanations specifically take the form of cuss this and our explanation graph’s syntax and connected directed acyclic graphs (DAG). The semantics below. nodes in the graph can be concepts (short phrases) Our graph-based explanations follow a broad from the belief, or the argument, which we refer to line of work on structured explanations for NLP. as internal nodes. They can also be external com- These typically include a chain of facts (Khot et al., monsense concepts which are neither part of the 2020; Jhamtani and Clark, 2020; Inoue et al., 2020; belief nor the argument but essential in the context Geva et al., 2021) or are semi-structured templates for the explanation graph to adhere to the stance. In (Ye et al., 2020; Mostafazadeh et al., 2020). As Fig.1, these external concept nodes are marked in an important next step in this useful line of work, dashed-red while internal concepts are marked with we propose explanations that are fully structured, solid borders. Edges in the graph connect two con- represented in the form of graphs. Graphs are an ef- cepts and are labeled with commonsense relations. ficient data structure for representing explanations The relations are chosen from a pre-defined set and help form simple coherent facts in conjunction with locally at the level of each fact (edge) by checking the two concepts. While some of these facts might for its importance in improving the model confi- not necessarily be factual (e.g. “Factory farming; dence and globally for the whole graph, defined has context; necessary"),2 note that such facts are by its ability to reveal the target stance label. Fur- essential in the context for composing an explana- thermore, we also propose plausibility metrics that tion that is indicative of the stance. Semantically, match generated graphs with human-written graphs our graphs can be seen as extended structured argu- by extending standard text-generation metrics like ments, augmented with commonsense knowledge. BLEU (Papineni et al., 2002) and ROUGE (Lin, We construct a benchmarking dataset for our task 2004) for graph matching. through a novel framework for collecting graph- Following past work on explanation generation structured data via crowdsourcing. Specifically, we (Rajani et al., 2019; Hase et al., 2020), we propose propose a Collect-Judge-And-Refine graph collec- some initial baseline models for our task – (1) a rea- tion framework, in which we collect connected di- soning model that first generates the graph and uses rected acyclic graphs that serve as non-trivial, com- it as additional context with the belief and argument plete and unambiguous explanations for the task (as to then predict the stance, and (2) a rationalizing explained in Sec.6). The framework also allows model that first predicts the stance and then gen- for iteratively improving the initial graphs through erates the graph as post-hoc explanation. Across multiple rounds of verification and refinement. A these models, we represent graphs as linearized significant 83% of the graphs in our dataset con- strings ordered topologically and fine-tune state- tain external commonsense nodes, indicating that of-the-art pre-trained generative language models commonsense knowledge is a critical component BART (Lewis et al., 2019) and T5 (Raffel et al., of our task.

Load more