WG Using Answer Set Grammars to Learn Explanations for Winograd Schemas

MEng Individual Project Imperial College London Department of Computing WinoGrammar Using Answer Set Grammars to Learn Explanations for Winograd Schemas Supervisors: Prof. Alessandra Russo Author: Dr. Krysia Broda Jordan Spooner Second Marker: Prof. Lucia Specia June 15, 2020 Abstract The Winograd Schema Challenge (Levesque et al., 2012) is a set of 273 expert-crafted pro- noun resolution problems which form a benchmark for commonsense reasoning. In recent years, several approaches to the WSC using neural language models have signicantly pushed forward the state-of-the-art accuracy on the dataset. However, the performance of these approaches on certain subsets of problems with strong statistical hints, as well as their reliance on extensive ne-tuning procedures, has brought into question whether these models are truly demonstrating commonsense reasoning, or whether they are instead exploiting spurious bi- ases present in the dataset. One frequently proposed method to test whether an approach to the WSC demonstrates true commonsense capabilities is to require it to explain its answers, but there is currently no published approach that does this. In this project, we propose an approach to the WSC that uses inductive logic programming to learn explanations from similar examples that are sourced automatically. In particular, we make use of answer set grammars to automatically generate structured representations for natural language texts, and propose a fully-automated approach to generating induction tasks which learn commonsense rules directly within these grammars. We are then able to determine the answers to Winograd schemas by applying these learned rules. Our work makes two main contributions. Firstly, we show that answer set grammars can be used to elegantly encode the syntax and semantics of natural language, producing structured representations for texts that are richer than dependency parses, but can be generated without a signicant decrease in accuracy compared to a dependency parse alone. Secondly, we demonstrate that our end-to-end approach to the WSC is comparable in accuracy to most previous approaches, but is additionally able to generate natural language explanations to support our answers, which are valid in 85% of cases. Acknowledgements I would like to thank Prof. Alessandra Russo and Dr. Krysia Broda for their guidance and support throughout this project, and for all the time they have devoted to our weekly meetings. Contents 1 Introduction3 1.1 Motivation......................................3 1.2 Objectives......................................4 1.3 Project Overview..................................5 1.4 Contributions....................................7 2 Datasets8 2.1 bAbI Tasks......................................8 2.2 Winograd Schema Challenge............................9 3 Related Work 13 3.1 Logical Reasoning.................................. 13 3.2 Knowledge Hunting................................. 16 3.3 Machine Learning.................................. 19 3.4 Language Models.................................. 21 3.5 Hybrid Approaches................................. 25 3.6 Discussion...................................... 26 4 Background 29 4.1 Natural Language Processing............................ 29 4.1.1 Preprocessing Methods........................... 29 4.1.2 Syntactic Parsing.............................. 30 4.1.3 Semantic Parsing.............................. 35 4.2 Answer Set Programming.............................. 42 4.2.1 Answer Set Programs........................... 42 4.2.2 Answer Set Grammars........................... 44 4.3 Inductive Logic Programming........................... 47 4.3.1 Inductive Learning of Answer Set Programs............... 47 4.3.2 Answer Set Grammar Induction...................... 49 1 5 Representing Natural Language 51 5.1 Knowledge Graphs................................. 52 5.2 Answer Set Grammars for Natural Language................... 54 5.3 Automated Translation from Dependency Structures.............. 57 5.3.1 Analysis of Some Linguistic Constructions................ 64 5.3.2 Background Knowledge Generation................... 70 6 Learning Commonsense Knowledge 73 6.1 Knowledge Hunting................................. 73 6.1.1 Initial Filtering............................... 76 6.1.2 Relevance Scoring............................. 78 6.2 Learning Task Generation............................. 80 6.2.1 Example Translation............................ 81 6.2.2 Hypothesis Space Generation....................... 84 6.3 Applying Learned Knowledge to Winograd Schemas.............. 90 7 Implementation 92 7.1 Base ASG Generation................................ 92 7.1.1 Design.................................... 92 7.1.2 NLP Dependencies............................. 94 7.2 Learning....................................... 95 7.2.1 Design.................................... 95 7.2.2 Knowledge Sources............................. 97 7.2.3 Dependencies................................ 98 8 Evaluation 100 8.1 Knowledge Graphs and Base ASG Generation.................. 100 8.1.1 Performance on WSC............................ 100 8.1.2 Quality of Knowledge Graphs....................... 102 8.2 Learning....................................... 108 8.2.1 Performance on bAbI Tasks........................ 110 8.2.2 Performance on the Winograd Schema Challenge............ 113 9 Conclusions 128 9.1 Future Work..................................... 129 2 Chapter 1 Introduction Improving the ability of computational agents to reason with commonsense has been a long- standing challenge in Articial Intelligence research, with its origins tracing as far back as the 1960s. Such reasoning has far-reaching applications in areas such natural language processing (NLP), vision and robotics, where many problems present innate ambiguities, capable of being resolved only with a rich understanding of the world. Despite this, progress in the area has been frustratingly slow, and even today there are few commercial systems which make any signicant use of automated commonsense reasoning (Davis and Marcus, 2015). In this project, we focus our attention on a subset of particularly dicult coreferencing problems, called Winograd Schemas, which are often presented as a modern-day alternative to the Turing Test (Levesque et al., 2012). We combine state of the art approaches from natural language processing, deep learning and logic-based learning in order to develop a novel method for performing the kind of commonsense reasoning required by these problems. 1.1 Motivation A Winograd Schema is a small reading comprehension test with a single binary question, intended to assess a system’s ability to perform commonsense reasoning. An example might be: The sh ate the worm. It was tasty. What was tasty? Answer 0: the sh Answer 1: the worm While most humans have no diculty understanding that it was the worm that was tasty, AI systems have traditionally struggled to determine the correct answer, which requires the machine to both obtain and apply the commonsense knowledge that “something you eat can be tasty”. In fact, until recently, state-of-the-art approaches have barely performed better than chance on these kinds of problems, and even today there is no approach that is able to achieve human-level performance. 3 Historically, approaches to commonsense reasoning problems of this kind have been based heavily on formalizations using mathematical logic. Signicant work has been done on de- veloping large, structured commonsense knowledge databases, and methods for automated reasoning over them. Generally these resources have been built by hand (Lenat et al., 1990), crowd-sourcing (Speer et al., 2017), or web-mining (Mitchell et al., 2015), and as such, they often suer from missing rules and/or inconsistencies. In recent years, advances in deep learning have enabled new state of the art results for many commonsense reasoning problems, including the Winograd Schema Challenge (WSC) (Levesque et al., 2012). Most notably, neural language models, which learn probability distributions over word sequences, when trained over huge text corpora, are innately able to provide some con- jecture as to whether a sentence makes sense or not (Trinh and Le, 2018). However, despite their state-of-the-art performance, these approaches also suer from several limitations. In particular, their performance on certain classes of Winograd schemas seems to suggest that in many cases language models exploit very simple statistical patterns (e.g. the sh is tasty is more likely than the worm is tasty), and as such they can struggle to solve problems which require a deeper understanding of the context introduced by the sentence (e.g. the fact that the worm was eaten). In light of these observations, many have questioned the commonsense reasoning capabilities of language models (Da and Kasai, 2019; Petroni et al., 2019), and advo- cated for an approach to the WSC which is able to explain its answers (Morgenstern and Jr., 2015; Zhang et al., 2020). In an attempt to develop an explainable approach, we propose combining neural approaches for natural language understanding with a symbolic approach for learning commonsense rules. Our approach is motivated in particular by recent advances in deep learning, which have enabled highly accurate parsers for natural language (Clark et al., 2018; Zhou and Zhao, 2019), and recent research in inductive logic programming, which has demonstrated the ca- pability to learn answer set grammars (ASGs) (Law

Load more