LOREN: Logic Enhanced Neural Reasoning for Fact Verification
Total Page:16
File Type:pdf, Size:1020Kb
LOREN: Logic Enhanced Neural Reasoning for Fact Verification Jiangjie Chen*y, Qiaoben Baoy, Jiaze Chenz, Changzhi Sunz, Hao Zhouz, Yanghua Xiaoy, Lei Liz yShanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China zByteDance AI Lab, Beijing, China fjjchen19, qbbao19, [email protected], fchenjiaze, sunchangzhi, zhouhao.nlp, [email protected] Abstract Claim c : The Adventures of Pluto Nash was reviewed by Ron Underwood. Given a natural language statement, how to verify whether it ① Central Stage 1: Reasoning is supported, refuted, or unknown according to a large-scale Phrase Extraction Units Construction What book was reviewed by Ron The Adventures knowledge source like Wikipedia? Existing neural-network- Underwood? or … of Pluto Nash ② based methods often regard a sentence as a whole. While Probing What did Ron Underwood do about reviewed ℛc Question The Adventures of Pluto Nash? or … we argue that it is beneficial to decompose a statement into Generation Ron Underwood Who reviewed The Adventures of multiple verifiable logical points. In this paper, we propose Pluto Nash? or … � LOREN, a novel approach for fact verification that integrates Evidence 1: The Adventures of Pluto Nash is a 2002 both LOgic guided REasoning and Neural inference. The key ⓪ E ③ Australian American science fiction action comedy film starring Evidential Evidence Eddie Murphy (in a dual role) and directed by Ron Underwood. insight of LOREN is that it decomposes a statement into mul- Retrieval Evidence 2: … Phrase tiple reasoning units around the central phrases. Instead of Calculation directly validating a single reasoning unit, LOREN turns it The Adventures The Adventures of Pluto Nash of Pluto Nash into a question-answering task and calculates the confidence ④ reviewed directed of every single hypothesis using neural networks in the em- ℛc Single Hypothesis ℛE Verification bedding space. They are aggregated to make a final predic- Ron Underwood Ron Underwood tion using a neural joint reasoner guided by a set of three- valued logic rules. LOREN enjoys the additional merit of in- NOT ENOUGH INFO REFUTES NOT ENOUGH INFO terpretability — it is easy to explain how it reaches certain re- ⑤ Joint Reasoning Stage 2: Verification sults with intermediate results and why it makes mistakes. We evaluate LOREN on FEVER, a public benchmark for fact ver- REFUTES ification. Experiments show that our proposed LOREN out- performs other previously published methods and achieves Figure 1: An example claim from the FEVER dataset and 73.43% of the FEVER score. our proposed fact verification framework LOREN. Texts highlighted denote the data flow for three central phrases ex- 1 Introduction tracted from the claim. LOREN includes two stages: reason- ing unit construction and verification. The relevant evidence The rapid evolution of mobile platforms has facilitated the is assumed to be provided by an external evidence retrieval creating and spreading of information. There are often sus- module. picious statements that emerged on social media platforms. How to effectively verify textual statements against an ex- ternal knowledge source? We study the problem of fact ver- ification for a given sentence with respect to a large-scale on designing specialized neural network architecture, with knowledge source such as Wikipedia. Notice that the diffi- the hope of exploiting the semantics from sentences (Yoneda arXiv:2012.13577v1 [cs.CL] 25 Dec 2020 culty arises when the outcome includes not only supported et al. 2018; Hanselowski et al. 2018; Nie, Chen, and Bansal (SUP) and refuted (REF), but also not enough information 2019; Zhou et al. 2019; Liu et al. 2020; Zhong et al. 2020b). (NEI). Fig. 1 describes an example claim to be verified from However, it is hard to interpret the internal mechanism when the FEVER dataset (Thorne et al. 2018), “The Adventure of these neural methods make mistakes. Symbolic logic rules, Pluto Nash is reviewed by Ron Underwood”. The relevant on the other hand, offer a versatile way to convey high-level Wikipedia page states “The Adventure of Pluto Nash is ... reasoning and communicate structured knowledge (Bindris, directed by Ron Underwood”. Therefore the ground truth Sudhahar, and Cristianini 2018; Zhong et al. 2020a). for the claim is refuted. In this paper, we propose LOREN, a method benefiting Recent work has achieved noticeable progress for this both from neural networks and symbolic logical rules. We task, however, they struggle at reasoning with logic ostensi- are inspired by the human reasoning process when check- bly required for fact verification. The majority of them focus ing the validity of a statement. Intuitively, we will identify *Work is done during internship at ByteDance. Corresponding key information points, search relevant evidence from the authors are Yanghua Xiao and Lei Li. knowledge base, and then compare the key points with the evidence. Inspired by this, we aim to develop an approach claims by collecting evidence from Wikipedia. The major- that mimics those steps. Our intuition is that a well-designed ity of existing works adopts a two-step pipeline to verify a neural network could match key points with evidence, while textual claim, i.e., evidence retrieval and claim verification the decomposition of a textual claim into verifiable elements (Chen et al. 2017), They often adopt the same or similar ap- requires logical reasoning. proaches for the first sub-task, making the claim verification Therefore, the overall idea of LOREN is to decompose the the primary concern. Recent verification systems are devel- verification into a series of hypotheses verification, where oped upon large-scale PLMs due to their great success in symbolic logic rules can be applied for achieving more log- NLP tasks represented by BERT (Devlin et al. 2019). Most ical reasoning. LOREN first constructs and validates multi- of these works focus on modeling the interaction between ple individual hypotheses based on the detection of central claim and evidence in representations. One can aggregate phrases in the claim (e.g. noun phrases, verb phrases, adjec- information that is obtained by reasoning over individual tive phrases, etc.). Secondly, LOREN aggregates the individ- claim-evidence pair (Yoneda et al. 2018; Hanselowski et al. ual hypotheses based on a set of logical composition rules 2018), or can regard each sentence as a node and reason with and perform joint reasoning to produce a final decision. As architectures such as self attention-based graph reasoning a motivating example, Fig.1 shows the reasoning process in (Zhong et al. 2020a) or kernel graph attention mechanism verifying “The Adventures of Pluto Nash was reviewed by (Liu et al. 2020). In contrast, Zhong et al. (2020b) builds se- Ron Underwood”. That is, in the example, we need to check mantic graphs of the claim and evidence by semantic role la- the integrity of every central phrase, which in turn presents beling and employs an XLNet (Yang et al. 2019) and GNNs interpretability on the final prediction. (Kipf and Welling 2016; Velickoviˇ c´ et al. 2018) to propa- We formulate the verification problem using a set of sym- gate the information and realize learning to reason. Instead, bolic rules according to Kleene’s Three-valued logic (Kleene we take a further step forward to logical reasoning in this 1952; Ciucci and Dubois 2012), which models the uncer- task and take advantage of the ability to model uncertainty tainty in verification beyond binary truth values. Each rule in the real world in neural networks. concerns the verification of a central phrase and its relevant Neural Networks with Logic in NLP. Previous efforts phrases in evidence, which together form a reasoning unit. toward unifying symbolic logic and neural networks include As a key perspective, we convert the problem of finding rel- those of Sourek et al. (2015); Dong et al. (2019a); Manhaeve evant phrases in evidence into a question answering task, et al. (2018); Lamb et al. (2020). A class of integrated sym- where we generate probing questions for evidence to answer. bolic logic and neural network methods is based on the vari- For example, one probing question can be “who reviewed the ational EM framework (Qu and Tang 2019; Harsha Vardhan, Adventure of Pluto Nash”. Compromising on the extreme Jia, and Kok 2020; Zhang et al. 2020). They alternately op- complexity of the real world, we soften the logic and take timize the logical model(e.g., Markov Logic Networks) and advantage of the linguistic and world knowledge learned in the neural model to achieve the unification. Another stan- large-scale pre-trained language models (PLMs), e.g. BERT dard method is to soften logic with neural network compo- (Devlin et al. 2019) and RoBERTa (Liu et al. 2019), which nents (Li et al. 2019; Wang and Pan 2020; Hu et al. 2016), serves as the basis of our architecture for reasoning. which can be trained in an end-to-end manner. Besides, To summarize, the contributions of this work include: neural module networks (NMNs) provide a general-purpose framework for learning collections of neural modules that • We are the first to formulate the logic required in fact veri- can be dynamically assembled into arbitrary deep networks fication with three-valued logic and enhance the deep rea- (Andreas et al. 2016; Zhong et al. 2020a). soning in this task with symbolic logic. • We propose LOREN to decompose the verification of a 3 Proposed Approach claim into the verification of every reasoning unit, fol- In this section, we present the main framework of the lowed by joint logical reasoning, which enjoys inter- LOREN method for fact verification. We follow the setup of pretability. FEVER 1.0 shared task (Thorne et al. 2018), which consists • Our framework with simple design of each component of two sub-tasks: evidence retrieval and fact verification. In surpasses previous published SOTA model on FEVER by this paper, we focus on the fact verification task, that is, to 1.13 points on the main metric, and has more gains with predict the veracity of a given textual claim and its relevant the evolution of backbone PLM.