Methods for Using Textual Entailment in Open-Domain Question Answering
Total Page:16
File Type:pdf, Size:1020Kb
Methods for Using Textual Entailment in Open-Domain Question Answering Sanda Harabagiu and Andrew Hickl Language Computer Corporation 1701 North Collins Boulevard Richardson, Texas 75080 USA [email protected] Abstract better constrain the answers to individual ques- tions (cf. (Prager et al., 2004)). While different in Work on the semantics of questions has their approach, each of these frameworks seeks to argued that the relation between a ques- approximate the forms of semantic inference that tion and its answer(s) can be cast in terms will allow them to identify valid textual answers to of logical entailment. In this paper, we natural language questions. demonstrate how computational systems Recently, the task of automatically recog- designed to recognize textual entailment nizing one form of semantic inference – tex- can be used to enhance the accuracy of tual entailment – has received much attention current open-domain automatic question from groups participating in the 2005 and 2006 answering (Q/A) systems. In our experi- PASCAL Recognizing Textual Entailment (RTE) ments, we show that when textual entail- Challenges (Dagan et al., 2005). 1 ment information is used to either filter or As currently defined, the RTE task requires sys- rank answers returned by a Q/A system, tems to determine whether, given two text frag- accuracy can be increased by as much as ments, the meaning of one text could be reason- 20% overall. ably inferred, or textually entailed, from the mean- ing of the other text. We believe that systems de- 1 Introduction veloped specifically for this task can provide cur- Open-Domain Question Answering (Q/A) sys- rent question-answering systems with valuable se- tems return a textual expression, identified from mantic information that can be leveraged to iden- a vast document collection, as a response to a tify exact answers from ranked lists of candidate question asked in natural language. In the quest answers. By replacing the pairs of texts evaluated for producing accurate answers, the open-domain in the RTE Challenge with combinations of ques- Q/A problem has been cast as: (1) a pipeline of tions and candidate answers, we expect that textual linguistic processes pertaining to the processing entailment could provide yet another mechanism of questions, relevant passages and candidate an- for approximating the types of inference needed swers, interconnected by several types of lexico- in order answer questions accurately. semantic feedback (cf. (Harabagiu et al., 2001; In this paper, we present three different methods Moldovan et al., 2002)); (2) a combination of for incorporating systems for textual entailment language processes that transform questions and into the traditional Q/A architecture employed by candidate answers in logic representations such many current systems. Our experimental results that reasoning systems can select the correct an- indicate that (even at their current level of per- swer based on their proofs (cf. (Moldovan et al., formance) textual entailment systems can substan- 2003)); (3) a noisy-channel model which selects tially improve the accuracy of Q/A, even when no the most likely answer to a question (cf. (Echi- other form of semantic inference is employed. habi and Marcu, 2003)); or (4) a constraint sat- The remainder of the paper is organized as fol- isfaction problem, where sets of auxiliary ques- tions are used to provide more information and 1http://www.pascal-network.org/Challenges/RTE 905 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 905–912, Sydney, July 2006. c 2006 Association for Computational Linguistics QUESTION ANSWERING SYSTEM Answers Expected Keywords Ranked List of Paragraphs Answer Type AUTO−QUAB Question Passage Generation Answer Question Processing Retrieval Processing Module Module List of Questions Module (QP) (PR) (AP) Documents Entailed Paragraphs List of Entailed Paragraphs TEXTUAL ENTAILMENT Ranked List of Answers Entailed Questions Method 1 TEXTUAL TEXTUAL ENTAILMENT Answers−M1 ENTAILMENT Answers−M2 Method 2 Method 3 Answers−M3 Figure 1: Integrating Textual Entailment in Q/A. lows. Section 2 describes the three methods of techniques. using textual entailment in open-domain question As illustrated in Figure 1, most open-domain answering that we have identified, while Section 3 Q/A systems generally consist of a sequence of presents the textual entailment system we have three modules: (1) a question processing (QP) used. Section 4 details our experimental methods module; (2) a passage retrieval (PR) module; and and our evaluation results. Finally, Section 5 pro- (3) an answer processing (AP) module. Questions vides a discussion of our findings, and Section 6 are first submitted to a QP module, which extracts summarizes our conclusions. a set of relevant keywords from the text of the question and identifies the question's expected an- 2 Integrating Textual Entailment in swer type (EAT). Keywords – along with the ques- Question Answering tion's EAT – are then used by a PR module to re- In this section, we describe three different meth- trieve a ranked list of paragraphs which may con- ods for integrating a textual entailment (TE) sys- tain answers to the question. These paragraphs are tem into the architecture of an open-domain Q/A then sent to an AP module, which extracts an ex- system. act candidate answer from each passage and then ranks each candidate answer according to the like- Work on the semantics of questions (Groe- lihood that it is a correct answer to the original nendijk, 1999; Lewis, 1988) has argued that the question. formal answerhood relation found between a ques- tion and a set of (correct) answers can be cast Method 1. In Method 1, each of a ranked list of in terms of logical entailment. Under these ap- answers that do not meet the minimum conditions proaches (referred to as licensing by (Groenendijk, for TE are removed from consideration and then 1999) and aboutness by (Lewis, 1988)), p is con- re-ranked based on the entailment confidence (a sidered to be an answer to a question ?q iff ?q logi- real-valued number ranging from 0 to 1) assigned cally entails the set of worlds in which p is true(i.e. by the TE system to each remaining example. The ?p). While the notion of textual entailment has system then outputs a new set of ranked answers been defined far less rigorously than logical en- which do not contain any answers that are not en- tailment, we believe that the recognition of textual tailed by the user's question. entailment between a question and a set of candi- Table 1 provides an example where Method 1 date answers – or between a question and ques- could be used to make the right prediction for a set tions generated from answers – can enable Q/A of answers. Even though A1 was ranked in sixth systems to identify correct answers with greater position, the identification of a high-confidence precision than current keyword- or pattern-based positive entailment enabled it to be returned as the 906 top answer. In contrast, the recognition of a neg- tify a set of answer passages that contain correct ative entailment for A2 caused this answer to be answers to the original question. For example, in dropped from consideration altogether. Table 2, we find that entailment between questions Q1: “What did Peter Minuit buy for the equivalent of $24.00?” indicates the correctness of a candidate answer: Rank1 TE Rank2 Answer Text th st Q2 1 2 A1 6 YES 1 Everyone knows that, back in 1626, Peter Mi- here, establishing that entails AGQ and AGQ (0.89) nuit bought Manhattan from the Indians for $24 (but not AGQ3 or AGQ4) enables the system to se- worth of trinkets. st A2 1 NO – In 1626, an enterprising Peter Minuit flagged lect A2 as the correct answer. (0.81) down some passing locals, plied them with beads, cloth and trinkets worth an estimated When at least one of the AGQs generated by $24, and walked away with the whole island. the AutoQUAB module is entailed by the original Table 1: Re-ranking of answers by Method 1. question, all AGQs that do not reach TE are fil- Method 2. Since AP is often a resource- tered from consideration; remaining passages are intensive process for most Q/A systems, we ex- assigned an entailment confidence score and are pect that TE information can be used to limit the sent to the AP module in order to provide an ex- number of passages considered during AP. As il- act answer to the question. Following this pro- lustrated in Method 2 in Figure 1, lists of passages cess, candidate answers extracted from the AP retrieved by a PR module can either be ranked (or module were then re-associated with their AGQs filtered) using TE information. Once ranking is and resubmitted to the TE system (as in Method complete, answer extraction takes place only on 1). Question-answer pairs deemed to be posi- the set of entailed passages that the system consid- tive instances of entailment were then stored in a ers likely to contain a correct answer to the user's database and used as additional training data for question. the AutoQUAB module. When no AGQs were Method 3. In previous work (Harabagiu et al., found to be entailed by the original question, how- 2005b), we have described techniques that can be ever, passages were ranked according to their en- used to automatically generate well-formed natu- tailment confidence and sent to AP for further pro- ral language questions from the text of paragraphs cessing and validation. retrieved by a PR module. In our current system, sets of automatically-generated questions (AGQ) 3 The Textual Entailment System are created using a stand-alone AutoQUAB gen- Processing textual entailment, or recognizing eration module, which assembles question-answer whether the information expressed in a text can be pairs (known as QUABs) from the top-ranked pas- inferred from the information expressed in another sages returned in response to a question.