Knowledge-Based Question Answering As Machine Translation
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge-Based Question Answering as Machine Translation Junwei Bao† ∗, Nan Duan‡ , Ming Zhou‡ , Tiejun Zhao† †Harbin Institute of Technology ‡Microsoft Research baojunwei001@gmail.com nanduan, mingzhou @microsoft.com { tjzhao@hit.edu.cn} Abstract 2013; Poon, 2013; Artzi et al., 2013; Kwiatkowski et al., 2013; Berant et al., 2013); Then, the answer- A typical knowledge-based question an- s are retrieved from existing KBs using generated swering (KB-QA) system faces two chal- MRs as queries. lenges: one is to transform natural lan- Unlike existing KB-QA systems which treat se- guage questions into their meaning repre- mantic parsing and answer retrieval as two cas- sentations (MRs); the other is to retrieve caded tasks, this paper presents a unified frame- answers from knowledge bases (KBs) us- work that can integrate semantic parsing into the ing generated MRs. Unlike previous meth- question answering procedure directly. Borrow- ods which treat them in a cascaded man- ing ideas from machine translation (MT), we treat ner, we present a translation-based ap- the QA task as a translation procedure. Like MT, proach to solve these two tasks in one u- CYK parsing is used to parse each input question, nified framework. We translate questions and answers of the span covered by each CYK cel- to answers based on CYK parsing. An- l are considered the translations of that cell; un- swers as translations of the span covered like MT, which uses offline-generated translation by each CYK cell are obtained by a ques- tables to translate source phrases into target trans- tion translation method, which first gener- lations, a semantic parsing-based question trans- ates formal triple queries as MRs for the lation method is used to translate each span into span based on question patterns and re- its answers on-the-fly, based on question patterns lation expressions, and then retrieves an- and relation expressions. The final answers can be swers from a given KB based on triple obtained from the root cell. Derivations generated queries generated. A linear model is de- during such a translation procedure are modeled fined over derivations, and minimum er- by a linear model, and minimum error rate train- ror rate training is used to tune feature ing (MERT) (Och, 2003) is used to tune feature weights based on a set of question-answer weights based on a set of question-answer pairs. pairs. Compared to a KB-QA system us- Figure 1 shows an example: the question direc- ing a state-of-the-art semantic parser, our tor of movie starred by Tom Hanks is translated to method achieves better results. one of its answers Robert Zemeckis by three main director of director of 1 Introduction steps: (i) translate to ; (ii) translate movie starred by Tom Hanks to one of it- Knowledge-based question answering (KB-QA) s answers Forrest Gump; (iii) translate director of computes answers to natural language (NL) ques- Forrest Gump to a final answer Robert Zemeckis. tions based on existing knowledge bases (KBs). Note that the updated question covered by Cell[0, Most previous systems tackle this task in a cas- 6] is obtained by combining the answers to ques- caded manner: First, the input question is trans- tion spans covered by Cell[0, 1] and Cell[2, 6]. formed into its meaning representation (MR) by The contributions of this work are two-fold: (1) an independent semantic parser (Zettlemoyer and We propose a translation-based KB-QA method Collins, 2005; Mooney, 2007; Artzi and Zettle- that integrates semantic parsing and QA in one moyer, 2011; Liang et al., 2011; Cai and Yates, unified framework. The benefit of our method This∗ work was finished while the author was visiting Mi- is that we don’t need to explicitly generate com- crosoft Research Asia. plete semantic structures for input questions. Be- 967 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 967–976, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics Hanks, Film.Actor.Film, Forrest Gump 6 and (iii) director of Forrest Gump ⟹ Robert Zemeckis i2 Forrest Gump, Film.Film.Director, Robert h Cell[0, 6] (ii) movie starred by Tom Hanks ⟹ Forrest Gump Zemeckis 6 are three ordered formal triples i0 (i) director of ⟹ director of corresponding to the three translation steps in Cell[2, 6] Figure 1. We define the task of transforming question spans into formal triples as question Cell[0, 1] translation. denotes one final answer of . A Q director of movie starred by Tom Hanks h ( ) denotes the ith feature function. • i · λ denotes the feature weight of h ( ). Figure 1: Translation-based KB-QA example • i i · According to the above description, our KB- QA method can be decomposed into four tasks as: sides which, answers generated during the transla- (1) search space generation for ( ); (2) ques- tion procedure help significantly with search space H Q tion translation for transforming question spans in- pruning. (2) We propose a robust method to trans- to their corresponding formal triples; (3) feature form single-relation questions into formal triple design for h ( ); and (4) feature weight tuning for queries as their MRs, which trades off between i · λ . We present details of these four tasks in the transformation accuracy and recall using question { i} following subsections one-by-one. patterns and relation expressions respectively. 2 Translation-Based KB-QA 2.2 Search Space Generation We first present our translation-based KB-QA 2.1 Overview method in Algorithm 1, which is used to generate Formally, given a knowledge base and an N- ( ) for each input NL question . KB L question , our KB-QA method generates a set H Q Q Q of formal triples-answer pairs , as deriva- {hD Ai} Algorithm 1: Translation-based KB-QA tions, which are scored and ranked by the distribu- 1 for l = 1 to do |Q| tion P ( , , ) defined as follows: 2 for all i, j s.t. j i = l do hD Ai|KB Q j − 3 ( i ) = ; H Q ∅ j M 4 T = QT rans( , ); exp i=1 λi hi( , , , ) Qi KB { ·M hD Ai KB Q } 5 foreach formal triple t T do exp λ h ( 0 , 0 , , ) ∈ d 0 , 0 ( P) i=1 i i 6 create a new derivation ; hD A i∈H Q { · hD A i KB Q } 7 d. = t.eobj ; A P P 8 d. = t ; denotes a knowledge base1 that stores a D { } • KB 9 update the model score of d; set of assertions. Each assertion t is in j 10 insert d to ( i ); ID ID ∈ KB H Q the form of esbj, p, eobj , where p denotes 11 end {ID ID} 12 end a predicate, esbj and eobj denote the subject 13 2 end and object entities of t, with unique IDs . 14 for l = 1 to do |Q| 15 for all i, j s.t. j i = l do − ( ) denotes the search space , . 16 for all m s.t. i m < j do ≤ m j •H Q {hD Ai} D 17 for dl ( ) and dr ( ) do is composed of a set of ordered formal triples ∈ H Qi ∈ H Qm+1 18 update = dl. + dr. ; j Q A A t1, ..., tn . Each triple t = esbj, p, eobj i 19 T = QT rans( update, ); { } { } ∈ Q KB denotes an assertion in , where i and 20 foreach formal triple t T do D KB ∈ j denotes the beginning and end indexes of 21 create a new derivation d; 22 d. = t.eobj ; A S S the question span from which t is trans- 23 d. = dl. dr. t ; D D D { } formed. The order of triples in denotes 24 update the model score of d; j D 25 insert d to ( ); the order of translation steps from to . H Qi Q A 26 E.g., director of, Null, director of 1, Tom end h i0 h 27 end 1We use a large scale knowledge base in this paper, which 28 end contains 2.3B entities, 5.5K predicates, and 18B assertions. A 29 end 16-machine cluster is used to host and serve the whole data. 30 end 2 Each KB entity has a unique ID. For the sake of conve- 31 return ( ). nience, we omit the ID information in the rest of the paper. H Q 968 The first half (from Line 1 to Line 13) gen- Algorithm 2: -based Question Translation QP erates a formal triple set T for each unary span 1 T = ; j ∅ , using the question translation method 2 foreach entity mention e do i Q ∈ Q Q ∈ Q j j 3 pattern = replace e in with [Slot]; QT rans( , ) (Line 4), which takes as the Q Q Q i i 4 foreach question pattern do Q KB Q QP input. Each triple t T returned is in the form of 5 if pattern == pattern then ∈ j Q QP 6 = Disambiguate(e , predicate); esbj, p, eobj , where esbj’s mention occurs in i , E Q QP { } Q 7 foreach e do ∈ E p is a predicate that denotes the meaning expressed 8 create a new triple query q; j by the context of e in , e is an answer of 9 q = e, predicate, ? ; sbj i obj { QP } j Q 10 i = AnswerRetrieve(q, ); based on esbj, p and . We describe the im- {A } KB i 11 foreach i do Q KB A ∈ {A } plementation detail of QT rans( ) in Section 2.3. 12 create a new formal triple t; · 13 t = q.esbj , q.p, ; The second half (from Line 14 to Line 31) first { A} 14 t.score = 1.0; updates the content of each bigger span j by con- Qi 15 insert t to T ; catenating the answers to its any two consecutive 16 end j smaller spans covered by (Line 18).