The Value of Semantic Parse Labeling for Knowledge Base Question Answering

The Value of Semantic Parse Labeling for Knowledge Base Question Answering Wen-tau Yih Matthew Richardson Christopher Meek Ming-Wei Chang Jina Suh Microsoft Research Redmond, WA 98052, USA scottyih,mattri,meek,minchang,jinsuh @microsoft.com { } Abstract system (i.e., did it get the right answer for a question) as opposed to agreement on intermediate rep- We demonstrate the value of collecting se- resentations (Berant et al., 2013; Kwiatkowski et mantic parse labels for knowledge base al., 2013), which allows for KBQA datasets to be question answering. In particular, (1) built with only the answers to each question. unlike previous studies on small-scale In this work, we re-examine the value of se- datasets, we show that learning from la- mantic parse labeling and demonstrate that seman- beled semantic parses significantly im- tic parse labels can provide substantial value for proves overall performance, resulting in knowledge base question-answering. We focus on absolute 5 point gain compared to learn- the task of question-answering on Freebase, using ing from answers, (2) we show that with an the WEBQUESTIONS dataset (Berant et al., 2013). appropriate user interface, one can obtain Our first contribution is the construction of the semantic parses with high accuracy and at largest semantic parse dataset for KB question- a cost comparable or lower than obtaining answering to date. In order to evaluate the costs just answers, and (3) we have created and and benefits of gathering semantic parse labels, we shared the largest semantic-parse labeled created the WEBQUESTIONSSP dataset1, which dataset to date in order to advance research contains semantic parses for the questions from in question answering. WEBQUESTIONS that are answerable using Free- 1 Introduction base. In particular, we provide SPARQL queries for 4,737 questions. The remaining 18.5% of the Semantic parsing is the mapping of text to a mean- original WEBQUESTIONS questions are labeled as ing representation. Early work on learning to build “not answerable”. This is due to a number of semantic parsers made use of datasets of questions factors including the use of a more stringent as- and their associated semantic parses (Zelle and sessment of “answerable”, namely that the ques- Mooney, 1996; Zettlemoyer and Collins, 2005; tion be answerable via SPARQL rather than by Wong and Mooney, 2007). Recent work on returning or extracting information from textual semantic parsing for knowledge base question- descriptions. Compared to the previous seman- answering (KBQA) has called into question the tic parse dataset on Freebase, Free917 (Cai and value of collecting such semantic parse labels, Yates, 2013), our WEBQUESTIONSSP is not only with most recent KBQA semantic parsing systems substantially larger, but also provides the semantic being trained using only question-answer pairs in- parses in SPARQL with standard Freebase entity stead of question-parse pairs. In fact, there is ev- identifiers, which are directly executable on Free- idence that using only question-answer pairs can base. yield improved performance as compared with ap- Our second contribution is a demonstration that proaches based on semantic parse labels (Liang et semantic parses can be collected at low cost. We al., 2013). It is also widely believed that collect- employ a staged labeling paradigm that enables ef- ing semantic parse labels can be a “difficult, time ficient labeling of semantic parses and improves consuming task” (Clarke et al., 2010) even for do- the accuracy, consistency and efficiency of ob- main experts. Furthermore, recent focus has been more on the final task-specific performance of a 1Available at http://aka.ms/WebQSP. 201 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 201–206, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics taining answers. In fact, in a simple comparison further constrain the answers. One key advantage with using a web browser to extract answers from of our UI design is that the annotator only needs to freebase.com, we show that we can collect se- focus on one particular sub-task during each stage. mantic parse labels at a comparable or even faster All of the choices made by the labeler are used to rate than simply collecting answers. automatically construct a coherent semantic parse. Our third contribution is an empirical demon- Note that the user can easily go back and forth to stration that we can leverage the semantic parse each of these three stages and change the previous labels to increase the accuracy of a state-of-the-art choices, before pressing the final submit button. question-answering system. We use a system that Take the question “who voiced meg on fam- currently achieves state-of-the-art performance on ily guy?” for example. The labeler will be pre- KBQA and show that augmenting its training with sented with two entity choices: Meg Griffin semantic parse labels leads to an absolute 5-point and Family Guy, where the former links “meg” increase in average F1. to the character’s entity and the latter links to the Our work demonstrates that semantic parse la- TV show. Depending on the entity selected, legiti- bels can provide additional value over answer la- mate Freebase predicates of the selected entity will bels while, with the right labeling tools, being be shown, along with the objects (either proper- comparable in cost to collect. Besides accuracy ties or entities). Suppose the labeler chooses Meg gains, semantic parses also have further benefits in Griffin as the topic entity. He should then pick yielding answers that are more accurate and con- actor as the main relationship, meaning the an- sistent, as well as being updatable if the knowl- swer should be the persons who have played this edge base changes (for example, as facts are added role. To accurately describe the question, the la- or revised). beler should add additional filters like the TV series is Family Guy and the performance type is 2 Collecting Semantic Parses voice in the final stage2. In order to verify the benefits of having labeled The design of our UI is inspired by recent work semantic parses, we completely re-annotated the on semantic parsing that has been applied to the WEBQUESTIONS dataset (Berant et al., 2013) WEBQUESTIONS dataset (Bast and Haussmann, such that it contains both semantic parses and the 2015; Reddy et al., 2014; Berant and Liang, 2014; derived answers. We chose to annotate the ques- Yih et al., 2015), as these approaches use a sim- tions with the full semantic parses in SPARQL, pler and yet more restricted semantic representa- based on the schema and data of the latest and last tion than first-order logic expressions. Following version of Freebase (2015-08-09). the notion of query graph in (Yih et al., 2015), the semantic parse is anchored to one of the enti- Labeling interface Writing SPARQL queries ties in the question as the topic entity and the core for natural language questions using a text editor is component is to represent the relation between the obviously not an efficient way to provide semantic entity and the answer, referred as the inferential parses even for experts. Therefore, we designed a chain. Constraints, such as properties of the an- staged, dialog-like user interface (UI) to improve swer or additional conditions the relation needs the labeling efficiency. Our UI breaks the po- to hold, are captured as well. Figure 1 shows an tentially complicated structured-labeling task into example of these annotated semantic parse com- separate, but inter-dependent sub-tasks. Given a ponents and the corresponding SPARQL query. question, the UI first presents entities detected in While it is clear that our UI does not cover compli- the questions using an entity linking system (Yang cated, highly compositional questions, most ques- and Chang, 2015), and asks the user to pick an en- tions in WEBQUESTIONS can be covered3. tity in the question as the topic entity that could lead to the answers. The user can also suggest a Labeling process In order to ensure the data new entity if none of the candidates returned by quality, we recruit five annotators who are famil- the entity linking system is correct. Once the en- iar with design of Freebase. Our goal is to provide tity is selected, the system then requests the user 2 to pick the Freebase predicate that represents the Screenshots are included in the supplementary material. 3We manually edited the SPARQL queries for about 3.1% relationship between the answers and this topic of the questions in WEBQUESTIONS that are not expressible entity. Finally, additional filters can be added to by our UI. 202 (a) who voiced meg on family guy? the design of a graph knowledge base. It searches over potential query graphs for a question, iter- (b) Topic Entity: Meg Griffin (m.035szd) atively growing the query graph by sequentially Inf. Chain: in-tv-program – actor adding a main topic entity, then adding an in- Constraints: (1) y0 – series – Family Guy (m.019nnl) (2) y0 – performance-type – Voice (m.02nsjvf) ferential chain and finally adding a set of constraints. During the search process, each candi- (c) Voice Family Guy date query graph is judged by a scoring function on how likely the graph is a correct parse, based on features indicating how each individual com- Meg Griffin in-tv-program y0 actor x ponent matches the original question, as well as (d) PREFIX ns: <http://rdf.freebase.com/ns/> some properties of the whole query graph. Exam- SELECT ?x ple features include the score output by the entity WHERE { ns:m.035szd ns:tv.tv_character.appeared_in_tv_program ?y0 .

Load more