Information Extraction Over Structured Data: Question Answering with Freebase

Information Extraction over Structured Data: Question Answering with Freebase Xuchen Yao 1 and Benjamin Van Durme 1,2 1Center for Language and Speech Processing 2Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD, USA Abstract bounded by the accuracy of the original semantic parsing, and the well-formedness of resultant Answering natural language questions us- database queries.1 ing the Freebase knowledge base has re- The Information Extraction (IE) community ap- cently been explored as a platform for ad- proaches QA differently: first performing rela- vancing the state of the art in open do- tively coarse information retrieval as a way to main semantic parsing. Those efforts map triage the set of possible answer candidates, and questions to sophisticated meaning repre- only then attempting to perform deeper analysis. sentations that are then attempted to be Researchers in semantic parsing have recently matched against viable answer candidates explored QA over Freebase as a way of moving in the knowledge base. Here we show beyond closed domains such as GeoQuery (Tang that relatively modest information extrac- and Mooney, 2001). While making semantic pars- tion techniques, when paired with a web- ing more robust is a laudable goal, here we provide scale corpus, can outperform these sophis- a more rigorous IE baseline against which those ticated approaches by roughly 34% rela- efforts should be compared: we show that “tradi- tive gain. tional” IE methodology can significantly outperform prior state-of-the-art as reported in the se- 1 Introduction mantic parsing literature, with a relative gain of Question answering (QA) from a knowledge base 34% F1 as compared to Berant et al. (2013). (KB) has a long history within natural language processing, going back to the 1960s and 1970s, 2 Approach with systems such as Baseball (Green Jr et al., 1961) and Lunar (Woods, 1977). These systems We will view a KB as an interlinked collection of were limited to closed-domains due to a lack of “topics”. When given a question about one or sev- knowledge resources, computing power, and abil- eral topics, we can select a “view” of the KB con- ity to robustly understand natural language. With cerning only involved topics, then inspect every the recent growth in KBs such as DBPedia (Auer related node within a few hops of relations to the et al., 2007), Freebase (Bollacker et al., 2008) topic node in order to extract the answer. We call and Yago2 (Hoffart et al., 2011), it has be- such a view a topic graph and assume answers can come more practical to consider answering ques- be found within the graph. We aim to maximally tions across wider domains, with commercial sys- automate the answer extraction process, by mas- tems including Google Now, based on Google’s sively combining discriminative features for both Knowledge Graph, and Facebook Graph the question and the topic graph. With a high per- Search, based on social network connections. formance learner we have found that a system with The AI community has tended to approach this millions of features can be trained within hours, problem with a focus on first understanding the in- leading to intuitive, human interpretable features. tent of the question, via shallow or deep forms of For example, we learn that given a question con- semantic parsing (c.f. 3 for a discussion). Typ- cerning money, such as: what money is used in § ically questions are converted into some mean- 1As an example, 50% of errors of the CCG-backed ing representation (e.g., the lambda calculus), then (Kwiatkowski et al., 2013) system were contributed by pars- mapped to database queries. Performance is thus ing or structural matching failure. 956 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 956–966, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics ukraine, the expected answer type is likely cur- al., 2013; Kwiatkowski et al., 2013) or distant su- rency. We formalize this approach in 4. pervision (Krishnamurthy and Mitchell, 2012). § One challenge for natural language querying We instead attack the problem of QA from a KB against a KB is the relative informality of queries from an IE perspective: we learn directly the pat- as compared to the grammar of a KB. For exam- tern of QA pairs, represented by the dependency ple, for the question: who cheated on celebrity parse of questions and the Freebase structure of A, answers can be retrieved via the Freebase rela- answer candidates, without the use of intermedi- tion celebrity.infidelity.participant, but the con- ate, general purpose meaning representations. nection between the phrase cheated on and the The data challenge is more formally framed as formal KB relation is not explicit. To allevi- ontology or (textual) schema matching (Hobbs, ate this problem, the best attempt so far is to 1985; Rahm and Bernstein, 2001; Euzenat and map from ReVerb (Fader et al., 2011) predicate- Shvaiko, 2007): matching structure of two on- argument triples to Freebase relation triples (Cai tologies/databases or (in extension) mapping be- and Yates, 2013; Berant et al., 2013). Note that tween KB relations and NL text. In terms of to boost precision, ReVerb has already pruned the latter, Cai and Yates (2013) and Berant et al. down less frequent or credible triples, yielding not (2013) applied pattern matching and relation inter- as much coverage as its text source, ClueWeb. section between Freebase relations and predicate- Here we instead directly mine relation mappings argument triples from the ReVerb OpenIE sys- from ClueWeb and show that both direct relation tem (Fader et al., 2011). Kwiatkowski et al. mapping precision and indirect QA F1 improve by (2013) expanded their CCG lexicon with Wik- a large margin. Details in 5. tionary word tags towards more domain indepen- § Finally, we tested our system, jacana- dence. Fader et al. (2013) learned question para- freebase,2 on a realistic dataset generously phrases from aligning multiple questions with the contributed by Berant et al. (2013), who collected same answers generated by WikiAnswers. The thousands of commonly asked questions by key factor to their success is to have a huge text crawling the Google Suggest service. Our source. Our work pushes the data challenge to the method achieves state-of-the-art performance limit by mining directly from ClueWeb, a 5TB with F1 at 42.0%, a 34% relative increase from collection of web data. the previous F1 of 31.4%. Finally, the KB community has developed other means for QA without semantic parsing (Lopez et 3 Background al., 2005; Frank et al., 2007; Unger et al., 2012; QA from a KB faces two prominent challenges: Yahya et al., 2012; Shekarpour et al., 2013). Most model and data. The model challenge involves of these work executed SPARQL queries on in- finding the best meaning representation for the terlinked data represented by RDF (Resource De- question, converting it into a query and exe- scription Framework) triples, or simply performed cuting the query on the KB. Most work ap- triple matching. Heuristics and manual templates proaches this via the bridge of various interme- were also commonly used (Chu-Carroll et al., diate representations, including combinatory cat- 2012). We propose instead to learn discriminative egorial grammar (Zettlemoyer and Collins, 2005, features from the data with shallow question anal- 2007, 2009; Kwiatkowski et al., 2010, 2011, ysis. The final system captures intuitive patterns 2013), synchronous context-free grammars (Wong of QA pairs automatically. and Mooney, 2007), dependency trees (Liang et al., 2011; Berant et al., 2013), string kernels (Kate 4 Graph Features and Mooney, 2006; Chen and Mooney, 2011), Our model is inspired by an intuition on how ev- and tree transducers (Jones et al., 2012). These eryday people search for answers. If you asked works successfully showed their effectiveness in someone: what is the name of justin bieber QA, despite the fact that most of them require brother,3 and gave them access to Freebase, that hand-labeled logic annotations. More recent re- person might first determine that the question search started to minimize this direct supervision by using latent meaning representations (Berant et 3All examples used in this paper come from the train- ing data crawled from Google Suggest. They are low- 2https://code.google.com/p/jacana ercased and some contain typos. 957 is about Justin Bieber (or his brother), go to 1. if a node was tagged with a question feature, Justin Bieber’s Freebase page, and search for his then replace this node with its question feature, brother’s name. Unfortunately Freebase does not e.g., what qword=what; → contain an exact relation called brother, but in- 2. (special case) if a qtopic node was tagged as stead sibling. Thus further inference (i.e., brother a named entity, then replace this node with male sibling) has to be made. In the following its its named entity form, e.g., bieber ↔ → we describe how we represent this process. qtopic=person; 4.1 Question Graph 3. drop any leaf node that is a determiner, prepo- sition or punctuation. In answering our example query a person might The converted graph is shown in Figure 1(a), take into consideration multiple constraints. With right side. We call this a question feature graph, regards to the question, we know we are looking with every node and relation a potential feature for the name of a person based on the following: for this question. Then features are extracted the dependency relation nsubj(what, name) • in the following form: with s the source and and prep of(name, brother) indicates that the t the target node, for every edge e(s, t) in the 4 question seeks the information of a name; graph, extract s, t, s t and s e t as | | | the dependency relation prep of(name, features.

Load more