Using Rich Inference to Find Novel Answers to Questions

Edinburgh Research Explorer Using Rich Inference to Find Novel Answers to Questions Citation for published version: Nuamah, K, Bundy, A & Lucas, C 2015, Using Rich Inference to Find Novel Answers to Questions. in 3rd International Essence Workshop: Algorithms for Processing Meaning. Evolution of Shared Semantics in Computational Environments (ESSENCE), Barcelona, United Kingdom, 20/05/15. Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: 3rd International Essence Workshop: Algorithms for Processing Meaning General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 06. Oct. 2021 Using Rich Inference to Find Novel Answers to Questions Kwabena Nuamah Alan Bundy Christopher Lucas School of Informatics School of Informatics School of Informatics University of Edinburgh University of Edinburgh University of Edinburgh Edinburgh, United Kingdom Edinburgh, United Kingdom Edinburgh, United Kingdom [email protected] [email protected] [email protected] Abstract— The Web is continuously enriched with data and has typically attempt (unsuccessfully) to find the pre-stored fact become a large knowledge repository. However, machines are unable population(UK, 2021, p). It most likely will not find this, and so to fully exploit this vast knowledge space in performing reasoning will give up and return no answer. Wolfram|Alpha [1] highlights tasks such as question answering. This inability limits the extent of this point in a statement on its website: “Only what is known is inference and ultimately limits the range of questions they can known to Wolfram|Alpha”. In contrast, humans are able to answer. We argue that the quality and range of answers generated answer this kind of question, by indirectly inferring answers that by a question-answering system is significantly improved when we we do not already have, from other readily available use rich reasoning techniques to infer novel knowledge from web information. In the example above, we could look up the data. By finding and aggregating facts from different knowledge population values for past years, and then estimate the bases, an agent can obtain a better representation of a domain and population in 2021 using regression. We could also find the hence infer new facts which did not exist in any of the original knowledge sources. We intend to explore rich semantic population growth rate from Wikipedia and use that to predict representations and rich forms of reasoning. These include the the population in 2021. In so doing, we use a combination of curation of data and the use of a combination of heuristics, logic and heuristics, logic and probabilistic techniques to infer answers. probabilistic techniques to infer answers. This approach will We refer to this as rich inference. minimize noise and uncertainty in the knowledge for reasoning. Our We believe that rich inference, applied to the heterogeneous customized representations will suit the problem to be solved rather and ever-growing sources of information on the web, is critical than being restricted by the formalisms used in the sources. We plan to realizing the promise of automated question-answering. to implement this in a question-answering system that exploits a vast set of knowledge bases such as ontologies and Linked Data More specifically, we claim that the quality and range of repositories. Our question-answering system will focus on questions answers generated by a question-answering system is which require rich inferences such as prediction and composition of significantly improved when we automatically curate data and answers from several pieces of information. use richer forms of inference to infer novel knowledge from Semantic Web data [2]. This improvement can be achieved by Keywords— inference, question-answering, knowledge finding and aggregating facts from different knowledge bases, representation, heuristics, uncertainty obtaining a better representation of the domain, discovering and caching new facts that are not already stored in any of the I. INTRODUCTION original knowledge sources. The rich inference that supports the The increasing availability of knowledge bases, such as project includes heuristics for decomposing questions, logical ontologies on the web, has opened up the possibility of computer and probabilistic reasoning and higher-order functions which we agents taking advantage of the massive amounts of information use to aggregate data into answers to questions, not limited by on the web for reasoning and information retrieval tasks that the formalisms of the source data. were previously intractable. Logical inference can enable an agent to infer implicit relationships between concepts in the In practical terms, we intend to build a system that can knowledge base, provided appropriate techniques are employed respond to questions where no suitable answer is contained in to deal with ambiguous, incomplete and sometimes erroneous any available data source, e.g., as a Resource Description data. Framework (RDF) triple [3], stored phrase, or table entry. This requires rich inference applied to pre-stored facts, logical When given a question, humans possess the ability to choose relationships, e.g., web ontology language (OWL) [4] and from a gamut of possible strategies the one that best solves the description logics [5], and other formal semantics. Further, any question. This ability allows us to answer questions even when novel facts or relationships that have been inferred can then be the answer is not pre-stored in our memory or knowledge base. propagated back to customized knowledge-bases, facilitating In contrast, question answering (QA) systems, although future question-answering. Unlike current question answering originally designed to use inference, tend to assume that the systems which focus on the natural language processing (NLP) answer is pre-stored in a knowledge base. Consider the question problems inherent in QA, our core contribution will emphasize “What will be the UK population in 2021?”. A QA system will mapping machine-readable queries to answers. Although natural language processing is not our main focus, we will use third- uses to search its knowledge base. It uses its rule-based approach party tools to map natural-language questions to representations (S-rules) and its natural language annotations to find the that our system can use. matches in its knowledge base to the question and then returns an answer from the best matches. II. INFERENCE IN EXISTING QA MODELS Model 3 does not just transform the question into some To varying extents, recent QA systems apply different forms specific representation, but also decomposes it by some criteria. of question transformations, decompositions, rules and inference For instance, the IBM Watson system uses parallel techniques to get answers to questions. We classify these into decomposition [8] when questions contain mutually three models. Fig. 1 shows the three main types of models that independent facts about the answer. An example used by the current QA systems use. authors was “[Which] company with origins dating back to 1876 Model 1 is the simplest type, characterized by avoiding any became the first U.S. company to have 1 million stockholders in transformations of the representation of the question. Model 1 1951?”. In this question, knowing the company with origins systems query knowledge bases directly, with the hope that the dating back to 1876 is important, but not necessary to data that best answers the questions are immediately available. determining the first U.S. company to have 1 million This model is often restricted to a specific domain using curated stockholders in 1591. So both can be determined independently knowledge bases, and a query language with a restricted and a common answer that both sub-questions find, will most vocabulary. This is found in simple QA systems which place a likely be the answer to the whole question. IBM Watson also user interface over the knowledge base and then find answers uses nested decomposition for questions containing an that best match the user query. Most basic information retrieval independent fact about an entity related to the correct answer and system and database systems such as SQL (Structured Query a separate fact that links that entity to the correct answer. An Language) follow this model. example of this type of question is “Which surgical procedure is required to deal with an aortic condition associated with Model 2 adds a question transformation feature. The bicuspid aortic valves?”. In this question, it is necessary to first objective is to transform the question so that it exploits the determine the aortic condition associated with bicuspid aortic knowledge representation formalism used in the knowledge valves before the surgical procedure required to deal with it is base. This allows the QA system to work with knowledge bases found. whose formalisms are known. These transformation rules are usually fixed and specific to the knowledge bases that the QA Saquete et.al [9] also device a temporal decomposition system depends on. AskMSR[6] uses this technique to strategy in their QA system which was designed to answer reformulate questions. Because its core strategy is to leverage questions of a temporal nature. The question: “Where did Bill search engines, the reformulation of questions allows it to Clinton study before going to Oxford University?” is rewrite the same query in different ways, and then submit each decomposed into the questions “Where did Bill Clinton study” query to a search engine.

Load more