Towards Explainable Question Answering (XQA) Saeedeh Shekarpour,1 Faisal Alshargi,2 Mohammadjafar Shekarpour 1 University of Dayton, Dayton, United States 2 University of Leipzig, Leipzig, Germany [email protected], [email protected], [email protected] Abstract poral and locative dimensions and feedbacks from the crowd during the spread of content. In addition, they The increasing rate of information pollution on the Web fail to 1) provide transparency about their exploitation requires novel solutions to tackle that. Question An- and ranking mechanisms, 2) discriminate trustworthy swering (QA) interfaces are simplified and user-friendly interfaces to access information on the Web. However, content and sources from untrustworthy ones, 3) iden- similar to other AI applications, they are black boxes tify manipulative or misleading context, and 4) reveal which do not manifest the details of the learning or rea- provenance. soning steps for augmenting an answer. The Explainable Question Answering (QA) applications are a subcat- Question Answering (XQA) system can alleviate the egory of Artificial Intelligence (AI) applications where pain of information pollution where it provides trans- for a given question, an adequate answer(s) is provided parency to the underlying computational model and ex- to the end-user regardless of concerns related to the poses an interface enabling the end-user to access and structure and semantics of the underlying data. The validate provenance, validity, context, circulation, inter- spectrum of QA implementations varies from statisti- pretation, and feedbacks of information. This position cal approaches (Shekarpour, Ngomo, and Auer 2013; paper sheds light on the core concepts, expectations, and challenges in favor of the following questions (i) Shekarpour et al. 2015), deep learning models (Xiong, What is an XQA system?, (ii) Why do we need XQA?, Merity, and Socher 2016; Shekarpour, Ngomo, and (iii) When do we need XQA? (iv) How to represent the Auer 2013) to simple rule-based (i.e., template-based) explanations? (iv) How to evaluate XQA systems? approaches (Unger et al. 2012; Shekarpour et al. 2011). Also, the underlying data sets in which the answer is exploited might range from Knowledge Graphs (KG) Introduction holding a solid semantics as well as structure to un- The increasing rate of information pollution [1–4] on structured corpora (free text) or consolidation of both. the Web requires novel solutions to tackle. In fact there Apart from the implementation details and the back- major deficiencies in the area of computation, informa- ground data, roughly speaking, the research commu- tion, and Web science as follows: (i) Information disor- nity introduced the following categories of QA sys- der on the Web: content is shared and spread on the tems: Web without any accountability (e.g., bots [6–9] or ma- • Ad-hoc QA: advocates simple and short questions nipulative politicians [10] posts fake news). The misin- and typically relies on one single KG or Corpus. formation is easily spread on social networks [11]. Al- though tech companies try to identify misinformation • Hybrid QA: requires federating knowledge from using AI techniques, it is not sufficient [12–14]. In fact, heterogeneous sources (Bast et al. 2007). the root of this problem lies in the fact that the Web • Complex QA: deals with complex questions which infrastructure might need newer standards and pro- are long, and ambiguous. Typically, to answer such tocols for sharing, organizing and managing content questions, it is required to exploit answers from a hy- (ii) The incompetence of the Information Retrieval (IR) brid of KGs and textual content (Asadifar, Kahani, and Question Answering (QA) models and interfaces: and Shekarpour 2018). the IR systems are limited to the bag-of-the-words se- mantics and QA systems mostly deal with factoid ques- • Visualized QA: answers texual questions from im- tions. In fact, they fail to take into account the other as- ages (Li et al.). pects of the content such as provenance, context, tem- • Pipeline-based QA: provides automatic integration of the state-of-the-art QA implementations (Singh AAAI Fall 2020 Symposium on AI for Social Good. et al. 2018b,a). Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International A missing point in all types of QA systems is that (CC BY 4.0). in case of either success or failure, they are silent to Question Answering System Search Engine Why not something else? what is the side effect of antibiotics? why do you fail? Corpora why do you succeed? when can i trust you? Deep How do I correct an error? HOTPOTQA: A Dataset for Diverse, ExplainableLearning Multi-hop Question AnsweringGraphical Bayesian Models Question Belief nets Answering Models Zhilin Yang* Peng Qi*~ SaizhengStatistical Zhang*| Ensemble Interlinked Models Methods Knowledge Yoshua Bengio|} William W. CohenMarkov† Graph Models Ruslan Salakhutdinov Christopher D. Manning~ Carnegie Mellon University Stanford University Mila, Universite´ de Montreal´ ~ | CIFAR Senior Fellow Google AI } † Figure 1: The existing QA systemszhiliny, are a rsalakhu black [email protected], which do notpengqi, provide manning any [email protected] for their inference. { } { } [email protected], [email protected], [email protected] the question of why? Why have been a particular an- Abstract Paragraph A, Return to Olympus: swer chosen? Why were the rest of the candidates dis- [1] Return to Olympus is the only album by the alterna- regarded? Why did the QA system fail to answer? tive rock band Malfunkshun. [2] It was released after whether it is the fault of theExisting model, question quality answering of data, (QA) datasets or fail the band had broken up and after lead singer Andrew lack of data? The truth isto that train the QA existing systems to QA perform systems complex rea- Wood (later of Mother Love Bone) had died of a drug soning and provide explanations for answers. overdose in 1990. [3] Stone Gossard, of Pearl Jam, had similar to other AI applications are a black box (see Fig- We introduce HOTPOTQA, a new dataset with compiled the songs and released the album on his label, ure 1) meaning they do113k not Wikipedia-basedprovide anysupporting question-answer pairs Loosegroove Records. fact (explanation) aboutwith the representedfour key features: answer (1) the with questions re- Paragraph B, Mother Love Bone: respect to the trustworthiness rate to the source of in- [4] Mother Love Bone was an American rock band that quire finding and reasoning over multiple sup- formed in Seattle, Washington in 1987. [5] The band formation, the confidence/reliabilityporting documents rate to to answer; the chosen (2) the ques- was active from 1987 to 1990. [6] Frontman Andrew answer, and the chain oftions reasoning are diverse or and learning not constrained steps to any Wood’s personality and compositions helped to catapult led to predict the final answer.pre-existing For knowledge example, bases Figure or knowledge 1 the group to the top of the burgeoning late 1980s/early schemas; (3) we provide sentence-level sup- 1990s Seattle music scene. [7] Wood died only days be- shows that the user sends the question ‘what is the fore the scheduled release of the band’s debut album, side effect of antibiotics?’porting facts requiredto the for reasoning, QA sys- allowing “Apple”, thus ending the group’s hopes of success. [8] tem. If the answer is representedQA systems into reason a way with similar strong supervision to The album was finally released a few months later. the interface of Google, thenand explain the end-user the predictions; might (4) we have offer a new Q: What was the former band of the member of Mother a mixed feeling as to whethertype of s/he factoid can comparison rely on this questions an- to test Love Bone who died just before the release of “Apple”? QA systems’ ability to extract relevant facts A: Malfunkshun swer or how and why suchand an perform answer necessary is chosen comparison. among We show Supporting facts: 1, 2, 4, 6, 7 numerous candidates? that HOTPOTQA is challenging for the latest The rising challenges regardingQA systems, the and credibility, the supporting reli- facts enable Figure 1: An example of the multi-hop questions in ability, and validity of themodels state-of-the-art to improve performance QA systems and make ex-Figure 2: An example from (Yang et al. 2018) where the supportingHOTPOT factsQA. Wenecessary also highlight to answer the supporting the given facts in ques- are of high importance, especiallyplainable predictions. on critical domains blue italics, which are also part of the dataset. such as life-science involved with human life. The Ex- tion Q are listed. plainable Question Answering1 Introduction (XQA) systems are an emerging area which tries to address the shortcomings First, some datasets mainly focus on testing the of the existing QA systems.The ability The to recent perform article reasoning (Yang and inferenceup withability discriminating of reasoning within information a single paragraph which is or biased et al. 2018) publishedover a data natural set language containing is an important pairs of aspect ofbased in- ondocument, race, gender, or single-hop age, ethnicity, reasoning. religion, For example, social or question/answer alongtelligence. with the The supporting task of question facts of answering the (QA)politicalin rankSQuAD of ( publisherRajpurkar et and al., targeted2016) questions user (Buranyi are corpus where an inferenceprovides mechanism a quantifiable over and them objective led to way to2017). test designed (Gunning to be 2017) answered raises given six a fundamental single paragraph compe- the answer. Figure 2 isthe an reasoning example ability taken of from intelligent the orig- systems. Totency this questionsas the context, regarding and most XAI of theas follows: questions can in inal article (Yang et al.end, 2018).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-