A Question Answering System Developed As a Project in a Natural Language Processing Course*

Total Page:16

File Type:pdf, Size:1020Kb

A Question Answering System Developed As a Project in a Natural Language Processing Course* A Question Answering System Developed as a Project in a Natural Language Processing Course* W. Wang, J. Auer, R. Parasuraman, I. Zubarev, D. Brandyberry, and M. P. Harpm Purdue University West Lafayette, IN 47907 {wang28,jauer,pram,dbrandyb,harper}@ecn.purdue.edu and [email protected] Abstract Plain Text ( Story and Questions ) This paper describes the Question Answering Sys- tem constructed during a one semester graduate- )i..... 8__~'!P°_s__!,g_ ~L ....... level course on Natural Language Processing (NLP). We hypothesized that by using a combination of syn- Tagged Text tactic and semantic features and machine learning 2 i Name Identification techniques, we could improve the accuracy of ques- Propernoun Identified 3 Tagged Text tion answering on the test set of the Remedia corpus Jr_ ......................... T ........... Word Lex]cat Lexica$ and Role • over the reported levels. The approach, although Infonmatior L ........ LabellnformalJ~1 ~ .... .... ~ Lexicon ....... ~ .............. novel, was not entirely successful in the time frame ........ L .................. Wordnel ........... Grammar Partial ~ : Pronoun of the course. ....... - ............ Rules Parser 4--- ................ - Resolution Gramnrmr ...... ~ Pronouns ...................... .............................. Resolved 1 Introduction This paper describes a preliminary reading com- Sentence-to-Question prehension system constructed as a semester-long . I, Comparison i project for a natural language processing course. This was the first exposure to this material for all but one student, and so much of the semester . --~=:: ..... ~__ .. ~ ....... ~ " Rule-Based " Neuro- : Neural Genetic was spent learning about and constructing the tools Classifier .' Fuzzy Net. Network Algorithm .' that would be needed to attack this comprehen- • • .ANS ANS ANS ~ -- ~" ANS sive problem. The course was structured around the project of building a question answering system Voting following the HumSent evaluation as used by the .......................... ÷ ................................... Deep Read system (Hirschman eta]., 1999). The Answer with i Deep Read reading comprehension prototype system Highest Scores i (Hirschman et al., 1999) achieves a level of 36% of the answers correct using a bag-of-words approach Figure 1: The architecture for our question answer- together with limited linguistic processing. Since the ing system. average number of sentences per passage is 19.41, this performance is much better than chance (i.e., 5%). We hypothesized that by using a combina- (POS) tagger distributed with the Deep Read sys- tion of syntactic and semantic features and machine tem. This tagged text is then passed to the Name learning techniques, we could improve the accuracy Identification Module, which updates the tags of of question answering on the test set of the Remedia named entities with semantic information and gen- corpus over these reported levels. der when appropriate." The Partial Parser Mod- ule then takes this updated text and breaks it into 2 System Description phrases while attempting to ]exically disambiguate The overall architecture of our system is depicted the text. The Pronoun Resolution Module is con- sulted by the parser in order to resolve pronouns be- in Figure 1. The story sentences and its five ques- fore passing partially parsed sentences and questions tions (who, what, where, when, and why) are first to the Sentence-to-Question Comparison Module. preprocessed and tagged by the Brill part-of-speech The Comparison Module determines how strongly * We would like to thank the Deep Read group for giving us the phrases of a sentence are related to those of a " access to their test bed. question, and this information is passed to several 28 Rm3-5 (... The club is for boys who are under 12 years Bid.) They are called Cub ScButs. modules which attempt to learn which features of (Answer to Question 1 ) the comparison are the most important for identify- ing whether a sentence is a strong answer candidate. : POS tagging We intended to set up a voting scheme among vari- They are called i Cub Scouts ous modules; however, this part of the work has not PO_S_-.~_P.R. p_: P_O_S_=_~V_B P" ' P_OS='__V_BN'_~ ii_POS='N_N_P21 POS_--.:NN__S" been completed (as indicated by the dashed lines). Name Identification Our system, like Deep Read, uses as the develop- They are ~ i called Cub Scouts' Pos.'t~s" : ment set 28 stories from grade 2 and 27 from grade POS="PRP" POS='VBP" " POS='VBN" PROP~PE.'~t 5, each with five short answer questions (who, what, Initial Partial Parsing when, where, and why), and 60 stories with ques- tions from grades 3 and 4 for testing 1. We will refer They are called ] Cub Scouts TYPE=NP TYPE=VP TYPE.VP ! TYPE=NP to the development and testing data as the Remedia ID=I ID=I ID..2 1[:)=2 LABEL=auK LABEL=subject Lt~,BEL. rnvb LABEL=object BASE=be BASE.,call ! BASE =CubScouts corpus. The following example shows the informa- BASE=they AGR~,~p AGR.3p AGR=3p AGR=3p TENSE=present TENSE=pastp i tion added to a plain text sentence as it progresses GENDER=male VOtCE=ac~,'e VOICE=gas=ire GENDER=male through each module of the system we have created. SEM_TYPE=per~ .z~, SEM TYPE=tan SEM_TYPEiequat el SEM ~__PE-~_erson Pronoun Resolution Each module-is described in more detail in the fol- Y lowing sections. They are called Cub Scouts "P#PE=NP TYP~=VP TYPE=VP TYPE=NP 11:>-i io=1 ~O.2 ID=2 IL'~'BEL=subjec~ LABEL~ LABEL.nwb LAeEL,=objoct ~',SElthey BAS.~Mce BASE=call BASE=CubScouts AGR=3p , AGR.~3p AGFI=3p 2.1 Name Identification Module GENDER=nut ~1 : TENBE=pQI=~I~ TENSE=pastp AGR=3p ! BEM TYPE=per ' VOICE,=active VOICE=pass)re GENDER.mate , PRONRE F=boys ' = _S E_M TYF~i=b~-pu~.E L SEM TYPE=equate _, SEM-_UPE=~__~ The Name Identification Module expects as in- Updating Features put a file that has been tagged by the Brill They are called ! Cub Scouts ' TYPE=NP TYPE.V'P TYPE=VP TYPE=NP tagger distributed with the Deep Read system. ID=I IB=~ 1[:),=2 ID=2 LABEL=subject LABEL=am LABEL=mvb i LABEL=object The most important named entities in the Re- BASE-be BASE~an BASE=CubSco~t= ' BASE=boy AGR=3p AGRB3p AGFt=3g TENSIE.pms4mt TENSE~p~tp AGR=3p media corpus are the names of people and the GENDER=male voICE~ctive VOICE=passive GENDER=mak~ ; names of places. To distinguish between these • SEM_~PE=~.~ers~ __. SE M "i'~fP _Ept?e-p~_ _ SEU='r~'Pe=~q2~: two types, we created dictionaries for names of Bracketed Sentence people and names of places. The first and last Cub Scouts Who are : ~'V~¢- ......... name dictionaries were derived from the files TypE~.VP TYPE=NP ID=I JD.I ID=2 at http ://www. census, gov/genealogy/names/. , LABEL=subject LABEL~vb ! LABEL=object = BABE=who BABE.be 8ABE=CubScout= i AGR,,3p i : AGR=3p AGR=3p First names had an associated gender feature; TENSE=,preserd GENDER=male : GENDER=male YOICE~cti~ , names that were either male or female included • s E_M,.TY~,~ .... SE_M=D'~,E:--p. ~. gender frequency. Place names were extracted from Bracketed Question 1 atlases and other references, and included names of countries, major cities and capital cities, major Figure 2: Processing an example sentence for match- attractions and parks, continents, etc. WordNet ing with a question in our system. was also consulted because of its coverage of place names. There are 5,165 first name entries, 88,798 last name entries, and 1,086 place name entries in 2.2 Partial Parser Module the dictionaries used by this module. The Partial Parser Module follows sequentially af- The module looks up possible names to decide ter the Name Identification Module. The input is whether a word is a person's name or a location. If it the set of story sentences and questions, such that cannot find the word in the dictionaries, it then looks the words in each are tagged with POS tags and at the POS tags provided in the input file to deter- the names are marked with type and gender infor- mine whether or not it is a propernoun. Heuristics mation. Initially pronouns have not been resolved; (e.g., looking at titles like Mr. or word endings like the partial parser provides segmented text with rich rifle) are then applied to decide the semantic type lexical information and role labels directly to the of the propernoun, and if the type cannot be deter- Pronoun Resolution Modffle. After pronoun reso- mined, the module returns both person and location lution, the segmented text with resolved pronouns as its type. The accuracy of the Name Identification is returned to the partial parser for the parser to Module on the testing set was 79.6%. The accuracy update the feature values corresponding to the pro- adjusted to take into account incorrect tagging was nouns. Finally, the partial parser provides bracketed 83.6%. text to the Comparison Module, which extracts fea- tures that will be used to construct modules for an- swering questions. There were differences between the Deep Read electronic version of the passages and the Remedia published passages. The Partial Parser Module utilizes information We used the electronic passages. in a lexicon and a grammar to provide the partial 29 parses. The lexicon and the parser will be detailed script was created to semi-automate the construc- in the next two subsections. tion of the lexicon from information extracted from 2.2.1 The Lexicon previously existing dictionaries and from WordNet. There were two methods we used to construct the 2.2.2 The Partial Parser lexicon: open lexicon, which includes all words The parser segments each sentence into either a noun from the development set along with all determiners, phrase (NP), a verb phrase (VP), or a prepositional pronouns, prepositions, particles, and conjunctions phrase (PP), each with various feature sets.
Recommended publications
  • Handling Ambiguity Problems of Natural Language Interface For
    IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 3, May 2012 ISSN (Online): 1694-0814 www.IJCSI.org 17 Handling Ambiguity Problems of Natural Language InterfaceInterface for QuesQuesQuestQuestttionionionion Answering Omar Al-Harbi1, Shaidah Jusoh2, Norita Norwawi3 1 Faculty of Science and Technology, Islamic Science University of Malaysia, Malaysia 2 Faculty of Science & Information Technology, Zarqa University, Zarqa, Jordan 3 Faculty of Science and Technology, Islamic Science University of Malaysia, Malaysia documents processing, and answer processing [9], [10], Abstract and [11]. The Natural language question (NLQ) processing module is considered a fundamental component in the natural language In QA, a NLQ is the primary source through which a interface of a Question Answering (QA) system, and its quality search process is directed for answers. Therefore, an impacts the performance of the overall QA system. The most accurate analysis to the NLQ is required. One of the most difficult problem in developing a QA system is so hard to find an exact answer to the NLQ. One of the most challenging difficult problems in developing a QA system is that it is problems in returning answers is how to resolve lexical so hard to find an answer to a NLQ [22]. The main semantic ambiguity in the NLQs. Lexical semantic ambiguity reason is most QA systems ignore the semantic issue in may occurs when a user's NLQ contains words that have more the NLQ analysis [12], [2], [14], and [15]. To achieve a than one meaning. As a result, QA system performance can be better performance, the semantic information contained negatively affected by these ambiguous words.
    [Show full text]
  • The Question Answering System Using NLP and AI
    International Journal of Scientific & Engineering Research Volume 7, Issue 12, December-2016 ISSN 2229-5518 55 The Question Answering System Using NLP and AI Shivani Singh Nishtha Das Rachel Michael Dr. Poonam Tanwar Student, SCS, Student, SCS, Student, SCS, Associate Prof. , SCS Lingaya’s Lingaya’s Lingaya’s Lingaya’s University, University,Faridabad University,Faridabad University,Faridabad Faridabad Abstract: (ii) QA response with specific answer to a specific question The Paper aims at an intelligent learning system that will take instead of a list of documents. a text file as an input and gain knowledge from the given text. Thus using this knowledge our system will try to answer questions queried to it by the user. The main goal of the 1.1. APPROCHES in QA Question Answering system (QAS) is to encourage research into systems that return answers because ample number of There are three major approaches to Question Answering users prefer direct answers, and bring benefits of large-scale Systems: Linguistic Approach, Statistical Approach and evaluation to QA task. Pattern Matching Approach. Keywords: A. Linguistic Approach Question Answering System (QAS), Artificial Intelligence This approach understands natural language texts, linguistic (AI), Natural Language Processing (NLP) techniques such as tokenization, POS tagging and parsing.[1] These are applied to reconstruct questions into a correct 1. INTRODUCTION query that extracts the relevant answers from a structured IJSERdatabase. The questions handled by this approach are of Factoid type and have a deep semantic understanding. Question Answering (QA) is a research area that combines research from different fields, with a common subject, which B.
    [Show full text]
  • Quester: a Speech-Based Question Answering Support System for Oral Presentations Reza Asadi, Ha Trinh, Harriet J
    Quester: A Speech-Based Question Answering Support System for Oral Presentations Reza Asadi, Ha Trinh, Harriet J. Fell, Timothy W. Bickmore Northeastern University Boston, USA asadi, hatrinh, fell, [email protected] ABSTRACT good support for speakers who want to deliver more Current slideware, such as PowerPoint, reinforces the extemporaneous talks in which they dynamically adapt their delivery of linear oral presentations. In settings such as presentation to input or questions from the audience, question answering sessions or review lectures, more evolving audience needs, or other contextual factors such as extemporaneous and dynamic presentations are required. varying or indeterminate presentation time, real-time An intelligent system that can automatically identify and information, or more improvisational or experimental display the slides most related to the presenter’s speech, formats. At best, current slideware only provides simple allows for more speaker flexibility in sequencing their indexing mechanisms to let speakers hunt through their presentation. We present Quester, a system that enables fast slides for material to support their dynamically evolving access to relevant presentation content during a question speech, and speakers must perform this frantic search while answering session and supports nonlinear presentations led the audience is watching and waiting. by the speaker. Given the slides’ contents and notes, the system ranks presentation slides based on semantic Perhaps the most common situations in which speakers closeness to spoken utterances, displays the most related must provide such dynamic presentations are in Question slides, and highlights the corresponding content keywords and Answer (QA) sessions at the end of their prepared in slide notes. The design of our system was informed by presentations.
    [Show full text]
  • Ontology-Based Approach to Semantically Enhanced Question Answering for Closed Domain: a Review
    information Review Ontology-Based Approach to Semantically Enhanced Question Answering for Closed Domain: A Review Ammar Arbaaeen 1,∗ and Asadullah Shah 2 1 Department of Computer Science, Faculty of Information and Communication Technology, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia 2 Faculty of Information and Communication Technology, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia; [email protected] * Correspondence: [email protected] Abstract: For many users of natural language processing (NLP), it can be challenging to obtain concise, accurate and precise answers to a question. Systems such as question answering (QA) enable users to ask questions and receive feedback in the form of quick answers to questions posed in natural language, rather than in the form of lists of documents delivered by search engines. This task is challenging and involves complex semantic annotation and knowledge representation. This study reviews the literature detailing ontology-based methods that semantically enhance QA for a closed domain, by presenting a literature review of the relevant studies published between 2000 and 2020. The review reports that 83 of the 124 papers considered acknowledge the QA approach, and recommend its development and evaluation using different methods. These methods are evaluated according to accuracy, precision, and recall. An ontological approach to semantically enhancing QA is found to be adopted in a limited way, as many of the studies reviewed concentrated instead on Citation: Arbaaeen, A.; Shah, A. NLP and information retrieval (IR) processing. While the majority of the studies reviewed focus on Ontology-Based Approach to open domains, this study investigates the closed domain.
    [Show full text]
  • A Question-Driven News Chatbot
    What’s The Latest? A Question-driven News Chatbot Philippe Laban John Canny Marti A. Hearst UC Berkeley UC Berkeley UC Berkeley [email protected] [email protected] [email protected] Abstract a set of essential questions and link each question with content that answers it. The motivating idea This work describes an automatic news chat- is: two pieces of content are redundant if they an- bot that draws content from a diverse set of news articles and creates conversations with swer the same questions. As the user reads content, a user about the news. Key components of the system tracks which questions are answered the system include the automatic organization (directly or indirectly) with the content read so far, of news articles into topical chatrooms, inte- and which remain unanswered. We evaluate the gration of automatically generated questions system through a usability study. into the conversation, and a novel method for The remainder of this paper is structured as fol- choosing which questions to present which lows. Section2 describes the system and the con- avoids repetitive suggestions. We describe the algorithmic framework and present the results tent sources, Section3 describes the algorithm for of a usability study that shows that news read- keeping track of the conversation state, Section4 ers using the system successfully engage in provides the results of a usability study evaluation multi-turn conversations about specific news and Section5 presents relevant prior work. stories. The system is publicly available at https:// newslens.berkeley.edu/ and a demonstration 1 Introduction video is available at this link: https://www.
    [Show full text]
  • Natural Language Processing with Deep Learning Lecture Notes: Part Vii Question Answering 2 General QA Tasks
    CS224n: Natural Language Processing with Deep 1 Learning 1 Course Instructors: Christopher Lecture Notes: Part VII Manning, Richard Socher 2 Question Answering 2 Authors: Francois Chaubard, Richard Socher Winter 2019 1 Dynamic Memory Networks for Question Answering over Text and Images The idea of a QA system is to extract information (sometimes pas- sages, or spans of words) directly from documents, conversations, online searches, etc., that will meet user’s information needs. Rather than make the user read through an entire document, QA system prefers to give a short and concise answer. Nowadays, a QA system can combine very easily with other NLP systems like chatbots, and some QA systems even go beyond the search of text documents and can extract information from a collection of pictures. There are many types of questions, and the simplest of them is factoid question answering. It contains questions that look like “The symbol for mercuric oxide is?” “Which NFL team represented the AFC at Super Bowl 50?”. There are of course other types such as mathematical questions (“2+3=?”), logical questions that require extensive reasoning (and no background information). However, we can argue that the information-seeking factoid questions are the most common questions in people’s daily life. In fact, most of the NLP problems can be considered as a question- answering problem, the paradigm is simple: we issue a query, and the machine provides a response. By reading through a document, or a set of instructions, an intelligent system should be able to answer a wide variety of questions. We can ask the POS tags of a sentence, we can ask the system to respond in a different language.
    [Show full text]
  • Structured Retrieval for Question Answering
    SIGIR 2007 Proceedings Session 14: Question Answering Structured Retrieval for Question Answering Matthew W. Bilotti, Paul Ogilvie, Jamie Callan and Eric Nyberg Language Technologies Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 { mbilotti, pto, callan, ehn }@cs.cmu.edu ABSTRACT keywords in a minimal window (e.g., Ruby killed Oswald is Bag-of-words retrieval is popular among Question Answer- not an answer to the question Who did Oswald kill?). ing (QA) system developers, but it does not support con- Bag-of-words retrieval does not support checking such re- straint checking and ranking on the linguistic and semantic lational constraints at retrieval time, so many QA systems information of interest to the QA system. We present an simply query using the key terms from the question, on the approach to retrieval for QA, applying structured retrieval assumption that any document relevant to the question must techniques to the types of text annotations that QA systems necessarily contain them. Slightly more advanced systems use. We demonstrate that the structured approach can re- also index and provide query operators that match named trieve more relevant results, more highly ranked, compared entities, such as Person and Organization [14]. When the with bag-of-words, on a sentence retrieval task. We also corpus includes many documents containing the keywords, characterize the extent to which structured retrieval effec- however, the ranked list of results can contain many irrel- tiveness depends on the quality of the annotations. evant documents. For example, question 1496 from TREC 2002 asks, What country is Berlin in? A typical query for this question might be just the keyword Berlin,orthekey- Categories and Subject Descriptors word Berlin in proximity to a Country named entity.
    [Show full text]
  • Multiple Choice Question Answering Using a Large Corpus of Information
    Multiple Choice Question Answering using a Large Corpus of Information a dissertation submitted to the faculty of the graduate school of the university of minnesota by Mitchell Kinney in partial fulfillment of the requirements for the degree of doctor of philosophy Xiaotong Shen, Adviser July 2020 © Mitchell Kinney 2020 Acknowledgements I am very thankful for my adviser, Xiaotong Shen. He has consistently provided valu- able feedback and direction in my graduate career. His input when I was struggling and feeling dismayed at results was always encouraging and pushed me to continue onward. I also enjoyed our talks about topics unrelated to statistics that helped re- mind me not to completely consume my life with my studies. I would like to thank my committee members Jie Ding, Wei Pan, Maury Bramson, and Charlie Geyer for participating in my preliminary and final exams and for their feedback on improving my research. I am grateful to Galin Jones for his open door and help on side projects throughout the years. Yuhong Yang was an exceptional mentor and instructor that I am thankful for constantly encouraging me. Thank you also to Hui Zou, Charles Doss, Birgit Grund, Dennis Cook, Glen Meeden, and Adam Rothman for their excellent instruction and their helpful office-hour guidance during my first few years. The office staff at the statistics department (a special thank you to Taryn Verley who has been a constant) has been amazingly helpful and allowed me to only stress about my studies and research. My peers in the statistics department and at the university have made my time in Minnesota immensely enjoyable.
    [Show full text]
  • Artificial Intelligence for Virtual Assistant
    Flyer Information Management and Governance Artificial Intelligence for Virtual Assistant IDOL for Virtual Assistant, is an AI (Artificial Intelligence) powered solution, that allows organizations to offer their customers and employees access to information and processes using an automated human-like operator which can engage in natural language dialogues. streamline the retrieval process, which allows such databases of information to allow natural To address an increasingly diverse information to be obtained in a more conve- language questions to receive direct factual nient and user-friendly fashion. Moreover, it answers. This is achieved by processing and range of information requests, should offer an interface to allow configura- understanding the question and mapping it to IDOL uses our unique Natural tion of that process to ensure existing human an appropriate structured query that will in turn Language Question Answering knowledge is allowed to create and train the respond with the desired answer. An example (NLQA) technology to allow natural system to answer questions optimally. In gen- of this would be databases of financial prices conversations between end users eral, there are three independent steps to ac- over time, allowing a query such as, “What was complish this. the EPS of HPQ in Q3 2016?” to receive the and a virtual assistant, to perform precise answer. This database of structure in- queries against: curated (FAQ) Answer Bank formation is generally known as the fact bank. responses (Answer Bank), data Many administrators of support or user-help tables and data services (Fact systems have an existing set of frequently In addition, the system should have the abil- Bank), and unstructured documents asked questions that human support agents ity to understand and extract information (for (Passage Extraction); and to are trained on, or the help pages are populated example names, numbers) from unstructured data such as free-form documents.
    [Show full text]
  • Natural Language Question Answering: the View from Here
    Natural Language Engineering 7 (4): 275–300. c 2001 Cambridge University Press 275 DOI: 10.1017/S1351324901002807 Printed in the United Kingdom Natural language question answering: the view from here L. HIRSCHMAN The MITRE Corporation, Bedford, MA, USA R.GAIZAUSKAS Department of Computer Science, University of Sheffield, Sheffield, UK e-mail: [email protected] 1 Introduction: What is Question Answering? As users struggle to navigate the wealth of on-line information now available, the need for automated question answering systems becomes more urgent. We need systems that allow a user to ask a question in everyday language and receive an answer quickly and succinctly, with sufficient context to validate the answer. Current search engines can return ranked lists of documents, but they do not deliver answers to the user. Question answering systems address this problem. Recent successes have been reported in a series of question-answering evaluations that started in 1999 as part of the Text Retrieval Conference (TREC). The best systems are now able to answer more than two thirds of factual questions in this evaluation. The combination of user demand and promising results have stimulated inter- national interest and activity in question answering. This special issue arises from an invitation to the research community to discuss the performance, requirements, uses, and challenges of question answering systems. The papers in this special issue cover a small part of the emerging research in question answering. Our introduction provides an overview of question answering as a research topic. The first article, by Ellen Voorhees, describes the history of the TREC question answering evaluations, the results and the associated evaluation methodology.
    [Show full text]
  • Web Question Answering: Is More Always Better?
    Web Question Answering: Is More Always Better? Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng Microsoft Research One Microsoft Way Redmond, WA 98052 USA {sdumais, mbanko, brill}@research.microsoft.com [email protected]; [email protected] ABSTRACT simply by increasing the amount of data used for learning. This paper describes a question answering system that is designed Following the same guiding principle we take advantage of the to capitalize on the tremendous amount of data that is now tremendous data resource that the Web provides as the backbone available online. Most question answering systems use a wide of our question answering system. Many groups working on variety of linguistic resources. We focus instead on the question answering have used a variety of linguistic resources – redundancy available in large corpora as an important resource. part-of-speech tagging, syntactic parsing, semantic relations, We use this redundancy to simplify the query rewrites that we named entity extraction, dictionaries, WordNet, etc. (e.g., need to use, and to support answer mining from returned snippets. [2][8][11][12][13][15][16]).We chose instead to focus on the Our system performs quite well given the simplicity of the Web as gigantic data repository with tremendous redundancy that techniques being utilized. Experimental results show that can be exploited for question answering. The Web, which is question answering accuracy can be greatly improved by home to billions of pages of electronic text, is orders of magnitude analyzing more and more matching passages. Simple passage larger than the TREC QA document collection, which consists of ranking and n-gram extraction techniques work well in our system fewer than 1 million documents.
    [Show full text]
  • A Speech-Based Question-Answering System on Mobile Devices
    Qme! : A Speech-based Question-Answering system on Mobile Devices Taniya Mishra Srinivas Bangalore AT&T Labs-Research AT&T Labs-Research 180 Park Ave 180 Park Ave Florham Park, NJ Florham Park, NJ [email protected] [email protected] Abstract for the user to pose her query in natural language and presenting the relevant answer(s) to her question, we Mobile devices are becoming the dominant expect the user’s information need to be fulfilled in mode of information access despite being a shorter period of time. cumbersome to input text using small key- boards and browsing web pages on small We present a speech-driven question answering screens. We present Qme!, a speech-based system, Qme!, as a solution toward addressing these question-answering system that allows for two issues. The system provides a natural input spoken queries and retrieves answers to the modality – spoken language input – for the users questions instead of web pages. We present to pose their information need and presents a col- bootstrap methods to distinguish dynamic lection of answers that potentially address the infor- questions from static questions and we show mation need directly. For a subclass of questions the benefits of tight coupling of speech recog- that we term static questions, the system retrieves nition and retrieval components of the system. the answers from an archive of human generated an- swers to questions. This ensures higher accuracy 1 Introduction for the answers retrieved (if found in the archive) and also allows us to retrieve related questions on Access to information has moved from desktop and the user’s topic of interest.
    [Show full text]