A Speech-Based Question-Answering System on Mobile Devices
Total Page:16
File Type:pdf, Size:1020Kb
Qme! : A Speech-based Question-Answering system on Mobile Devices Taniya Mishra Srinivas Bangalore AT&T Labs-Research AT&T Labs-Research 180 Park Ave 180 Park Ave Florham Park, NJ Florham Park, NJ [email protected] [email protected] Abstract for the user to pose her query in natural language and presenting the relevant answer(s) to her question, we Mobile devices are becoming the dominant expect the user’s information need to be fulfilled in mode of information access despite being a shorter period of time. cumbersome to input text using small key- boards and browsing web pages on small We present a speech-driven question answering screens. We present Qme!, a speech-based system, Qme!, as a solution toward addressing these question-answering system that allows for two issues. The system provides a natural input spoken queries and retrieves answers to the modality – spoken language input – for the users questions instead of web pages. We present to pose their information need and presents a col- bootstrap methods to distinguish dynamic lection of answers that potentially address the infor- questions from static questions and we show mation need directly. For a subclass of questions the benefits of tight coupling of speech recog- that we term static questions, the system retrieves nition and retrieval components of the system. the answers from an archive of human generated an- swers to questions. This ensures higher accuracy 1 Introduction for the answers retrieved (if found in the archive) and also allows us to retrieve related questions on Access to information has moved from desktop and the user’s topic of interest. For a second subclass of laptop computers in office and home environments questions that we term dynamic questions, the sys- to be an any place, any time activity due to mo- tem retrieves the answer from information databases bile devices. Although mobile devices have small accessible over the Internet using web forms. keyboards that make typing text input cumbersome The layout of the paper is as follows. In Section 2, compared to conventional desktop and laptops, the we review the related literature. In Section 3, we ability to access unlimited amount of information, illustrate the system for speech-driven question an- almost everywhere, through the Internet, using these swering. We present the retrieval methods we used devices have made them pervasive. to implement the system in Section 4. In Section 5, Even so, information access using text input on we discuss and evaluate our approach to tight cou- mobile devices with small screens and soft/small pling of speech recognition and search components. keyboards is tedious and unnatural. In addition, by In Section 6, we present bootstrap techniques to dis- the mobile nature of these devices, users often like tinguish dynamic questions from static questions, to use them in hands-busy environments, ruling out and evaluate the efficacy of these techniques on a the possibility of typing text. We address this issue test corpus. We conclude in Section 7. by allowing the user to query an information repos- itory using speech. We expect that spoken language 2 Related Work queries to be a more natural and less cumbersome way of information access using mobile devices. Early question-answering (QA) systems, such as A second issue we address is related to directly Baseball (Green et al., 1961) and Lunar (Woods, and precisely answering the user’s query beyond 1973) were carefully hand-crafted to answer ques- serving web pages. This is in contrast to the current tions in a limited domain, similar to the QA approach where a user types in a query using key- components of ELIZA (Weizenbaum, 1966) and words to a search engine, browses the returned re- SHRDLU (Winograd, 1972). However, there has sults on the small screen to select a potentially rele- been a resurgence of QA systems following the vant document, suitably magnifies the screen to view TREC conferences with an emphasis on answering the document and searches for the answer to her factoid questions. This work on text-based question- question in the document. By providing a method answering which is comprehensively summarized 55 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pages 55–63, Los Angeles, California, June 2010. c 2010 Association for Computational Linguistics in (Maybury, 2004), range widely in terms of lin- mation repositories that are behind web forms – and guistic sophistication. At one end of the spectrum, provide a unified meta-interface to such informa- There are linguistically motivated systems (Katz, tion sources, for example, web sites related travel, 1997; Waldinger et al., 2004) that analyze the user’s or car dealerships. Dynamic questions can be seen question and attempt to synthesize a coherent an- as providing a natural language interface (NLI) to swer by aggregating the relevant facts. At the other such web forms, similar to early work on NLI to end of the spectrum, there are data intensive sys- databases (Androutsopoulos, 1995). tems (Dumais et al., 2002) that attempt to use the redundancy of the web to arrive at an answer for 3 Speech-driven Question Retrieval factoid style questions. There are also variants of System such QA techniques that involve an interaction and We describe the speech-driven query retrieval appli- use context to resolve ambiguity (Yang et al., 2006). cation in this section. The user of this application In contrast to these approaches, our method matches provides a spoken language query to a mobile device the user’s query against the questions in a large cor- intending to find an answer to the question. Some pus of question-answer pairs and retrieves the asso- example users’ inputs are1 what is the fastest ani- ciated answer. mal in water, how do I fix a leaky dishwasher, why In the information retrieval community, QA sys- is the sky blue. The result of the speech recognizer tems attempt to retrieve precise segments of a doc- is used to search a large corpus of question-answer ument instead of the entire document. In (To- pairs to retrieve the answers pertinent to the user’s muro and Lytinen, 2004), the authors match the static questions. For the dynamic questions, the an- user’s query against a frequently-asked-questions swers are retrieved by querying a web form from (FAQ) database and select the answer whose ques- the appropriate web site (e.g www.fandango.com for tion matches most closely to the user’s question. movie information). The result from the speech rec- An extension of this idea is explored in (Xue et al., ognizer can be a single-best string or a weighted 2008; Jeon et al., 2005), where the authors match the word lattice.2 The retrieved results are ranked using user’s query to a community collected QA archive different metrics discussed in the next section. In such as (Yahoo!, 2009; MSN-QnA, 2009). Our ap- Figure 2, we illustrate the answers that Qme!returns proach is similar to both these lines of work in spirit, for static and dynamic quesitons. although the user’s query for our system originates as a spoken query, in contrast to the text queries in previous work. We also address the issue of noisy Q&A corpus speech recognition and assess the value of tight in- tegration of speech recognition and search in terms Search Speech 1−best ASR of improving the overall performance of the system. Ranked Results Lattice Classify Match Rank A novelty in this paper is our method to address dy- Dynamic Retrieve namic questions as a seamless extension to answer- from Web ing static questions. Also related is the literature on voice-search ap- Figure 1: The architecture of the speech-driven question- plications (Microsoft, 2009; Google, 2009; Yellow- answering system Pages, 2009; vlingo.com, 2009) that provide a spo- ken language interface to business directories and return phone numbers, addresses and web sites of 4 Methods of Retrieval businesses. User input is typically not a free flowing We formulate the problem of answering static natural language query and is limited to expressions questions as follows. Given a question-answer with a business name and a location. In our system, archive QA = {(q1, a1), (q2, a2),..., (qN , aN )} users can avail of the full range of natural language expressions to express their information need. 1The query is not constrained to be of any specific question And finally, our method of retrieving answers to type (for example, what, where, when, how). 2For this paper, the ASR used to recognize these utterances dynamic questions has relevance to the database and incorporates an acoustic model adapted to speech collected meta search community. There is growing interest from mobile devices and a four-gram language model that is in this community to mine the “hidden” web – infor- built from the corpus of questions. 56 aggregated score for a document d using a un- igram model of the query and the document is given as in Equation 1. For a given query, the documents with the highest total term weight are presented as retrieved results. Terms can also be defined as n-gram sequences of a query and a document. In our experiments, we have used up to 4-grams as terms to retrieve and rank documents. X Score(d) = tfw,d × idfw (1) w∈Q Figure 2: Retrieval results for static and dynamic ques- 2. String Comparison Metrics: Since the length tions using Qme! of the user query and the query to be retrieved are similar in length, we use string compar- of N question-answer pairs, and a user’s ques- ison methods such as Levenshtein edit dis- r tance (Levenshtein, 1966) and n-gram overlap tion qu, the task is to retrieve a subset QA = r r r r r r (BLEU-score) (Papineni et al., 2002) as simi- {(q1, a1), (q2, a2),..., (qM , aM )} M << N us- ing a selection function Select and rank the mem- larity metrics.