
SpeechQoogle: An Open-Domain Question Answering System with Speech Interface Guoping Hu1, Dan Liu2, Qingfeng Liu2, Renhua Wang1 1. iFly Speech Lab, University of Science and Technology of China, Hefei 2. Research of Anhui USTC iFlyTEK Co., Ltd., Hefei [email protected] Abstract. In this paper, we propose a new and valuable research task: open- domain question answering system with speech interface, and first prototype (SpeechQoogle) is constructed with three separated modules: speech recognition, question answering (QA) and speech synthesis. Speech interface improves the utility of QA system, but also brings several new challenges, including 1) distorting effect of speech recognition error; 2) just one answer could be returned; and 3) the returned answer should be understandable just in speech. To conquer these challenges in SpeechQoogle’s construction, we first carefully choose FAQ-based question answering technique for QA module because of its inherent advances and 600,000 QA pairs are collected to support this technique. Then corresponding acoustic model and language model are particularly developed for speech recognition module which promotes the character ACC to 87.17%. Finally, in open-set testing the integrated prototype successfully answers 56.25% spoken questions, which is a quite satisfied and inspiring performance because many potential improving approaches are still unexploited. Keywords: Question Answering, Speech Interface, Speech Recognition, Speech Synthesis, Information Extraction 1 Introduction Question answering (QA) is a task to answer human’s natural language questions. The task has received a great deal of attention and great progression has been achieved in recent years, especially since the launch of the QA track at TREC in 1999. Introducing speech interface into QA system has promise for improving the utility of QA system on two aspects. First, user can communicate with the system just in the most convenient method: spoken language. Second, the speech interface enables QA system to be accessed through telephone or mobile from anywhere in anytime, while traditional QA system can only demoed or put into practical use by employing an input textbox on a web page and let user to input their query in natural language. Motivated by these advantages derived from speech interface, what we are going to do in this paper is trying to construct one open-domain question answering system with speech interface, named as SpeechQoogle, with three separated modules: speech recognition, question answering and speech synthesis. SpeechQoogle is the first prototype of QA system with speech interface to the best of our knowledge. Together with the improvement of convenience and availability, several new challenges are embedded in question answering system with speech interface, including: 1) distorting effect of speech recognition error; 2) just the one answer candidate could be replied to user; and 3) the answer should be syntactic and semantic understandable just in speech approach. These three challenges defeat traditional question answering techniques, including question type prediction and matching, syntactic or semantic parsing and some other question understanding techniques. Frequently asked questions (FAQ) based QA technique was proposed in recent years. In this technique, question answering is recast as a retrieval problem: given a question, the most relevant QA pair is retrieved from a large scale collection of QA pairs, and the answer part of the retrieved pair is returned to user as final answer for given question. Obviously, this FAQ-based QA technique can generate answer with syntactic and semantic integrity ensured and can tolerate more speech recognition errors. The scale and quality of QA pair collection is the most important thing for FAQ-based QA technique. Fortunately, the popularity of question/answering websites provides us a very easy way to collect QA pairs. In our experiments, about 600,000 high quality Chinese QA pairs are collected from just one question/answering website: http://zhidao.baidu.com. And with this support, our question answering module successfully answers 72.50% text questions in open-set testing. Speech recognition performance is still the biggest problem though proper question answering technique is employed. Therefore, corresponding acoustic and language model should be developed for the speech recognition module. We trained our acoustic model on a collected spontaneous speech corpus and trained our language model on the question part of the QA pair collection. Recognition experiment on 400 spoken questions shows that our character ACC achieves 87.17%, which is almost a high enough performance to support the next QA module. Several powerful toolkits are employed to build our final SpeechQoogle prototype. Lucene.NET [1] is employed as search engine in question answering module. Julius [14] is employed as LVCSR engine in speech recognition module and Mandarin TTS system InterPhonic 4.0 [22] is utilized in the speech synthesis module. Experimental result shows that 56.25% of spoken questions can be successfully answered by SpeechQoogle, which is a quite satisfied and inspiring responding performance. The rest of the paper is organized as follows. We review the related work in the next section. Then we emphasize both the values and challenges embedded in the question answering system with speech interface in section 3. The architecture of our SpeechQoogle system is described in section 4. Section 5 presents the experimental results. Finally, we conclude and summarize the future work in section 6. 2 Related Work Lots of research work is related to the question answering system with speech interface. We categorize them into three aspects: 2.1 Text-based Question Answering Text-based question answering has received a great deal of attention in recent years, especially since the launch of the QA track at TREC in 1999 [19]. In early years, the QA task was generally restricted to answering factoid questions [5, 8, 11 and 15], while answering the question beyond factoid was proposed and researched in the last few years [13 and 17]. The work in [13] presents a novel and practical approach to answer natural language questions by using the large number of FAQ pages available on the web. What we want to build is also an open domain and non-factoid question answering system, so we adopt this FAQ-based question answering technique in the question answering module of our SpeechQoogle system. Another trend of question answering is the popularity of question answering websites, e.g., http://zhidao.baidu.com and http://iask.sina.com.cn. These web sites provide platforms for millions of web users to share their knowledge in the methods of someone posting questions on web and waiting for someone else to answer. All the question/answering histories are available on the web sites. We just utilize these question/answering resources to complete our QA pair collection. 2.2 Speech-Driven Web Retrieval and Speech-Driven Question Answering Speech-driven web retrieval is to retrieve needed information from web via speech queries. This research is most motivated by expectation of providing solution for accessing internet just through telephone or mobile devices. Speech-driven web retrieval system first employs speech recognition to transcribe the spoken query into keywords sequence and then these keywords are sent to a traditional web retrieval system. How to return the retrieval result to user is leaving for future work or just display the top-ranked several short snippets or titles on the small screen of mobile devices. Several methods has been proposed to improve the speech recognition performance, including language models adaptation [10], mobile channel processing [6], and extracting key semantic term [18] etc. Speech-driven question answering was proposed on Japanese and a series of work has been published [2, 3 and 4] since 2002. In their work, speech recognition is integrated into traditional QA system to accept spoken question but the output of their system is still several ranked text snippets, while in SpeechQoogle, only the best answer should be synthesized into speech and returned to user. Without the speech output limitation, their question answering technique is still based on relevant passage retrieval and answer extraction, while FAQ-based QA technique is carefully chosen for SpeechQoogle to satisfy the limitation. Another difference between their research work to SpeechQoogle lies in that their research just focuses on language model tuning while we develop both corresponding acoustic model and language model to improve speech recognition performance, and also more techniques are investigated and employed on language model training in our SpeechQoogle system. 2.3 Spoken Dialog System and Speech-to-Speech Translation System Speech-driven dialog systems [21] and speech-to-speech translation system [16] have been intensively researched in the field of spoken language processing. However, state-of-the-art dialog systems only operate for specific-domain question answering dialogs and speech-to-speech translation system also just work in a very narrow domain, such as booking room and asking the way. But open-domain questions are expected to be handled in question answering with speech interface system, including our SpeechQoogle. The basic idea under speech-driven dialog system or speech-to- speech translation system is trying to understand specific-domain question according to rules or syntax
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-