QASR: Spoken Using Semantic Role Labeling

Svetlana Stenchikova Dilek Hakkani-Tur¨ Gokhan Tur State University of New York AT&T Labs - Research Stony Brook, NY, 11794 Florham Park, NJ, 07932 [email protected] dtur,[email protected]

In this demo, we will present QASR - a ques- studied in the framework of the TREC confer- tion answering system that uses semantic role ences (TREC ). labeling. Given a question, QASR semantically In this work, we apply semantic role label- parses it, formulates a query, sends it to a search ing to the QA task. Semantic role labeling engine (in this case, Google), and semantically aims to identify the /argument rela- parses the candidate answers matching the se- tions within a sentence. In our approach we mantic argument in question. QASR supports process question and candidate sentences ex- multiple modalities for question answering. It tracted using a search engine, identify predi- is possible to write down the question using a cate/argument structure using the Assert sys- web or command-prompt interface, or speak the tem (Pradhan et al., 2005) which currently has question via speech interface. one of the highest reported performance among Question answering (QA) is the task of find- other semantic role labelers. Assert assigns ing a concise answer to a Natural Language tags to predicates and argument phrases: TAR- question. Question answering system uses a GET tag is assigned to the predicate, ARG0, search engine but differs from a web search task ARG1, ARG2..., ARGM-TMP, ARGM-LOC by the type of its input and output. Input to tags are assigned arguments. The meaning of a search engine is a query while a QA system an argument type depends on a class of the takes a natural language question as an input. predicate that it appears with, but generally For example, if one wants to find out “Who ARG0 serves as an agent and ARG1, ARG2... first broke the sound barrier?”, the user of a are objects. Arguments of the type ARGM- search engine would convert the question into TMP which represents temporal argument and a query like first AND broke AND “sound bar- ARGM-LOC - a locational argument are shared rier”, while the user of a QA system would type over all predicates. For example a sentence or say the question. The output of the QA sys- ”Nostradamus was born in 1503 in the south of tem is a concise answer: Yeager, in the case of France” gets tagged by the semantic role label- the above question while the search engine pro- ing program: [ARG1 Nostradamus] was [TAR- duces snippets with link to the web pages. GET born ] [ARGM-TMP in 1503] [ARGM- In this work we only focus on factoid ques- LOC in the south of France]. In this exam- tions. Answer to a factoid question is a single ple born is identified as a target predicate with or a short phrase. The above example is three arguments: object “Nostradamus”, tem- a factoid question, while “What is the meaning poral argument “in 1503” and locational argu-

of life” is not. Factoid QA has been extensively ment “in the south of France”. ¡ This research has been conducted at AT&T Labs - Research To demonstrate the importance of predi- cate/argument extraction for the QA task con- the speech interface, we will need a phone line. sider a question “Who created a comic strip We can always use our cell phones or head- Garfield?” and a candidate sentence: “Garfield phones, but since the acoustic models are tuned is a popular comic strip created by Jim Davis to perform optimally on land-line telephones, it featuring the cat Garfield ...” Without the would be the best choice. deep semantic processing and finding predi- cate/argument relations, we could extract the answer Jim Davis by creating an example- References specific template. It is not feasible to create V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur¨ , A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, templates for each anticipated predicate/answer and M. Saraclar. 2005. The at&t watson speech rec- candidate pair because the number of predi- ognizer. In Proceedings of the ICASSP, Philadelphia, cates covered by open-domain question answer- PA, March. ing system is unlimited as well as the varia- Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James H. tion of candidate sentences. With the knowl- Martin, and Daniel Jurafsky. 2005. Semantic role la- edge of the predicate/argument structure iden- beling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd tified by the semantic role labeler: Garfield is annual meeting (ACL-2005). [ARG1 a popular comic strip] [TARGET cre- ated ] [ARG0 by Jim Davis] featuring the cat G. Riccardi, R. Pieraccini, and E. Bocchieri. 1996. Stochastic automata for language modeling. Com- Garfield ... ARG0 corresponds to the “agent”, puter Speech and Language, 10:265–293. the answer to the “who” question is expected to be an agent, so we can extract Jim Davis as the Text retrieval conference (TREC). http://trec.nist.gov. answer. Voice extensible markup language (VoiceXML) version While using the speech interface, QASR uses 2.0. http://www.w3.org/TR/voicexml20/. AT&T Watson speech recognizer (Goffin et al., 2005) with VoiceXML (VXML ). The speech recognizer uses language models based on Variable N-gram Stochastic Automata (Ric- cardi et al., 1996). The acoustic models are subword unit based, with triphone context mod- eling and variable number of gaussians (4-24). The output of the ASR engine is then used as the input of the SLU component. We train the language models using the questions pro- duced by the TREC conferences and augmented them with questions gathered from the trivia websites. Unlike textual question answering, spoken question answering has not been exten- sively studied in the QA community, mostly in- formation retrieval, natural language process- ing, and artificial intelligence communities par- ticipated in TREC evaluations. During the search phase, this demo will need internet access to query the search engine used. In case, there is no internet access, we can show pre-queried questions. Furthermore, to present