Adapting Dependency Parsing to Spontaneous Speech for Open Domain Spoken Language Understanding

INTERSPEECH 2014 Adapting dependency parsing to spontaneous speech for open domain spoken language understanding Frederic Bechet, Alexis Nasr, Benoit Favre Aix Marseille Universite, CNRS-LIF Abstract added to the transcription in order to convey speakers in- Parsing human-human conversations consists in automatically tention (sentiment, behaviour, polarity). enriching text transcription with semantic structure information. Semantic parsing is the process of producing semantic in- We use in this paper a FrameNet-based approach to semantics terpretations from words and other linguistic events that are that, without needing a full semantic parse of a message, goes automatically detected in a text conversation or a speech sig- further than a simple flat translation of a message into basic con- nal. Many semantic models have been proposed, ranging from cepts. FrameNet-based semantic parsing may follow a syntactic formal models encoding “deep” semantic structures to shallow parsing step, however spoken conversations in customer service ones considering only the main topic of a document and its main telephone call centers present very specific characteristics such concepts or entities. We will use in this study a FrameNet- as non-canonical language, noisy messages (disfluencies, rep- based approach to semantics that, without needing a full se- etitions, truncated words or automatic speech transcription er- mantic parse of a message, goes further than a simple flat trans- rors) and the presence of superfluous information. For syntactic lation of a message into basic concepts: FrameNet-based se- parsing the traditional view based on context-free grammars is mantic parsers detect in a sentence the expression of frames and not suitable for processing non-canonical text. New approaches their roles. Because frames and roles abstract away from syn- to parsing based on dependency structures and discriminative tactic and lexical variation, FrameNet semantic analysis gives machine learning techniques are more adapted to process spon- enhanced access to the meaning of texts: (of the kind “who taneous speech for two main reasons: (a) they need less training does what, and how where and when ?”). data and (b) the annotation with syntactic dependencies of con- FrameNet-based semantic parsing is often based on a syn- versation transcripts is simpler than with syntactic constituents. tactic parsing step. However, for processing noncanonical text Another advantage is that partial annotation can be performed. such as automatic speech transcripts, the traditional view of This paper presents the adaptation of a syntactic dependency parsing based on context-free grammars is not suitable: due parser to process very spontaneous speech recorded in a call- to ungrammatical structures in this kind of text, writing a gen- centre environment. This parser is used in order to produce erative grammar and annotating transcripts with that grammar FrameNet candidates for characterizing conversations between is difficult. New approaches to parsing based on dependency an operator and a caller. structures and discriminative machine learning techniques [1] Index Terms: dependency parsing, FrameNet, spoken lan- are more appropriate for two main reasons: (a) they need less guage understanding, spontaneous speech. training data and (b) the annotation with syntactic dependencies of conversation transcripts is simpler than with syntactic 10.21437/Interspeech.2014-39 1. Introduction constituents. Using dependency parsing for speech processing has been Parsing human-human conversations consists in automatically proposed in previous studies ([2, 3]), however the problem of enriching text transcription with semantic structure informa- the adaptation of a dependency parser to the specificities of tion. Such information includes sentence boundaries, syntac- speech transcripts, manual or automatic, of spontaneous real- tic and semantic parse of each sentence, para-semantic traits world speech remains an open problem. related to several paralinguistic dimensions (emotion, polarity, This paper describes the adaptation process of a depen- behavioural patterns). Spoken conversations in customer ser- dency parser to spontaneous speech in order to perform open vice telephone call centers present specific characteristics such domain Spoken Language Understanding thanks to a FrameNet as: approach. We will present why it is crucial to adapt parsers non-canonical language: spontaneous spoken conver- that are originally trained on written text to the specificities of • sations represent different levels of language than the spontaneous speech on manual transcriptions containing dis- “canonical” one used in written text such as newspaper fluencies, and discuss the usefuleness of this approach to per- articles; form open-domain SLU on ASR transcriptions even with a high “noisy messages”: spoken conversation transcriptions WER. All the experiments have been carried on the RATP- • may contain disfluencies, repetitions, truncated words or DECODA corpus containing recordings of conversations in the automatic speech transcription errors; Paris public transport authority call-centre. superfluous information: redundancy and digression • make conversation messages prone to contain superflu- 2. Related work ous information that needs to be discarded; Many methods have been proposed for limited domain SLU, conversation transcripts are not self-sufficient: for spo- following early works on the ATIS corpus (see [4] for a review • ken messages, even with a perfect transcription, supra- of SLU methods and models). Regardless of the paradigm cho- segmental information (prosody, voice quality) has to be sen for performing SLU (parsing, classification, sequence la- Copyright © 2014 ISCA 135 14-18 September 2014, Singapore belling), the domain-ontology concepts and relations are always pipeline. Parsing is traditionally tightly connected to rewrit- directly predicted from the ASR word transcriptions, sometimes ing grammars, usually context free grammars, used together with features coming from a linguistic analysis based on generic with a disambiguation model. Many current state-of-the-art text syntactic or semantic models. For open-domain SLU, it is nec- parsers are built on this model, such as [10]. Shallow syntac- essary to choose an abstract level of representation that can be tic processes, including part-of-speech and syntactic chunk tag- applied to a large range of domains and applications, therefore ging, are usually performed in the first stage. This traditional syntactic and semantic models developed in the Natural Lan- view of parsing based on context-free grammars is not suit- guage Processing community for processing text input are good able for processing non-canonical text such as automatic speech candidates. transcripts: due to ungrammatical structures in this kind of text, As presented in the introduction, we choose a FrameNet ap- writing a generative grammar and annotating transcripts with proach to semantic in this paper. FrameNet parsing is tradition- that grammar is difficult. naly decomposed into the following subtasks (whether applied New approaches to parsing based on dependency structures sequentially or not): and discriminative machine learning techniques [11] are much easier to adapt to non-canonical text for two main reasons: they trigger identification: find the words that express frames. need less training data and the annotation with syntactic depen- • For instance in ”she declared to her friend that she was dencies of spoken transcripts is simpler than with syntactic con- going out”. The target word ”declared” is identified. stituents. Other advantages are the fact that partial annotation trigger classification: assign the relevant frame in con- can be performed [2] and the parses generated are much closer • text (assign the frame STATEMENT to the trigger ”de- to meaning than constituent trees, which eases semantic inter- clared”) pretation. role filler identification: find/segment the expressions For the second issue of ASR errors and syntactic parsing, • that may fill a frame role (”she”, ”to her friend” and ”that most of the work have addressed this problem from a different she was going out” should be identified as potential role point of view: using syntactic features during ASR to help re- fillers ducing Word Error Rate (WER). This can be done by directly integrating parsing and ASR language models [12] or keeping role filler classification: assign the roles to the role fillers • them as separate processes through a reranking approach using candidates (”she”, ”to her friend” and ”that she was go- both ASR and parsing features [3, 13]. The improvement in ing out” play respectively the Speaker, Addressee and ASR transcriptions obtained by adding syntactic features to the Message roles, defined for the frame STATEMENT models is often rather small, however the structure and the re- The last two subtasks are generally referred to as ” seman- lations between words obtained through parsing can be of great tic role labeling ” (SRL), though this term is more general and interest for the SLU processes, even without a significant de- includes SRL with other roles than that of FrameNet, in partic- crease of WER. ular PropBank roles. [5] presented the first study on role filler classification: they proposed a probabilistic classifier that, given 3. A corpus of call-centre conversations an English sentence, a lexical trigger within that sentence and 1 the (gold) corresponding frame, assigns FrameNet roles to syn-

Load more