Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Parsing Natural Language Conversations using Contextual Cues
Shashank Srivastava Amos Azaria Tom Mitchell Carnegie Mellon University Ariel University Carnegie Mellon University [email protected] [email protected] [email protected]
Abstract [U1] User: Compose an email [A1] Agent: Composing new email. Its fields In this work, we focus on semantic parsing of natu- are: recipient, subject and body [U2] User: Use mom’s email ral language conversations. Most existing methods [A2] Agent: I set the recipient field to for semantic parsing are based on understanding the ‘[email protected]’ semantics of a single sentence at a time. However, [U3] User: Set the subject to hello, please [A3] Agent: I set the subject field to ‘hello, understanding conversations also requires an un- please’ derstanding of conversational context and discourse [U4] User: Set the subject to hello, please structure across sentences. We formulate semantic [A4] Agent: I set the subject field to ‘hello’ [U5] User: Set the body to I’ll be late parsing of conversations as a structured prediction [A5] Agent: I set the body field to ‘I’ll be task, incorporating structural features that model late’ the ‘flow of discourse’ across sequences of utter- [U6] User: Send the email ances. We create a dataset for semantic parsing of [A6] Agent: I sent the email ... conversations, consisting of 113 real-life sequences of interactions of human users with an automated Figure 1: Example of a real-world interaction between a human email assistant. The data contains 4759 natural (User) and an automated email assistant (Agent). language statements paired with annotated logical forms. Our approach yields significant gains in per- formance over traditional semantic parsing. However, most methods for semantic parsing have focused on parsing single natural language sentences in isolation. While this is useful for environments consisting of one-shot 1 Introduction interactions of users with a system (e.g., running QA queries The ability to interact with computers using natural language on a database), many settings of human-computer interac- is a fundamental challenge towards building intelligent cog- tions require extended interactions between a user and an au- nitive agents. The problem of language understanding is tomated assistant (e.g. making a flight booking). This makes usually approached using semantic parsing, a growing area the one-shot parsing model inadequate for many scenarios. within the field of natural language processing (NLP). Se- Figure 1 shows a snippet of a conversation between a hu- mantic parsing is the conversion of natural language utter- man user and a digital email assistant. We note that pragmat- ances to formal semantic representations called logical forms ics and conversational context offer essential cues in under- that machines can execute. For example, a natural lan- standing several individual utterances from the user. In partic- guage sentence like ‘Set the subject of the mail as hello’ ular, observe that utterance U2 (‘Set to mom’s email’) cannot may be mapped to a logical form such as setFieldValue( be correctly parsed based on its content alone, but requires an subject, stringVal(‘hello’)), which can then be ex- understanding of the discourse. Based on the previous state- ecuted to yield the desired behavior or output. Similarly, in ment (composing a new email), setting the recipient field is the domain of mathematics, ‘What is the product of three and the user’s likely intent. Similarly, utterances U3 and U4 show five?’ may be mapped to a logical form such as multiply( an example of a repetition, where the agent first misinterprets 3, 5)1. Semantic parsing has previously been explored in an (U3), and then correctly parses a statement (U4). While pars- eclectic variety of settings such as querying from databases ing U4, the agent needs to implicitly understand that it should [Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; interpret the current utterance to a different logical form than Berant et al., 2013], robot navigation [Kate et al., 2005] and before (even though the textual content is identical). This spreadsheet manipulation [Gulwani and Marron, 2014]. would not be possible in the one-shot parsing setting, which cannot incorporate such implicit feedback. Instead, correctly 1Predicates used in logical forms (such as stringVal and interpreting the sentence requires modeling of the discourse multiply) come from domain-specific meaning representations structure of the conversation.
4089 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
We address the problem of semantic parsing of natural lan- scripts [Schank and Abelson, 1977; Chambers and Jurafsky, guage conversations. The underlying thesis is that modeling 2008; Pichotta and Mooney, 2015]. These works induce typi- discourse structure and conversational context should assist cal trajectories of event sequences from unlabeled text to infer interpretation of language [Van Dijk, 1980]. Here, we focus what might happen next. on incorporating structural cues for modeling these discourse On the other hand, a notable application area that has ex- structures and context. However, we do not address issues plored conversational context within highly specific settings such as anaphora and discourse referents [Kamp and Reyle, is state tracking in dialog systems. Here, the focus is on in- 1993] that are prevalent in conversations, but lie beyond the ferring the state of a conversation given all previous dialog scope of the current work. Our main contributions are: history [Higashinaka et al., 2003; Williams et al., 2013] in context of specific task trajectories, rather than interpreting We address semantic parsing in the context of conversa- the semantic meanings of individual utterances. • tions; and provide an annotated dataset of conversations, In terms of approach, while our formulation is largely ag- comprising of 4759 natural language statements with their nostic to the choice of semantic parsing framework, for this associated logical forms. work our method is based on CCG semantic parsing, which We formulate the problem as a structured prediction task, is a popular semantic parsing approach [Zettlemoyer and • and introduce a latent variable model that incorporates both Collins, 2007; Kwiatkowski et al., 2013; Artzi et al., 2015]. text-based cues within sentences, and structural inferences The CCG grammar formalism [Steedman and Baldridge, across sentences. Using this model, we empirically demon- 2011] has been widely used for its explicit pairing of syntax strate significant improvements in parsing conversations with semantics, and allows expression of long range depen- over the state-of-the-art. dencies extending beyond context-free-grammars. We also We also present effective heuristic strategies for expanding allow our semantic parser to output logical forms that may • the search space of allowable meaning representations to not be entailed from an utterance using a strict grammar for- improve semantic parsing of conversations. malism, expanding on similar ideas in [Wang et al., 2015; Goldwasser and Roth, 2014]. Finally, some of the heuris- We show that latent categories learned by the model are • tics proposed in this paper for improving parsing performance semantically meaningful, and can be interpreted in terms are motivated by previous work on paraphrasing for semantic of discourse states in the conversations. parsing [Berant and Liang, 2014]. 2 Related Work 3 Semantic Parsing with Conversational Supervised semantic parsing has been studied in a wide Context range of settings [Zettlemoyer and Collins, 2005; Wong and Mooney, 2007; Kwiatkowski et al., 2010]. Recent ap- We present an approach for semantic parsing of conversations proaches have focused on various strategies for using weaker by posing conversations as sequences of utterances to model forms of supervision [Clarke et al., 2010; Krishnamurthy and ‘flow of discourse’. We consider the problem as a structured Mitchell, 2012; Berant et al., 2013] and rapid prototyping prediction task, where we jointly learn preferences for collec- of semantic parsers for new domains [Wang et al., 2015; tive assignments of logical forms for sentences in a sequence. Pasupat and Liang, 2015]. Other works have explored se- s1 st 1 st sT mantic parsing in a grounded contexts, and using perceptual − context to assist semantic parsing [Matuszek et al., 2012; Krishnamurthy and Kollar, 2013]. However, none of these approaches incorporate conversational context to jointly in- z1 z1 zt 1 zt zT zT terpret a conversational sequence. For instance, poor interpre- − tations made while parsing at a point in a conversation cannot be re-evaluated in light of more incoming information. A no- table work that incorporates conversational data in relation to l1 lt 1 lt lT semantic parsing is [Artzi and Zettlemoyer, 2011]. However, − rather that incorporating contextual cues, their goal is very N different: using rephrasings in conversation logs as weak su- Figure 2: Model diagram for semantic parsing of conversational se- pervision for inducing a semantic parser. Closer to our work quences. Traditional semantic parsing features depend on utterances is previous work by [Zettlemoyer and Collins, 2009], who st and associated logical forms lt only. Our model additionally al- also learn context-sensitive interpretations of sentences using lows structured features that can depend on previous logical forms a two-step model. However, their formulation is specific to lt−1, latent variables zt representing the discourse state of the con- CCG grammars and focuses on modeling discourse referents. versation at any step, and the previous utterances s1 . . . st. The role of context in assigning meaning to language has been emphasized from abstract perspectives in compu- Let s denote a conversation sequence of T utterances by tational semantics [Bates, 1976; Van Dijk, 1980], as well as a user, with individual utterances denoted as s1 . . . sT . { } in systems for task-specific applications [Larsson and Traum, Let l := l1 . . . lT be the intended logical forms for cor- 2000]. Examples of the former include analyzing language responding{ utterances.} We assume a supervised learning from perspectives of speech acts [Searle, 1969] and semantic setting where we have labeled training sequences := T
4090 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)