Topic Aware Neural Response Generation

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Topic Aware Neural Response Generation Chen Xing,1,2∗ Wei Wu,4 Yu Wu, 3 Jie Liu,1,2 † Yalou Huang,1,2 Ming Zhou,4 Wei-Ying Ma4 1College of Computer and Control Engineering, Nankai University, Tianjin, China 2College of Software, Nankai University, Tianjin, China 3State Key Lab of Software Development Environment, Beihang University, Beijing, China 4Microsoft Research, Beijing, China {v-chxing, wuwei, v-wuyu, mingzhou, wyma}@microsoft.com {jliu,huangyl}@nankai.edu.cn Abstract and Li 2015; Sordoni et al. 2015a) from large scale social conversation data. Recently, neural network based meth- We consider incorporating topic information into a sequence- ods have become mainstream because of their capability to to-sequence framework to generate informative and inter- capture semantic and syntactic relations between messages esting responses for chatbots. To this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The and responses in a scalable and end-to-end way. Sequence- model utilizes topics to simulate prior human knowledge that to-sequence (Seq2Seq) with attention (Bahdanau, Cho, and guides them to form informative and interesting responses in Bengio 2014; Cho, Courville, and Bengio 2015) represents conversation, and leverages topic information in generation a state-of-the-art neural network model for response genera- by a joint attention mechanism and a biased generation prob- tion. To engage people in conversation, the response gener- ability. The joint attention mechanism summarizes the hidden ation algorithm in a chatbot should generate responses that vectors of an input message as context vectors by message at- are not only natural and fluent, but also informative and in- tention and synthesizes topic vectors by topic attention from teresting. MT models such as Seq2Seq with attention, how- the topic words of the message obtained from a pre-trained ever, tend to generate trivial responses like “me too”, “I see”, LDA model, with these vectors jointly affecting the genera- or “I don’t know” (Li et al. 2015) due to the high frequency tion of words in decoding. To increase the possibility of topic words appearing in responses, the model modifies the gen- of these patterns in data. Although these responses are safe eration probability of topic words by adding an extra proba- for replying to many messages, they are boring and carry bility item to bias the overall distribution. Empirical studies little information. Such responses may quickly lead the con- on both automatic evaluation metrics and human annotations versation between human and machine to an end, severely show that TA-Seq2Seq can generate more informative and in- hurting the user experience of a chatbot. teresting responses, significantly outperforming state-of-the- art response generation models. In this paper, we study the problem of response generation for chatbots. Particularly, we target the generation of informative and interesting responses that can help chatbots Introduction engage their users. Unlike Li et al. (Li et al. 2015) who try to passively avoid generating trivial responses by penaliz- Human-computer conversation is a challenging task in AI ing their generation probabilities, we consider solving the and NLP. Existing conversation systems include task ori- problem by actively bringing content into responses by top- ented dialog systems (Young et al. 2013) and non task ori- ics. Given an input message, we predict possible topics that ented chatbots. Dialog systems aim to help people complete can be talked about in responses, and generate responses for specific tasks such as ordering and tutoring, while chatbots the topics. The idea is inspired by our observation on con- are designed for realizing natural and human-like conversa- versations between humans. In human-human conversation, tion with people regarding a wide range of issues in open do- people often associate an input message with topically re- mains (Perez-Marin 2011). Although previous research fo- lated concepts in their mind. Based on the concepts, they cused on dialog systems, recently, with the large amount of organize content and select words for their responses. For conversation data available on the Internet, chatbots are be- example, to reply to “my skin is so dry”, people may think coming a major focus of both academia and industry. it is a “skin” problem and can be alleviated by “hydrating” A common approach to building the conversation engine and “moisturizing”. Based on this knowledge, they may give in a chatbot is learning a response generation model within more informative responses like “then hydrate and moistur- a machine translation (MT) framework (Ritter, Cherry, and ize your skin” rather than trivial responses like “me too”. Dolan 2011; Sutskever, Vinyals, and Le 2014; Shang, Lu, The informative responses could let other people follow the ∗The work was done when the first author was an intern in Mi- topics and continue talking about skin care. “Skin”, “hy- crosoft Research Asia. drate”, and “moisturize” are topical concepts related to the †He is the corresponding author. message. They represent people’s prior knowledge in con- Copyright c 2017, Association for the Advancement of Artificial versation. In responding, people will bring content that is Intelligence (www.aaai.org). All rights reserved. relevant to the concepts to their responses and even directly 3351 use the concepts as building blocks to form their responses. and then the decoder estimates the generation probability of We consider simulating the way people respond to mes- Y with c as input. The objective function of Seq2Seq can be sages with topics, and propose a topic aware sequence-to- written as sequence (TA-Seq2Seq) model in order to leverage topic in- T formation as prior knowledge in response generation. TA- p(y1, ..., yT |x1, ..., xT )=p(y1|c) p(yt|c,y1, ..., yt−1). Seq2Seq is built on the sequence-to-sequence framework. t=2 In encoding, the model represents an input message as hid- c den vectors by a message encoder, and acquires embeddings The encoder RNN calculates the context vector by of the topic words of the message from a pre-trained Twit- ht = f(xt, ht−1); c = hT , ter LDA model. The topic words are used as a simulation h t f of topical concepts in people’s minds, and obtained from a where t is the hidden state at time and is a non-linear Twitter LDA model which is pre-trained using large scale transformation which can be either a long-short term mem- social media data outside the conversation data. In decod- ory unit (LSTM) (Hochreiter and Schmidhuber 1997) or a gated recurrent unit (GRU) (Cho et al. 2014). In this work, ing, each word is generated according to both the message f and the topics through a joint attention mechanism. In joint we implement using GRU which is parameterized as z z attention, hidden vectors of the message are summarized as z = σ(W xt + U ht−1) r r context vectors by message attention which follows the ex- r = σ(W xt + U ht−1) isting attention techniques, and embeddings of topic words s tanh Wsx Us h ◦ r (1) are synthesized as topic vectors by topic attention. Differ- = ( t + ( t−1 )) ent from existing attention, in topic attention, the weights of ht =(1− z) ◦ s + z ◦ ht−1 the topic words are calculated by taking the final state of the The decoder is a standard RNN language model except when message as an extra input in order to strengthen the effect of conditioned on the context vector c. The probability distri- the topic words relevant to the message. The joint attention bution pt of candidate words at every time t is calculated lets the context vectors and the topic vectors jointly affect as response generation, and makes words in responses not only relevant to the input message, but also relevant to the corre- st = f(yt−1, st−1, c); pt = softmax(st,yt−1) lated topic information of the message. To model the behav- where st is the hidden state of the decoder RNN at time t ior of people using topical concepts as “building blocks” of and yt−1 is the word at time t − 1 in the response sequence. their responses, we modify the generation probability of a topic word by adding another probability item which biases Attention mechanism the overall distribution and further increases the possibility The traditional Seq2Seq model assumes that every word is of the topic word appearing in the response. generated from the same context vector. In practice, how- We conduct an empirical study on large scale data crawled ever, different words in Y could be semantically related to from Baidu Tieba, and compare different methods with both different parts of X. To tackle this issue, attention mecha- automatic evaluation and human judgment. The results on nism (Bahdanau, Cho, and Bengio 2014) is introduced into both automatic evaluation metrics and human annotations Seq2Seq. In Seq2Seq with attention, each yi in Y corre- show that TA-Seq2Seq can generate more informative, di- sponds to a context vector ci, and ci is a weighted average verse, and topic relevant responses and significantly outper- T of all hidden states {ht}t=1 of the encoder. Formally, ci is forms state-of-the-art methods for response generation. defined as The contributions of this paper include 1) a proposal for T ci =Σj=1αijhj, (2) using topics as prior knowledge for response generation; 2) α a proposal for a TA-Seq2Seq model that naturally incorpo- where ij is given by rates topic information into the encoder-decoder structure; exp eij) α ( e η s , h 3) empirical verification of the effectiveness of TA-Seq2Seq.

Topic Aware Neural Response Generation

WELDA: Enhancing Topic Models by Incorporating Local Word Context

Generative Topic Embedding: a Continuous Representation of Documents

Linked Data Triples Enhance Document Relevance Classification

Document and Topic Models: Plsa

Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model

Topic Modeling: Beyond Bag-Of-Words

Semantic N-Gram Topic Modeling

Modeling Topical Relevance for Multi-Turn Dialogue Generation

Text Analysis of QC/Issue Tracker by Natural Language Processing (NLP)

Topic Modeling Approaches for Understanding COVID-19

Distributional Approaches to Semantic Analysis

Topic Modeling for Natural Language Understanding