Clustering-Based Summarization of Transactional Chatbot Logs

Clustering-based Summarization of Transactional Chatbot Logs Luxun Xu Vagelis Hristidis Nhat X.T. Le Computer Science & Engineering Computer Science & Engineering Computer Science & Engineering University of California, Riverside University of California, Riverside University of California, Riverside Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Transactional chatbots have become popular today, as they can automate repetitive transactions such as making an appointment or buying a ticket. As users interact with a chatbot, rich chat logs are generated to evaluate and improve the effectiveness of the chatbot, which is the ratio of chats that lead to a successful state such as buying a ticket. A fundamental operation to achieve such analyses is the clustering of the chats in the chat log, which requires effective distance functions between a pair of chat sessions. In this paper, we propose and compare various distance measures for individual messages as well as whole sessions. We evaluate these measures using user studies on Mechanical Turk, where we first ask users to use our chatbots, and then ask them to judge the similarity of messages and sessions. Finally, we provide anecdotal results showing that our distance functions are effective in clustering messages and sessions. I. INTRODUCTION In the last years, chatbots (or conversational bots) have gained popularity thanks to advances in deep learning and natural language understanding. There has been much work Fig. 1: Workflow of a transactional chatbot – especially recently, using deep learning techniques – on how to train chatbots, using a training set of conversations (see SectionVI). Most of these systems’ goal is to generate a bot message that is meaningful given the chat history. These We consider a comprehensive framework of transactional systems are effective and appropriate for several applications chatbots, as illustrated in Figure1. In this framework, an such as providing basic information (e.g., a store’s opening administrator designs and deploys the chatbot. Once a fair hours), or never-ending chitchat conversations. Recently, there amount of users have used the chatbot, a log analysis en- is also an increased interest in transactional chatbots, which gine will analyze their chat logs to provide feedback to the aim to complete a specific transaction instead of just keeping administrator. Based on the feedback, the administrator can a reasonable conversation going. Typical transactions include refine the chatbot logic to serve users better. In this paper, we making a medical transportation reservation, scheduling a focus on the log analysis engine, which summarizes the chat meeting, or connecting a patient to a suitable care provider. logs into clusters of similar chat sessions, in order to help the Several real-world examples of such chatbots are Kayak, Dom administrator on how to improve the chatbots. For instance, a (from Domino’s) and Babylon Health (symptoms checker). cluster of similar erroneous user responses on the bot question Note that open-ended informational bots, such as IBM of delivery address helps the administrator re-formulate the bot Watson, are outside the scope of this paper. Such bots typically question into a clearer form. input a large knowledge base or even text documents, and The log analysis engine clusters chat logs at two levels: user offer question-answering capabilities, e.g., “what disease is responses and whole chat sessions. To achieve good clustering associated with finger redness?” In contrast, we focus on performance we propose novel distance functions, customized multi-step transactional scenarios, where a user is trying to to chat log data. Our distance functions for user responses achieve a goal, such as booking a visit to a doctor office or utilize a combination of textual, information extraction and making a food delivery order. conversational features, well-tuned by Information Theoretic A limitation of current transactional chatbots is that there Metric Learning technique. The chat session distance functions are no effective methods to utilize the chat logs to continuously use the aforementioned response distance function as a module improve future users’ experience. and employ sequence alignment techniques to consider the conversation order. However, such bots generally do not support easy branching We conduct user studies on Amazon Mechanical Turk capabilities. (MTurk), where we first ask workers to use our chatbots to collect chat logs, and then ask them to compare pairs of responses or sessions with respect to their distance. We compare our distance functions against several baselines. For responses, our distance function achieves 0.82 overall accuracy over baseline’s 0.73. For sessions, our similarity function Fig. 2: Information Extraction engine also achieves an overall accuracy of 0.85, higher than other measures. This paper has the following contributions. • We propose a novel distance function for chat responses to a bot question using textual, Information Extraction (IE), sentiment, and user interaction history (response time and attempts) features. • We extend the response distance function to work for whole chat sessions. • We present experiments on the analysis of chat logs we collected through MTurk, which shows that our distance functions outperform baselines. • We publish the chat logs that we collected through MTurk, as well as the pairwise labels of responses and chat sessions 1. The rest of the paper is structured as follows. In SectionII, we discuss the architecture of the chatbot platform and the problem definition. We propose our distance functions for user responses and whole sessions in Section III andIV respectively. In SectionV, we present our experimental results and followed by related work in SectionVI and conclusions Fig. 3: Food ordering bot flowchart in Section VII. II. CHATBOT PLATFORM AND PROBLEM DEFINITION A. Transactional Chatbot Abstraction A transactional chatbot consists of (a) a logical flow descrip- tion that defines the flow of the chatbot, and (b) an Information Extraction (IE) engine that extracts entities from the user input at each step. The extracted entities and entity values (as shown in Figure2) are collected and subsequently determine the flow of the bot. For example, in Figure3, the IE engine tries to extract the entity “location” from user’s response to the first (root) chatbot message. If such an entity value is successfully extracted, the bot moves on to the next message. Otherwise, it repeats the same message until the expected entities and values are extracted. Moreover, the extracted entity values can also be used as branching conditions. For example, in Figure5, depending on the user’s response to Q 8, the bot can Fig. 4: Hobby bot flowchart develop through different branches based on the entity and value extracted. B. Chat Log Clustering Problem Note that slot-filling bots, like Amazon Lex, where the bot consists of a list of questions to fill slots – e.g., “what type To effectively cluster chat logs, we need an effective dis- of flowers?” or “when will you pick them up?” – can also tance function, which is the focus of this paper. We divide be viewed as an abstraction of transactional chatbots, where a this into two steps: first we compute the distance between two sequence of questions (called “prompt”) defines the bot logic, individual responses, and then use this function to compute similar to one in Figure3, and “slot type” defines the entity. the distance between two chat sessions. Compute the distance between two user responses: A user 1Data avalibale at https://github.com/LuxunXu/ChatbotDataset response is a user message in response to a bot message. We ing [1] to automatically learn appropriate weights and depen- dencies among these features. A. Response Features We model user responses by vectors of features (TableI) and use their Euclidean distance as response distance function. Statistical features capture basic structural information of the response. Information extraction (IE) features capture how confident or successful the IE engine extracts the entity. Conversational features embeds the user’s effectiveness and sentiment in responding to the bot. We later show that they are useful in session clustering. For example, “attempt” feature indicates the number of attempts of the current response to the bot message due to repetition (when bot is unable to understand the response). A high attempt number indicates poor response quality and implies that the bot message is poorly formulated. Lastly, we include a word embedding feature to capture semantic information of a response. We use Fig. 5: Branching bot flowchart fastText’s pre-trained English vectors [2] to construct a 300- dimensional vector for each response. view two responses as similar if they can be treated in the B. Mahalanobis Distance and Metric Learning same way by the administrator for the purpose of improving the chatbot. Take a bot message “What is your date of birth?” Euclidean distance is often not a good metric in high as an example. User responses can be considered similar in dimensions due to its lack of considering variable correlations. the following cases (not an exhaustive list): This issue can be solved by using Mahalanobis distance. Given d a set of n points x1; : : : ; xn in R , the (squared) Mahalanobis • If they are both “valid”, that is, if the IE engine correctly distance is given in Equation1[1]. processes one (such as successfully extracting entity or entity values), it will likely process the other correctly. T For example, “10/12/1983” and “1/12/2001” are both dA(xi; xj) = (xi − xj) A(xi − xj); (1) “valid” responses (so that the admin does not need to Practically, matrix A is re-weighting the features. By choosing do anything about them). a particular A matrix, we can boost or reduce certain features’ • If they are both “invalid” in the same fashion. For importance in the clustering. example, (i) both responses are completely irrelevant to We use the Information Theoretic Metric Learning (ITML) the bot message, e.g., “I have a headache”.

Clustering-Based Summarization of Transactional Chatbot Logs

Shakespeare in the Eighteenth Century: Algorithm for Quotation Identification

Using N-Grams to Understand the Nature of Summaries

Multi-Document Biography Summarization

Quesgen Using Nlp 01

An Automatic Text Summarization for Malayalam Using Sentence Extraction

Automatic Document Summarization by Sentence Extraction

The Elements of Automatic Summarization

A Query Focused Multi Document Automatic Summarization

Automatic Summarization of Medical Conversations, a Review

A Hybrid Approach for Multi-Document Text Summarization

A Chinese Chatter Robot for Online Shopping Guide

Relation Extraction Using Distant Supervision: a Survey