Learning Representations of Chatbots

Bot2Vec: Learning Representations of Chatbots Jonathan Herzig1 Tommy Sandbank∗∗;2 Michal Shmueli-Scheuer1 David Konopnicki1 John Richards1 1IBM Research 2Similari fhjon,shmueli,davidkog.il.ibm.com, [email protected], [email protected] Abstract complex objects that execute logic in order to drive conversations with users. How should they best be Chatbots (i.e., bots) are becoming widely used represented? in multiple domains, along with supporting bot programming platforms. These platforms are Many commercial companies provide bot pro- equipped with novel testing tools aimed at im- gramming platforms. These platforms provide proving the quality of individual chatbots. Do- tools and services to develop bots, monitor and ing so requires an understanding of what sort improve their quality. Due to the increasing pop- of bots are being built (captured by their under- ularity of bots, thousands or tens of thousands of lying conversation graphs) and how well they bots could be deployed by different companies on perform (derived through analysis of conver- each platform1. Although bots might have differ- sation logs). In this paper, we propose a new ent purposes and different underlying structures, model, BOT2VEC, that embeds bots to a com- pact representation based on their structure the ability to understand bot behavior at a high and usage logs. Then, we utilize BOT2VEC level could inspire new tools and services bene- representations to improve the quality of two fitting all bots on the platform. In this work, we bot analysis tasks. Using conversation data explore a commercial platform, and study differ- and graphs of over than 90 bots, we show that ent bot representations. BOT2VEC representations improve detection Inspired by the success of recently proposed performance by more than 16% for both tasks. learned embeddings for objects such as graphs 1 Introduction (Narayanan et al., 2017), nodes (Grover and Leskovec, 2016), documents (Le and Mikolov, As conversational systems (i.e., chatbots) become 2014) and words (Mikolov et al., 2013a), we pro- more pervasive, careful analysis of their capabil- pose a new model, BOT2VEC, that learns bot ities becomes important. Conversational systems embeddings, and propose both content and graph are being used for a variety of support, service, based representations. While previous graph em- and sales applications that were formerly handled bedding representations consider static local struc- by human agents. Thus, organizations deploying tures in the graph (Narayanan et al., 2017; Grover such systems must be able to understand bots be- and Leskovec, 2016), our graph representation is havior to improve their performance. In many based on dynamic conversation paths. As bots are cases, such an analysis can be viewed as a classi- usually represented on bot platforms as some form fication task whose goal is to check whether a bot of directed graph, with conversations represented or a particular instance of a conversation satisfies as traversals on the graph, this approach seems some property (e.g., is the conversation success- reasonable. It captures the way the bot is actually ful?). Models for these downstream classification used, in addition to how it is structured. tasks should benefit from conditioning on repre- In this paper, our goal is to consider various bot sentations that capture global bot behavior. embeddings and two different but realistic classi- For a conversation itself, there exists a natural fication tasks, and test whether some bot represen- way to represent it as the concatenation of the hu- tations are more appropriate for these tasks. The man and bot utterances. As for a bot, the question first task, at the level of entire bots, aims to detect of its representation is more complicated: bots are 1https://www.techemergence.com/chatbot-comparison- ∗ Work was done while working at IBM. facebook-microsoft-amazon-google/ 75 Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM), pages 75–84 Minneapolis, June 6–7, 2019. c 2019 Association for Computational Linguistics (1) (1,1) (1,1,1) (1,1,1,1) (1,1,1,1,1) Welcome Make_a Payment_ Payment_ Paypal Paypal_error _payment time methods the context of a conversation. (1,2) (1,1,1,2) (1,1,1,1,2) Payment_type Credit_card Insert_detail At every step (i.e., every turn) of a conversation (2) (2,1) (2,1,1) with a bot, the human user expresses an utterance Account_ New_account Create_user operation (2,2) (2,2,1) and the bot analyzes it, determines how to respond Delete_ Delete_user account and updates its internal state. This determination (2,3) Update_ is executed by traversing the graph, starting from account (3) (3,1) a special node called the root node, and moving Store_ Location information (4) (4,1) (4,1,1) (4,1,1,1) along the nodes of the graph according to a given Technical_ Headset_ Wired_ Echo_test problem problem model set of rules as described below. Note that this de- (4,2) (4,1,2) (4,1,2,1) HD_problem Wireless_ Charge_ model headset scription aims to present and explain key abstrac- (4,3) Battery_ Goto: tions rather than the implementation details of an problem Echo_test (5) actual bot platform. Device_ jump node information negative edge (6) positive edge Anything_Else 2.1 Graph Components Figure 1: Example of a customer support bot graph. Every node in the graph has two internal parts: a user intent, and an optional reply of the bot. Given a user utterance, an intent classifier is used to de- termine whether the user utterance matches the in- whether the bot is in production (i.e., in use with tent associated with the node. For example, the real human users) or not. The second task, at the Technical problem node has been defined to cap- conversation level within each bot, aims to detect ture cases where users encounter a technical prob- problematic conversations with a deployed bot in lem with a product, and this is what is being ex- support of focusing improvement efforts. pressed in the utterance at hand (e.g., “I’m hav- The main contributions of this paper are three- ing some issues with my headset”). In this case, fold: (1) this is the first research that leverages in- the classifier should be able to classify this utter- formation from multiple bots in order to improve ance as relevant to this intent. In practice, the in- bot quality, (2) this is the first research to propose tent classifier is trained from examples of utter- an embedding approach for bots based on their ances and their corresponding intents, written by structure and how this structure is exploited dur- bot programmers. ing conversations, and (3) we empirically evaluate Every node has two optional outgoing edges: a these representations for two classification tasks positive edge and a negative edge. If a user ut- using data from more than 90 conversational bots. terance has been classified positively, the optional We find that our proposed representations lead node answer is presented to the user, and the exe- to more than 16% improvement in classification cution moves to the node which is the target of the for our two tasks, with our structure-based repre- positive edge. When the execution moves along sentation performing better than our content-based a positive edge, it is possible to obtain additional representation. This suggests that the representa- input from the user, and continue the evaluation tions explored in this paper are valuable across a using this input. If a user utterance has been clas- range of possible tasks. sified negatively, the execution moves to the node which is the target of the negative edge. As shown 2 Bot Overview in Figure1, we represent nodes connected by neg- Although different programming models can be ative edges along the vertical axis, and nodes con- used to create bots, in practice, most commercial nected by positive edges along the horizontal axis. conversational system platforms represent the con- When there is no positive edge and the classi- versation control flow for bots as graphs. In this fication is positive, the execution stops, and the paper, we use a bot paradigm based on one of the system waits for the next user utterance. When publicly available commercial platforms, but it is there is no negative edge and the classification is quite general and can be adapted to fit to other bot negative, execution jumps back to the root node to programming platforms. Our example in Figure1 resume evaluation from the start. shows a part of a customer support bot graph, and In addition to the nodes that we just described, we use it to explain how such a graph is used in there are special sink nodes (in our example, the 76 Agent: Hi, I’m Eva a virtual assistant, how may I help you? using?”), the system then waits for the next user Customer: I’m having some issues with my headset utterance. The next user utterance (“A wireless Agent: Which model are you using? Customer: A wireless one. one.”) leads to the Wireless model node, hence the Agent: Please make sure that they are charged. resulting path for this utterance is a continuation of Customer: Yes they are, and they work well with other de- the previous path. vices... Agent: Got you. In this case go and try our echo test. Note that nodes connected vertically by neg- Customer: Well, I’ll stop by one of your stores and see a tech- ative edges represent alternative understand- nician. Agent: I’m not trained on that yet, but I’m still learning. You ings of an utterance. That is, in our ex- may want to rephrase your question and try again. ample, an utterance can be identified as Ac- Customer: What is your closest location? count operation, Store information or Techni- Agent: Check here to see our locations in your city.

Load more