Arthur: a New ECA That Uses Memory to Improve Communication
Total Page:16
File Type:pdf, Size:1020Kb
Arthur: a new ECA that uses Memory to improve Communication Paulo Knob, Willian S. Dias, Natanael Kuniechick, Joao Moraes, Soraia Raupp Musse Graduate Programa in Computer Science, School of Technology, Pontifical Catholic University of Rio Grande do Sul Porto Alegre, Brazil Email: [email protected] Abstract—This article proposes an embodied conversational researches developed in the context of chatbots, while Sec- agent named Arthur. In addition to being able to talk to a person tion II-B presents papers that discuss the behaviour of human (using text and voice), he is also able to recognize the person he memory and its use for virtual agents. Finally, Section II-C is talking to and detect his/her expressed emotion through facial expressions. Arthur uses these skills to improve communication discusses the importance of appearance and expressiveness for with the user, also using his artificial memory, which stores and virtual agents, also presenting work related with Embodied retrieves data about events and facts, based on a human memory Conversational Agents (ECAs). model. We conducted some experiments to collect quantitative and qualitative information, which show that our model provides A. Chatbots a consistent impact on users. Zhou et al. [12] focus their work on the proposal of the I. INTRODUCTION Emotional Chat Machine (ECM), a chatbot that can gener- ate relevant responses to content that are also emotionally Human beings are the only known species that use spoken consistent. As discussed by the authors, there are three main language to communicate, having a skill developed in com- challenges: obtaining high-quality data marked by emotions munication that uses factors other than speech, such as, for in a large-scale corpus, considering emotions in a natural and example, body expressions and gazing [1]. In this context, coherent way and incorporating that information of emotions Embodied Conversational Agents (ECAs) are virtual agents into a neural model. The results achieved by their work show which are able to interact and talk with humans in a natural that ECM can generate appropriate responses, if the emotion way. In the last years, many research have been conducted category of the response and the emotion of the post both to improve the quality of the communication abilities of such belong to one of the frequent Emotion Interaction Patterns ECAs, both verbal and non-verbal [2], [3], [4]. A fair amount (EIP). EIP is defined as the pair: categories of emotion of the of effort has been focused on ECAs that help people to have publication and its response. a healthier life [5], [6], [7], in clinical interviews [8], [9], Zhang et al. [13] propose to solve the problem of con- training specific skills [10], [11], among others. sistency on chatbot responses, concerning both context and The present work aims to propose an Embodied Conver- personas (casual speaker). In their work, they present a self- sational Agent (ECA) with general purpose, called Arthur, supervised approach that uses the natural structure of con- endowed with a memory module and many other abilities. versational data to learn and leverage both topic and persona Besides a conversational module, using text and voice, this features. For the tests, two data sets are used: Twitter FireHose, ECA is able to recognize the person he is talking to, as collected from 2012 until 2016; and Maluuba dataset. The well to assess the user emotional state through his/her facial results achieved indicate that the proposed model is able to expressions. Also, Arthur is able to demonstrate different capture meaningful topics and personas’ features. Also, the arXiv:2104.13327v1 [cs.HC] 27 Apr 2021 levels of emotion through his facial expressions. Lastly, Arthur incorporation of the learned features helps to significantly is equipped with a memory module, which tries to replicate improve the quality of generated responses on both data sets, the behavior of human memory and, thus, allows for Arthur even when comparing with models which explicit persona to learn information with and from the user while interacting. information. Moreover, it allows Arthur to remember such information later In our method, we mix a pre-defined dialog with an existent in the conversation or, even, in a different interaction. technology to provide chat and voice, in order to build the Section II presents several work related with the model pro- communication module of Arthur. More details are given in posed in this work. Section III details our methodology, while Section III-B. Section IV shows some results achieved when users know Arthur. Finally, Section V presents our final considerations. B. Human and Agent Memory One of the most accepted models concerning human mem- II. RELATED WORK ory cited in literature is known as Autobiographical Memory. This Section presents some work related with methodolo- As defined by Bluck et al. [14], autobiographical memory gies used to build Arthur. Section II-A presents two interesting is ”a system that encodes, stores and guides retrieval of all episodic information related to our person experiences”. Also, following the algorithm proposed by Lee et al. [20]. More according to Conway et al. [15], autobiographical memory can details are presented in Section III-F. be grouped in three levels: lifetime periods, general events and III. PROPOSED MODEL event-specific knowledge. So, such memories can be directly accessed if the cues are specific and relevant to the person. This work aims to build an interactive empathetic ECA Otherwise, if the cues are too general, a generative retrieval which can store and recover memories about previous inter- process must be used to produce more specific cues for the actions: Arthur. One of his features is being able to recognize retrieval of relevant memories. the person he is talking to. Also, Arthur has a memory to store Following this definition of autobiographical memory, Wang knowledge about people he interacts, such as their face, emo- et al. [16] build a model to mimic such behavior. Their tion and interactions. Finally, in order to keep the interactions model, known as Autobiographical Memory-Adaptive Res- as natural as possible, the agent is able to properly react to onance Theory (AM-ART), is a three-layer neural network interactions with people, delivering meaningful responses and that encode lifetime periods, general events and event-specific facial cues. knowledge, respectively, being, therefore, consistent with the A. Overview model presented by Conway et al. [15]. Also, it encodes the 5W1H schema, which represents when an event occurred, The overview of Arthur is illustrated in Figure 1. This work where it happened, who was involved, what happened, which was mostly developed using Unity3D [21]. Some modules pictorial memory was associated with the event and how was were developed using Python, as, for example, the Face the person feeling during the event. The results achieved by Recognition module. their work show that AM-ART was able to perform better that the keyword-based query method, since the last can not deal with noisy cues in many existing photo or memory repositories. The main difference between the work presented in this Section and ours is that we do not train a neural network. We modeled our memory in a procedural way, similarly to [17]. Also, our memory retrieval can be guided by the emotional state and, even, change the mood of the agent. More details are presented in Section III-E. C. Appearance and Expressiveness When developing an interactive agent, there is a concern Fig. 1. Overview of the proposed model. Behavior Control is responsible to define the appropriate behavior of the virtual agent, while the Memory about how it presents itself. It is important that the interaction Control is responsible of store and retrieve memories. The dolls inside the occurs so that the subject in contact with this agent does not boxes represent the modules where an user input exist. feel uncomfortable and/or embarrassed. As shown in Dill et al. [18], when trying to present a character who pretends to Behavior Control is responsible for defining Arthur’s appro- be human, there is a certain strange feeling (Uncanny Valley) priate behavior. This module uses all the information available when it seems human, but not as close as expected. It should to Arthur, that is, the person recognized by Arthur and with be noted that, considering the face of a character, the elements whom he is interacting, the user’s facial expressions, facts that can cause more strangeness are the eyes and the mouth. and information stored in Arthur’s memory, the user’s emotion Real human eyes tend to not stay static for too much time, detected and the dialogue through text/voice. having some involuntary or voluntary movement at some point. The Chat and Voice Detection modules are responsible by Such phenomena is known as saccade behavior, being defined the interaction between the user and the agent, in the scope of by Leigh and Zee [19] as rapid movements of both eyes verbal and textual behavior, where the Chat Module allows for from one gaze position to another. Lee et al. [20] propose the user to type some sentence and the Voice Detector Module an algorithm to simulate such behavior. The authors obtained is able to transform the voice of the user into words, which a statistical model through the analysis of eye-tracking im- are used as input to the agent. The Face Recognition Module ages from a subject during a conversation with a software. allows Arthur to recognize the person he is talking to, while In combination with literature data, they achieved a model the Emotion Detection Module allows him to identify the capable of generate movements considering various aspects, person’s perceived emotion.