arXiv:2108.02513v1 [cs.RO] 5 Aug 2021 optrSineDepartment Science Computer ie oteue freape tmltn h interaction the stimulating example, feedba (for and user form the control to the knowl given determine and they experiences Finally, o abilities, user. behavior different of the the dialogue adapting to for and useful the behavior hel are a can individual’s models for models user an used Secondly, user understand First, are [2]. robot out models carry the User by to purposes modeling. mechanism of user variety key is a actions and behavior, these interpr to human ability to the is react important more good but a point, provides starting t a communication Detecting and order [1]. action in manner human human-like recognizing behavior, a social in people natural with user’s interact rich of the complexity understand and Thus and perceive [1]. to role need collaborative robots its interactive to ca related skills, rules behaviors, and social-emotive ities, in engaging by people so of kind different with approach int the user-robot re-use robots. to the order modeling in pr for We tion robots. architecture affective cloud-based and social a in features modeling user fsca eainhprls odtc h srsemotions user’s the basic set detect a a to follow handle rules, and relationship to them social able recognized of to is users, robot with the conversation Than service, iSanbot. cloud-based called been the has application client-side this fteapiainhsbe eeoe n etdfrteSanb the for tested and developed Elf v been first has This application etc.). the so , of Softbak for Elf, services adaptive (i.e., componen and robots affective cloud social, offering managing iRobot, for called application, based re or-s h prahwt ifrn ido social of kind different with approach the interactio robots. re-use user-robot to the modeling clo order a for propose We architecture robots. affective based and social in features ing 1 oilrbti natnmu oo htitrc with interact that robot autonomous an is robot social A Terms Index Abstract h i forwr st eeo eea ups cloud- purpose general a develop to is work our of aim The hspprpeet h u tep oitgaeue model- user integrate to attempt our the presents paper This 1 [email protected] hneot ipySno)rbtatn scin,thus client, as acting robot Sanbot) simply (henceforth nvriyo Turin of University rsiaGena Cristina Ti ae rsnsorfis tep ointegrate to attempt first our presents paper —This Italy sca oo,ue oeig fetv computing affective modeling, user robot, —social I T II. EI HE .I I. R srmdln o oilrobots social for modeling User NTRODUCTION BTAPPLICATION OBOT optrSineDepartment Science Computer [email protected] nvriyo Turin of University ac Botta Marco Italy social , ersion tand et opose sto ks erac- edge ness pac- in n and cial ). cial ud- nd ck ot ts p o f . optrSineDepartment Science Computer ui.Terbtms eal orcgietepol thas it people interactions. past the of remember recognize University and to the with able of interacted be Dept. already must Science a robot Computer The from the Turin. started enter of who we people hall welcomes study, the robot Sanbot case the As wherein scenario change interaction. could these robot how on human focu and is detection task emotion last real-time This on them. to according behavior its modify .Fc eonto service recognition Face A. abl be will robot behavior. the its that adapt so interactions, and componen following modeling acquire interests the user of the for age, the all from goal sex, recovering information elaborated then the name, and and has face, etc.) mood, that (e.g., predominant conversation user the language knowing natural a with mto eonto evc a endvlpdi #and C# in Affectiva developed of libraries been the has implements th service while Python recognition ””, in response. the emotion librarY developed the the been sending for implements has waiting request, and instead, and recognition, text HTTP face into simple The converted be a to audio through place takes otx evc.Telte a encetduigteWit the using created been speech- has the finally latter service and The In modeling service. service. user to-text recognition different the emotion software recognition, the a various are: face the creates services between main which glue the particular, a of as each acts writt components, well, of side, server as able The Java and streaming. in connection video network and a audio acquiring with equipped with compatible robot is other application, iSanbot the hosting follow.Sanbot as implemented More been cloud. acquire has whil the architecture to in interaction, the ”brain” and robot’s user-robot detail, user the the represents the a server customizing with machine. the and at interact main dedicated aimed to , a two is on for are goal located Java client’s there and The Java, in level, in written high written server, client, At a one. components: client-server a is sso steue prahs ecmsadtksapicture a takes and welcomes approaches, user the as soon as 3 2 h rhtcueue o eeoigteioo applicatio iRobot the developing for used architecture The is,w mlmne h aercgiinsrie Sanbot service. recognition face the implemented we First, h letsd,rpeetdi h urn cnroby scenario current the in represented side, client The [email protected] nvriyo Turin of University eeiaCena Federica 2 afe peht etsrie n communication and service) Text to Speech free (a Italy optrSineDepartment Science Computer [email protected] lui Mattutino Claudio nvriyo Turin of University 3 h plcto starts application The . Italy the , to e any sed en in d n e e t , of her, and sends it to the Java server, which delegates the The Affectiva SDK and API also provide estimation on Python component to the user’s recognition. If the user has gender, age range, ethnicity and a number of other metrics not been recognized, a new user will be inserted into the user related to the analyzed user. In particular we use the informa- database. The user is associated to a sets of attributes, such as tion on gender and age range to implicitly acquire this data name, surname, age, sex, profession, interests, moods, etc. to about the user and store them on the database. Since also this be filled in the following. Some of the attribute indeed may information is returned with a perceived intensity level, we be associated with an explicit question, while other may be use a certainty factor described in [4]. implicitly inferred. Thus the client, during the user registration C. User modeling phase, asks the server what the next question is and then it reads it to the user. Once the user has answered, the audio is Currently, we have created a very simple user model sent to the Wit server, converted into text and, one returned, organized as a set of feature-value pairs, and stored in a saved into the corresponding attribute associated with the database. The current stored features are: user name, age range, question. For example Sanbot, if does not recognized the user, gender, favourite sport, favorite color, profession, number of asks the user her name, then, before taking her a picture, it interaction, predominant emotions. Some of the user acquired asks the permission to (Do you want me to remember you information are explicitly asked to the user (name, profession, the next time i see you?), then it asks her profession, and her sport, color), other ones (age, sex, emotions) are inferred favorite color and sport, and stores all these information on the trough the emotion recognition component described above. user DB. The detected predominant emotions will be inferred Please notice that explicit information as favorite color and and stored, as well as other biometric data,as described in the sport are used for mere demonstration purposes. following. In the future we would like to extend the user features On the other hand, if the face recognition component with other inferred features such as kind of personality (e.g. recognizes the user, all her stored information are recovered Extraversion, Agreeableness, Neuroticism, etc), kind of user and the user is welcomed (e.g., Hello Cristina! Do you feel dominant emotion, stereotypical user classification depending better today?) as well the next question to read. When there on socio-demographic information as sex, age and profession, are no more questions, the conversation ends. and so on. Concerning this last point, we will base our As far as the robot adaptive behavior is concerned, it is stereotypes on shared social studies categorizing somehow now based on simple rules, which are used for demonstration the user’s interests and preferences on the basis of socio- purposes and which will be then modified and extended in demographic information. We would also like to import part our future work. For example, in the current demo, the robot of the user profile by scanning social web and social media performs a more informal conversation with young people and [5], with the user’s consent. a more formal one with older people, it is more playful with III. CONCLUSION young people and more serious with older people. For accessibility reasons and even better understanding, the Our cloud-based service to model the user and her emotion application replicates all the conversation transcription on the is just at the beginning. We are now working with the Pepper Sanbot tablet screen, both the phrases spoken by Sanbot and robot to replicate the same scenario implemented in Sanbot. those ones acquired by the user. In the future we would like to replace Affectiva with a similar service we are developing, extend the user model and integrate B. Emotion recognition service more sophisticated rules of inferences and adaptation, improve During conversation with the user, the emotion recognition the component of natural language processing. component is running. This component has been realized REFERENCES through a continuous streaming of frames that is directed [1] C. L. Breazeal, Designing sociable robots. MIT press, 2004. towards the C# component, which analyzes each frame and [2] T. Fong, I. Nourbakhsh, and K. Dautenhahn, “A survey of socially extracts the predominant emotions. After that, the server sends interactive robots,” and autonomous systems, vol. 42, no. 3-4, a response via a JSON object to the client, which can therefore pp. 143–166, 2003. [3] F. W. V. . A. S. Ekman, P., “Facial signs of emotional experience,” Journal adapt its behavior and its conversation according to the rule of Personality and Social Psychology, vol. 39, no. 6, p. 1125–1134, 1980. associated to the dominant emotion. For instance, in our [4] C. Gena, F. Cena, C. Mattutino, and M. Botta, “Cloud-based user application, for demonstration purposes, Sanbot also changes modeling for social robots: A first attempt (short paper),” in Proceedings of the Workshop on Adapted intEraction with SociAl Robots, cAESAR its facial expression with the expression associated with user’s 2020, Cagliari, Italy, March 17, 2020, ser. CEUR Workshop Proceed- mood. ings, B. N. D. Carolis, C. Gena, A. Lieto, S. Rossi, and A. Sciutti, Currently Sanbot is able to recognize six plus one emotions, Eds., vol. 2724., 2020, pp. 1–6. [Online]. Available: also known as Ekman’s [3] universal emotions and they are: [5] G. Sansonetti, F. Gasparetti, A. Micarelli, F. Cena, and C. Gena, “En- sadness, anger, disgust, joy, fear, surprise and contempt. The hancing cultural recommendations through social and linked open data,” predominant emotion of a given interaction will be the one User Model. User Adapt. Interact., vol. 29, no. 1, pp. 121–159, 2019. [Online]. Available: with a higher certainty factor associated. On the basis of the predominant emotion, returned after the greetings, Sanbot will adapt its behavior accordingly.