Is Two Better than One? Effects of Multiple Agents on User

Reshmashree B. Kantharaju Dominic De Franco ISIR, Sorbonne Université School of Science and Engineering, University of Dundee Paris, France Dundee, UK [email protected] [email protected] Alison Pease Catherine Pelachaud School of Science and Engineering, University of Dundee CNRS - ISIR, Sorbonne Université Dundee, UK Paris, France [email protected] [email protected] ABSTRACT 1 INTRODUCTION Virtual humans need to be persuasive in order to promote behaviour Intelligent virtual agents (IVA) have been incorporated into socio- change in human users. While several studies have focused on un- technical systems which perform socially valuable functions, such derstanding the numerous aspects that influence the degree of as teaching [46], social coaching [1], and healthcare decision sup- persuasion, most of them are limited to dyadic interactions. In this port [16]. As advances are made in their communicative abilities paper, we present an evaluation study focused on understanding and social behaviour, the potential application and use of IVA’s will the effects of multiple agents on user’s persuasion. Along with gen- shift away from a model of AI-as-tool to that of AI-as-assistant, der and status (authoritative & peer), we also look at type of focus AI-as-collaborator and AI-as-coach. Developing systems with an employed by the agent i. e., user-directed where the agent aims to agenda, or persuasive systems, is now an active area of research [10], persuade by addressing the user directly and vicarious where the with one such class known as Behaviour Change Support Systems agent aims to persuade the user, who is an observer, indirectly by en- (BCSS). These are developed for the purpose of openly helping in- gaging another agent in the discussion. Participants were randomly dividuals or groups to change their behaviour: a BCSS is a “a socio- assigned to one of the 12 conditions and presented with a persuasive technical information system with psychological and behavioural message by one or several virtual agents. A questionnaire was used outcomes designed to form, alter or reinforce attitudes, behaviours to measure perceived interpersonal attitude, credibility and persua- or an act of complying without using or ” [36]. sion. Results indicate that credibility positively affects persuasion. Persuasion, in which attempts are made to create, modify, reinforce, In general, multiple agent setting, irrespective of the focus, was or extinguish a user’s beliefs, attitudes, intentions, motivations, or more persuasive than single agent setting. Although, participants behaviour [20], is often integral to such systems. To develop per- favored user-directed setting and reported it to be persuasive and suasive BCSS’s and measure their effectiveness, it is useful to draw had an increased level of trust in the agents, the actual change in on studies in argumentation and rhetorical strategies to understand persuasion score reflects that vicarious setting was the most effec- the roles that group dynamics, types and characteristics tive in inducing behaviour change. In addition to this, the study of both persuader and persuadee play. In this paper, we aim to also revealed that authoritative agents were the most persuasive. understand the effects of having multiple agents on user persua- sion through an evaluation study by varying the agent’s gender, CCS CONCEPTS status and focus in both single and multiple agent settings. Results • Human-centered computing → Empirical studies in HCI; from this study will facilitate us in developing IVA’s that will be able to handle conversations successfully and be effective in user KEYWORDS behaviour changes. arXiv:1904.05248v1 [cs.HC] 10 Apr 2019 Persuasion, Role, Evaluation, Multi-agent, User 2 RELATED WORK ACM Reference Format: Reshmashree B. Kantharaju, Dominic De Franco, Alison Pease, and Cather- 2.1 Agent Characteristics and Persuasion ine Pelachaud. 2018. Is Two Better than One?, Effects of Multiple Agents on Appearance plays an important role in influencing the user. Individ- User Persuasion. In IVA ’18: International Conference on Intelligent Virtual uals are more influenced by agents who are similar to themselves Agents (IVA ’18), November 5–8, 2018, Sydney, NSW, Australia. ACM, New with respect to appearance-related characteristics [3]; and in the York, NY, USA, 8 pages. https://doi.org/10.1145/3267851.3267890 context of a learning environment, an anthropomorphic agent with ACM acknowledges that this contribution was authored or co-authored by an employee, a human voice led to greater perceptions of agent credibility. For contractor or affiliate of a national government. As such, the Government retainsa nonexclusive, royalty-free right to publish or reproduce this article, or to allow others self-regulation and efficacy, gender played an important role. High to do so, for Government purposes only. self-regulation and self-efficacy was observed with those students IVA ’18, November 5–8, 2018, Sydney, NSW, Australia who worked with mentor or motivator. © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6013-5/18/11...$15.00 Real world gender stereotypes have been shown to be projected https://doi.org/10.1145/3267851.3267890 to computing environments [12] and has shown to be applicable IVA ’18, November 5–8, 2018, Sydney, NSW, Australia R. B. Kantharaju et al. to animated agents [35]. Studies have shown that observers tend an emotional response from the persuadee and persuades with this to be more influenced by an agent of same gender6 [ , 21]. Male appeal to emotion. Logos is an appeal to reason and utilises evidence instruction agent was preferred to work with over a female instruc- and sound argument to persuade. tion agent [29]. Undergraduate women tended to choose to ‘learn The effects that the number of sources presenting a persua- about engineering’ from agents who were male and attractive, but sive message has on attitude change has been studied from an uncool conforming to the stereotypes. Students who worked with information-processing view [23]. Harkins and Petty found that female agents showed higher self-efficacy beliefs than students who increasing the number of sources of a message increases thinking worked with the male agents because they perceived female agents about the message content. In one experiment, subjects exposed to as less intelligent [6]. Studies have shown that female agents are 3 compelling presented by 3 different people were more subjected to more negative descriptions [22, 43]. In [30], a visu- persuaded than subjects exposed to the same number of arguments ally androgynous teachable agent, was used to study the influence presented by just one person (and conversely, subjects exposed of ascribed gender on perceived personality characteristics of the to 3 weak arguments presented by 3 sources were less persuaded agent. Results revealed that the agent that was perceived as a girl than subjects exposed to the same number of arguments presented received fewer positive words and more negative words than the by just one person). The authority, credibility, importance or pop- same agent when it was perceived as a boy. In [31] virtual agents ularity of a speaker is well recognised in argumentation studies were used to enhance participants’ performance, effort and mo- as affecting people’s willingness to accept an argument. This is tivation in mathematics. Results revealed that performance and reflected in Aristotle’s Ethos, as well as in common patterns of effort were significantly enhanced when interacting with an agent argumentation such as argument from authority, from , of opposite gender. and from expert opinion [47]. The effect of gender on the persuasiveness (trust, credibility and An important part of rhetoric is who the audience is. In many engagement) using a robot and manipulating only the voice was models of argument, the persuadee is considered to be an active done in [42]. Cross gender interaction was observed for trust and participant in the argument, or a member of an audience in the credibility i.e., male participants trusted female robot more and vice- case of a single orator. However, this is not always the case. The US versa. Participants donated more often to a female robot than male televised Presidential Debates feature an argument-as-performance robot. Further analysis revealed that male participants had a higher model, in which the goal of the debate is to persuade the audience tendency to donate to a female robot while female participants had rather than the opponent (in a recent study 29% of people surveyed no preference. Men rated female robots as more credible and female said the presidential debates were more helpful in helping them rated male robots credible. The effects of gender and realism on decide how to vote than anything else [24]). The legal system, in persuasion was reported in [49]. Participants were presented with which lawyers argue – apparently to persuade each other, but in a persuasive message regarding four topics, delivered by a male or reality to persuade a passive jury – is another example of this female human, virtual human i. e., an anthropomorphic agent, or model. These show the power of vicarious persuasion: the process virtual character i. e., a 3D agent with ogre- or cat-like appearance. where the aim is to persuade the audience rather than the person This study showed that participants found the virtual characters with whom a proponent is directly engaged in discussion. While used in the study as persuasive as real humans. Visual realism of understudied in argumentation, this power has been documented the speakers did not have an effect on the degree of persuasion. elsewhere, for instance in the domain of teaching (where vicarious However, male participants were more persuaded by the female experiences – in which an individual observes another individual speakers than the male speakers, and female participants were more teach – are one of the main sources that influence a preservice persuaded by the male speakers than the female speakers. teacher’s perceived self-efficacy [4]). Based on the literature we can now say that agent gender, status is a significant attribute in interactions. Most of the existing studies 2.3 Measuring the Effectiveness of a BCSS mostly focus on effects of agent appearance and personality in learning environments and persuasion in dyads. The most influential approach for measuring the effectiveness of a BCSS in changing behaviour is the Persuasive Systems Design (PSD) Framework [37]. Based on the work of Fogg, [19] it suggests 2.2 Argumentation and Rhetorical Strategies that the development of persuasive systems consists of understand- While much of the development in BCSS’s has focused on subcon- ing key issues, analysing the persuasion context and designing and scious methods of persuasion, recent theories such as the Trans- analysing the system. To analyse a system, experts examine the forming Sociotech Design model [45] have shifted the emphasis BCSS against twenty eight design principles. These principles are into more permanent belief and behaviour change. As split into four categories: primary task (personalising the system to such, the dialogue and argumentation within it have an increas- the user, reducing effort on them and allowing them to self-monitor ingly important role in facilitating these transformations. To assist progress), dialogue (implementing -to-human dialogue with incorporating argumentation concepts into BCSS’s a matrix to help users move towards their goal, including , rewards, has been developed [14]. One of the most crucial of these concepts reminders and suggestions), system credibility (aimed at making a was introduced by Aristotle [5] – the three fundamental modes of system more persuasive through increasing its credibility through persuasion. Ethos is primarily concerned with the character of the trustworthiness, expertise and authority) and social support (moti- person making the argument and persuades by emphasizing the vating users through social factors such as facilitation, cooperation authority or credibility of the persuader. Pathos relies on eliciting and competition). Is two better than one? IVA ’18, November 5–8, 2018, Sydney, NSW, Australia

Another method of measuring the persuasiveness of a BCSS after (1) How significant is the effect of agent’s gender and status on a period of user testing, is to issue the user with a questionnaire. persuasion? An example of this is the Perceived Persuasiveness Questionnaire (2) Can use of multiple agents have a better positive outcome in (PPQ) [33]. The PPQ was composed of twenty one questions re- persuading the users over having a single agent? lating to demographics, primary task support, dialogue support, (3) Can vicarious persuasion be more effective than user-directed perceived credibility, perceived persuasiveness, design aesthetics, persuasion? unobtrusiveness and intention to continue the program. The three questions from the primary task section were utilised and adapted 4 STIMULUS AND QUESTIONNAIRES as were the three questions from the perceived persuasiveness In this section, we describe our experimental setup: the topic of section. A further three questions were taken from the perceived persuasion, the agents used to present the persuasive dialogue and credibility section, although questions regarding professionalism our design of the content. and confidence were not relevant in the current experiment’s sce- nario. A combination of these two approaches has generally been 4.1 Topic followed since De Jong et al showed consistent results between In order to avoid personal biases, the topic of discussion had to be an expert PSD analysis, PPQ results and an analysis of user-test as neutral as possible, while still being popular and broad. With transcripts. This also matched with user log-data which showed this criteria in mind, we selected films as our discussion topic. To consistent use of the BCSS studied [9]. avoid any biases that may occur if our participants had previously To measure how persuasive a system is, we must also take into watched any of the films, we created our own film descriptions. To account how persuadable the individual participants in our experi- ensure that the film descriptions themselves would not bias the ments are, since this factor alone could drastically vary the results outcome of our study we selected three different genres. A recent of any research. Persuadability has been defined as the individuals’ study on gender stereotypes in film [48] revealed the most popular susceptibility to persuasive strategies and principles [27]. Based and gender neutral genres. Once we had filtered for country and on Cialdini’s [13] six principles of influence; reciprocity, commit- cultural biases, we were left with our 3 film genres: Comedy, Crime ment and consistency, social proof, authority, liking and scarcity, and History. The description of each of the 9 films followed a similar Kaptein et al [26] developed a 7-item persuadability instrument structure, while we kept the film titles and the language of the film to determine participants persuadability score. Each item on the descriptions as neutral as possible. questionnaire is measured using a 7-point Likert scale, and average score is used to group participants into persuasion profiles ranging 4.2 Agent Appearance and Status from low to high based on their scores. One must also consider the beliefs a user holds prior to a persua- Appearance, animations, affect and voice play an important role sive encounter and how the BCSS changed these beliefs. A useful in defining the personality of the virtual agent[6]. For this study, way to measure belief change was suggested by Andrews et al [2] we designed four characters that differed in gender and status. We in which users rank their preferences before and after a persuasive manipulated the visual appearance, non-verbal behaviours and lin- interaction. A measure of persuasion is constructed and normalised guistic style to fit the roles of an authoritative and a peer agent. The from the difference between the two rankings. design of the appearance of the agents were largely based on litera- ture that studied the effects of gender and status of virtual agents in motivating and learning environments [6]. The authoritative agent 3 CURRENT STUDY was designed to fit the role of a film critique and to be perceived The main objective of our experimental study is to understand the as an expert in films. Research shows that expertise in humans re- effects of using multiple agents in persuading users. We considered quires several years of deliberate practice in a domain [17]. Hence, agent’s characteristics (gender, status as displayed through different we modeled the agent to appear aged in late-forties and dressed verbal and non-verbal behaviours) and focus (vicarious vs user- formally in a professional manner. The peer agent was designed directed). We analysed the way these factors are perceived along to fit the role of a film-enthusiast who enjoys watching filmsand the dimensions of credibility and interpersonal attitude. Through appeared as a student in the early-twenties and dressed casually. this study, we evaluate the effect of the following: The appearances of the agents were designed using the Autodesk Character Generator software cf. Figure 1. (1) the gender of the agents delivering the persuasive message (male / female) 4.3 Behaviours (2) the number of agents delivering the persuasive message (1 speaker delivering 6 arguments / 2 speakers delivering 3 Body behaviour plays an important role in persuasive discourse arguments each) and has been studied extensively [28, 39, 40]. Although there are (3) the status of the agents delivering the persuasive message no particular gestures that could be categorised as persuasive, some (authoritative / peer) gestures have a persuasive effect as they convey some of the infor- (4) the focus of the agent delivering the persuasive message mation required in the persuasive structure of discourse [40]. This (user-directed / vicarious) information includes, importance, certainty, evaluation, sender’s benevolence, sender’s competence and emotion. Hence, in this Through these we aim to answer the following research ques- study, along with verbal content, we also incorporated non-verbal tions: behaviours that were expressed by the virtual agents. We chose IVA ’18, November 5–8, 2018, Sydney, NSW, Australia R. B. Kantharaju et al.

two on logos and two on pathos. Each dialogue then ended with a wrap-up section to conclude it. These dialogues were adapted for our variables giving us a dialogue for a single, peer agent, a single, authority agent, multiple agents using direct persuasion and multiple agents using vicarious persuasion. In the multiple agent scenarios using direct persuasion, the agents were both directing their arguments at the user with each using one ethos, pathos and logos argument. This scenario was split into three cases where we had two peer agents, two authority agents and one of each. To distinguish between peer and authority approaches we incorporated ethos into the authority agent, used more formal language and vocabulary as well as utilised more credible sources. For the multi-agent dialogues using vicarious persuasion, one Figure 1: Agent appearance agent tried to persuade the other. This scenario was further split into four cases. In each case the persuading agent was the only to characterize agents’ behaviours depending on their status only, one making an argument, while the other agent was listening and either authoritative or peer. We did not differentiate in behaviours appearing as if they were being persuaded. computation for the age or gender variables. As the agents don’t differ much in age, we can assume their behaviour to not beaf- Table 2: Examples of the dialogues used. fected by this variable. We did not want to fall into stereotypes and Scenario Dialogue to assign typically gendered-behaviours. This is why we did not I was listening to the radio the other night when the Ethos consider this variable either. movie segment came on. The movie critic Alex Garner Peer We define the peer agent as warm, friendly and the authoritative said that [film name] was the funniest movie he’s ever Comedy agent as competent and dominant. The chosen behaviours for the seen. agents were based on literature on persuasion in humans [28, 40]. Ethos I noticed in the film section of this morning’s paper In particular authoritative agents will display the Kendon’s ring [28] Authority that their film critic Alex Garner gave [film name] the and deictic finger point (towards the interlocutor) [40]. On the other Comedy highest ever rating for a comedy. Pathos hand, peer agents made use of more vague gestures such as beat I was a wreck by the end of this film, [film name] had Peer open hand. We also relied on studies that looked at the effects of me high, then low, it was like a roller-coaster! behaviour on warmth and competence as well as attitude in virtual History If I were you I’d see this movie because it’s full of agents [8, 15]. These studies informed us on the rest pose, facial Logos suspense and drama as well as being a good old fash- expressions [15], preferences to use either beat gestures (rhythmic Peer ioned crime movie. So you get a lot of bang for your Crime gestures not related to the semantic content of the speech) or buck. ideational gestures (more complex gestures related to the semantic content of the speech) [8]. Table 1 provides an overview of the variables we manipulated. We made use of the virtual agent Argument structures both within and between the three genres platform Greta [38] with Unity3D for generating the animations of were identical. Dialogues for each specific film differed only byfilm the virtual agent and Cereproc TTS voice synthesizer to generate name, an incident, and a realisation, which were specific to each the audio for the virtual agents. film. These were introduced systematically, with two references to the film name, one realisation and one incident for each genre Table 1: Overview of the distinctive characteristics for au- and each argument type, in order to personalise the argument and thoritative and peer agents. avoid repetition for the participants. An example of the dialogues Parameters Authoritative Peer used can be found in Table 2. Aged Young Appearance Formal clothing Casual Clothing 4.5 Evaluating Persuadability and Frown Eyebrow raise Persuasiveness Facial Expressions Corner lip down Open We used Kaptein’s questionnaire on persuadability; however be- Frequency of Smile Low High cause it was written in the context of buying patterns and products, Rest Position Akimbo Arms Crossed we adapted it slightly to involve characters discussing films. The Ring Gestures Beat open hand first three questions were omitted from the persuadability measure Deictic finger pointing in this experiment as they were not considered relevant to film choices and we wanted to avoid potentially overloading the partici- pants with questions. We added a question concerning advice from 4.4 Dialogues trusted websites, as the ethos of the characters and the sources Our dialogues utilised the three modes of persuasion (see Sec 2.2). they recommended was a crucial part of this experiment. We evalu- Each dialogue began with an introduction, before six arguments ated both perceived persuadedness, as measured by a questionnaire were made in front of the user. Two arguments were based on ethos, based on the PPQ (described in Sec. 2.3), and actual persuadedness, Is two better than one? IVA ’18, November 5–8, 2018, Sydney, NSW, Australia as measured by rating beliefs before and after interaction with the which different argument types are presented can have. E2 thought system. We adapted the ranking study described above (Sec. 2.3) to that some mini-arguments contained more than one mode of per- use ratings1 as a number between -4 and 4, to give a more fine-tuned suasion, and sent back a fully annotated set of the 18 arguments, measure than the rankings used in [2]. with different parts of a mini-argument labelled ethos, pathos, lo- gos. Based on the advice from these two experts, we simplified our 5 EXPERIMENT mini-arguments to ensure that each argument was solely focused 5.1 Design on one mode of persuasion. To avoid the impact that E1 warned us of, in the main experiment we presented these to participants in a The experiment is based on 2 x 2 x 3 design. The variables include randomised order. agent gender (male vs. female), status (authoritative vs. peer) and To assess the agents along appearance and behaviours, we cre- focus (multiple agent user-directed vs. multiple agent vicarious vs. ated a pre-study questionnaire and did a between-subject study single agent). Since we are also studying the effects of gender, in with group (n = 12), consisting of experts and naïve participants. multiple agent condition, a male agent and a female agent were We used static images of the four agents designed for the study and present and only the status is altered. In vicarious persuasion, the used a set of attributes to collect the first impression. The attributes status of both speaker and addressee agent will always be the same included: above/below 30, student/professional, competent, friendly. and only the gender is altered. Table 3 provides the overview of 2 For the authoritative agents, the attributes selected included above the twelve conditions used in the study. 30, professional, competent and for peer agents below 30, student, friendly. Table 3: Overview of the twelve randomly allocated experi- To assess agent’s behaviour, we created two separate video clips mental conditions; F: Female, M: Male, A: Authoritative, P: using the same neutral 3 female agent displaying the non-verbal be- Peer. haviours corresponding to both authoritative agent and peer agent Focus Composition Condition Participants ref. Sec 4.3. A 5-point Likert scale was used to measure friendliness FA C1 18 and competence. For the non-verbal behaviours, results showed a MA C2 18 Single Agent significant difference with agent friendlinessp ( < 0.05), and even FP C3 18 though the authoritative agent measured to be competent, it was not MP C4 15 F A persuades M A C5 18 significantp ( > 0.05). With this study we ensured that the agents Multiple Agent M A persuades F A C6 17 modeled fit to the status as well as their non-verbal behaviours. (vicarious) F P persuades M P C7 18 M P persuades F P C8 16 5.3 Method F A + M A C9 18 5.3.1 Dependent Variables. The dependent variables measured in- Multiple Agent F P + M A C10 18 (user-directed) F P + M P C11 15 cludes interpersonal attitude, credibility, persuasion ref. Sec 2.3. We F A + M P C12 16 use a Likert-scale on a five-point scale ranging from 1 (Disagree) to 5 (Agree). To measure the interpersonal attitude, we make use of the inter-personal circumplex proposed by Leary [32]. The inter- personal circumplex is 2-dimensional, where affiliation (friendliness 5.2 Pre-Study Evaluation vs. hostility) is represented on one axis and status (dominance vs. To assess the dialogues and in particular their mode of persuasion, submission) on the other axis. In total, the circumplex is divided we recruited two experts in argumentation and discourse analysis. into eight quadrants and we chose one adjective from each namely The first expert (E1) had over 20 years experience in the commu- Assertive, Helpful, Warm, Un-authoritative, Timid, Distant, Arro- nicative processes of argumentation, dialogue and persuasion, and gant, Forceful. Credibility was measured using 3 items developed the second one (E2) had 3 years experience as a post-graduate in by Kaptein et al [26] on a 5-point Likert scale measuring perceived the same area. We showed them our two mini-arguments for each trustworthiness, reliability and expertise of the agent and perceived mode and each of three film genres, i.e. 18 mini-arguments. These persuasiveness measured using a 3 items questionnaire. were presented in written form, in random order within the genre categories. We asked our experts to independently categorise the 5.3.2 Sample. For this study we collected responses in two stages. persuasion mode of each mini-argument as either ethos, pathos or Initially 282 participants were recruited from Crowdflower. 156 logos, and then gave them our model answers and asked them to responses were removed from the collected data due to inconsis- check their categorisations against these and to let us know of any tencies and non-naïvety as several participants did not adhere to discrepancies and comments. Since we focused on persuasion mode the instructions and responded multiple times and we considered we used only one status type - in this case all arguments were from the responses to be not genuine. We also collected 79 responses by our peer example. contacting respondents through mailing lists. In total, we had 209 E1 warned us about attributing a single mode (ethos, pathos, participants where 55% were male (n = 113) and 45% were female logos) to dialogues, and highlighted the impact that the order in (n = 92). 46% of the participants were between the age range of 21-30 years, 22% between 31-40, and 15% between 41-50 and 14% 1The actual ratings were between 1 to 5. We measured the change in this rating before above 50 years old. The participants came from different cultural and after the persuasive message and this change scales from -4 to +4. 2The verbal and non-verbal content remains the same across gender and conditions and varied only according to status. 3Appearance is not modeled to fit any status IVA ’18, November 5–8, 2018, Sydney, NSW, Australia R. B. Kantharaju et al. backgrounds with the three most prominent groups from, North 5.4.2 Credibility. There was a statistically significant difference in America (37%), Europe (27%), and Asia (20%). the credibility of the agents: authoritative agents were considered more credible (p = 0.0006) than peer agents (i. e., trustworthy 5.3.3 Procedure. The participant began the study by filling in the (p = 0.003), reliable (p = 0.0007) and shows expertise (p = 0.001)). demographics data i. e., age, gender and level followed Male users reported agents to be more credible than female users by accepting the consent form. The study is divided into three (p = 0.025). There was no significant effect of agent’s gender onthe main steps,(1) Pre-questionnaire, (2) Answering questionnaire af- perceived credibility. There was a statistically significant difference ter watching a video clip with persuasive dialogue (collected 3 in credibility (p = 0.0003) over the twelve conditions. We further times i. e., one per given film genre), (3) Post-questionnaire. The performed ad-hoc test with Bonferroni correction. There was a pre-questionnaire is designed to measure the extent to which the statistically significant difference between following pairs: (C6 -C11 participant is persuadable. Along with this the participant also : p = 0.029), (C9 - C11 : p = 0.002), (C12 - C11 : p = 0.0007). Figure 2, provides information about overall openness and comfort towards shows the mean perceived credibility value for each condition. technology and interest in films. A short introductory clip was de- signed using a virtual agent who presented the study. The age of the 5.4.3 Persuasion. Participants reported high persuasion with the agent was in its 30s and its appearance was smart casual. This was multiple agent setting (C5 - C12) in comparison with single agent . . done in order to familiarize the participants with the animations of setting (p = 0 05). Male user’s reported high persuasion (p = 0 015) the virtual agents to avoid collecting responses based on the first compared to their female counterparts. However agent’s gender impression generated. did not have any significant effect on persuading the user. Au- The users are first presented with a short textual description thoritative agents were reported to be more persuasive than peer . of three films of a given genre and asked to rate the likeliness of agents (p = 0 0004). There was a statistically significant difference . watching the films respectively. Once the ratings are provided, the in persuasion (p = 0 00002) over the twelve conditions. We further user is assigned randomly to one of the 12 conditions specified performed ad-hoc test with Bonferroni correction. There was a above and presented with a persuasive video clip about the film. statistically significant difference between following pairs: (C4-C9 . . . Since we want to measure the persuasion in user, we opted to show : p = 0 048), (C4 - C12 : p = 0 016), (C6 - C11 : p = 0 015), (C9 - C11 . . the clip corresponding to the film that received lowest rating by : p = 0 001), (C12 - C11 : p = 0 0004). Figure 2, shows the mean the user. The clip generally is 60s - 90s long, consisting of virtual perceived persuasion value for each condition. Further, participants characters presenting opinion and information about the film. The who scored to be ’easily persuadable’ reported to be more persuaded . participants were again asked to rate the likeliness of watching the (p = 0 0175) than those who scored to be ’not easily persuadable’. film again followed by questionnaire to measure attitude, perceived credibility, and perceived persuasiveness. This step is repeated again 6 DISCUSSION for the other two film genres. The condition does not differ between The main focus of this research work was to understand whether the genre of films and remains the same throughout the experiment. having multiple agents was more effective than having single agent Finally, a post-questionnaire is used to measure persuasiveness, in persuasion task. We also wanted to understand the effects of trust in the agents, overall satisfaction and intention to continue gender and status on user-persuasion. In order to study this, we using the system. had three settings: single agent user-directed, multiple agents user- directed and multiple agents vicarious. 5.4 Results The likeliness score of watching a film before and after the per- suasive clip, is an indication that agents were successful in per- In this section we present the results of the study and report on suading the users to reconsider their decision about wanting to only the significant results from ANOVA (1-way and n-way). watch a film. 153 participants reconsidered their rating at least once and increased it. Participants who were grouped under ‘easily 5.4.1 Perceived Attitude. Attitude was measured using eight adjec- persuadable’(n = 150) reported significantly higher persuasion tives from Leary’s interpersonal attitude circumplex ref. Sec. 5.3.1. from both authoritative and peer agents and the change in scores For each participant, we collected three responses, one after each indicate the same. Agent’s status did not have any effect on partici- film genre. Since the difference between the three responses wasnot pants grouped under ‘not easily persuadable’(n = 55), however, the statistically significant, we averaged the three responses to simplify vicarious setting was more effective in persuading them. the analysis. In terms of user’s gender, male users reported agents Authoritative agents were reported to be more credible regard- to be more “distant" in comparison to female users (p = 0.029) less of the gender of the agent and participants reported higher as well as, perceived the agents to be arrogant (p = 0.030) and level of trust in the information provided by them, indicating that forceful (p = 0.008). There was no significant effect of the user’s the authoritative agents were perceived as competent [18]. This gender on the rest of the attributes. In terms of agent’s status, au- is in line with [7], where expert-like agents were perceived to be thoritative agents were considered to be more forceful (p = 0.024) more credible. Additionally, they were also reported to be more compared to peer agents but they were also considered to be more persuasive than peer agents. In [25], the expertise of the agent in- helpful(p = 0.0034). There was no significant difference with regard fluenced the perceptions of credibility, and credibility mediated the to agent’s gender on the perceived attitude. Agents were considered influence of the agent’s expertise on persuasion. to be warmer (p = 0.041) and helpful (p = 0.030) in multiple agent While status played an important role, agent’s gender did not setting over single agent setting. have any significant effect. In previous studies6 [ , 21], gender had Is two better than one? IVA ’18, November 5–8, 2018, Sydney, NSW, Australia

Figure 2: Mean rating of perceived credibility (p = 0.0003) C1-C4: Single agent (m = 2.894); C5-C8: Vicarious (m = 2.936); C9-C12: Multiple agent (m = 2.966); and perceived persuasion (p = 0.00002) over 12 conditions. C1-C4: Single agent (m = 2.882); C5-C8: Vicarious (m = 2.969); C9-C12: Multiple agent (m = 3.028); over 12 conditions. a role in persuasion. However, in our study, gender was simply engaged when experiencing vicarious social relationships and emo- differentiated by the appearance of the agent and there wasno tional responses than when experiencing events from their own other difference at the behaviour level or at the interaction level direct environment [44]. Moreover, these studies underline how which can explain why there was no significant effect of gender. the effect of persuasion depends on the level of identification ofthe The main finding of this evaluation study is that, a multiple agent users with the interaction content. In our study, participants who setting was more effective than a single agent. The persuasiveness reported being ‘easily persuadable’ did report high persuasion. questionnaire revealed that participants reported being more Further, the association between perceived credibility and per- influenced by the user-directed multiple agent setting. However, ceived persuasion was observed using Spearman’s rank correlation we measured the mean change in rating, for each condition coefficient cf. Figure 2. We observe that perceived credibility posi- and this revealed that vicarious setting was more effective in tively affects the perceived persuasionr ( s = 0.92, p < 0.01). This persuading the user to change their score than user-directed tendency has been studied in detail in [11, 33, 41], where credibility setting cf. Table 4. In particular, authoritative agents were is linked with persuasion. We can conclude that a credible agent more effective in vicarious setting than single agent setting cf. can be effective for promoting behaviour change in a multiple agent Figure 3. Since the difference between the three settings was not setting. statistically significant, we suggest that there is a strong tendency in the result that needs to be further verified with more participants.

Table 4: Mean value of change in likeliness score of watch- ing a film (before & after the persuasive clip) and the self 0.9 reported persuasiveness for the three conditions.

Focus Change in score Self-report 0.6 Agent's Role Authoritative Single Agent 0.285 2.882 Peer

Multiple Agent Change in score 0.604 2.969 (Vicarious) 0.3 Multiple Agent 0.413 3.028 (User Directed) 0.0

Single Vicarious Additionally, the agents in the multiple setting (user-directed) Focus were considered to be more credible than a single agent and also Figure 3: Interaction of agent status∗focus on change in rat- users reported that they would consult the agents again and would ing (p = 0.035). Authoritative agents were more persuasive recommend it to friends. This setting was also more helpful and in multiple-agent setting (vicarious) than in single-agent set- users reported high satisfaction. 39% of the users in single agent ting. setting preferred to have multiple agents with different perspective while only 16% preferred to have one agent condition. From the above results it is quite evident that multiple agent condition is 7 LIMITATIONS AND FUTURE WORK indeed more effective, in particular, when vicarious persuasion is Although the sample size for this study was lower, results indicate a used. significant relation between the type of persuasion used and agent Our results on effects of settings are in line with human stud- attribute i. e., status on user’s persuasion. Future work will include ies from social cognitive science. In [34] it is argued that verbal collecting a larger sample of data and performing a detailed analysis persuasion by a single person is less efficient than vicarious experi- focusing on the effects of status and persuasion type with multiple ence on self-efficacy and behavioural change. Studies on interactive agents. Also, we aim to make the study more interactive, where the narrative systems report also that users are more influenced and participant will be able to communicate with the agents and focus IVA ’18, November 5–8, 2018, Sydney, NSW, Australia R. B. Kantharaju et al. on studying the effectiveness of agents in various other domains [22] A Gulz and M Haake. 2010. Challenging gender stereotypes using virtual pedagog- e. g., healthcare, education. ical characters. Gender Issues in Learning and Working with IT: Social Constructs and Cultural Contexts (2010), 113–132. [23] S G Harkins and Petty R E. 1981. Effects of source magnification of cognitive ACKNOWLEDGMENTS effort on attitudes: An information-processing view. Journal of Personality and Social Psychology 40, 3 3 (1981), 401–413. This project has received funding from the European Union’s Hori- [24] J. Holz, H. Akin, and K. Jamieson. 2016. Presidential Debates: What’s Behind the zon 2020 research and innovation programme under grant Agree- Numbers? White paper of the Annenberg Public Policy School at the University of Pennsylvania (2016). ment Number 769553. This result only reflects the authors’ views [25] M. Holzwarth, C. Janiszewski, and M. Neumann. 2006. The influence of avatars and the EU is not responsible for any use that may be made of on online consumer shopping behavior. Journal of marketing 70, 4 (2006), 19–36. the information it contains. We are grateful to members of the [26] M Kaptein, J Lacroix, and P Saini. 2010. Individual differences in persuadability in the health promotion domain. In Proc. International Conference on Persuasive Centre for Argument Technology at the University of Dundee for Technology. Springer, 94–105. discussion of a previous draft, and in particular to Rory Duthie and [27] M Kaptein, P Markopoulos, B de Ruyter, and E Aarts. 2009. Can you be persuaded? Katarzyna Budzynska for detailed comments. We would also like individual differences in susceptibility to persuasion. In Proc. IFIP Conference on Human-Computer Interaction. Springer, 115–118. to thank our anonymous reviewers for their thoughtful reviews. [28] A Kendon. 2004. Gesture: Visible action as utterance. Cambridge University Press. [29] Y Kim, A L Baylor, and E Shen. 2007. Pedagogical agents as learning companions: the impact of agent emotion and gender. Journal of Computer Assisted Learning REFERENCES 23, 3 (2007), 220–234. [1] K. Anderson, E. André, T. Baur, S. Bernardini, M. Chollet, E. Chryssafidou, I. [30] C Kirkegaard, B Tärning, M Haake, A Gulz, and A Silvervarg. 2014. Ascribed Damian, C. Ennis, A. Egges, P. Gebhard, et al. 2013. The TARDIS framework: gender and characteristics of a visually androgynous Teachable Agent. In Proc. Intelligent Virtual Agents for social coaching in job interviews. In Advances in International Conference on Intelligent Virtual Agents. Springer, 232–235. computer entertainment. Springer, 476–491. [31] N C Krämer, B Karacora, G Lucas, M Dehghani, G Rüther, and J Gratch. 2016. [2] P Andrews and S Manandhar. 2009. Measure of belief change as an evaluation Closing the gender gap in STEM with friendly male instructors? On the effects of persuasion. In Proc. Persuasive technology and digital behaviour intervention of rapport behavior and gender of a virtual agent in an instructional interaction. symposium. 1. & Education 99 (2016), 1–13. [3] J N Bailenson, J Blascovich, and R E Guadagno. 2008. Self-representations in [32] T Leary. 2004. Interpersonal diagnosis of personality: A functional theory and immersive virtual environments. Journal of Applied Social Psychology 38, 11 methodology for personality evaluation. Wipf and Stock Publishers. (2008), 2673–2690. [33] T Lehto, H Oinas-Kukkonen, and F Drozd. 2012. Factors affecting perceived [4] A. Bandura. 1977. Self-efficacy: Toward a unifying theory of behavioral change. persuasiveness of a behavior change support system. (2012). Psychological Review 84, 191–215 (1977). [34] S T Meier. 2012. Language and narratives in counseling and psychotherapy. [5] J Barnes et al. 2014. Complete Works of Aristotle, Volume 1: The Revised Oxford Springer Publishing Company. Translation. Vol. 1. Princeton University Press. [35] K N Moreno, N K Person, A B Adcock, RNV Eck, G T Jackson, and J C Marineau. [6] A L Baylor and Y Kim. 2004. Pedagogical agent design: The impact of agent 2002. Etiquette and efficacy in animated pedagogical agents: The role of stereo- realism, gender, ethnicity, and instructional role. In Proc. International Conference types. In Proc. AAAI symposium on personalized agents, Cape Cod, MA. on Intelligent Tutoring Systems. Springer, 592–603. [36] H. Oinas-Kukkonen. 2012. A Foundation for the Study of Behavior Change [7] A L Baylor and Y Kim. 2005. Simulating instructional roles through pedagogical Support Systems. In Personal and Ubiquitous Computing. agents. International Journal of in Education 15, 2 (2005), [37] H Oinas-Kukkonen and M Harjumaa. 2009. Persuasive systems design: Key 95–115. issues, process model, and system features. Communications of the Association [8] B Beatrice, A Cafaro, and C Pelachaud. 2017. Investigating the Role of Gestures, for Information Systems 24, 1 (2009), 28. Arms Rest Poses and Smiling in First Impressions of Competence. In Proc. 3rd [38] F. Pecune, A. Cafaro, M. Chollet, P. Philippe, and C. Pelachaud. 2014. Sugges- WS on Virtual Social Interaction. tions for extending SAIBA with the VIB Platform. In Proc. of the Workshop on [9] N Beerlage-de Jong, MJ Wentzel, S M Kelders, H Oinas-Kukkonen, and J EWC Architectures and Standards for Intelligent Virtual Agents at IVA. 16–20. van Gemert-Pijnen. 2014. Evaluation of perceived persuasiveness constructs [39] I Poggi and C Pelachaud. 2008. Persuasion and the expressivity of gestures in by combining user tests and expert assessments. In Proc. Second International humans and machines. Embodied communication in humans and machines Oxford Workshop on Behavior Change Support Systems, BCSS 2014. CEUR-WS. org. University Press, Oxford (2008), 391–424. [10] I. Benbasat. 2010. HCI Research: Future Challenges and Directions. In AIS Trans- [40] I Poggi and L Vincze. 2009. Gesture, gaze and persuasive strategies in political actions on Human Computer Interaction, Vol. 2. 16–21. discourse. In Multimodal corpora. Springer, 73–92. [11] J. Burgoon, T. Birk, and M. Pfau. 1990. Nonverbal behaviors, persuasion, and [41] C. Pornpitakpan. 2004. The persuasiveness of source credibility: A critical review credibility. Human communication research 17, 1 (1990), 140–169. of five decades’ evidence. Journal of applied social psychology 34, 2 (2004), 243– [12] L L Carli. 2001. Gender and social influence. Journal of Social Issues 57, 4 (2001), 281. 725–741. [42] M Siegel, C Breazeal, and M I Norton. 2009. Persuasive robotics: The influence [13] R B Cialdini and R B Cialdini. 2007. Influence: The psychology of persuasion. of robot gender on human behavior. In IEEE/RSJ International Conference on Collins New York. Intelligent Robots and Systems. IEEE, 2563–2568. [14] D De Franco, A Pease, and M Snaith. 2018. Measuring Persuasiveness in Behaviour [43] A Silvervarg, K Raukola, M Haake, and A Gulz. 2012. The effect of visual gender on Change Support Systems. In Proc. Sixth International Workshop on Behavior abuse in conversation with ECAs. In Proc. International Conference on Intelligent Change Support Systems (BCSS 2018). BCSS. Virtual Agents. Springer, 153–160. [15] S Dermouche and C Pelachaud. 2016. Sequence-based multimodal behavior [44] M D Slater. 2002. Entertainment education and the persuasive impact of narratives. modeling for social agents. In Proc. of the 18th ACM International Conference on (2002). Multimodal Interaction. ACM, 29–36. [45] A Stibe, A-K Christensen, and T Nyström. 2018. Transforming Sociotech Design [16] D. DeVault, R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgila, J. Gratch, A. (TSD). In Proc. of the 13th international Conference on Persuasive Technology. Hartholt, M. Lhommet, et al. 2014. SimSensei Kiosk: A virtual human interviewer [46] W. Swartout, D. Traum, R. Artstein, D. Noren, P. Debevec, K. Bronnenkant, J. for healthcare decision support. In Proc. International conference on Autonomous Williams, A. Leuski, S. Narayanan, D. Piepol, et al. 2010. Ada and Grace: Toward agents and multi-agent systems. 1061–1068. realistic and engaging virtual museum guides. In Proc. International Conference [17] K A Ericsson, R T Krampe, and C Tesch-Römer. 1993. The role of deliberate on Intelligent Virtual Agents. Springer, 286–300. practice in the acquisition of expert performance. Psychological review 100, 3 [47] D Walton, C Reed, and F Macagno. 2008. Argumentation Schemes. Cambridge (1993), 363. University Press. [18] S T Fiske, A JC Cuddy, and P Glick. 2007. Universal dimensions of social cognition: [48] P Wühr, B P Lange, and S Schwarz. 2017. Tears or fears? Comparing gender Warmth and competence. Trends in cognitive sciences 11, 2 (2007), 77–83. stereotypes about movie preferences to actual preferences. Frontiers in psychology [19] B J Fogg. 2002. Persuasive technology: using computers to change what we think 8 (2017), 428. and do. Ubiquity 2002, December (2002), 5. [49] C Zanbaka, P Goolkasian, and L Hodges. 2006. Can a virtual cat persuade you?: [20] R H Gass and J S Seiter. 2015. Persuasion: Social influence and compliance gaining. the role of gender and realism in speaker persuasiveness. In Proc. of the SIGCHI Routledge. conference on Human Factors in computing systems. ACM, 1153–1162. [21] R E Guadagno, J Blascovich, J N Bailenson, and C Mccall. 2007. Virtual humans and persuasion: The effects of agency and behavioral realism. Media Psychology 10, 1 (2007), 1–22.