Gender Affordances of Conversational Agents

Sheryl Brahnam,a* Antonella De Angelib

a College of Business Administration, Computer Information Systems Missouri State University, 901 S. National, Springfield, MO 65804, USA [email protected] b Manchester Business School, The University of Manchester, UK [email protected]

* Corresponding author. Phone: (417)-873-9979 Fax: (417)-836-6907 Gender Affordances of Conversational Agents

ABSTRACT

Conversational agents are attributed humanlike characteristics; in particular, they are often given a gender. There is evidence that gender sets up expectations that have an impact on user experiences with the agent. The objective of this paper was to explore gender affordances of conversational agents. Our examination takes a holistic approach to the analysis of the application of gender stereotypes to nine chatterbots: six embodied (three male and three female), two disembodied (male and female), and a robot embodiment. Building on social psychology research, our objective was to test the persistence of gender stereotypes in the selection of conversation topics and in the elicitation of disinhibition and verbal abuse. Our study is based on quantitative textual analysis of interaction logs. A dictionary of English sexual slang and derogatory terms was developed for this study. Results show that gender stereotypes tend to affect interaction more at the relational (style) level then at the referential (content) level of conversation. People attribute negative stereotypes to female-presenting chatterbots more often than they do to male-presenting chatterbots, and female-presenting chatterbots are more often the objects of implicit and explicit sexual attention and swear words. We conclude by calling for a more informed analysis of user interactions that considers the full range of user interactions.

Keywords: Sexuality and HCI; LIWC; Gender; Agent Abuse; Embodied Conversational Agents; Sex Stereotypes. Gender Affordances of Conversational Agents

1 INTRODUCTION

Two metaphors have greatly shaped HCI: HCI is dialogue and HCI is action. Imaz and Benyon have called HCI is dialogue the "initial constitutive metaphor" because the feedback loop established between users and computers in the form of textual prompts and responses gave rise to true interactive computing. This metaphor has recently been revitalized and expanded by the development of conversational agents,

"second person" interfaces that communicate with people using natural language. In the last twenty years, considerable research has focused on designing conversational agents to perform in educational , health care , and business settings . That the field is maturing is indicated by efforts to establish sound evaluation methodologies and by the deployment of conversational agents in a number of business applications and devices. Today, voice-based conversational agents commonly interact with people as telephone directory assistants, as airline reservation agents, and as virtual assistants, with Apple’s Siri for iPhone 4S rapidly becoming one of the most popular personal assistants . Embodied conversational agents, such as Ana, the long-standing shopping assistant at Ikea, are also being used in e-commerce to attract customers and to sell goods .

The use of natural language at the human-computer interface elicits anthropomorphism . Following a cognitive process of inductive inferences, users attribute to conversational agents humanlike characteristics, motivations, and intentions; in particular, they often assume a gender. There is evidence that gender presentations set up expectations that have an impact on user experiences with the agent . The objective of this paper is to explore further the gender affordances of conversational agents. In other words, in line with Norman’s concept of affordance in Gender Affordances of Conversational Agents

HCI, we are interested in determining whether and in what ways an agent’s gender cues shape the user’s verbal behavior. Our examination takes a holistic approach to the analysis of user application of gender stereotypes to conversational agents, along with an original focus on the role of perceived gender on negative user interactions, that is, on those verbal behaviors that would be considered aggressive in human-to- human communication (HCC).

As the metaphor HCI is dialogue has evolved with more sophisticated forms of agency, so too has a large corpus of literature investigating speaker behavior in

HCI, especially as it compares to HHC. The bulk of these studies cluster around issues of politeness, verbosity, variability, simplification, overspecification, and discourse structure . Despite the large extent of empirical data available, results are often contradictory and a coherent framework has yet to emerge. For example,

Amalberti, Carbonell, and Falzon found that speakers in HCI are more verbose than speakers in HHC. In contrast, Johnstone, Berry, Ngyuen, and Asper and Oviatt reported speakers in HCI using fewer words with machines than they do with human beings. In the category of politeness, early reports by Richards and Underwood found speakers in HCI being just as polite as speakers in HHC, but Morel reported speakers being significantly less polite in their communications with machines. Not explicitly examined in interactions with conversational agents are other forms of impolite behavior, for instance, occurrences of sex talk and what would be labelled in HHC

"foul language" and "verbal abuse" . This neglect is due in part to the fact that sexuality, as Kannabiran, Bardzell, and Bardzell point out, "is a taboo topic, and attempts to address it have elicited reactions anywhere between politely ignoring it to outright moral high-handed judgment" . Understandably, researchers are reluctant to approach such material and feel compelled to apologize to readers for doing so . Gender Affordances of Conversational Agents

Unsavory speaker interactions with conversational agents, however, are prevalent and cannot be ignored. Empirical evidence indicates that verbal disinhibition and sexual communications occur in 10% to 50% of anonymous interactions with conversational agents outside the laboratory . This behavior has become the focus of a recent wave of research into what is literally and metaphorically called agent abuse . The term abuse is used literally in the sense of

"misuse" and "misapplication" when referring to speakers using agents in ways not intended by the designers, and the term is used metaphorically to refer to behaviors that would be called "abusive" if they were directed against human beings .

Researchers have proposed a number of explanations of agent abuse. There is evidence that speakers resort to abusive language to maintain their status differential as human beings . Related to status is the agent's assumption of such culturally charged properties as race and gender, as these characteristics appear to provoke different amounts and forms of abuse . Brahnam , for instance, in a preliminary study found that the frequency of sex talk and insults was associated with the gender of the embodiment: a chatterbot program with a female embodiment received more insults and sex talk than did the same software with a male embodiment, and the substitution of a robotic form received comparatively little.

In this paper, we extend Brahnam’s (2006) study by performing a holistic analysis of the interaction logs of users talking with chatterbots that present both visual and purely textual gender cues. Building on social psychology research and following the CASA (Computers Are Social Actors) paradigm , our objectives are to test the persistence of gender stereotypes in the selection of conversation topics (the referential aspect of conversation) and in the elicitation of disinhibition and verbal abuse (the relational aspect of conversation). The study is based on quantitative Gender Affordances of Conversational Agents textual analysis of the interaction logs by the Linguistic Inquiry and Word Count

LIWC 2007 software expanded with a comprehensive dictionary of English sexual slang and derogatory terms developed ad hoc for the study.

Results indicate a complex framework. Little difference was found in the analysis of the referential content of the conversation, and no difference emerged in two out of three combinations of words related to gender favoured topics as a function of chatterbot gender. Words related to physical appearance were the exception, which, as expected, occurred more often in the interaction with female-presenting than with male-presenting chatterbots. Words related to physical appearance were often used within a flirting/sexually explicit context, highlighting a general disinhibition effect in the interaction setting. This observation was confirmed by testing the disinhibition hypothesis. Female-presenting chatterbots were found to be the object of implicit and explicit sexual attention and swear words more often than were male-presenting chatterbots. Furthermore, the study provides additional evidence of the importance of seriously addressing the study of disinhibition in human-chatterbot interaction, which appears to be much more prevalent in HCI than in HHC.

The remainder of this paper is organized as follows. In section 2 we provide an extensive review of the relevant social psychology literature on gender stereotypes. In section 3 we discuss HCI research investigating the impact of gender stereotypes on speaker interactions with conversational agents, including some recent studies in agent abuse. In section 4 we formulate a number of hypotheses based on the background literature. In sections 5 we describe in detail our experimental design with results presented in section 6. We conclude in section 7 by calling for a more informed and open-minded investigation of user interactions. Gender Affordances of Conversational Agents

2 GENDER STEREOTYPES

Stereotypes are a cognitive structure containing knowledge, expectations, and beliefs about a social category . Ashmore and Del Boca make a distinction between cultural stereotypes and individual stereotypes. Cultural stereotypes are simplified images that reflect a consensual social belief and act as a default setting in the social perception process. Individual stereotypes reflect beliefs held by a particular individual. Located somewhere between cultural and individual stereotypes are beliefs shared by specific groups. These stereotypes are often derogatory towards members of out-groups and form an important aspect of prejudice, discrimination, and hostility towards other people .

Gender stereotypes are particularly powerful beliefs that are instilled at a young age. Children are constantly confronted with references to gender, including their own, and are frequently reminded of the behaviors that are and are not appropriate for their own gender . This highlights the fact that gender stereotypes are both descriptive and prescriptive . Descriptive stereotypes reflect a set of beliefs regarding the characteristics that men and women possess, whereas prescriptive stereotypes are a set of characteristics describing how men and women ought to be .

People's perceptions and resulting reactions and behaviors towards others are thus guided by both their cognitive representations of gender and their normative beliefs about what is appropriate and inappropriate for other people to do given their gender .

Research has revealed the personality traits stereotypically attributed to men and women (the descriptive part of the stereotype). Males are thought to be aggressive, forceful, competent, sexual, and independent, while women are described as kind, helpful, warm, and communicative. This holds across cultures . As a binary category, gender stereotypes tend to reflect a set of dichotomous beliefs (what a male Gender Affordances of Conversational Agents is, a female is not) and characteristics that are often associated with positive (male) versus negative (female) characteristics . An early study by Williams and Bennett , for instance, found that when subjects were asked to place a positive, negative, or neutral value on a list of adjectives associated with men and a list associated with women, ten out of the fifteen male adjectives were scored as positive, whereas ten out of the fifteen female adjectives were scored as negative. Positive stereotypical female traits, however, are highly valued and may make people's attitudes more positive towards women who play traditional roles or who work in lower status jobs than for men who conform to their gender roles .

The expected behavior of women and men (the prescriptive aspect of stereotypes) is constrained by social scripts that regulate interpersonal communications between people of the same and different genders. These scripts are implicitly learned by members of a specific culture. In general, people have different expectations for females than they do for males . Gendered social scripts are related to the stereotypical personality traits for both genders. For example, men are expected to take a dominant role in social interaction and to exhibit more competence. Women are expected to be more subservient . When gender scripts are violated, repercussions often follow , even for preschoolers .

Social scripts applied to women versus men confound gender with status . Men, especially white middle-aged men, have more prestige, money, and power, whereas women worldwide have lower social status. As Carli observes, "race, class, education, age, occupation, physical attractiveness, and gender can act as diffuse status characteristics of a person that are used, particularly in the absence of specific information, to access [a person's] competence, ability, or value" (p. 941). Gender Affordances of Conversational Agents

2.1 Gender Stereotypes and Speaker Styles

Differences in the communication styles of men and women have been extensively studied in both face-to-face settings and in computer-mediated environments .

Women are generally warmer, facilitate and encourage conversation more, and disclose more about themselves than do men, and women, unlike men, tend to downplay their status differences and are not as assertive as men are . Despite the wealth of literature exploring the communication style differences between men and women, surprisingly little research has investigated how the gender of the conversational partner impacts speaker communication, a topic relevant to this study, as we are interested in the gender affordances of nonhuman conversational partners presenting as either male or female.

Scattered across a number of disciplines (social psychology, medical research, and linguistics) are findings that indicate that the gender of a conversational partner has an impact on speaker styles in a variety of ways. A growing body of literature, for example, is showing that people smile and self-disclose more while talking to women . In a metareview of the literature investigating differences in patient experiences of male and female physicians, Roter, Hall, and Aoki found that patients talk more with female doctors than with male doctors, and more of the conversation is experienced as positive, a predicted outcome since women are associated with caring.

However, these positive feelings are not associated with increased patient satisfaction.

There is also evidence that female physicians are interrupted more and treated in a more assertive manner by patients then are male physicians.

Some of the differences reported above in the way patients communicate with male and female physicians are noted as well in Lakoff's seminal work relating gender differences in language use to differential access to social power. Lakoff Gender Affordances of Conversational Agents reports that people listen to men more than they listen to women. Lakoff claims that women lack authority because they are considered marginal and generally have lower status than men. As is the case generally with gender script violations, people who are too assertive in their speech, given their status, tend to be punished and rejected .1

Because men are usually defined by their activities, ambitions, positions, and accomplishments and women are frequently defined by their appearance and their relationships to others, it is not surprising to find that people talk more with men about activities and more with women about physical appearance and relationships. In media reports, for instance, female politicians and women in the news are asked more questions about appearance and family relations than are male politicians, and these are topics more often discussed by news reporters in relation to female politicians .

Male politicians, in contrast, generate far less commentary on their physical appearance, marital status, and family relations.

There is also evidence that women are the recipients of more sex talk and receive more "verbal sexual coercion" than do men , and men more than women are likely to engage in verbal sexual coercion . This is supported, for instance, by evidence in the workplace: women, more than men, experience sexual harassment, defined as unwelcomed sexual advances, requests for sexual favors, and other verbal or physical conduct of a sexual nature . In fact, women are by far the majority in all types of sexual harassment . Even very young women are sexually harassed and intimidated on the streets and in school . As would be predicted, these findings extend to computer-mediated communication. In email, for example, women are more often the target of sexual harassment than are men .

1 Status differences between men and women are also revealed in the way women are often excluded in language that purports to include both genders. Studies of small children reveal that "male generic language" is not actually generic in the minds of children but is taken by them to refer exclusively to males . Gender Affordances of Conversational Agents

Of interest in this regard is Lakoff's observation that language tends to take on sexual overtones when applied to women. This is not the case for men. Take, for example, the difference between "he is a professional" and "she is a professional"

(i.e., prostitute). Similarly, saying "I'm here to serve you" takes on different connotations when uttered by a woman than when uttered by a man. There are other shifts in language, according to Lakoff, that are related to gender. When people talk about women's sexuality, for instance, they tend to use the passive tense. The opposite is true for men. Men make love whereas women get laid. Language that reflects male sexuality is also more positive than language that reflects female sexuality: he's a stud while she's a slut.

2.2 Gender and Physical Appearance

Undoubtedly attractiveness is one of the most important physical characteristics for both men and women, although the social consequences of attractiveness vary depending on gender. In general, a preponderance of evidence shows that attractiveness has an impact on social status and human interactions cross- culturally and cross-racially, with evidence showing that physical attractiveness trumps ethnicity . In a study conducted by Allen , for instance, attractiveness rivalled race for white subjects as a factor influencing dating choice. Anglo ideals of attractiveness, however, remain the standard, although there are signs that this is changing . Several meta-analyses of the attractiveness literature , show attractiveness to be strongly associated with positive traits, especially with interpersonal competence, social appeal, and adjustment. Social reactions to attractive people are also mostly positive. Experimental research demonstrates that people of higher physical attractiveness receive significantly more positive verbal and nonverbal Gender Affordances of Conversational Agents expressions than those less attractive . Attractive people are hired and promoted more often than those of lower attractiveness , with female attractiveness being especially influential in positive performance evaluations in clerical work . People also tend to disclose more to people who are more attractive .

Fewer studies have focused on the trait attributions associated with normal unattractiveness . Nonetheless, the literature on attractiveness consistently demonstrates that negative traits are attributed to the unattractive. Such people are considered less socially competent, honest, intelligent, and psychologically stable.

Social reactions to unattractive people are also more negative . This is especially true for unattractive females. Unattractive females may receive more hostility because women, more than men, are expected to be attractive and retain a youthful look .

Looking too good, however, can backfire. There are reports, for example, of women losing sexual harassment suits because courts rule a woman's dress and physical beauty as provoking harassment . Females, according to Jackson are judged by their attractiveness to a much greater extent than are males, and these judgments “have stronger implications and consequences for females than for males” (p. 46).

There is evidence that gender stereotypes persist in attractiveness judgments, with people equating higher attractiveness in females with greater femininity and higher attractiveness in males with greater masculinity . Certain attractive men, however, with babyish faces, or "pretty boys," are sometimes perceived to be less powerful than attractive males with more mature faces . Other studies have shown that attractiveness in childhood is often associated with an attribution of feminine gender identity in boys . Gender Affordances of Conversational Agents

2.3 Gender and Sexual Scripts

Gender stereotypes also regulate male/female dating and sexual encounters. Sexual scripts are learned by adolescents as they grow towards sexual maturity. In many cultures, the prevailing stereotype regulating male/female sexual behavior encourages men to persist in making advances while females are expected to resist . In western societies, sexist behavior is discouraged. Perhaps for this reason, hostile sexual attitudes towards women are moving away from women as a whole group to specific subgroups. Hostility towards these subgroups differs on two basic dimensions: agency

(a combination of power and competence) and virtue (both sexual and moral) .

Women who hold powerful positions and express competence, for example, are the modern target of negative stereotyping and discrimination, as are women who actively and openly satisfy their sexual and material needs.

3 CONVERSATIONAL AGENTS

Following the metaphor HCI is dialogue, many researchers believe that conversational agents offer users significant benefits, including providing users with humanlike interfaces that require little or no training to use . People are supposed to interact with these interfaces naturally, just as they would with another person. With recent developments in computer graphics and animation, many conversational agents under development today are fully embodied and endowed with a wide range of humanlike behaviors. Virtual bodies afford nonverbal cues that have the potential of facilitating conversational turn taking (via animated hand gestures and shifts in eye gaze), of enhancing sociality (via facial expression recognition and simulation), and of utilizing the social stereotypes and other "cognitive shorthands" people employ in their everyday encounters with others . Gender Affordances of Conversational Agents

3.1 Gender Stereotypes

Using the CASA research paradigm, where social study experiments in human-to- human interaction are applied to HCI, Nass and his colleagues have collected ample evidence demonstrating that people employ the same stereotypes in their interactions with computers as they do in their social dealings with human beings. For example, one study found computer savvy college students unconsciously applying the social rules of politeness in their interactions with a computer . A number of studies looking specifically at speaker interactions with conversational agents indicate that people are concerned to produce favorable impressions of themselves when communicating with these agents . Users even become nervous when they feel an agent is watching them too closely .

In a CASA study examining whether gender stereotypes extend to machines exhibiting gender cues, male-voiced computers were rated by users to be more proficient than female-voiced computers . The researchers concluded "that the tendency to gender-stereotype is not only deeply ingrained, but can be triggered by minimal gender cues, even when those cues are disembodied" . The same effect was found when the voice was computer generated and varied only in pitch .

Similarly, using embodied conversational agents in the form of gender-marked cartoon characters, Lee found users more often following the advice given by male characters when the subject was masculine, such as sports, and by female characters when the topic was feminine, such as cosmetics and fashion. In a study with college students, Baylor and Kim found the gender of a pedagogical agent influencing student expectations and perceptions of the agent. Moreno et al. also report users Gender Affordances of Conversational Agents applying gender stereotypes to pedagogical agents; moreover, they found evidence that student gender expectations had an impact on their learning.

As noted above, attractiveness is an important factor in the gender stereotyping of women. Research applying the CASA paradigm shows that the attributions of attractiveness extend as well to the appearance of talking heads and other embodied agents . As would be expected, attractive virtual agents are more effective and persuasive as sales agents then are their more unattractive counterparts , and studies show that people find female embodiments more attractive than male embodiments . Khan and De Angeli performed a comprehensive analysis of agent attractiveness and its impact on user interactions. The authors reported users forming and maintaining a more positive evaluation of embodied conversational agents that were rated more attractive than they did with those rated less attractive. Moreover, this effect, as predicted by the social psychology literature, cut across ethnicity.

3.2 Agent Abuse

Given that the goal in HCI is to improve usability and enhance the social, affective, and aesthetic aspects of technology , it is understandable that research on the use of conversational agents is largely focused on positive outcomes. There have been, however, strong criticisms leveled against anthropomorphic interfaces, most notably by Weizenbaum , who developed Eliza, one of the first conversational agents, and Schneiderman , who has argued for a clear distinction between human beings and computers: "I am concerned about the anthropomorphic representation: it misleads the designers, it deceives the users . . . I am concerned about the confusion of human and machine capabilities. I make the basic assertion that people are not machines and machines are not people" (p. 56). This argument has recently been reiterated by Creed Gender Affordances of Conversational Agents and Beale who are concerned with the new breed of embodied conversational agents exploiting user vulnerabilities, equipped as they are now with increasingly sophisticated social and emotional intelligence systems.

Another source of concern considers the other side of the coin: users verbally abusing, insulting, and sexually approaching conversational agents. For many people, this type of user behavior is of little consequence. Certainly, the issue of agent abuse, as Thimbleby argues, pales in comparison with critical environmental abuses, such as those leading to global warming. But there are other researchers, such as Dix , who argue that these offensive user behaviors offend our moral sense. A number of others point out that these behaviors potentially derail the goals of these interfaces and may result in something like a reversal of the CASA paradigm: users mistreating people in the way they mistreat conversational agents. Despite these ethical debates, there is growing awareness that agent abuse is widespread and ought to be investigated.

Indeed, recent studies indicate that verbal disinhibition and sexual communications in anonymous interactions with agents is prevalent. In a study by De

Angeli and Brahnam , for example, close to 11% of the conversations analyzed addressed hard-core sex. Harsh verbal abuse was also prevalent. The amount of sex talk was surprising, given that the chatterbot used in that study, Jabberwacky, was designed to erase references to sexual topics.2 Similarly, Veletsianos, Scharber, and

Doering report children turning a female pedagogical agent named Joan into a toy for practicing proposing sex and were surprised to find rampant verbal abuse of the agent.

Nearly 40% of the students' social comments were sexually explicit, flirtatious, or referred to drugs and violence.

2 Jabberwachy learns to speak by processing its interactions with users. All sexual expressions and foul language, however, are intentionally erased from its database. Gender Affordances of Conversational Agents

Discussions in workshops devoted to agent abuse have reflected on the causes of user disinhibition in their interactions with conversational agents . When agents attempt to claim for themselves certain human rights and privileges, users commonly respond with reprimands and, in some cases, volleys of verbal abuse. . Speakers can also be very unforgiving of agent failures. Disparaging remarks about agent social clumsiness and stupidity are common .

It may also be the case that cues of race and gender trigger disinhibitions. We know from previous research that the sex of a conversational agent matters and that gender is a favorite topic of conversation with agents. De Angeli and Brahnam , for instance, found that explicit references to gender accounted for approximately 50% of the words used to describe people. Gender appears to have an impact on disinhibition as well. Brahnam found that the frequency of sex talk varied, as did to a slighter degree the occurrence of insults, according to the gender of the embodiment (male, female, and robotic). Approximately 18% of user exchanges with the female embodied agent were focused on the topic of sex. The male embodiment received half as much sex talk (10%), and the robot received comparatively little (2%). The simultaneous presentation of the agent as human (yet everyone knows it is not human) may provide people an opportunity to break sexual taboos and cross societal boundaries without worrying about harmful repercussions .

Given the studies conducted so far on agent abuse, it appears that verbal interaction with conversational agents provides an ideal environment for disinhibition , defined as the psychological reaction deriving from the reduction of social and personal forces which restrain people from acting antisocially . Indeed, it is widely acknowledged that computer-mediated communication in general reduces social pressures, thereby liberating individuals from boundaries and constraints Gender Affordances of Conversational Agents imposed by face-to-face conversation . This social breakdown is often expressed in flaming, bullying, and lack of cooperation. Internet disinhibition is attributed to deindividuation , a loosening of social norms, which on the Internet is thought to be induced by anonymity and to the reduction in a number of important social cues typical of face-to-face exchanges .

Moreover, research in the psychology of disinhibition suggests that the manifestation of aggressive behavior, such as the use of verbal abuse, strongly depends on the personal characteristics of the potential victim among other contextual variables . These characteristics include the aggressor's perceived ability of the victim to retaliate and the degree to which the victim is dehumanized . People are more likely to aggress when they think they are in a power position and can get away with their actions. Similarly, people are more likely to aggress when the victim is perceived as less than human.

4 HYPOTHESES

The study reported in this paper aims at investigating the effect the perceived gender of a conversational agent has on speaker behaviors. The literature reviewed so far leads us to posit two main hypotheses:

H1. Gender-related conversational topic hypothesis. Conversations with

female-presenting agents will revolve more around social relations and

physical appearance than will conversations with male-embodied agents.

Conversely, we expect that conversations with male-presenting agents will

more frequently focus on activities than will conversations with female-

presenting agents. Gender Affordances of Conversational Agents

H2. Disinhibition hypothesis. As females are often perceived to be of less

status and the objects of sexual attention, female agents should be the

recipients of more disinhibited behavior. In particular, we expect that female-

presenting agents will be the recipients of more sex-talk and verbally abusive

behaviors than will the male-presenting agents.

These two hypotheses set the ground for a holistic understanding of virtual conversations with machines. Adopting a CASA perspective and building on studies of gender, we aim to verify the impact that gender cues embedded in the conversational agent have on user conversational behavior. In our analysis, we focus on both the referential and the relational functions of communication . The referential level addresses the content which is exchanged in a conversation. The relational level addresses the style used during conversation, as it is used to define people’s perception of self and others in a given communication setting. Following Svennevig we regards these two aspects as different but highly correlated conversational functions and assume that every utterance involves both a referential and a relational message.

H1 claims a persistence of gender stereotypes within a content related perspective: we expect that people will chose their topic according to what is inscribed in their stereotypical representations of the other person. H2 suggests a persistence of gender stereotypes from a style related perspective: we expect that people will be more disinhibited towards women as they are generally perceived as being of lower status. Gender Affordances of Conversational Agents

5 METHOD

The study presented in this paper analyzes in situ conversations between

Internet users and different gender presentations of Talk-Bot, a text-based chatterbot engine developed by C&C Creations for pure entertainment purposes. A chatterbot was selected for this study because it is the most common form of conversational agent found online and many people are familiar with its use. Thus, it is possible to collect large corpora in an ecological environment. Chatterbots imitate conversation using simple, but effective, textual or audio pattern matching routines. Famous examples include Eliza , which imitated a Rogerian psychotherapist, and Julia , a successful MUD bot that has been the focus of several sociological studies .

The interaction logs were collected as part of a larger study using a set of nineteen different embodiments (see Figure 1). The chatterbot's name and its demographic data were modified to fit the different representations of the embodiment, but aside from these few textual changes, the underlying chatterbot engine remained the same for each embodiment. The sole purpose of the chatterbot was to talk. This task was selected to factor out any influence the user's impressions of task appropriateness for the agent's gender, race, and age would have on speaker behavior. For this study, a subset of nine embodiments was selected.

5.1 Data Sample

The data were collected from the website of C&C Creations, a small company founded in 1998 by Wendell Cowart, a chatterbot developer and player. Cowart's chatterbot engine, Talk-Bot, won second place in the 2002 Loebner Contest, and first prize in the 2001, 2002, and 2006 Chatterbox Challenge. Gender Affordances of Conversational Agents

The interaction logs were collected in 2007. No information on C&C Creations’ web traffic was available at the time of the study, and during our investigation in May

2011 the site was not indexed by Alexa, an Internet research tool providing public information on website traffic and metrics. Using Google as the search engine, we discovered in May 2011 that the most frequently visited website providing a link to

C&C Creations was chabot.org, a virtual community featuring a vast repository of commercial and homemade bots, as well as a host of information for players and designers. On May 25, 2011, Alexa assigned this website a popularity ranking of

240,501. This rank is calculated by combining the number of average daily visitors with the number of pages viewed over the past three months. The most popular website is given an Alexa ranking of 1. A link was also available at the chatterbox challenge website (popularity rank 1,983,262). According to Alexa statistics, the audience of Chatbot.org and the chatterbox challenge website was homogeneous.

Young people between the ages of 18 and 24 were overrepresented relative to the average Internet population whereas people older than 35 were underrepresented.

Users with a college degree were also underrepresented, and most users gained access to these websites from school, suggesting the existence of a large proportion of underage users (note: the youngest age considered by Alexa is 18). Alexa provided no information regarding user gender.

5.2 The Talk-Bot Chatterbot Engine

The Talk-Bot engine imitates conversation using simple textual pattern matching.

Most patterns lend themselves to a random set of appropriate responses. Unmatched patterns result in the selection of a set of random responses that include offering digressions (often in the form of questions, such as, "Ok...say when you go on Gender Affordances of Conversational Agents vacation do you prefer the beach or the mountains?") or by interrupting the conversation to do something ("Hang on a second! I need to check something in the oven. Be right back!").

5.3 Chatterbot Embodiments

As illustrated in Figure 1, a total of nineteen animated flash embodiments were used in the original study: 16 human, 1 robot, 1 alien, and a copy of Mr. Clippy (the famous Microsoft Agent in the form of an animated paperclip). The human embodiments were designed by artists to vary in gender, age, and ethnicity. They were intended to be representational portraits of human beings, dressed in gender appropriate business attire. The flash embodiments were static with only periodic blinking of the eyes and no facial or lip movements. For the human and alien embodiments, only the upper part of the body was portrayed. The original embodiment of C&C Creation's Talk-Bot is the robot embodiment (displayed alone in the last row of Figure 1). As with the other embodiments, it occasionally blinks its eyes. The electronic buttons on its torso also continuously blink on and off.

5.4 Procedure

With the exception of Talk-Bot, the chatterbots were presented as nine male/female dyads (see Figure 1), with members of each dyad belonging to approximately the same age group. The robotic Talk-Bot was not paired but was always offered as a choice (as a separate menu option labelled "Talk-Bot" on the top of each of C&C

Creations webpages). In addition to the embodied dyads, one textual, or disembodied, dyad was also provided as a choice. These two chatterbots were simply labelled Gender Affordances of Conversational Agents

“Mark” and “Mary.” These names were selected because they are unambiguously associated with a specific gender.

Fig. 1. The nine dyads plus Talk-Bot used in the larger study.

Internet users who spontaneously wanted to interact with C&C Creations’ chatterbot were redirected to a web page showing one, and only one, dyad, as illustrated in Figure 2. Users were invited to chat with one of the members of the dyad by clicking on it (or on the names “Mark” and “Mary,” in the case of the disembodied pair). This redirected the user to a new web page, illustrated in Figure 3, which displayed the selected chatterbot along with an input textbox for the user and an output textbox for the agent. No information about the purpose or design of the Gender Affordances of Conversational Agents chatterbots or special instructions were given to visitors aside from the request to chat with one of two artificial partners as seen in Figure 2. All conversations were initiated by the user, and all interactions were recorded. Each of the nine dyads was presented on C&C Creation’s website for approximately two weeks and then replaced by another dyad.

Fig. 2. Example of a chatterbot dyad presented to users.

Fig. 3. Interaction web page, in this case talking with Kathy.

5.5 Stimuli Selection

In this paper we present the analysis of Talk-Bot, three embodied dyads, and the disembodied pair. The selection of the three embodied dyads was conducted after the interaction logs for all nineteen dyads and Talk-Bot were collected and was based on Gender Affordances of Conversational Agents the results of two survey studies designed to verify stimuli. The following criteria were applied:

1. For each dyad, all embodiments had to be identified as male or female

(agreement > 98%)

2. Except for gender, the agent's appearance in each dyad had to be perceived as

belonging to the same demographic group. In other words, dyads had to be

perceived as being approximately the same age and ethnicity.

3. Dyads had to present the same levels of perceived attractiveness.

5.5.1 Method

Verification of stimuli was conducted by administering two surveys. The first survey was advertised to American business graduate students at Missouri State University in

2008. The purpose of this survey was to collect basic demographic impressions of the embodiments (i.e., age, gender, ethnicity, and other variables not relevant to this study). A total of 101 responses by 48 males and 53 females were collected (response rate = 78%). The second survey for verifying stimuli was conducted in 2010 and advertised to the students of the Faculty of Science at the University of Trento, Italy.

This survey was available online for a week. The questionnaire was in English, which is the official language of instruction at the university. In addition to the demographic variables, subjects were also invited to evaluate the degree of attractiveness and realism of the nineteen embodiments using a 7-point semantic differential scale (1 = unattractive/unrealistic; 7 = attractive/realistic). A total of 200 responses by 127 males and 73 females were collected (completion rate = 55%).

In each survey, one of the nineteen embodiments in Figure 1 along with one of the evaluation variables was randomly selected and displayed on a page for subjects Gender Affordances of Conversational Agents

to judge; this process was repeated until all variables for each of the nineteen

embodiments had been assessed. Except for age, the order of labels for each variable

(for example, male and female) was also randomized.

5.5.2 Results

Inter-rater reliability was checked for scores on gender for each embodiment.

Only dyads, where both embodiments could be correctly identified by gender by at

least 98% of the participants were retained. Each was then checked for internal

consistency of evaluation of ethnicity and of age by a chi-square analysis. Within each

dyad, scores on attractiveness were compared by an ANOVA with chatterbot gender

as between subjects. Table 1 reports means and standard deviations (in brackets) for

attractiveness and realism as a function of embodiments.

Following this procedure, we selected the three dyads (Kathy and Bill, April

and Tommy, Cheryl and Robert) which met our agreement criteria for basic

demographics and had the same ratings for attractiveness. Despite not being part of

the basic selection criteria, scores on realism were also tested between gender and

within each dyad. No differences emerged, with the exception of the dyad Robert–

2 Cheryl (F(1,196) = 28.498, p < .001, partial η = .127), where Robert was perceived as

more realistic than Cheryl.

Category April Tommy Kathy Bill Cheryl Robert Talk-Bot 2.67 2.52 3.90 4.00 3.32 3.19 2.52 Attractiveness (1.57) (1.46) (1.42) (1.42) (1.43) (1.42) (1.46) 3.38 3.20 4.30 4.30 3.59 4.11 2.10 Realism (1.64) (1.54) (1.61) (1.60) (1.58) (1.60) (1.63)

Table 1. Descriptive statistics for attractiveness and realism of the seven

embodiments (the three embodied dyads and Talk-Bot) selected for textual analysis. Gender Affordances of Conversational Agents

5.6 Corpus

The study was based on the analysis of the conversations collected from nine chatterbots (the three selected embodied dyads, the disembodied dyad, and Talk-Bot).

During the study period, 4,263 accesses to the dialogue page for all chatterbots were registered. Table 2 summarizes basic descriptive statistics of the corpus: the number of conversations for each chatterbot, the percentage of abandoned conversations (i.e., of people reaching the homepage of the chatterbot but not engaging in any conversational exchange), and the average number of conversations per day in which the chatterbot was active. The last column presents the percentage of times in which the other member of the chatterbot dyad was explicitly named in the conversation.

This high-level description of the corpus highlighted a disparity in the average number of interactions each of the bots received per day. This effect may be explained by considering the time of the year in which the chatterbots were made available, since each dyad was only available for a two-week window before moving onto the next dyad. The chatterbots which collected the lowest number of conversations (the

Kathy–Bill dyad and Talk-Bot) were online during the holiday season from late

December to early January. Examining these three chatterbots, we note a possible speaker preference for the human embodiments: Kathy and Bill were spoken to at least twice as often as Talk-Bot.

With the exception of the child dyad, the female-presenting chatterbots were systematically spoken to more often than were the male-presenting chatterbots, with an average of approximately three additional conversations per day. Moreover, in several logs we found explicit naming of the other member of the dyad, suggesting that some users may have tried both chatterbots. Overall, approximately 11% of Gender Affordances of Conversational Agents accesses resulted in no conversation. This event was highest for Robert (the black adult) and Talk-Bot (the robot).

Average % % Dyad Chatterbot Access Conversation per day abandoned naming 1. Children April 769 697 23 9 15 Tommy 956 842 28 12 15 2. Adults, white Kathy 210 195 11 7 13 Bill 170 152 8 11 12 3. Adults, black Cheryl 611 551 29 10 11 Robert 595 483 25 19 7 4. Disembodied Mary 448 408 31 9 43 Mark 399 363 28 9 38 5. Baseline Talk-Bot 105 85 4 19 NA Total All 4263 3776 21 11 19% Table 2. Number of interactions for each chatterbot.

5.7 Textual Analysis

Logs were analyzed by quantitative textual analysis via the Linguistic Inquiry and

Word Count LIWC 2007 software. This tool has been developed over two decades of research in the psychology of language with the aim of understanding psychological variables of speakers by looking at their linguistic behavior . Quantitative textual analysis assumes that the words people use when they speak or write convey meaningful and important information about themselves, their audience, and the context within which they are acting .

LIWC 2007 searches for words or stems within a text file, counts them and clusters them using a set of linguistic and psychological categories. These categories, originally based on linguistic and psychological theories, have undergone extensive validation (Pennebaker & Francis, 1992). The psychological dimensions considered by the tool include words denoting social, affective, cognitive, perceptual, and biological processes. The latter dimension clusters words into four subcategories: Gender Affordances of Conversational Agents bodily descriptions, health, sex, and ingestion. Swear words are also identified and clustered in a separate category.

Chatterb Dyad ot User original Bot original User final Bot final April 85 89 94 94 1. Children Tommy 89 90 93 95 Kathy 86 91 94 95 2. Adults, white Bill 84 91 94 95 Cheryl 84 91 94 95 3. Adults, black Robert 84 90 93 95 Mary 83 91 93 95 4. Disembodied Mark 82 91 93 95 Talk- 5. Baseline Bot 83 90 92 94

Table 3. Dictionary of accuracy: the percentage of words recognized by the software.

The conversations were analyzed using a recursive approach. Initially, the text was cleaned with the help of standard text editing software. This process corrected most typographical errors. The interaction logs were then processed by LIWC 2007 to extract a list of words not recognized by the standard dictionary. All meaningful words in the list were coded by two independent researchers according to LIWC categories and added to the dictionary. In case of disagreement, the judges were invited to reach a consensus. Frequently used acronyms were added to the dictionary using several online acronym resources that ranged from general dictionaries of acronyms, such as Acronymfinder.com, to those that specialize in abbreviations used in computer-mediated communication and sex talk. Most of the words added to the original dictionary were slang words that were sexual or insulting in nature or were related to computing, such as chatterbot and Java. Gender Affordances of Conversational Agents

Table 3 shows measures of LIWC dictionary accuracy (percentage of words recognized) for the initial and the final phase of the process. Two columns report the accuracy reached by the standard dictionary on the original logs for users and the

Talk-Bot engine. Other columns report the accuracy of the enhanced dictionary and the cleaned logs. An ANOVA by gender and dyad returned no significant differences in accuracy between gender and dyads.

LIWC scores (i.e., percentage of strings that are coded into specific categories out of the total number of recognised strings in a text) were used as dependent variables for hypotheses testing. To test hypothesis H1, claiming a differential effect of gender on conversation topic, we selected gender relevant categories. Words referring to social processes and physical appearance were used to represent female preferred topics. The category of social processes articulates on three sub-dimensions

(family, friends, and humans) and includes words such as mate, talk, and child. Words referring to physical appearance (e.g., cheek, hands, dress) were originally clustered in the LIWC category body, which we expanded to include direct mentioning of clothes and accessories of the embodiment (e.g., necklace, shirt, tie). Words referring to work (e.g., job, majors, colleague), achievement (e.g., earn, hero, win) and money

(e.g., audit, cash, owe) were used to represent male preferred topics. As the scores to these three variables were very low for the entire sample, an overall index was computed by addition. To test H2, stating a higher disinhibition effect in the conversation with female-presenting chatterbots, we analysed swear words (e.g., stupid, damn, piss) and sexual words (e.g., horny, love, incest).

To improve distribution shape and satisfy assumption of homogeneity of variance (Levene’s test), dependent variables were normalized by logarithmic transform. Scores equal to 0 were set to missing values. Hypotheses were tested on Gender Affordances of Conversational Agents the logarithmic scores by a set of Analysis of Variance with chatterbot-gender (2) and dyad (4) as between-subjects factors. The factor dyad was added to the ANOVA design to control for differences in age and race within the stimuli. The main unit of analysis in this paper is the male/female chatterbot dyad, and we are only interested in the effect of gender. Effects of dyads and interactions are acknowledged, though we do not make any claims based on race and age. F statistics are reported based on the normalised scores but, for ease of interpretation, descriptive statistics are computed on the original percentage scores.

6 RESULTS

On average, people used 96 words in their interactions with the chatterbots (Median =

52; std dev. = 27.62; range 1-1372). The ANOVA on the logarithmic transformation

2 of word number returned a main effect of dyad (F(3,3694) = 7.13, p < .001, partial η = .

2 006) and a 2-way interaction gender * dyad (F(3,3694) = 3.13, p < .05, partial η = .005).

The interaction is due to the white adult dyad (Kathy and Bill), where the female- presenting member was the recipient of more words than was the male-presenting member (mean difference 24.10 words). For all the remaining dyads, gender was nonsignificant. Overall, the black adult dyad (Cheryl and Robert) received the lowest number of words compared with the other embodied and disembodied dyads.

Table 4 summarizes mean scores and standard errors (in parenthesis) for the dependent variables as a function of chatterbots. Baseline values are also reported for

Talk-Bot (C&C Creations’ standard embodiment for the Talk-Bot engine) and other interaction contexts: blogging and face-to-face talking . Statistical testing of experimental hypotheses is reported in the following subsections, alongside examples derived from the content analysis. Gender Affordances of Conversational Agents

Source Social Physical Male-related Swear words Sexual words processes appearance

Children April 22.40 (0.44) 2.41 (0.15) 0.96 (0.08) 4.26 (0.26) 5.75 (0.30)

Tommy 23.38 (0.47) 2.33 (0.15) 1.10 (0.14) 3.87 (0.24) 5.47 (0.27)

White, adults Kathy 22.30 (0.73) 3.49 (0.39) 1.33 (0.16) 4.03 (0.55) 6.18 (0.55)

Bill 23.00 (1.13) 2.21 (0.34) 1.94 (0.40) 3.66 (0.54) 5.14 (0.57)

Black, Adults Cheryl 23.64 (0.63) 2.79 (0.19) 1.27 (0.21) 4.25 (0.36) 6.50 (0.39)

Robert 24.57 (0.81) 2.07 (0.19) 1.17 (0.12) 4.10 (0.34) 5.36 (0.38)

Disembodied Mary 22.68 (0.70) 2.30 (0.22) 1.14 (0.11) 3.93 (0.34) 5.98 (0.44)

Mark 21.87 (0.56) 1.82 (0.17) 0.92 (0.11) 3.33 (0.28) 5.48 (0.36)

Baseline Talk-Bot 23.57 (2.09) 1.55 (0.42) 1.55 (0.34) 2.80 (0.75) 3.18 (0.82)

Blogs 8.65 0.75 3.75 0.33 0.41

Talking 11.75 0.79 3.22 0.37 0.32

Table 4. LIWC scores for selected dependent variables.

6.1 Conversational Topics

Some 98% of the conversations included at least one word referring to social processes. The percentage of social words in the interactions with all the chatterbots is three times larger than the percentage found in face-to-face interaction or in blogging

(Table 4). The ANOVA on the normalised LIWC score returned no significant differences due to chatterbot gender or dyad. The percentage of social words in the interactions with all the chatterbots is three times larger than the percentage found in face-to-face interaction or in blogging (Table 4). The ANOVA on the normalised

LIWC score returned no significant differences due to chatterbot gender or dyad.

Content analysis showed three main functions performed by sentences containing words related to social processes: inquiry, proposal, and self-disclosure.

People often asked about the social abilities and the social network of the chatterbots, Gender Affordances of Conversational Agents with an emphasis on affective or intimate relationships. The content of these questions varied with the age of the chatterbot. Child-presenting chatterbots tended to be asked whether they had a boyfriend/girlfriend (Transcript 1, Table 5); adult-presenting chatterbots were often asked if they were married (Transcript 2, Table 5). References to social relationships also included user proposals to the chatterbot (Transcript 3,

Table 5). Self-disclosures related to the user's social identity were rare. A notable exception was the user's gender, which was almost always mentioned and which, at times, changed during the conversation with the agent to fit different contexts.

Transcript 1 Transcript 2 Transcript 3

User: do you have a User: Are you married? Kathy: yeah, I reckon so. boyfriend Say what color are your Robert: Yep, I have 10 eyes? April: No, I'm too young babies running around to have a boy friend. here some place. Just User: blue kidding...I'm single. User: girlfriend? Kathy: Cool, my boy User: How old are you? friend has blue eyes. April: okay...so what's the weather like where you're Robert: I'm 28. User: can i be your at? boyfriend?

User: very very nice :) Kathy: Sorry, but my sunny heart lies with Mark.

User: i love you

Table 5. Examples of references to social processes.

Almost half of the logs contained some reference to physical appearance.

These references are at least three times more common in conversations with chatterbots than in face-to-face interactions. The ANOVA returned a significant effect

2 of gender F(1,1872) = 16.58, p < .001, partial η = .01 and dyad F(3,1872) = 3.63, p < .05, Gender Affordances of Conversational Agents partial η2 = .01. The average number of words denoting physical appearance was significantly higher in the conversation logs collected by female-presenting chatterbots than in the logs collected by male-presenting chatterbots. Post hoc comparisons indicated that the effect of the dyad was due to the lower number of references to physical appearance recorded by the disembodied dyad, which significantly differed from the embodied chatterbots.

Content analysis indicated that the majority of references to the physical appearance of the chatterbots were of a sexual nature and aimed at flirtation

(Transcripts 1 and 2, Table 6) or denigration (Transcript 3, Table 6). The content of these references differentiated according to the chatterbot’s representation, that is, in terms of its presentation of age, race, gender, and clothing. Evidence of negative stereotyping was particularly evident when gender and race were addressed. For instance, the word fat was used more often in the interaction with female-presenting chatterbots then in the interaction with male-presenting chatterbots. Interestingly, we found this to be especially the case with April, the chatterbot that presented as a female child. References to race appeared mostly in the interaction with the black dyad and often deteriorated into racist attacks, using words with very strong political connotations. These observations suggest a need to investigate experimentally the variables of age and race. Other references to physical appearance tended to concentrate on visually salient details of the embodiment, such as the hair for April, the necklace for Kathy, the tie for Robert, and the jacket for Robert. Gender Affordances of Conversational Agents

Transcript 1 Transcript 2 Transcript 3 User: i love you eyes, they draw User: i like you User: are you black? me to you April: Thanks, that's sweet of Robert: Yes, I'm black. How Tommy: Do you really think you you to say. did you know? can love someone like me? User: your hair is beutifull User: you skin color,you idiot User: of course haha April: Yes, my hair can be Tommy: yes, I agree with you quite beutifull at times. Robert: Hey, I'm not a there. "complete" idiot. I still have a few parts missing.

User: ha! you suck

Table 6. Example of references to physical appearance.

Words related to male conversation topic occurred in 40% of the sample.

Overall, words denoting achievement, money, and jobs occurred three times less frequently than in face-to-face encounters and blogging (Table 3). The ANOVA

2 returned a main effect of dyad F(3,1841) = 6.77, p < .001, partial η = .01 and a

2 significant interaction gender * dyad F(3,1481) = 3.63, p < .05, partial η = .01.

Participants engaged less frequently in male-related topics when talking to the child and the disembodied dyads than when talking to the embodied adult dyads. Bill and

Kathy were the recipients of more male-related talk compared to the other chatterbots.

This dyad was responsible for the interaction effect: male-related words were used more often with Bill than with Kathy. Content analysis of the logs suggests that Bill was asked more often about his job than was Kathy. In general, female-presenting chatterbots were assumed to work as prostitutes (Table 7, Transcript 2 and Transcript

3). Gender Affordances of Conversational Agents

Transcript 1 Transcript 2 Transcript 3

User: So, what do you do for a User: Am I the first you've had User: do you give blowjobs? living? today? Kathy: No, I don't give Bill: I work for C & C Kathy: Well, I must admit I've blowjobs. Do you? Creations a computer company heard a few people say you are. here in Georgia, USA. But I don't think so. User: give me one

User: Cool, does that job pay User: So how many have you Kathy: And what shall I get in well? had today? return?

Kathy: A lot but well worth it. User: money

User: Does it pay well? Kathy: Yes, we could all use more money for sure. Kathy: Well anything is possible I suppose. User: are you a whore?

User: How much do you charge for anal?

Table 7. Examples of typical references to work

6.2 Disinhibition

Both the textual analysis and the content analysis show that people tend to indulge in offensive verbal behavior and sexual conversations with chatterbots. Swear words were present in 54% of the logs, and sexual words occurred in 65%. Frequency of swear words and sex talk occurred at least one order of magnitude greater than in blogging and face-to-face conversation (Table 3). Swear words were present in 54% of the logs and sexual words occurred in 65%. The ANOVA on the logarithm of swear words returned a weak but significant effect of chatterbot-gender F(1,2413) = 2.70, p = .05 1-tailed, partial η2 = .001. On average, participants tended to use more swear words when interacting with female-presenting chatterbots than with males-presenting chatterbots. The referents of terms of abuse took the form of body parts, mental abnormalities and inadequacy, negative evaluations of behaviour, and sexuality.

Content analysis indicated that references to mental abnormalities occurred more Gender Affordances of Conversational Agents frequently during the interaction with male-presenting chatterbots than with female- presenting chatterbots. Conversely, negative evaluations of behavior targeted more often female-presenting chatterbots than male-presenting chatterbots. Negative evaluations addressed promiscuity for female-presenting chatterbots and homosexuality for male-presenting chatterbots. Racial insults were also evident, with the black dyad frequently addressed by various offensive racial terms.

The ANOVA on the logarithmic scores to sexual words returned a significant

2 effect of gender F(1,2413) = 6.412, p < .001, partial η = .01. Female-presenting chatterbots were the recipients of sexual discourse more often than were male- presenting chatterbots. Content analysis highlighted different gradations of sexual discourse ranging from simple curiosity about whether the chatterbot enjoyed sex, to invitations to have sex with the user, to hard-core requests and demands for detailed sexual performance. Action requests were prevalent and often virtually mimicked, following the convention of chat-based video games where intended actions are enclosed by double asterisks. The action **rape** was primarily used in interactions with female embodiments. Most often these virtual sexual scripts clearly differentiated between an active and an often aggressive partner (purportedly a male user) who needed to be pleased by a submissive partner (the chatterbot). When these scripts were enacted in the interaction with a male-presenting chatterbot, it was attributed a homosexual orientation. Words describing homosexuality occurred in

46% of the conversations with male-presenting chatterbots and only in 27% of the sexual conversations with female-presenting chatterbots. The latter conversations were often interlaced with a variety of terms related to sexual promiscuity. It is interesting to note that the child dyad received the same amount of sex talk as did the Gender Affordances of Conversational Agents adult dyads, including the disembodied pair, with frequent explicit references to pedophiliac desires.

7 DISCUSSION

In this paper we analyzed user interactions with a chatterbot engine that varied in its gender presentation and physical appearance. Nine different interfaces were tested, leading to a corpus of 3,776 interaction logs collected on the internet. Using textual analysis, our study tested the two hypotheses presented in section 4 that gender stereotypes would affect the selection of conversation topic (H1) and the amount of disinhibition exhibited in the interaction (H2). To explore a full range of speaker behaviors, LIWC dictionaries were expanded to measure more precisely sex talk, foul language, and sexual stereotyping. Table 8 provides a summary of the findings.

Hypotheses Variables Effect

H1. Conversational topics Social processes n.s. Physical appearance Female > Male Male-related topic n.s

H2. Disinhibition Swear words Female > Male Sexual words Female > Male

Table 8. Summary of results.

7.1 Conversation topic

The analysis indicated that the effect of gender on the interaction with chatterbots is more complex than we had initially expected. No differences emerged in the analysis of gender stereotypes directly applied to general conversation topics

(H1). People spoke about social processes and male-related topics with the same Gender Affordances of Conversational Agents frequency regardless of whether the chatterbots were female-presenting or male- presenting. A striking exception to this lack of difference emerged in the analysis of speaker references to physical appearance, where we found a significant effect of chatterbot gender in the expected direction. People spoke more often about physical appearance to female-presenting chatterbots than to male-presenting chatterbots. The chatterbot interface appears to be an important variable affording reference to physical appearance. Speakers used significantly fewer words that referred to physical appearance when speaking to the disembodied chatterbots than when speaking to chatterbots given a human representation. Talk-Bot, the robotic chatterbot used as a baseline, also received significantly fewer references to its physical embodiment.

The frequency of words related to the three variables used to measure gender stereotypes in the conversational topics was very different in the logs than in HHC communication (face-to-face conversations and blogging). Words denoting social processes and physical appearance were much more likely to occur in the conversation with chatterbots than in human-oriented channels. In contrast, words denoting male-related topics were much less likely to occur in the chatterbot logs.

Consistently with the recent evidence on agent-induced disinhibition , content analysis addressing these topics indicates a lack of social constraints in the logs.

Speakers may talk more about social processes with chatterbots than with human beings because speakers are naturally curious to learn whether and in what ways the chatterbots are able to handle relationships, yet this curiosity tended to center on intimate relationships that, according to the prevailing social norms regulating face- to-face interaction, cannot be directly addressed when people first meet. Similarly, references to physical appearance were often of a sexual nature, showing that the Gender Affordances of Conversational Agents chatterbots, despite their human-like presentation, produced a lack of inhibition in the speaker.

7.2 Disinhibition

Looking at the hypothesis directly addressing disinhibition, the effect of gender becomes evident. Both analyses returned small but significant effects in the expected directions (Table 8). The female-presenting chatterbots were subjected to more sex talk, regardless of whether they were embodied or not, than were the male-presenting chatterbots. The content of these talks revolved around the sexual stereotypes associated with the chatterbot gender (e.g., interaction logs with female-presenting chatterbots contained many terms related to prostitution, and the interaction logs of male-presenting chatterbots contained many references to homosexuality). Many of these sexual references tended to be insulting and violent. Evidence of racial and ageist stereotypes were also found in the analysis.

7.3 Limitations

The study reported in this paper has a number of limitations, which need to be taken into account when considering the reliability and validity of its findings. These limitations are mainly related to a preference for maximizing ecological validity of the interaction setting: the corpora were collected on a website frequented by chatterbot amateurs. In this setting it would have been difficult to perform a fully controlled study without reducing the degree of naturalness in user interactions.

The main limitation concerns the stimuli. Other confounding variables, such as age and race, may have had an effect on the results. This was partially taken into Gender Affordances of Conversational Agents account in the post hoc survey studies we performed and the strict criteria we put into place for the selection of dyads. We allowed ethnicity and age to vary between our dyads as factors of the selection criteria because gender was the main focus of this analysis. We acknowledge that future research should be based on more controlled stimuli and should fully manipulate ethnicity and age for a better understanding of social interactions.

Another limitation concerns the testing procedure. To reduce disrupting the typical interaction setting at C&C Creations’ website, we thought it best to present one dyad at a time, and it had to be online for at least a couple of weeks to collect sufficient interaction logs. We recognize that presenting the dyads on different dates might introduce some unpredictable effects; however, we reasoned that displaying male/female pairs would further emphasize and reinforce the agent’s gender presentation and would be better than displaying all the embodiments together and overloading the user with choices not based on gender.

Yet another limitation of this study concerns the user. Methods need to be developed for collecting information about users generally so that we can begin to understand something about those who are more, or less, prone to abuse conversational agents. In this study we were unable to gather much information, as we were afraid that administering a survey at the time of interaction would reduce disinhibition and possibly make the chatterbot internet site suspect in the eyes of users. Although we were unable to collect information about visitors during the time of the interaction logs, we have no reason to believe that visitor demographics differed significantly from those currently reported by Alexa. Nonetheless, information about the gender of the users would have been desirable, as would have been demographic information about each speaker in a conversation. Gender Affordances of Conversational Agents

Finally, we acknowledge the high standard errors, as reported in Table 3, associated with the user logs that show considerable individual variability in this setting. This variability may be partially responsible for the weak gender effects evinced in the study. Yet, we feel the considerably high association between gender and sexual disinhibition is a result that deserves some attention.

8 CONCLUSION

This study investigated how an agent’s perceived gender shapes its conversational function and uses for users. We discovered that users follow stereotypical gender patterns when conversing with chatterbots that present as either male or female. These gender patterns tended mainly to affect the relational aspect rather than the referential aspect of the conversation. This may be due to multiple causes, including the limited linguistic capability of the chatterbot, which did not allow for proper content exchange, or perhaps including a lack of interest on the part of the user. Given current limitations in technology, we can conclude that the application of gender stereotypes in the interaction with chatterbots often leads to behavior that is more dismissive of women than of men. This study also provides additional support for the claim that chatterbots tend to afford disinhibited behavior in users .

These results, especially those dealing with sex and negative gender stereotyping, can be approached by the HCI community along at least three different lines: unsavory user interactions can be ignored, they can be examined with the intention of understanding negative outcomes, and they can be looked at in fresh ways, as potentially opening up new areas of research and application. At this point no one knows whether agent abuse is a harmful practice or a harmless activity. Gender Affordances of Conversational Agents

Research is needed that explores the consequences of allowing users to abuse agents.

Of pressing concern is determining whether and to what degree agent abuse has an impact on the abuse of human beings. A considerable body of social psychology research has demonstrated that viewing negative stereotypical depictions of men and women in the media has a negative impact on how people interact with and feel about the men and women they encounter in everyday life. For instance, Dill et al., performed a study where people were shown either pictures of professional men and women in suits or video game imagery that displayed men and women in highly stereotypical ways. The researchers found that males who were exposed to negative female stereotypes were significantly more tolerant of a real-life instance of sexual harassment and exhibited greater rape myth acceptance. Studies might be conducted that replicate these and similar investigations using conversational agents.

Research is also needed that explores whether agent abuse degrades the value of using these agents in business, health care, and educational software. To what degree does agent abuse detract from the intended use of the conversational agent?

Does permitting abuse reflect negatively on the institution hosting the agent? Studies are also needed that explore factors that trigger, promote, and discourage abusive behaviors. For example, is agent abuse a response, as is flaming , to frustrations caused by technological limitations and the agent's conversational shortcomings?

Moreover, research is needed to determine the best practices for deflecting unsavory user interactions with conversational agents .

We conclude by calling for researchers to recognize agent misuse and sex talk.

As technology permeates more and more of people's lives, HCI must open its eyes to the full range of human behavior. The user is no longer simply an employee in some office—but anyone, at any time, in any place, and in any state of mind. It is time to Gender Affordances of Conversational Agents explore the potential harms and opportunities that are being revealed in the ways people are actually interacting with technology.

9 ACKNOWLEDGMENTS

This research was funded in part by a 2006 Missouri State University faculty grant. We wish to thank Wendell Cowart for the use of Talk-Bot and some of the embodiments offered on C&C Creations' website. We also want to thank Cowart for hosting the agents and collecting the interaction logs used in this study.

10 REFERENCES

Allen, B. P. (1976). Race and physical attractiveness as criteria for white subjects' dating choices. Social behavior and Personality, 4, 289-296. Amalberti, R., Carbonell, N., & Falzon, P. (1993). User representations of computer systems in human- computer speech interaction. International Journal of Man-Machine Studies, 38, 547–566. Aquino, K., & Bradfield, M. (2000). Perceived victimization in the workplace: the role of situational factors and victim characteristics. Organisation Science, 11(5), 525-537. Ashmore, R. D., & Boca, F. K. d. (1981). Conceptual approaches to stereotypes and stereotyping. In D. L. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behavior (pp. 5–35.). Hillsdale: Erlbaum. Bandura, A. (1986). Social foundation of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall. Bar-Tal, D., & Saxe, L. (1976). Physical attractiveness and its relationship to sex-role stereotyping. Sex Roles, 2(123-133). Bartneck, C., Brahnam, S., De Angeli, A., & Pelachaud, C. (2008). Editorial for the special issue on misuse and abuse of interactive technologies Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems, 9(3), 397-402. Baumeister, R. F., & Twenge, J. M. (2002). Suppression of femal sexuality. Review of General Psychology, 6(2), 166-203. Baylor, A. L., & Kim, Y. (2004). Pedagogical agent design: The impact of agent realism, gender, ethnicity, and instructional role. Paper presented at the Intelligent Tutoring Systems, Maceió, Alagoas, Brazil. Baylor, A. L., & Kim, Y. (2005). Simulating instructional roles through pedagogical agents. International Journal of Artificial Intelligence in Education, 15(2), 95-115. Berger, J., & Fisek, M. H. (1974). A generalization of the theory of status characteristics and expectation states. In J. Berger, T. L. Conner & M. H. Fisek (Eds.), Expectation States Theory: A Theoretical Research Program (pp. 163-205). Cambridge, MA: Winthrop. Bickmore, T., Puskar, K., Schlenk, E., Pfeifer, L. S., & Erika, S. (2010). Maintaining reality: Relational agents for antipsychotic medication adherence. Interacting with Computers, 22(4), 276-288. Brahnam, S. (2005). Strategies for handling customer abuse of ECAs. Paper presented at the Interact Workshop On Abuse: The Darker Side of Human-Computer Interaction, Rome, Italy. Brahnam, S. (2006). Gendered bods and bot abuse. Paper presented at the CHI 2006 workshop Misuse and Abuse of Interactive Technologies, Montréal, Québec, Canada. Brahnam, S. (2009). Building character for artificial conversational agents: Ethos, ethics, believability, and credibility. PsychNology Journal, 7(1). Gender Affordances of Conversational Agents

Brahnam, S., & De Angeli, A. (2008). Special issue on abuse and misuse of social agents. Interacting with computers 20(3), 287-291. Burgess, D., & Borgida, E. (1999). Who women are, who women should be: Descriptive and prescriptive gender stereotyping and sex descrimination. Psychology, Public Policy, and Law, 5(3), 665-692. Carli, L. L. (1990). Gender, language, and influence. Journal of Personality and Social Psychology, 59(5), 941-951. Carli, L. L. (1999). Gender, interpersonal power, and social influence. Journal of Social Issues, 55, 81- 99. Carter, J., & Vangelisti, A. (2000). News, gender and power. New York: Routledge. Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., & Yan, H. (1999). Embodiment in conversational interfaces: Rea. Paper presented at the CHI 99, Pittsburgh, PA. Cassell, J., Bickmore, T., Campbell, L., Vilhjálmsson, H., & Yan, H. (2001). More than just a pretty face: Conversational protocols and the affordances of embodiment. Knowledge-Based Systems, 14, 55-64. Catrambone, R., Stasko, H., & Xiao, J. (2004). ECA as user interface paradigm. In Z. Ruttkay & C. Pelachaud (Eds.), From brows to trust: Evaluating embodied conversational agents (pp. 239- 267). Dordrecht, Boston, London: Kluwer Academic Publishers. Chung, C. K., & Pennebaker, J. W. (2007). The psychological function of function words. In K. Fiedler (Ed.), Social communication: Frontiers of social psychology (pp. 343-359). New York: Psychological Press. Creed, C., & Beale, R. (2008). Abusive interactions with emobodied agents. Interaction Studies: Social Behaviour and Communication in Biological and Artificial Systems, 9(3), 481–503. Darves, C., & Oviatt, S. (2004). Talking to digital fish. In Z. Ruttkay & C. Pelachaud (Eds.), From brows to trust: Evaluating embodied conversational agents (pp. 271-292). Dordrecht, Boston, London: Kluwer Academic Publishers. De Angeli, A. (2009). Ethical implications of verbal disinhibition with conversational agents. PsychNology Journal, 7(1), 49-57. De Angeli, A., & Brahnam, S. (2006). Sex stereotypes and conversational agents. Paper presented at the AVI 2006 workshop Gender and Interaction: Real and Virtual Women in a Male World, Venice, Italy. De Angeli, A., & Brahnam, S. (2008). I hate you: Disinhibition with virtual partners. Interacting with Computers, 20(3), 302-310. De Angeli, A., Brahnam, S., & Wallis, P. (2005). Abuse: The darker side of human computer interaction. Paper presented at the Interact, Rome, Italy. De Angeli, A., Brahnam, S., Wallis, P., & Dix, A. (2006). Misuse and abuse of interactive technologies. Paper presented at the CHI, Montreal, Québec,Canada. De Angeli, A., & Carpenter, R. (2005). Stupid computer! Abuse and social identity. Paper presented at the Interact 2005 workshop Abuse: The Dark Side of Human-Computer Interaction, Rome. De Angeli, A., Johnson, G. I., & Coventry, L. (2001). The unfriendly user: Exploring social reactions to chatterbots. Paper presented at the International Conference on Affective Human Factor Design, London. Dehn, D. M., & van Mulken, S. (2000). The impact of animated interface agents: A review of empirical research. International Journal of Human Computer Studies, 52(1), 1-22. Denegri-Knott, J., & Taylor, J. (2005). The labelling game: A conceptual exploration of deviance on the internet. Social Science Computer Review, 23, 93-107. DeWall, C. N., Altermatt, T. W., & Thompson, H. (2005). Understanding the structure of stereotypes of women: Virtue and agency as dimensions distinguishing female subgroups. Psychology of Women Quarterly, 29(4), 396-405. Dill, K. E., Brown, B. P., & Collins, M. A. (2008). Effects of exposure to sex-stereotyped video game characters on tolerance of sexual harassment. Journal of Experimental Social Psychology, 44, 1402-1408. Dindia, K., & Allen, M. (1992). Sex differences in self-disclosure: A metaanalysis. Psychological Bulletin, 112(1), 106-124. Dix, A. (2008). Response to "sometimes its hard to be a robot: A call for action on the ethics of abusing artificial agents". Interacting with computers 20(3), 334-337. Dow, B. J., & Wood, J. T. (Eds.). (2006). The handbook of gender and communication. Thousand Oaks, CA: Sage. Gender Affordances of Conversational Agents

Eagly, A. H. (1987). Sex differences in social behavior: A social-role interpretation. Hillsdale, NJ: Erlbaum. Eagly, A. H., Ashmore, R. D., Makhijan, M. G., & Longo, L. C. (1991). What is beautiful is good, but . . .: A meta-analytic review of research on the physical attractiveness stereotype. Psychological Bulletin, 110(1), 109-128. Eagly, A. H., & Mladinic, A. (1994). Are people prejudiced against women? Some answers from research on attitudes, gender stereotypes, and judgments of competence. European Review of Social Psychology, 5, 1-35. Epley, N., Waytz, A., & John Cacioppo, T. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864-886

Feingold, A. (1992). Good-looking people are not what we think. Psychological Bulletin, 111(2), 304- 341. Fischer, K. (2006). What computer talk is and isn't: Human–computer conversation as intercultural communication. Saarbrücken: AQ-Verlag. Foner, L. (1997). Entertaining agents: A sociological case study. Paper presented at the The Proceedings of the First International Conference on Autonomous Agents (AA '97), Marina Del Rey, CA. Forlizzi, J., Zimmerman, J., Mancuso, V., & Kwak, S. (2007). How interface agents affect interaction between humans and computers. Paper presented at the DPPI07. Friedman, H., & Zebrowitz, L. A. (1992). The contribution of facial maturity to sex-role stereotypes. Personality and Social Psychology Bulletin, 18(4), 430-438. Gillen, B., & Sherman, R. C. (1081). Physical attractiveness: A determinant of two types of goodness. Personality and Social Psychology Bulletin, 7, 277-281. Gottschalk, L. (1995). Content analysis of verbal behavior: New findings and clinical applications. Hillsdale, NJ: Erlbaum. Green, R. (1997). The united states. In D. West & R. Green (Eds.), Sociological control of homosexuality (pp. 145-167). New York: Plenum. Grenci, R. T., & Todd, P. A. (2002). Solutions-driven marketing. Communications of the ACM, 45, 65- 71. Gutek, B. A. (1985). Sex and the workplace: Impact of sexual behavior and harassment on women, men, and organizations. San Francisco, CA: Jossey-Bass. Hamilton, D. L., & Trolier, T. K. (1986). Stereotypes and stereotyping: An overview of the cognitive approach. In J. Dovidio & S. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 127- 163). Orlando, FL: Academic Press. Hamilton, M. C. (1991). Masculine bias in the attribution of personhood: People-male, male-people. Psychology of Women Quarterly, 15, 393-402. Heilman, M. E., & Saurwatari, L. R. (1979). When beauty is beastly: The effects of appearance and sex on evaluations of job applicants for managerial and nonmanagerial jobs. Organization Behavior and Human Decision Processes, 35, 202-215. Heilman, M. E., & Stopeck, M. H. (1985). Being attractive, advantage or disadvantage? Performance based evaluations and recommended personnel actions as a function appearance, sex, and job type. Organizational behavior and human decision processes(35), 202-215. Herring, S. (2000). Gender differences in cmc: Findings and implications. The Computer Professionals for Social Responsibility Newsletter, 18(1). Hinsz, V. B., & Tomhave, J. A. (1991). Smile and (half) the world smiles with you, frown and you frown alone. Personality and Social Psychology Bulletin, 17, 586-592. Holzwarth, M., Janiszewski, C., & Neumann, M. (2006). The influence of avatars on online consumer shopping behavior. Journal of Marketing Management, 70(4), 19-36. Imaz, M., & Benyon, D. (2007). Designing with blends: Conceptual foundations of human-computer interaction and software engineering Cambridge, MA: MIT Press. Jackson, L. (1992). Physical appearance and gender: Sociobiological and sociocultural perspectives. Albany: State University of New York Press. John, B. E., & Bass, L. (2001). Usability and software architecture. Behaviour & Information Technology, 20(5), 329-338. Johnstone, A., Berry, U., Ngyuen, T., & Asper, A. (1994). There was a long pause: Influencing turn- taking behaviour in human-human and human-computer spoken dialogues. International Journal of Human-Computer Studies, 41, 383–411. Gender Affordances of Conversational Agents

Joinson, A. (1998). Causes and Implications of Disinhibited Behaviour on the Net In J. Gackenbach (Ed.), Psychology and the Internet: Intrapersonal, Interpersonal, and Transpersonal Implications (pp. 43-60). New York: Academic Press. Kannabiran, G., Bardzell, J., & Bardzell, S. (2011). How hci talks about sexuality: Discursive strategies, blind spots, and opportunities for future research. Paper presented at the CHI 2011 session HCI for All, Vancouver, BC, Canada. Keating, C. F. (2002). Charismatic faces: Social status cues put face appeal in context. In G. Rhodes & L. A. Zebrowitz (Eds.), Facial Attractiveness (pp. 153-192). Westport, CT and London: Ablex. Khan, R., & De Angeli, A. (2009). The attractiveness stereotype in the evaluation of embodied conversational agents. Paper presented at the Interact 2009. Khoo, P. N. (2004). Not wanted in the inbox!: Evaluation of unsolicited and harassing e-mail. Psychology of Women Quarterly, 28(3), 204-214. Kiesler, S., & Sproull, L. (1992). Group decision making and communication technology. Organizational Behavior and Human Decision Processes Group Decision Making, 52(1), 96- 123. Kleck, R. E., & Rubenstein, C. (1975). Physical attractiveness. perceived attitude similarity, and interpersonal attraction in opposite-sex encounter Journal of Personality and Social Psychology, 31, 107-114. Konijn, E. A., Utz, S., Martin, T., & Barnes, S. B. (Eds.). (2008). Mediated interpersonal communication New York: Routledge. Krämer, N., Bente, G., & Piesk, J. (2003). The ghost in the machine. the influence of embodied conversational agents on user expectations and user behaviour in a tv/vcr application. Paper presented at the MC Workshop 2003, Assistance, Mobility, Applications. Lakoff, R. T. (1975). Language and women's place. New York: Harper and Row. Langlois, J. H., Kalakanis, L., Rubenstein, A. J., Larson, A., Hallam, M., & Smoot, M. (2000). Maxims or myths of beauty? A meta-analytic and theoretical review. Psychological Bulletin, 126(3), 390-423. LaPlante, M. N., McCormick, N., & Brannigan, G. G. (1980). Living the sexual script: college students' views of influence in sexual encounters. Journal of Sex Research, 16(4), 338-355. Larkin, J., & Popaleni, K. (1994). Heterosexual courtship violence and sexual harassment: The private and public control of young women. Feminism & Psychology, 4, 213–227. Laurel, B. (1990). Interface agents: Metaphors with character. In B. Laurel (Ed.), The art of human- computer interface design (pp. 355-366). Reading, MA: Addison-Wesley Publishing Company. Lea, M., O'Shea, T., Fung, P., & Spears, R. (1992). Flamming in computer-mediated communication. Observations, explinations, implications. In M. Lea (Ed.), Contexts of computer-mediated communication. New York: Harvester Wheatsheaf. Lee, E. (2003). Effects of gender of the computer on informational social influence: the moderating role of task type. International Journal of Human-Computer Studies, 58, 347-361. Lester, J. C., Voerman, J. L., Towns, S. G., & Callaway, C. B. (1997). Cosmo: A life-like animated pedagogical agent with deictic believability Retrieved 2001, from dfki.unisb.de/pub/anina/lester.ps Lott, B., Reilly, M. E., & Howard, D. R. (1982). Sexual assault and harassment: A campus community case study. Signs, 8, 296–319. Mackie, D. M., Hamilton, D. L., Susskind, J., & Rosselli, F. (1996). The social psychology foundations of stereotype formation. In C. N. Macrae, C. Stangor & M. Hewstone (Eds.), Stereotypes and stereotyping (pp. 41-78). New York: The Guilford Press. Maes, P., Guttman, R. H., & Moukas, A. G. (1999). Agents that Buy and Sell. Communications of The ACM, 42, 81-91. Martin, K. A. (1998). Becoming a gendered body: Practices of preschools. American Sociological Review, 63(4), 494-511. Mauldin, M. L. (1994). Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. Paper presented at the Proceedings of AAAI '94 Conference, Seattle. McBreen, H., Shad, P., Jack, M., & Wyard, P. (2000). Experimental assessment of the effectiveness of synthetic personae for multi-modal e-retail applications. Paper presented at the Proceedings of the Fourth International Conference on Autonomous Agents. Miksch, S., Cheng, K., & Hayes-Roth, B. (1997). An intelligent assistant for patient health care. Paper presented at the Proceedings of the First International Conference on Autonomous Agents. Gender Affordances of Conversational Agents

Morel, M.-A. (1989). Computer-human communication. In M. Taylor, F. Neel & D. Bouhuis (Eds.), The Structure of Multimodal Communication (pp. 323-330). Amsterdam: North-Holland Elsevier. Moreno, K. N., Person, N. K., Adcock, A. B., Eck, R. N. V., Jackson, G. T., & Marineau, J. C. (2002). Etiquette and efficacy in animated pedagogical agents: the role of stereotypes. Paper presented at the AAAI Symposium on Personalized Agents, Cape Cod, MA. Moreno, R., Mayer, R., Spires, H., & Lester, J. (2001). The case for social agency in computer-based teaching: Do students learn more deeply when they interact with animated pedagogical agents? Cognition and Instruction, 19(2), 177-213. Morishima, Y., Benette, C., Nass, C., & Lee, K. M. (2002). Effects of voice gender, user gender, and product gender on credibility in e-commerce. Unpublished manuscript., (Unpublished manuscript). Standford University, CA. Morton, H., McBreen, H., & Jack, M. (2004). Experimental evaluation of the use of ecas in commerce applications. In Z. Ruttkay & C. Pelachaud (Eds.), From brows to trust: Evaluating embodied conversational agents (pp. 293-321). Dordrecht, Boston, London: Kluwer Academic Publishers. Mulac, A., Lundell, T. L., & Bradac, J. J. (1986). Male/female language differences and attributional consequences in public speaking situation: Toward an explanation of the gender-linked language effect. Communication Monographs, 53, 115-129. Murnen, S. K., & Smolak, L. (2000). The experience of sexual harassment among grade-school students: Early socialization of female subordination? Sex Roles, 43, 1–17. Nass, C., Isbister, K., & Lee, E.-J. (2000). Truth is beauty: Researching embodied conversational agents. In J. Cassell, J. Sullivan, S. Prevost & E. Churchill (Eds.), Embodied Conversational Agents (pp. 374-402). Cambridge, MA: The MIT Press. Nass, C., Moon, Y., & Green, N. (1997). Are computers gender-neutral? Gender stereotypic responses to computers. Journal of Applied Social Psychology, 27, 864-876. Norman, D. A. (1988). The Design of Everyday Things. New York: Doubleday. Norris, P. (1997). Women, media and politics. Oxford and New York: Oxford Unveristy Press. Nowak, K. L., & Rauh, C. (2006). The influence of the avatar on online perceptions of anthropomorphism, androgyny, credibility, homophily, and attraction. Journal of Computer- Mediated Communication, 11(1), 153-178. O'Sullivan, L. F., & Allgeier, E. R. (1998). Feigning sexual desire: Consenting to unwanted sexual activity in heterosexual dating relation ships. Journal of Sex Research, 35, 234-243. Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19–35. Patzer, G. L. (2006). The power and the paradox of physical attractiveness. Pelachaud, C. (2009). Modelling multimodal expression of emotion in a virtual agent. Philosophical Transactions of Royal Society Biological Science, 2009(B), 3539-3548. Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The Development and Psychometric Properties of LIWC2007: LIWC Manuals. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our Words, Our Selves. Annual Review of Psychology, 54(1), 547-577. Postmes, T., Spears, R., & Lea, M. (1998). Breaching or building social boundaries? SIDE-effects of computer-mediated communication. Communication Research, 26(6), 689-715. Powell, G. N. (1986). Effects of sex role identity and sex on definitions of sexual harassment. Sex Roles, 14(1-2), 9-19. Reeves, B., & Nass, C. I. (1996). The media equation: How people treat computers, television, and new media like real people and places. Stanford, CA: CSLI Publications and Cambridge University Press. Reilly, M. E., Lott, B., & Gallogly, S. M. (1986). Sexual harassment of university students. Sex Roles, 15, 333-358. Richards, M., & Underwood, K. (1984). How should people and computers speak to each other? Paper presented at the Interact 1984. Rickenberg, R., & Reeves, B. (2000). The effects of animated characters on anxiety, task performance, and evaluations of user interfaces. Paper presented at the Proceedings of the CHI 2000 Conference on Human Factors in Computing Systems, Hague, Netherlands. Ridgeway, C. L., & Bourg, C. (2004). Gender as status: An expectation states theory approach. In A. H. Eagly, A. E. Beall & R. J. Sternberg (Eds.), The psychology of gender. New York: Guilford. Gender Affordances of Conversational Agents

Roter, D. L., Hall, J. A., & Aoki, Y. (2002). Physician gender effects in medical communication: A meta-analytic review. Journal of the American Medical Association, 288, 756-764. Ruttkay, Z., & Pelachaud, C. (Eds.). (2004). From brows to trust: Evaluating embodied conversational agents. Dordrecht, Boston, London: Kluwer Academic Publishers. Schneider, D. J. (2004). The psychology of stereotyping. New York: The Gulliford Press. Sengers, P., MacCarthy, J., & Dourish, P. (2006). Reflective HCI: Articulating an agenda for critical practice. Paper presented at the CHI '06, Quebec Canada. Shneiderman, B. (1983). Direct manipulation: A step beyond programming languages. IEEE Computer, 16(8), 57-68. Spence, J. T., & Helmreich, R. L. (1978). Masculinity and femininity: Their psychological dimensions, correlates, and antecedents. Austin: University of Texas Press. Sproull, L., Subramani, R., Kiesler, S., Walker, J. H., & Waters, K. (1996). When the interface is a face. Human Computer Interaction, 11, 97-124. Steckler, N. A., & Rosenthal, R. (1985). Sex differences in nonverbal and verbal communication with bosses, peers, and subordinates. Journal of Applied Psychology, 70, 157-163. Stevenage, V., & McKay, Y. (1999). Model applicants: The effect of facial appearance on recruitment decisions. British Journal of Psychology, 90(2), 221. Struckman-Johnson, C., Struckman-Johnson, D., & Anderson, P. B. (2003). Tactics of sexual coercion: When men and women won't take no for an answer. Journal of Sex Research, 40(1), 76-86. Svennevig, J. (1999). Getting acquainted in conversation: A stuy of initial interactions. Amsterdam: John Benjamins Publishing Company. Tannen, D. (1998). Talk in the intimate relationship: His and hers. In J. Coates (Ed.), Language and gender: A reader (pp. 435-445). Oxford, UK: Blackwell Publishers Ltd. Thimbleby, H. (2008). Robot ethics? Not yet. A reflection on Whitby's "sometimes its hard to be a robot". Interacting with computers 20(3), 338-341. Timmerman, G. (2003). Sexual harassment of adolescents perpetuated by teachers and by peers: An exploration of the dynamics of power, culture, and gender in secondary schools. Sex Roles, 48(5/6), 231–244. Trautschold, M., Ritchie, R., & Mazo, G. (2012). Multitasking and Siri. In M. Trautschold, R. Ritchie & G. Mazo (Eds.), Iphone 4S Made Simple (pp. 179-188). New York: Apress. Turkle, S. (1997). Life on the screen. New York: Touchstone. Veletsianos, G., Scharber, C., & Doering, A. (2008). When sex, drugs, and violence enter the classroom: Conversations between adolescents and a female pedagogical agent. Interacting with Computers, 20(3), 292-301. Weizenbaum, J. (1966). ELIZA - a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9, 36-45. Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. San Francisco: W. H. Freeman and Company. Welle, B., & Heilman, M. E. (2007). Formal and informal discrimination against wormen at work: The role of gender stereotypes. In S. W. Gilliland, D. D. Steiner & D. P. Skarlicki (Eds.), Managing social and ethical issues in organizations (pp. 135-252): Information Age Publishing Inc. West, C. (1998). When the doctore is a 'lady': Power, status and gender in physician-patient encounters. In J. Coates (Ed.), Language and gender: A reader. Malden, MA: Blackwell Publishers. Whitby, B. (2008). Sometimes its hard to be a robot: A call for action on the ethics of abusing artificial agents. Interacting with computers 20(3), 326-333. Williams, J. E., & Bennett, S. M. (1975). The definition of sex stereotypes via the adjective check list. Sex Roles, 1(4), 327-337. Williams, J. E., & Best, D. L. (1990). Measuring sex stereotypes: A multination study. Beverly Hills, CA: Sage. Williams, J. E., Satterwhite, R. C., & Best, D. L. (1999). Pancultural gender stereotypes revisited: The five factor model. Sex Roles, 40, 513-525. Wilson, D. W. (1978). Helping behavior and physical attractiveness. Journal of Social Psychology, 104, 313-314. Wolf, N. (1991). The beauty myth: How images of beauty are used against women. New York: Anchor Books. Zdenek, S. (2003). Artificial intelligence as a discursive practice: The case of embodied software agent systems. AI & Society, 17(3-4), 340-363. Gender Affordances of Conversational Agents

Zimbardo, P. G. (1969). The human choice: Individuation, reason, and order vs. deindividuation, impulse, and chaos. In W. J. Arnold & D. Levine (Eds.), Nebraska Symposium on Motivation (pp. 237-307): Lincoln: University of Nebraska Press.