<<

THE WORLD IS NOT ENOUGH: TRUST IN COGNITIVE AGENTS

by

Ewart Jan de A Dissertation Submitted to the Graduate Faculty of George Mason University in Partial Fulfillment of The Requirements for the Degree of Doctor of Philosophy Psychology

Committee:

Director

Department Chairperson

Program Director

Dean, College of Humanities and Social Sciences

Date: Summer Semester 2012 George Mason University Fairfax, VA The World Is Not Enough: Trust in Cognitive Agents

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University

By

Ewart Jan de Visser Master of Arts George Mason University, 2007 Bachelor of Arts University of North Carolina at Wilmington, 2005

Director: Raja Parasuraman, University Professor Department of Psychology

Summer Semester 2012 George Mason University Fairfax, VA Copyright c 2012 by Ewart Jan de Visser ￿ All Rights Reserved

ii Dedication

I dedicate this dissertation to Jos, Olga, Emmy, and Yorick. Your unconditional love and trust has kept me going. De raderen draaien...nog steeds.

iii Acknowledgments

My FILM 101 teacher told me on the first day of class that creating a movie constitutes a small miracle. It is no different with a dissertation. Many of the ideas, methods, and procedures contained in this work were conceived, adjusted, and refined after many discussions with a number of people. I have been blessed to be surrounded by such a group of loving, caring, and supportive people throughout my graduate studies, and I would like to mention these individuals here with a few words. Raja, you have supported me in every way possible throughout my Ph.D. I cannot imagine a better advisor. Thank you for your patience and support at all times and for setting the bar high. Thank you for showing me gradualness, modesty, and passion in science. I am also pleased to report that I have met your three unofficial Ph.D. goals, including eating your delicious spicy food, publishing one first authored publication in a respectable journal, and beating you once in pool. Frank, it was serendipity that you arrived at George Mason at the time of my proposal writings on trust. You have directed me to all the neuroeconomic aspects of trust, which has greatly enhanced my work. Thanks for all your crucial help in finalizing my proposal and critical comments on my work. I have enjoyed our many idea-filled meetings at the Jazzman cafe. Patrick, thanks for the many scientific and personal discussions on the philosophy of science, trust, and a host of other topics near the JC bike rack. Thanks for pushing me towards R and Latex in the end; my thesis rests on its structure. Tyler, thanks for all the daily soulful conversations, our frequent ice-tea runs with Delicious Dutch Delectables, frequent encouragement to emerge from the fog and get to the story, numerous visits to the man-cave, dissertation writing sessions at Bus Boys and Poets, award winning movie making, all the fun times in San Francisco, Miami, and Vegas, and for being a true friend. I would like to thank the following Arch Lab members and research assistants who have helped me with developing the materials for this dissertation, collecting and coding data, and helped me to develop the software to run . These include Steve Scheid, Stephanie Chalk, Laya Muralidharan, Rizhna Chener, Andreina Daza, Andres Arredondo, Saman Berenji, Lara Moody and Melissa Smith. I would also like to thank Peter-Paul van Maanen, Peter Twieg, Adam Emfield, and Bill Miller for assistance in programming the experimental task. Furthermore, I would like to thank Asaf Beasley for providing the AMP task and Kevin Young for providing the Balloon Task. To the The Fisherman Circle, you are my foundation. To my dad, thank you for keeping me on track to finish, reading my writings, and promoting my work to others. To my mom, thanks for the for reminders to get the PhD done, the wonderful time at the Ole Mink farm, frequent visits to the USA with dad, and even making my graduation goal your password (please change it now). To my sister, thank you for being a golden angel and keeping me

iv on the ground. To my lil bro, thanks for providing humorous reviews, notifying me of the latest I-phone games on my breaks, and completing your awesome thesis ahead of me. To Jasmine, thank you for daily morning technical and emotional phone support, your mangos, melons, and lentils, our late night skype writing nights, and the fun classes we shared together for distraction and relaxation. To Minhaj, thank you for the writing sessions in the Caribou coffee shop in the early stages of my dissertation. Thank you for listening with enthusiasm to the formation of my ideas on this topic, showing and reminding me how to remain a Homo Ludens, and explaining to me how desis became the greatest nation on earth. Finally, I would like to thank the American Psychological Association (APA) for awarding me a Dissertation Research Award that supported this work and the grant FA9550-10-1-0385 from the Air Force Office of Scientific Research.

The creation of something new is not accomplished by the intellect but by the play instinct acting from inner necessity. - Carl Jung

v Table of Contents

Page ListofFigures...... viii Abstract...... x 1TrustinCybersocieties...... 1 1.1 Introduction...... 1 1.2 Human-Human Trust ...... 3 1.2.1 Threat Policy and Action ...... 5 1.2.2 Threat Assessment ...... 5 1.2.3 Trust Evolution ...... 7 1.3 Human-Automation Trust ...... 8 1.3.1 Agent Belief ...... 9 1.3.2 Agent Experience ...... 9 1.4 Human-Human Trust vs. Human-Automation Trust ...... 10 1.4.1 Agent Belief ...... 11 1.5 AgentExperience...... 14 1.5.1 Task Difficulty ...... 14 1.5.2 Etiquette ...... 14 1.5.3 TrustRepair ...... 15 1.5.4 Feedback ...... 16 1.5.5 Level of Automation Assistance ...... 16 1.5.6 Trust Evolution ...... 17 1.5.7 Individual Differences ...... 17 1.6 The Cognitive Agent Experiments ...... 18 1.6.1 TheCurrentStudy...... 19 2 MethodsandResults ...... 23 2.1 Experiment1 ...... 23 2.1.1 Method ...... 23 2.1.2 Results ...... 30 2.1.3 Discussion...... 38 2.2 Experiment2 ...... 38

vi 2.2.1 Method ...... 38 2.2.2 Results ...... 39 2.2.3 Discussion...... 47 2.3 Experiment3 ...... 47 2.3.1 Method ...... 47 2.3.2 Results ...... 48 2.3.3 Discussion...... 53 2.4 Experiment4 ...... 54 2.4.1 Method ...... 54 2.4.2 Results ...... 55 2.5 Discussion...... 62 3 GeneralDiscussion...... 64 3.1 Human-Human Trust and Human-Automation Trust are Different . . . . . 64 3.2 Agent Belief interacts with Agent Experience ...... 66 3.3 The Cognitive Agent Spectrum Is Useful for Categorizing Agents ...... 67 3.4 Limitations ...... 68 3.5 Implications for Design and Training ...... 69 3.6 FutureResearch ...... 69 3.7 Conclusion ...... 70 A BenevolentStories ...... 72 A.1 Computer Agent Story: PC-Check ...... 72 A.2 Avatar Agent Story: TaxFiler ...... 73 A.3 Human Agent Story: Matt O’Shea ...... 75 B MalevolentStories ...... 77 B.1 Computer Agent Story: SecCrack ...... 77 B.2 Avatar Agent Story: TXTSPY ...... 78 B.3 Human Agent Story: Matt O’Shea ...... 80 References ...... 82

vii List of Figures

Figure Page 1.1 The Eastern Scheldt storm surge barrier (Dutch: “Oosterscheldekering”). . 3 1.2 Advanced organizer: Trust threat taxonomy ...... 4 2.1 Theexperimentaldesign...... 24 2.2 The Cognitive Agent Spectrum ...... 25 2.3 The Cognitive Agents ...... 26 2.4 The TNO Trust Task (T 3)...... 28 2.5 Agent by reliability for pattern recognition performance in Experiment 1. The red dashed line indicates chance performance on the task...... 31 2.6 Agent by reliability for overall performance in Experiment 1...... 32 2.7 Agent by reliability for compliance in Experiment 1. The red triangles indicate calibrated performance...... 33 2.8 Agent by reliability for self-reliance in Experiment 1...... 35 2.9 Agent by reliability for trust in Experiment 1. The red triangles indicate calibratedtrust...... 36 2.10 Agent by time for perceived morality in Experiment 1...... 37 2.11 Agent by reliability for overall performance in Experiment 2. The red dashed line indicates chance performance on the task...... 41 2.12 Agent by reliability for compliance in Experiment 2. The red triangles indicate calibrated performance...... 42 2.13 Agent by reliability for self-reliance in Experiment 2...... 43 2.14 Agent by reliability for trust in Experiment 2. The red triangles indicate calibratedtrust...... 44 2.15 Agent by Time for perceived security in Experiment 2...... 45 2.16 Agent by Time for perceived morality in Experiment 2...... 46 2.17 Agent by reliability for overall performance in Experiment 3...... 49 2.18 Agent by reliability for compliance in Experiment 3. The red triangles indicate calibrated performance...... 50

viii 2.19 Agent by reliability for self-reliance in Experiment 3...... 51 2.20 Agent by reliability for trust in Experiment 3. The red triangles indicate calibratedtrust...... 52 2.21 Agent by reliability for pattern recognition performance in Experiment 4. The red dashed line indicates chance performance on the task...... 56 2.22 Agent by reliability for overall performance in Experiment 4...... 57 2.23 Agent by reliability for compliance in Experiment 4. The red triangles indicate calibrated compliance...... 58 2.24 Agent by reliability for self-reliance in Experiment 4...... 59 2.25 Agent by reliability for trust in Experiment 4. The red triangles indicate calibratedtrust...... 60 2.26 Agent by Time for perceived security in Experiment 4...... 61 2.27 Agent by Time for perceived morality in Experiment 4...... 62 A.1 IntroductionPictureofPC-Check...... 72 A.2 IntroductionPictureofTaxFiler...... 73 A.3 Introduction Picture of Matt O’Shea...... 75 B.1 IntroductionPictureofSecCrack...... 77 B.2 IntroductionPictureofTXTSPY...... 78 B.3 IntroductionPictureofMattO’Shea...... 80

ix Abstract

THE WORLD IS NOT ENOUGH: TRUST IN COGNITIVE AGENTS Ewart Jan de Visser, PhD George Mason University, 2012 Dissertation Director: Raja Parasuraman

Models describing human-automation interaction propose that trust in machines is uniquely different from trust in humans. Alternatively, some claim these differences disappear if machines are designed in likeness of humans. With the advent of cognitive agents, entities that are neither machine nor human, this dichotomous distinction may no longer be appropriate. Four experiments were designed to examine these opposing viewpoints directly by varying the humanness of agents on a cognitive-agent spectrum. In a controlled version of an automated decision-aid paradigm, agents provided recommendations on a pattern recognition task. Results showed that in early trials, compliance was higher with machine-like agents compared to human-like agents. This difference disappeared as familiarity with agents increased. When agents were portrayed as competitive, trust and compliance decreased for human-like agents compared to cooperative human-like agents, but no such differences were observed between machine-like agents. Immediate trial-by-trial performance feedback resulted in more appropriate trust and compliance compared to delayed cumulative performance feedback. Humane feedback that was either encouraging on high performance trials and apologetic in low performance trials resulted in higher compliance with agents compared to sessions without such feedback. These findings suggest that differences between cognitive agents are strong in novel, ambiguous situations, but disappear with direct performance observations. Decision aids may require a machine-like design if high performance can be guaranteed, but may need to exhibit human-like features when the outcome is uncertain and initial expectations need to be lowered. This work presents a step towards a general human-agent trust theory that can account for observed differences between types of agents. Chapter 1: Trust in Cybersocieties

1.1 Introduction

The distinction between humans and machines has begun to blur in the new developing cyber society that we live in. As a result, we have all become cognitive agents,entities in a vast and largely anonymous space that may be human, machine or something else that pretends to have intelligence. For example, Siri, an intelligent voice-activated search application on Apple’s iOS, exhibits a friendly, but edgy personality in her replies to queries

(Apple, 2012). Technological agents such as these are being created everywhere to be our assistants (Honda, 2012), companions (Breazeal, 2002), and perhaps even our lovers

(Meadows, 2010).

These developments are changing our relationships with technology and force us to re-examine the fundamental question of what makes us human as compared to these novel agents. It has become harder for people to detect whether they are dealing with other humans and it has become easier for hackers to create software that can imitate humans and pass the Turing Test (Turing, 1950). For instance, website administrators routinely ask users to prove they are human because they cannot distinguish between a user and a computer attempting to gain access. Cyberspace has provided the anonymity that commonly strips humans of their defining features and has given computers more power to pretend to be human. With our increasing dependence on virtual technology, humans and organizations have become more vulnerable to security breaches, identify thefts and other malicious activities designed to do harm. This vulnerability will likely grow larger with the exponential increase in intelligent automated systems in the next few decades

(Kurzweil, 2005), creation of sophisticated intentional worms like Stuxnet (Sanger, 2012) and casual hacker groups such as Anonymous (Olson, 2012). Already the FBI, Pentagon, 1 and the European Union have erected divisions to deal with the new cyber threats facing ordinary citizens (Europol, 2012).

Unknown malicious agents may pose a threat, but many similarly unfamiliar agents have been created to help people. Google cars are driving around autonomously in Nevada, robots have been created to do a number of household chores, and drones are being used for reconnaissance, airstrikes, and search and rescue. It is thus important for survival to distinguish between those that intend harm and those who are good natured (Axelrod

& Hamilton, 1981). Humans develop trust in agents as a way to accomplish this goal in situations characterized by uncertainty, interdependence, and threat level. Correctly trusting or distrusting an agent can prevent serious personal damage in the case of a malevolent agent or prevent precious energy from being expended to protect oneself in the case of a benevolent agent. Trust is like a cell membrane that protects the cell from outside forces by selectively letting in substances, or like the famous flexible Dutch dam that can close its adjustable gates during storms to protect the Netherlands from flooding (see Fig.

1.1). Humans have developed ways to adjust their trust “gates” and protect themselves from harmful agents. These adjustments are largely based on the level of information available about an agent, the resources available to collect information, and the general potential harm of a particular situation.

2 Figure 1.1: The Eastern Scheldt storm surge barrier (Dutch: “Oosterscheldekering”).

1.2 Human-Human Trust

Trust has been a topic that has been studied for several decades (Barber, 1983). In

1983, Barber discussed the difficulties of understanding and distinguishing it from the everyday use of the term. The study of human-human trust is further complicated by the numerous disciplines including psychology, marketing, political science, economics, artificial intelligence, and more recently neuroscience that each have their own theories and largely operate independently of one another. To illustrate, trust has been simultaneously defined as an attitude (Lee & See, 2004; Mayer, Davis, & Schoorman, 1995), a personality trait

(Rotter, 1980), an expectation (Barber, 1983; Rempel, Holmes, & Zanna, 1985), a behavior

(Fehr, 2005), a belief (de Visser & Krueger, in press), and a pattern of brain activation

(Adolphs, 2002; Krueger et al., 2007). A neurological, evolutionary inspired advanced organizer is presented in Figure 1.2 to reconcile some of these disparate and seemingly conflicting views. To further simplify the study of trust, we adopted a working definition of

3 trust based on psycho-logic, a formal technical language for psychology based on axioms, definitions, and logically derived theorems (Smedslund, 1988, Smedslund, 1997):

Definition 1. P in S at t thinks O will not do anything bad to P.

where:

P = the trustor

S = a situation

t = a point in time

O=thetrustee

Trust is a tailored policy of who to protect against and who not to protect against.

Trust can be broken down into actions, threat policy, recognition-based threat assessment, and experience-based threat assessment.

.,/)0' 9)2=$&$' (#)&$,&'#$1)7#,$1' 6"%#$'#$1)7#,$1'

D+1%3#$$' .3#$$' 9))=$#%&$' L$#+M,%/)0' 4$"%A+)#' <:$,7/)0'

H$A$*'G' !"#$%&'()*+,-'

;=$08!#71& ' ''D$5%7*&' ' '''''''''''''''9*)1$8D+1G&' Trust Evolution

!"#$#""#%&'()*& G0B+A+B7%*'D+I$#$0,$1' ($#,$+A$B'K+1F' 4$"%A+)#'G0&$0&'

D$,+1+)0'E%F+03' >E$2)#-?'<:=$#+$0,$?'J$0$/,1C'

!"#$%&'.11$112$0&'

H$A$*'GG' .3$0&'4$*+$5' 6+&7%/)089)0&$:&8;7&,)2$' 9)2=#$"$01+)0'

.3$0&'<:=$#+$0,$' >2)0+&)#+03?')@1$#A%/)0?'$A+B$0,$C' H$A$*'GGG' ($#,$=/)0'

Figure 1.2: Advanced organizer: Trust threat taxonomy

4 1.2.1 Threat Policy and Action

Actions associated with trust are cooperation (Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr,

2005), agreement (Lee & Moray, 1992) and sharing of resources (Molm et al., 2009). When trust exists between two parties, initiatives are taken, promises are made and kept, and access is given to information. Without trust, or with distrust, there is generally suspicion, lack of initiative, avoidance and dissatisfaction. Conditions necessary for trust to arise include a level of uncertainty, risk, a level of interdependence, and a choice between at least two alternatives (Rousseau, Sitkin, & Burt, 1998; Lee & See, 2004).

In the case that no information about an agent is available, individuals may default to their individual preference for risk and set their threat policy accordingly. Preferred risk is the level of risk the trustor is willing to tolerate in a given situation. In this context, trust can be seen as a personality trait (Kramer, 1999) which can be measured as the propensity to trust (Mayer et al., 1995; Merritt & Ilgen, 2008) or risk propensity (Weber, Blais, &

Betz, 2002; Lejuez et al., 2002). Traits of the trustor, or individual trust differences, can explain up to 52% of the variance (Merritt & Ilgen, 2008). Other individual differences may come from personal experience and cultural influences that have shaped general beliefs in the trustor (Hofstede, 1984). With additional information about an agent, the trustor may engage in recognition-based or experience-based threat assessment.

1.2.2 Threat Assessment

Much theorizing and classification has focused on the types of information that inform trust.

As Lee & See (2004) have summarized, this can be done at various levels of abstraction using generally three categories: 1) performance, which refers to the observable and beneficial actions to the trustor, 2) process, which is the consistency and manner of those actions over time, and 3) purpose, which is the positive or negative intent of the trustee towards the trustor. Barber (1983) refers to these categories as competence, predictability, and responsibility. Mayer et al. (1995) calls them ability, integrity, and benevolence. Rempel

5 et al. (1985) calls them predictability, dependability, and faith. Trustors may engage in two separate methods for processing these types of information for threat assessment; recognition-based or experience-based. Prior to interacting with an agent, an initial agent belief may be formed about the agent to be trusted. Beliefs about an agent can be formed from indirect information, stereotypes, reputation, and appearance to determine purpose, process, and performance. Showing initial positive intent towards the trustor by means of a story increases initial trust while negative or malicious intent will decrease initial trust

(Boudreau, McCubbins, & Coulson, 2009; Delgado, Frank, & Phelps, 2005; Merritt & Ilgen,

2008).

Recognition-based Threat Assessment

Recognition-based threat assessment is a rapid affective evaluation of this available information.

The idea is that people conserve resources by taking short cuts in decision-making known as heuristics (Gigerenzer, Todd, & Group, 2000). Explanatory mechanisms for this type of assessment include the affect heuristic (Slovic, Finucane, Peters, & MacGregor, 2007), the risk-as-feelings hypothesis (Loewenstein, Weber, Hsee, & Welch, 2001), and the somatic-marker hypothesis (Damasio, 1994). These mechanisms focus on the meaning derived from the feeling and emotions generated by the information available. One example of this research is that vivid images that are quickly affectively labeled can give strong emphasis to the outcome of an event. In addition, probabilities are only evaluated affectively when they can be affectively labeled. Recent neuro-imaging research has identified the amygdala as the crucial brain region of the social approach/withdrawal circuit and its role in threat appraisal

(Baas, Aleman, & Kahn, 2004; Brierley, Shaw, & David, 2002). Automatic engagement of the amygdala has been shown during evaluation of the trustworthiness of faces (Winston,

Strange, O’Doherty, & Dolan, 2002). Further fMRI studies have demonstrated that the amygdala responded the strongest to both highly trustworthy and highly untrustworthy faces (Said, Baron, & Todorov, 2009) and the amygdala response is a better predictor of trustworthiness than an individual’s own judgments (Engell, Haxby, & Todorov, 2007).

6 Patients with complete bilateral amygdala damage judged unfamiliar individuals to be more trustworthy and more approachable than did control subjects (Adolphs, Tranel, & Damasio,

1998). Interestingly, the impairment was most striking for faces to which normal subjects assign the most negative ratings: untrustworthy and unapproachable looking individuals.

Experience-based Threat Assessment

Experience-based threat assessment is a slower, more deliberate evaluation of the risk and benefit with respect to a trustee based on available and directly observable information.

This is akin to analytical, calculus-based trust, or conditional trust (Lewicki, Tomlinson,

& Gillespie, 2006; Krueger et al., 2007), which is a tit-for-tat calculated expected value analysis. The purpose, process, and performance of a trustee are observed and learned over time. The perceived risk is informed by this information which in turn informs the trust policy. Neuro-imaging research has identified the caudate nucleus as an area sensitive to learning this type of observed social information over time. The trust game is used in neuroeconomics to investigate these types of social learning. The trust game involves two players. The first player is awarded a sum of money and can choose to share some of this money to a trustee. In turn, the trustee can return some of the funds which are then tripled by the experimenter. The trustor can gain more money by investing in the trustee, but can also loose more money if the trustee decides to keep the money and shares nothing.

For instance, King-Casas et al. (2005) found that the caudate nucleus was activated in early rounds reactively just after a trustor had made the investment decision. However, in later rounds of the game this signal had moved to 14 seconds earlier making it a predictive signal. This observed signal matches reinforcement learning models (Montague, King-Casas,

& Cohen, 2006).

1.2.3 Trust Evolution

Several trust researchers have recognized that trust is a cyclical process and repeats itself over many cycles (Lewicki et al., 2006; Lee & See, 2004; Rousseau et al., 1998). For

7 example, Rousseau et al. (1998) proposed three phases of trust including the building phase, the stability phase, and the dissolution phase. Others have proposed that people shift between trust and distrust which has been useful in classifying different sampling behaviors of evidence (Muir & Moray, 1996; Lewicki et al., 2006; Jian, Bisantz, & Drury,

2000). The speed of trust has been found to vary. Swift trust can be established quickly with a team of experts who rely on rapid assessment of trustworthiness cues (Meyerson,

Weick, & Kramer, 1996). Lee & Moray (1992) found evidence for a so-called inertia in trust evolution, demonstrating that it takes time for people to adjust their trust after experienced faults. In their study, trust showed residual effects over time, and was found to repair slowly.

1.3 Human-Automation Trust

Much of the theoretical development in human-automation trust has stemmed from extensions of studies into human-human trust (Muir, 1994; Lee & Moray, 1992). Three factors were important in motivating interest in studies of human trust of automation. First, despite the best intentions of designers, automated systems have typically not been used in ways for which they had been intended, with instances of misuse and disuse being prevalent

(Parasuraman & Riley, 1997). Second, most automated systems do not exhibit 100% reliability: variable human trust is the consequence of imperfect automation. Automated systems that exhibit imperfect behavior can have adverse effects on operator trust (Lee

& See, 2004; Parasuraman & Riley, 1997), reliance and compliance (Rovira, McGarry,

& Parasuraman, 2007) and overall system performance (de Visser & Parasuraman, 2011).

Other problems in human-automation interaction have included unbalanced mental workload, reduced situation awareness, decision biases, mistrust, over-reliance, and complacency (Sarter,

Woods, & Billings, 1997; Sheridan & Parasuraman, 2005). Third, at the time, it was not clear whether trust concepts derived from the human-human literature could be applied at all to human-automation trust. Since then, a growing human-automation trust literature has focused on the two critical issues of agent belief and agent experience.

8 1.3.1 Agent Belief

Automation bias can lead to a higher perceived trust because users may undeservingly ascribe greater authority and performance to a computer aid than to a human helper

(M. Dzindolet, Peterson, & Pomranky, 2003; Parasuraman & Manzey, 2010). This implies a need for users to calibrate trust more appropriately. A person is appropriately calibrated when the perceived reliability in an agent matches the actual reliability of the agent (Cohen,

Parasuraman, & Freeman, 1998; Lee & See, 2004).

1.3.2 Agent Experience

Reliability

Performance related errors include the type of error made by automation (false alarm or misses), the magnitude of the error, and the promptness of the error. The general rule is as follows: the more conspicuous the error, the more impact it has on trust. For example, it has generally been found that false alarms are more damaging to trust perceptions compared to misses (Parasuraman & Riley, 1997). The magnitude of the error is another factor that influences the perception of the error. The magnitude of the error has been found to be roughly proportional to subsequent automation use and trust (Lee & Moray, 1992).

Promptness of the error has also been found as an effect. Warnings given after reliance decisions have been made are perceived as being late and therefore untrustworthy (Abe &

Richardson, 2006). Along the process dimension, error features including the persistence and the consistency of the errors, have been shown to have an effect on trust and reliance

(Fox & Boehm-Davis, 1998; Lee & Moray, 1992; Parasuraman, Molloy, & Singh, 1993; Muir

& Moray, 1996). Chronic failures that persist over time cause an immediate drop-offin trust followed by a slow decline (Lee & Moray, 1992); however, a steady increase or decrease in the reliability of the automation can lead to an inertia effect, showing a slow adjustment of appropriate trust (Fox & Boehm-Davis, 1998). On the other hand, acute failures that do not persist over time are seen as glitches and trust recovers almost immediately for

9 small errors, but slower for larger errors (Lee & Moray, 1992). Variable errors are known to elicit decreased trust compared to constant errors mainly because variable errors are unpredictable (Muir & Moray, 1996). Ironically, trust seems to increase over time with constant, predictable errors (Muir & Moray, 1996; de Visser, Parasuraman, Freedy, Freedy,

& Weltman, 2006). Perhaps this effect is a reflection of an operator’s ability to predict the type of malfunction of the automation and uses this in his trust evaluation. Indeed, Lee and Moray (1992) showed recovery of trust over time even during a chronic failure of the automation.

Complacency

Complacency has been measured as a failure of checking raw data information and reduced monitoring (Bahner, Huper, & Manzey, 2008; Parasuraman & Manzey, 2010; Parasuraman et al., 1993). There is an inverse relationship between levels of trust and such monitoring.

Complacency is likely to occur in situations where the operator’s trust in the automation is high because the operator will not monitor the automation as closely (Muir & Moray,

1996; Parasuraman, Sheridan, & Wickens, 2008) and in either high-pressure or multiple-task environments (Bahner et al., 2008; Parasuraman et al., 1993). Operators are more likely to monitor automation that is highly variable in its reliability than automation that shows consistent reliable performance (Parasuraman et al., 1993). Higher complacency has been linked to an increase in commission errors (Bahner et al., 2008) and people who have higher complacency potential have poorer gauge recall in a simulated task (Bailey, Scerbo,

Freeman, Mikulka, & Scott, 2006). Introduction of failures during training can effectively reduce complacency in this task (Bahner et al., 2008).

1.4 Human-Human Trust vs. Human-Automation Trust

While previous research has shown that trust plays an important role in human-automation relations, the core question of whether human-automaton trust (HAT) is different from

10 human-human trust (HHT) has been largely left unaddressed. Only recently, researchers have taken positions on either side of in the emerging debate about the differences and similarities in human-human and human-automation interaction. This may have been sparked by the rapidly changing landscape of kinds of automation, as illustrated earlier.

One side supports the contrast hypothesis, which posits that human-automation trust is uniquely different from trust in humans and proposes models that indicate differences in initial perception, automation monitoring performance, and judgments that lead to differences in trust (M. Dzindolet et al., 2003; Lee & See, 2004; Madhavan & Wiegmann,

2007). Others support the opposing similarity hypothesis, which proposes that computers are essentially social actors and that social psychological findings with humans can be replicated with machines (Nass, Steuer, & Tauber, 1994; Nass & Moon, 2000; Reeves &

Nass, 1996). These differing hypotheses are discussed next in terms of a number of relevant empirical findings.

1.4.1 Agent Belief

Purpose

Supporters of the contrast hypothesis suggest that automation essentially lacks intentionality

(Lee & See, 2004; Madhavan, Wiegmann, & Lacson, 2006). They claim automation is typically introduced to an operator without much context or information about it’s intended purpose. However, experiments in which automation is portrayed as having intent have shown that people treat that automation as a social actor (Nass et al., 1994; Nass & Moon,

2000; Reeves & Nass, 1996; Nass, Moon, Fogg, Reeves, & Dryer, 1995; Nass, Fogg, &

Moon, 1996), although these studies do not typically compare humans and automation in the same study. It is, therefore, unclear whether there are actual differences between trust in other humans and trust in automation or if they disappear when humans and machines are portrayed as having intentionality. Furthermore, human performance experiments involving automation typically assist operators in some manner, whereas economic game experiments have successfully portrayed humans as both benevolent and malevolent using

11 vivid descriptions of life events that indicated either their neutral, praiseworthy, or suspect moral character (Delgado et al., 2005). It is unclear if differences between humans and automation may emerge if automation is portrayed to be malevolent or competitive.

Process

Supporters of the contrast hypothesis claim that people have different initial settings for trust. One theoretical model proposed by Madhavan & Wiegmann (2007), suggests that humans have a perfect schema for automated aids while humans are believed to be imperfect.

The authors of the model propose that because expectations are very high for automation initially that breakdowns are more severe. This occurs primarily because automation is expected to perform perfectly, as indicated by a general positive bias towards automation known as automation bias (Parasuraman & Manzey, 2010; M. Dzindolet et al., 2003;

Parasuraman & Riley, 1997), whereas humans are expected to make mistakes. Furthermore, automated agents are thought to be invariant and objective in their behavior while humans are believed to be adaptable (Dijkstra & Liebrand, 1998), which may explain why humans reduce reliance on a human aid and increase reliance on an automated aid when faced with higher risk decisions (Lyons & Stokes, 2011). Automation bias seems to disappear when the experience of both human and advisor is equated (Lerch, Prietula, & Kulik, 1997) and trust breakdowns are more severe for automation compared to humans when they are both deemed expert advisors (Madhavan et al., 2006).

It has been suggested that there is a lack of symmetry between machines and humans because machines cannot reciprocate (Lee & See, 2004). With new emerging advanced technology that is designed to adapt to the emotional needs of humans, this is likely to change. In fact, the issue of whether machines have or should be moral is of increasing interest, especially given the fact that drones, and other types of automation have already been designed and implemented to automatically fire at humans (Economist, 2012). There have, however, been few direct empirical examinations of whether individuals consider machines to be more or less moral compared to humans given certain situations. Unfair

12 offers were more often rejected from humans than from computers suggesting a stronger emotional reaction to unfairness from humans (Sanfey, Rilling, Aronson, Nystrom, & Cohen,

2003). It has yet to be examined whether individuals consider humans to have higher moral standards than computers.

The similarity hypothesis would again suggest that if automation is set up as a social actor, individuals make attributions of that automation in ways that are similar to attributions made when interacting with another human.

Humanness

There is quite a bit of variance in the portrayed humanness of humans and automation agents in various experiments. Research that supports the contrast hypothesis typically provides little information about the automation and few surface characteristics are available.

They appears as a software entity with few visible features. One exception is a study in which information on the expertise of human or automation agents was provided to participants prior to their working with either humans or automation (Madhavan et al.,

2006). Another is the more recent interest in studies using high fidelity robotic simulations

(de Visser & Parasuraman, 2011). In contrast to this work, others have specifically varied a number of human features that are typically characteristics of humans but which are then transplanted to machines. These include appearance (Nass & Moon, 2000), facial expressions (DiSalvo & Gemperle, 2002), voices (Nass & Lee, 2001), and biological motion

(Thompson, Trafton, McCurry, & Francis, 2010). This work has shown that humans exhibit similar responses and make similar established social attributions if the machine is made out to be human. The impact of such human features on trust is less clear.

13 1.5 Agent Experience

1.5.1 Task Difficulty

Task difficulty has been shown to be a sensitive manipulation in terms of whether people actually rely on the automation or not. When task difficulty is high it is more difficult to detect errors and humans are more willing to comply with the automation because their self-confidence with regard to the task is lower (Lee & Moray, 1992). Furthermore, the ease of determining an error made by the automation may correspond to subsequent trust judgments. One study showed that when an automated aid made errors on an easy task, trust was lower than when an automated aid made errors only during a difficult task

(Madhavan et al., 2006). These results suggest that the difficulty of the task for the operator in part determines how the reliability and the trust in the aid is assessed. If prior belief is kept constant, it is therefore expected that, compared to human-like agents, machines-like agents will be trusted less when task difficulty is low and they make mistakes. These effects will disappear when task difficulty is increased simply because it will be more difficult to track the errors of the agents. Furthermore, only the human performance literature typically portrays automation as a decision aid on a common task. Many economic games do not involve such tasks and instead examine social interactions and resource exchange. More consistent comparisons between the type of tasks with regard to humans and machines is needed to draw more definitive conclusions.

1.5.2 Etiquette

Some have argued that one of the reasons for the adverse effects of working with automation is that the behavior of an automated device does not conform to human-human etiquette

(Brown & Stephen, 1987; Parasuraman & Miller, 2004). This approach aims to design automation that behaves in a manner consistent with human-human protocol and social expectations. Increasing etiquette has also been shown to lead to better performance, reduced situation awareness, and improved trust (Hayes & Miller, 2010; Parasuraman &

14 Miller, 2004); moreover, researchers have shown that humans react very similar socially to automation as they do to the humans when automation acts more human (Nass et al., 1995).

It is less clear how cognitive agents with intermediate levels of humanness are perceived by users. In addition, previous experiments have typically not compared humans and machines together in terms of etiquette.

1.5.3 Trust Repair

Given that many studies concerning human-automaton trust involve a breakdown of trust after reliability deteriorates, trust repair seems extremely relevant in the context of human-automation.

However, this topic is largely ignored in the human-automation trust domain. Researchers in this area have proposed several ways to repair trust after it has been broken, but these have mainly been applied to humans and organizations and not to machines. Tomlinson and Mayer (2009) suggested that causal attributions regarding the trust violation dictate the appropriate response to alleviate and initiate a trust repair effort. The authors proposed three attribution dimensions including the locus of causality, controllability, and stability.

The locus of causality refers to who is to blame in the situation and can refer internally to the trustee itself or externally to someone else or another factor in the situation. Controllability of the situation focuses on the ability of the trustee to have control over the situation and how much the trustee can be held accountable for the failure. Finally, stability refers to the likelihood of the event occurring again. A number of additional strategies have been proposed to repair trust in organizations. Lewicky and Bunker (1996) recommended four immediate actions after violations have occurred: acknowledging the violation; determining the cause of the violation and admitting fault; admitting the act was destructive; and accepting responsibility for consequences. Gillespie and Dietz (2009) outlined a detailed four phased strategy including a set of immediate responses, diagnosis, reforming interventions, and an evaluation. They emphasized both regulating distrust and demonstrating trustworthiness.

It is expected that the trust may be positively influenced with a given repair effort employed by an agent. For example, if the automation agent apologizes for a mistake, we may see

15 an increase in trust. If a rationale is given for why a machine may fail, this may further improve the restoration of trust (M. Dzindolet et al., 2003). Finally, a combination of these methods may rapidly restore trust or decrease the decline of trust (i.e. trust resilience).

Differences in trust repair between humans and machines have thus far not been examined.

1.5.4 Feedback

Feedback on agent performance concerns the timing of the agent’s advice to the human.

The advice can be given before the human gives a response, after the human gives a response (after which they are given a chance to change this response), and in a tally after several trials of the experiment (M. Dzindolet et al., 2003). Few studies have actually examined the effects of feedback on human-automation trust. Dzindolet et al. (2003) showed that the best trust calibration occurrs when feedback is provided every five trials and not immediately after each trial. It is therefore expected that when feedback is available, this will differentially impact different agents. More human agents may have the highest trust when no feedback is provided and the lowest when feedback is provided at the end of the task. With immediate or short delay feedback, it is likely that people will be calibrated to agent capability and thus increase calibration. The larger the time delay between and the performance feedback, the more an individual relies on individual differences, automation characteristics, and context.

1.5.5 Level of Automation Assistance

The types of automation used in previously mentioned studies have vary in their functionality, as defined by levels and stages of information processing (Parasuraman, Sheridan, & Wickens,

2000). Some automation tasks include process control (Lee & Moray, 1992), target detection tasks (M. Dzindolet et al., 2003), and decision-making (Rovira et al., 2007). Some focus on perception and comprehension types of automation, others focus on decision-making and execution based automation. It is important to identify the effects of automation at different levels and stages along the functional spectrum because this can have differential effects on

16 performance. For instance, breakdowns in reliability have been shown to be particularly harmful in the decision phase of automation as compared to earlier stages of information processing (Rovira et al., 2007). This is a concern considering that the different features and capabilities of the automation seem to be critical to its overall perception and perceived trust. These features have not been systematically controlled, which can confound the inferences made about the differences between HHT and HAT. It may therefore be better to work with a basic laboratory version of an automation task and control the features and characteristics in more basic laboratory tasks such as the one used in Rovira et al. (2007).

1.5.6 Trust Evolution

Some have asserted that trust may evolve differently between humans and other humans compared to humans and machines (Lee & See, 2004). The suggestion is that operators typically start relationships with machines on a faith based relationship and gradually switch to monitoring the automated agents. Human relationships, on the other hand, start with monitoring of evidence and gradually change to faith-based relationships. There are several issues with this suggestion. First, human relationships often start with purpose/faith based on little information. As described earlier, expert teams generate trust swiftly. Second, the level of information about automated agents varies enormously. For instance, we know a lot more about Siri than our local ATM. How relationships with these agents depends for the most part on the situation, which makes it harder to pin this to an actual difference between human and automated agents. Such temporal dynamics have to be compared more rigorously in an experiment that varies and controls for the situation.

1.5.7 Individual Differences

Individual operator characteristics may play an important role in how trust policy is set for any agent. These characteristics may interact with how harmful an agent is estimated to be and how this fits with the likelihood of attributing certain qualities to agents. For instance, a genetic study showed that people verified more and trusted the automation less

17 compared to people with a different expression of the gene (Parasuraman, de Visser, Lin,

& Greenwood, 2012). This may reflect a generally more skeptical attitude or an astute ability to detect harm rather than being a feature unique to automation. Something similar may occur for humans. Nonetheless specific questionnaires have been developed to assess attitudes towards technology. The complacency potential scale is one example and assesses how likely people are to comply with humans as compared to machines (Singh, Molloy,

& Parasuraman, 1993). A number of other attitude questionnaires have been developed to assess attitudes towards technology. Some personality characteristics like openness and agreeableness have been linked to increases in trust. Finally, constructs such as the ability to anthropomorphize (Epley, Waytz, & Cacioppo, 2007) and technomorphize have more recently been implicated as important for technology attitudes and trust.

1.6 The Cognitive Agent Experiments

Several important issues that emerge from this debate make it hard to draw any definitive conclusions to answer the question of whether and how human-human trust differs from human-automation trust. There are several motivating factors that make it prudent to re-examine this question and move towards a more general theory of human-agent of trust that can account for a variety of outcomes in human interaction with different agents.

The traditional dichotomy between humans and machines may no longer apply given that novel cognitive agents may not be classified as either entirely human or entirely machine, which has ramifications for trust behavior that may not be accounted for by either theory.

Second, efforts to examine differences between agents are hampered by the tendency of researchers studying trust to invent new constructs such as human-robot trust, medical trust, etc without establishing why a new trust construct has to be used in the first place.

Would we be required to establish a new field of human-alien trust if and when they arrive on earth? Moving towards a unified human-agent theory that incorporates the characteristics of different agents and their effects on trust would provide parsimony and remove the need for some of these arbitrary categories. Third, it is important to identify the common 18 components of trust, which is hard if everyone defines trust components with trust as part of the component. Fourth, for the most part, the differences between human-human trust and human-automation have been assumed and not been tested directly. Direct comparisons between humans and machines remain rare and must be made in order to draw firm conclusions about any differences or similarities.

1.6.1 The Current Study

In the near future, we may get advice from a number of different kinds of agents in the same environment. An example of such a situation is seeking advice at the airport regarding a variety of unknowns. Generally, every country has different procedures at an airport.

In this uncertain environment, passengers will depend on advice from agents to navigate the procedures of the airport. When travelers need advice, they may consult the airport employees to ask for guidance. More recently, holograms that appear human have been installed at various airports in the world to provide travelers with advice on immigration procedure and other matters. Computer terminals may provide additional assistance about itineraries. Passengers will have to determine how they will use this information and what sources they will rely on most to make decisions. Whether the source is a computer, avatar, or human could make a difference to how trustworthy they deem the advice. With constantly changing conditions in the airport environment, aids may not always provide reliable advice. Finding out whether the agents are correct may take a while because the facts may emerge over time. This may have differential effects with the involvement of a computer agent, an intermediate avatar agent, or a human agent.

In the present series of experiments, we examined the performance of participants on a numerical pattern detection task with the assistance of an automated aid (van Dongen

& Maanen, 2006). We created three agents along the cognitive spectrum: a computer, an avatar, and a human. Agent background stories were created to communicate either a cooperative or competitive intent on the part of the agents. We hypothesized that participants’ preference for working with cooperative agents would increase as their humanness

19 decreased consistent with the automation bias hypothesis. We expected that participant preference for working with competitive agents would decrease more so for humans than for more machine-like agents, primarily because a competitive human may be considered a more severe threat than a competitive machine. Brief videos of each agent were shown prior to each recommendation to create more realistic agents and to remind participants about the kind of agent they were interacting with.

In addition, to varying agency, we also manipulated the reliability (accuracy) of the agent. Trust has been shown to decrease consistently with decreased reliability. Furthermore, decreased reliability has been shown to have an increased detrimental effect for machines compared to humans. To test this effect, we steadily increased the number of errors in the advice provided by the aid and so decreased reliability over time. Perfect advice was given initially at 100% reliability. Agent reliability was then decreased systematically over time, ending with a 0% reliability condition. There were several reasons for varying reliability in this manner. First, there has been some debate about the reliability rate that is still considered to produce useful advice. It has been proposed that automation with 75% reliability is considered to be useful, but any reliability below this threshold is considered to be harmful (Wickens & Dixon, 2007). Evidence against this proposition has been provided in cases where automation is prone to misses and any correct advice is still beneficial (de

Visser & Parasuraman, 2011). Second, no study has examined a 0% reliability condition.

In this case, the aid is still providing information because it is always incorrect. It is also interesting in terms of examining whether this 0% would be considered differently if it came from a human rather than a machine, depending on the intentionality of the agent. For a good intentioned agent, low reliability may be considered a flaw in ability/performance. For a malevolently intentioned agent, low reliability may be considered lying or sabotage, an issue with the purpose level of information. This has not previously been examined. Third, variable reliability allows examination of how calibrated participants are to actual capability.

A person is said to be optimally calibrated if perceived reliability matches actual reliability

(Cohen et al., 1998; Lee & See, 2004). Any miscalibration becomes a sensitive measure for

20 examination of differences between the agents. The reliability level was repeated for two phases. This was done to examine a novel phase when a new reliability level is introduced and a familiar phase in which participants have tuned into a reliability and work with an agent over a period of time.

Finally, we also manipulated the delay of the feedback provided to the participants as well as the type of feedback. Delaying feedback increases uncertainty and decreases the amount of observable performance evidence available. We expected that participants would rely more on their initial beliefs with delayed feedback and thus reinforce possible agent stereotypes. With the immediate performance feedback differences between the agents may disappear because of the overwhelming available evidence. Finally, we manipulated performance feedback after each agent block with empathic feedback. We either showed positive congratulatory and happy feedback if the human-agent team achieved higher than a

50% performance score on the trial. When performance was below this threshold, participants received remorseful, apologetic and sad feedback. We expected this type of feedback would form a kind of resilience towards any trust decrement, producing a shallower slope compared to some of the other experiments.

In summary, the present investigation examined performance, compliance, and trust while working with agents varying on the cognitive-agent spectrum in terms of humanness.

We varied onset of feedback, intentionality of the agents, and type of feedback. We predicted no differences between agents with immediate feedback as almost complete performance certainty could be achieved. With increases in uncertainty achieved by delaying performance feedback we expected differences to appear such that as humanness increased, performance, compliance and trust would decrease more compared to more machine-like agents. We expected this effect to disappear or even reverse as reliability decreased. With competitive intent we expected performance, trust, and compliance to decrease as humanness decreased for the reason that machines would remain more predictable than humans. We expected this effect to disappear or even reverse as reliability decreased. Finally, we expected that empathy and humanness would interact such that the more human agents would be trusted

21 and complied with more than the machine-like agents.

Four experiments were conducted to test these hypotheses. In Experiment 1, we examined performance, compliance, and trust following advice by either a computer, avatar, or human that gradually became more erroneous over time (less reliable) with immediate performance feedback on every trial. In Experiment 2, we examined performance, compliance, and trust of the same task but delayed performance feedback to 6 trials after every block. Experiment

3 was identical to Experiment 2, except that initial agent stories were changed to be adversarial as opposed to cooperative stories used in all other experiments. Experiment

4 was identical to Experiment 2, except that after each agent block, positive or negative was provided adaptively based on the participants performance.

22 Chapter 2: Methods and Results

2.1 Experiment 1

2.1.1 Method

Participants

Twenty-two young adults were rewarded monetarily for participation in this study based on their performance. Participants were given a starting fund of $35 at of the experimental task and penalized with $0.35 for every incorrect answer. This monetary loss scheme was chosen to increase the vulnerability of participants, an important factor in trust experimentation. In this first experiment, the earnings were continuously shown to the participant and updated throughout the duration of the experiment. Participants additionally received a base fee of $10 upon completion of the study.

Design

The experimental design for this study was a 3x4x2 repeated measures design with agent

(computer, avatar, human), reliability (100%, 75%, 50%, 0%), and phase (novel, familiar) as within-subject factors (see Fig. 2.1). Each agent block (represented by a single square in Fig. 2.1) comprised 6 trials and was randomized within the reliability block totaling 18 trials for the three agents. Reliability blocks were presented in the order shown starting at 100% reliability and then declining to 0%. Each of the 4 reliability levels was presented twice for a total of 8 experimental blocks constituting the phase variable. A total of 144 experimental trials were presented to participants.

23 ln h ontv pcrm(e i.22.Bsdo hs eut,tecmue,avatar, Spectrum Human-Machine computer, the the of results, samples representative these as on selected were Based agents human 2.2). and Fig. (see agents the spectrum perceived cognitive at participants the that recruited along confirmed participants results using Pilot tested Turk. were Mechanical intent spectrum Amazon’s stories human-agent competitive and the and along 2004) agents cooperative See, 10 both for & with Stories (Lee 2005). al., information using et trust (Delgado modeled by of were developed dimensions stories The purpose fictional behavior. and a characteristic process and a personality, the about their report about news paragraph a a of history, excerpt life agent The brief A). a Appendix (see contained attitude stories and intent agent’s an establish to created were Stories Stimuli Story netwsmd u ob oprtv n helpful. and cooperative agent’s be the to experiment, this out In made was stories). intent all for A Appendix (see experimentation further for

!"#$"%&'$(##"$&'#"$())"%*+,(%-' $!!"# %!"# &%"# !"# $# iue21 h xeietldesign experimental The 2.1: Figure '# (# )# ./($0-' 24 %# *# &# 70.68# 456163# ,-./0123# +# Figure 2.2: The Cognitive Agent Spectrum

Video Stimuli

Videos were used to provide additional cues for an agent’s thinking process, to further establish the behavior and overall perception of the agents. For the experimental task,

48 videos were created for each agent to show the process of thinking prior to providing participants with the recommendation. All videos were cut to 4 to 5 seconds and were preceded with the text “linking...” and “connecting...” on in white font on a black background. These visualizations were included to show a smooth process between waiting for the recommendation and the video loading remotely from a server. For the human condition, a male actor from the University’s theatre department was filmed (see Fig. 2.3).

The actor was instructed to act as a person giving advice to another person pretending to look at a computer screen. The actor was instructed to make various sounds and say various phrases such as “let me think...” and “mmmm..” to simulate he was thinking before giving his advice. The actor used a keyboard and looked off-screen to make his recommendation to suggest that he was entering the recommendation into a computer that would then be sent back to the participant. The avatar agent was created using the online website XtraNormal.

A standard avatar agent template was used and modified to create a realistic avatar (see

Fig. 2.3). This avatar was animated to say “thinking” phrases similar to those used by the human agent (see Appendix B for all behaviors and dialogue used in the videos). The computer agent consisted of blinking lights compiled from separate GIF images and put in various orders to simulate thinking (see Fig. 2.3).The computer videos were made such

25 that there was little difference between the videos, similar to standard responses of many representative computers. The sound accompanying the images consisted of a series of beeps obtained from a website retrieved from www.trekcore.com/sounds. All videos were prepared and exported with the Final Cut Pro 5.4.1 video editor software.

Figure 2.3: The Cognitive Agents

Materials

The pre-experimental evaluation of the agents consisted of a subset of items derived from the Godspeed measure (Bartneck, Kuli´c, Croft, & Zoghbi, 2009), a trust measure (Jian et al., 2000), and performance predictions (M. Dzindolet et al., 2003). A custom measure was made to measure the risk vs. benefit of working with an agent inspired by the risk and judgment literature (Finucane, Alhakami, Slovic, & Johnson, 2000).

Task

A version of the TNO Trust Task (T 3) was adapted to develop a trust paradigm (van Dongen

& Maanen, 2006). In the T 3, participants were asked to discover a pattern (the pattern was as follows: 1,2,3,1,2) in a sequence of numbers with the help of an automated aid (see

Fig. 2.4). Participants were first asked to select one of three numbers: 1, 2, or 3, which

26 constituted their first choice. Then, a video of the agent was presented showing the aid was thinking before making a recommendation about the correct number in the sequence. When the recommendation was shown, participants made a final choice on selecting the number.

The correct answer was then shown for 2 seconds. After this step, the next trial started and the process was repeated. Each trial lasted about 10 seconds. The immediate feedback allowed participants to adjust their mental model of the pattern by comparing the correct answer with their perceptions of the reliability of their own predictions, the predictions of the decision aid, and their reliance decisions. Earnings were also shown continuously and decreased steadily as the number of incorrect answers increased. Ten percent noise was introduced into the pattern to make it difficult to learn the pattern. This percentage was determined in pilot testing to be optimal and produced sufficient difficulty in performing the task. Errors in the advice of the agents were randomized within each agent block. After each agent block, participants rated their trust in the agent and their self-confidence in their own ability.

Measurements. A number of measures were derived from the different stages in the task.

These measures comprise a variety of behavioral performance measures relevant to trust and compliance.

Pattern recognition performance. Pattern recognition was calculated by taking the correctly predicted numbers on the first choice by the participant (before the aids’ recommendation).

Human-agent performance. Human-agent performance was calculated as the overall correctly answered trials on the final choice made by the participant (after the aid had made the recommendation) over the number of trials per agent block.

27 Figure 2.4: The TNO Trust Task (T 3).

Compliance. Compliance was measured as the overall number of trials in which the participant had the same final answer as the recommendation by the aid. This measure indicates agreement with the advice and includes data in which the participant picked the same number for the first choice as the agent.

Self-Reliance. Self-reliance was measured as the overall number of trials in which the first choice of the participant matched the second choice. This measure indicates the persistence and self-confidence participants have in their own choices. The inverse of this measure is persuasion, a stronger form of compliance in which participants switch their answers away from their first choice.

28 Trust and Self-Confidence. Trust and self-confidence were measured after each agent block. To measure trust participants were asked to indicate to what extent they trusted the agent’s ability to correctly predict the number. To measure self-confidence participants were asked to indicate to what extent they were self-confident about their own ability to correctly predict the number. Both trust and self-confidence were rated on a 10 item scale. A 10 item scale was used to enable an analysis of calibrated trust with actual reliability of the aid, which ranges from 0 to 1.

Procedure

Participants were welcomed into the experimental suite and explained the procedure for earning money was explained to them. After signing the informed consent form, participants were asked to fill out the individual differences questionnaires. Participants were then told that they would participate in a pattern recognition task and they would receive recommendations from different kinds of agents. They then learned about the different agents (computer, avatar, human), read their stories, and rated them on the godspeed measure, the trust measure, the risk vs. benefit measure, and the performance prediction measures. Participants were given a brief break in which they could get some water and eat some available candy while the experimental task was being setup. They then proceeded to complete the T 3. Fifty trials were used for training with an anonymous aid and no video stimuli to establish base-line performance on the task. Participants then proceeded with the experimental blocks. After the experimental task was completed, participants filled out the post questionnaire which contained the same measures as the first ratings as well as the causality attribution measure and several open ended questions about the agents.

Participants were paid according to their performance and thanked for their participation.

The total time in the lab for each participant was about 2 hours.

29 2.1.2 Results

Pre- and post-subjective measures were submitted to a 3x2 Repeated Measures ANOVA with factors of Agent (computer, avatar, human) and stage (belief, experience) as within-subject variables. Behavioral data collected from the experimental task was submitted to a 3x4x2

Repeated Measures ANOVA within-subject factors of Agent (computer, avatar, human), reliability (100%, 75%, 50%, 0%) and phase (novel, familiar) as within-subject factors.

Additional analyses were carried out by separating behavioral data by reliable and unreliable trials and are reported where appropriate. All effects reported here are significant at the

0.05 level. All simple effect comparisons were corrected for with Tukey’s honestly significant differences test.

The Human-Machine Spectrum

There was a main effect for agent on perceived humanness, F (2, 38) = 161.14, p ¡ 0.05.

Simple effect analyses revealed that as expected, the computer (M = 1.4, SEM = 0.11) was considered significantly more machine-like than the human (M = 4.47, SEM = 0.12), t(38) = 16.62, p < 0.05. The avatar (M = 1.85, SEM = 0.19) was also considered significantly more machine-like than the human, t(38) = 14.19, p < 0.05, but significantly more human-like than the computer, t(38) = 2.43, p < 0.05.

Pattern recognition performance

There was a significant 3-way interaction between agent, reliability and phase for pattern

2 recognition performance, F (6, 126) = 2.91, p < 0.05, ηG = 0.02. (see Fig. 2.5). Subsequent analyses revealed an agent by reliability interaction for the novel phase alone, F (6, 126) =

2 4.53, p < 0.05, ηG = 0.05. Simple effects analyses for the novel phase condition showed that in the 100% condition working with the computer resulted in significantly higher performance compared to working with the human, t(42) = 2.43, p < 0.05. In the 75% condition, working with the computer resulted in significantly higher performance compared

30 to working with the human, t(42) = 2.55, p < 0.05. In this condition, working with the avatar also resulted in significantly higher performance compared to the human, t(42) =

3.69, p < 0.05. In the 0% condition, however, working with the computer resulted in significantly lower performance compared to both the human, t(42) = 3.11, p < 0.05. and the avatar, t(42) = 3.44, p < 0.05. 1.0 Computer Avatar Human 0.8 0.6 0.4 pattern recognition performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.5: Agent by reliability for pattern recognition performance in Experiment 1. The red dashed line indicates chance performance on the task.

31 Human-agent performance

Human-agent performance significantly decreased as reliability decreased, F (3, 63) = 93.21, p < 0.05 (see Fig. 2.6). Performance in the 0% condition (M = 0.42, SEM = 0.04) was significantly lower compared to overall performance in the 100% condition (M = 0.95, SEM

= 0.01), t(63) = 15.73, p < 0.05. Performance in the familiar phase was significantly higher

(M = 0.66, SEM = 0.02) compared to performance in the novel phase (M = 0.61, SEM

= 0.02), F (1, 21) = 5.75, p < 0.05. No significant differences were observed between the agents for overall performance. 1.0 Computer Avatar Human 0.8 0.6 0.4 human − agent performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.6: Agent by reliability for overall performance in Experiment 1.

32 1.0 Computer Avatar Human 0.8 0.6 compliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.7: Agent by reliability for compliance in Experiment 1. The red triangles indicate calibrated performance.

Compliance

There was a significant 3-way interaction between agent, reliability and phase for compliance,

F (6, 126) = 3.4, p < 0.05 (see Fig. 2.7). Overall compliance in the 0% condition (M =

0.32, SEM = 0.03) was significantly lower compared to overall compliance in the 100% condition (M = 0.95, SEM = 0.01), t(63) = 20.4, p < 0.05. Subsequent analyses revealed an agent by reliability interaction for the novel phase alone, F (6, 126) = 3.43, p < 0.05.

Simple effects analyses for the novel phase in the 100% condition showed significantly higher

33 compliance with the computer compared to the human, t(42) = 3.08, p < 0.05. This difference disappeared in the 0% condition, t(42) = 0.33, p = 0.94. The red triangles in

Fig. 2.7 indicate calibrated compliance; that is, when perceived reliability matched actual reliability. In this experiment participants were calibrated highly in the 100% conditions, but then steadily over-relied on the aid’s advice. Calibration was also higher in phase compared to the novel phase.

Self-Reliance

There was a significant 3-way interaction between agent, reliability and phase for self-reliance,

F (6, 126) = 2.85, p < 0.05 (see Fig. 2.8). Subsequent analyses revealed an agent by reliability interaction for the novel phase alone, F (6, 126) = 4.75, p < 0.05. Simple effects analyses for the novel phase in the 100% condition showed no significant differences between agents, t(42) = 1.58, p = 0.25. In the 75% condition, self-reliance was maintained for the avatar, but was significantly higher compared to both the computer, t(42) = 2.79, p < 0.05, and the human, t(42) = 4.32, p < 0.05.

34 1.0 Computer Avatar Human 0.8 0.6 self − reliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.8: Agent by reliability for self-reliance in Experiment 1.

Trust

Trust significantly decreased as reliability decreased, F (3, 63) = 108.75, p < 0.05 (see Fig.

2.9). Trust in the 0% condition (M = 2.85, SEM = 0.45) was significantly lower compared to trust in the 100% condition (M = 9.23, SEM = 0.16), t(63) = 17.66, p < 0.05. No significant differences were found between the different agents. The red triangles in Fig.

2.9 again indicate calibrated trust; when perceived reliability as measured by trust matches actual reliability. In this experiment participants were calibrated highly in the 100% and

75% conditions, but then over-trusted agents in the 50% and 0% conditions.

35 10 Computer Avatar Human 8 6 trust 4 2 0

100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.9: Agent by reliability for trust in Experiment 1. The red triangles indicate calibrated trust.

Perceived Security

There was a main effect for agent on perceived security, F (2, 38) = 7.29, p < 0.05. Simple effect analyses revealed that the computer (M = 3.62, SEM = 0.16) was perceived to be providing significantly more security than the human (M = 2.9, SEM = 0.15), t(38) =

3.69, p < 0.05. No significant differences were found between the avatar (M = 3.35, SEM

= 0.15) and the human, t(38) = 2.29, p = 0.06, or between the computer and the avatar, t(38) = 1.4, p = 0.34. Perceived security was significantly lower in the experience phase

36 (M = 2.98, SEM = 0.14) compared to the belief phase (M = 3.6, SEM = 0.15), F (1, 19)

= 8.87, p < 0.05.

Perceived Morality

There was a significant interaction between agent and stage for perceived morality, F (2,

38) = 9.09, p < 0.05 (see Fig. 2.10). The human agent was considered significantly more aware of right and wrong compared to the computer, t(95) = 8.57, p < 0.05, and compared to the avatar, t(95) = 7.54, p < 0.05. Only for the human did perceived morality decrease in the experience stage compared to the belief stage, t(95) = 3.43, p < 0.05. 5 Computer Avatar Human 4 3 2 Perceived Morality Perceived 1 0

belief experience

Agent by Time

Figure 2.10: Agent by time for perceived morality in Experiment 1.

37 2.1.3 Discussion

The purpose of this study was to investigate the effects of working with different cognitive agents on performance, compliance, and trust. First and foremost, all agents were rated to be different in terms of humanness on the cognitive agent spectrum. These results provide an empirical validation of the manipulation of agents along the cognitive spectrum.

Behavioral differences were found primarily between the computer and the human for pattern recognition performance and compliance in the very first reliability block.

Subsequent blocks did not show any differences. This suggests that computers were seen initially to be more reliable than humans.

This study also shows evidence for initial perceived differences between agents. The more machine-like agents were initially perceived to provide more security than their human counterpart. This difference disappeared after participants had interacted with the agents.

Results indicate that participants showed a high degree of calibration as apparent in both the behavioral compliance measure and subjective trust ratings. Participants also correctly shifted their strategies away from the agent’s advice when reliability started to decreased.

Compliance declined to low levels in the last 0% reliability level. Because immediate performance feedback is often not available to us in real-world settings, performance feedback was delayed until after each agent block (i.e., every 6 trials instead of every trial) in a follow-up experiment. It was expected that any differences between the agents would emerge more clearly as participants would have to rely on their own biases to make decisions within each agent block.

2.2 Experiment 2

2.2.1 Method

Participants

Eighteen young adults were rewarded monetarily for participation in this study based on their performance. The reward scheme was identical to the first experiment. None had

38 participated in the previous experiment.

Task

The task was identical to that of the first experiment except that feedback about the correct answer was no longer provided to participants after each trial. Instead, performance feedback was given after each agent block of 6 trials. The screen showed how many trials out of 6 were correctly answered. An update on the remaining earnings was also shown.

After this performance feedback, participants rated their trust in the agent and their self-confidence in their own ability. The design, stimuli, and materials were all identical to those used in Experiment 1.

Procedure

The procedure was identical to that of Experiment 1. Participants filled out the individual differences questionnaires, were introduced to the agents, rated the agents, and proceeded to complete the T 3. After the experimental task was completed, participants filled out the post questionnaires and were paid according to their performance and thanked for their participation.

2.2.2 Results

Pre- and post-subjective measures were submitted to a 3x2 Repeated Measures ANOVA with factors of Agent (computer, avatar, human) and stage (belief, experience) as within-subject variables. Behavioral data collected from the experimental task were submitted to a 3x4x2 repeated measures ANOVA with factors of Agent (computer, avatar, human), reliability

(100%, 75%, 50%, 0%) and phase (novel, familiar) as within-subject factors. Additional analyses were carried out by separating behavioral data by reliable and unreliable trials and are reported where appropriate. All effects reported here are significant at the 0.05 level. All simple effect comparisons were corrected for with Tukey’s honestly significant differences test.

39 The Human-Machine Spectrum

There was a main effect for agent on perceived humanness, F (2, 38) = 143.59, p < 0.05.As found in Experiment 1A, simple effect analyses revealed that the computer (M = 1.27, SEM

= 0.12) was considered significantly more machine-like than the human (M = 4.65, SEM

= 0.18), t(38) = 15.88, p < 0.05. The avatar (M = 1.88, SEM = 0.18) was also considered significantly more machine-like than the human, t(38) = 13.06, p < 0.05, but significantly more human-like than the computer, t(38) = 2.82, p < 0.05.

Pattern Recognition Performance

Pattern recognition performance significantly decreased as reliability decreased, F (3, 51) =

7.08, p < 0.05. Performance in the 0% condition (M = 0.29, SEM = 0.02) was significantly lower compared to overall performance in the 100% condition (M = 0.47, SEM = 0.03), t(51) = 4.71, p < 0.05.

Human-agent performance

There was a significant 3-way interaction between agent, reliability and block for human-agent performance, F (6, 102) = 2.65, p < 0.05 (see Fig. 2.11). Subsequent analyses revealed an agent by reliability interaction for the familiar phase, F (6, 102) = 3.42, p < 0.05. Simple effects analyses for the familiar phase in the 100% condition showed significantly higher performance with the computer compared to the human, t(34) = 4.09, p < 0.05, and between the avatar and the human, t(34) = 2.88, p < 0.05, but not between the computer and avatar, t(34) = 1.21, p = 0.45. No such differences were found in the 0% condition of the familiar phase.

40 1.0 Computer Avatar Human 0.8 0.6 0.4 human − agent performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.11: Agent by reliability for overall performance in Experiment 2. The red dashed line indicates chance performance on the task.

Compliance

There was a significant main effect for agent on compliance, F (2, 34) = 5.39, p < 0.05 (see

Fig. 2.12). Simple effect analyses revealed that compliance was significantly higher with the computer (M = 0.82, SEM = 0.03) than with the human (M = 0.71, SEM = 0.04), t(34)

= 3.23, p < 0.05. No significant differences were found between the avatar (M = 0.75, SEM

= 0.03) and the human, t(34) = 1.1, p = 0.51, or the computer and the avatar, t(34) =

2.13, p = 0.08. The red triangles in Fig. 2.12 indicate calibrated compliance. Participants

41 were again calibrated highly in the more reliable conditions. However, participants in this experiment complied more in the 0% condition than did participants in Experiment 1 in the same condition. 1.0 Computer Avatar Human 0.8 0.6 compliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.12: Agent by reliability for compliance in Experiment 2. The red triangles indicate calibrated performance.

Self-Reliance

There was a significant interaction between reliability and phase for self-reliance, F (3, 51)

= 7.71, p < 0.05 (see Fig. 2.13). In the novel phase, self-reliance decreased in the 0%

42 condition compared to the 100% condition, t(51) = 2.95, p < 0.05. In the familiar phase, however, self-reliance increased in the 0% condition compared to the 75% condition, t(51)

= 3.83, p < 0.05. 1.0 Computer Avatar Human 0.8 0.6 self − reliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.13: Agent by reliability for self-reliance in Experiment 2.

Trust

Trust significantly decreased as reliability decreased, F (3, 51) = 51.17, p < 0.05 (see Fig.

2.14). Trust in the 0% condition (M = 2.26, SEM = 0.48) was significantly lower compared to trust in the 100% condition (M = 7.77, SEM = 0.3), t(51) = 11.65, p < 0.05. No

43 significant differences were found between the different agents. The red triangles in Fig.

2.14 indicate calibrated trust. In this experiment participants were under-relying in the

100% and 75% conditions, calibrated in the 50% condition, and over-relying in the 0% condition. Mis-calibration in this experiment was higher compared to Experiment 1 in which immediate feedback was provided. 10 Computer Avatar Human 8 6 trust 4 2 0

100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.14: Agent by reliability for trust in Experiment 2. The red triangles indicate calibrated trust.

44 Perceived Security

There was an interaction between agent and time on perceived security, F (2, 38) = 3.7, p

< 0.05 (see Fig. 2.15).The computer agent was perceived to be providing the most security, while the avatar and the human were perceived to provide similar levels of security, but less than the computer agent. After the experiment, all agents were perceived as providing less security, but now the avatar was perceived to be similar to the computer agent. 5 Computer Avatar Human 4 3 2 Perceived Security Perceived 1 0

belief experience

Agent by Time

Figure 2.15: Agent by Time for perceived security in Experiment 2.

45 Perceived Morality

There was a main effect for agent on perceived morality, F (2, 38) = 71.65, p < 0.05 (see Fig.

2.16). The human was considered as having the highest moral standards. The computer and avatar were rated as having significantly lower perceived morality compared to the human agent. 5 Computer Avatar Human 4 3 2 Perceived Morality Perceived 1 0

belief experience

Agent by Time

Figure 2.16: Agent by Time for perceived morality in Experiment 2.

46 2.2.3 Discussion

The purpose of this study was to investigate the effects of working with different cognitive agents on performance, compliance, and trust. With delayed feedback introduced in this experiment, several differences between the agents emerged. Participants complied more with the computer agent compared to the human or avatar agents which is consistent with the automation bias phenomenon.

Compared to Experiment 1, all agents were trusted less. The additional uncertainty is likely to have caused mis-calibration. Throughout the experiment, participants over-relied on the advice of the agents. This explains why performance decreased more dramatically as reliability decreased over time compared to Experiment 1.

While initially the computer agent was perceived to be providing more security than the human and the avatar agent, this difference disappeared after the experiment task was completed. This was similar to the effects of observed in Experiment 1.

Even though behavioral differences were observed in this experiment due to increasing uncertainty in the experiment, the in-task trust measure did not differentiate between the agents. The initial belief setup for the agents potentially has a large impact on how trust is perceived over time. The initial stories were changed to be competitive and adversarial to observe effects with a different kind of dynamic. All other stimuli were held constant for

Experiment 3.

2.3 Experiment 3

2.3.1 Method

Participants

Twenty-two young adults were rewarded monetarily for participation in this study based on their performance. The reward scheme was identical to the first experiment. None had participated in the previous experiment.

47 Story Stimuli

Stories were created in similar fashion as those used for Experiment 1 and 2 except that the intent of the agents was made to be adversarial and competitive. The agent stories contained a brief life history, a paragraph about their personality, and a fictional excerpt of a news report about a characteristic behavior (see Appendix B for all stories).

2.3.2 Results

The Human-Machine Spectrum

There was a main effect for agent on perceived humanness, F (2,42) = 111.75, p < 0.05. The human agent was again perceived to be the most human, and the computer was perceived to be most machine-like. The avatar was rated as more human than the computer, but was still considered more of a machine than a human. These differences again remained stable before and after the experiment.

Pattern Recognition Performance

There was a significant interaction between reliability and phase for pattern recognition performance, F (3, 57) = 5.67, p < 0.05. In the novel phase, performance decreased in the

0% condition compared to the 100% condition, t(57) = 3.4, p < 0.05. No such differences were observed in the familiar phase. Note that in the 50% of the novel phase and the 0% condition of the familiar phase, performance dropped below chance level suggesting that advice was biasing participants away from the correct pattern.

Human-agent performance

There was a significant interaction between agent and reliability on human-agent performance,

F (6, 114) = 2.42, p < 0.05 (see Fig. 2.17). Simple effect analyses revealed that human-agent performance was significantly higher with the computer in the 100% condition (M = 0.85,

SEM = 0.04) as compared to performance with the human (M = 0.68, SEM = 0.06), t(38)

48 = 4.09, p < 0.05. No such differences were found between the avatar and the human or the computer and the avatar. 1.0 Computer Avatar Human 0.8 0.6 0.4 human − agent performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.17: Agent by reliability for overall performance in Experiment 3.

Compliance

There was a main effect for reliability on compliance, F (3, 57) = 10.97, p < 0.05 (see Fig.

2.18). Compliance in the 0% condition (M = 0.35, SEM = 0.03) was significantly lower compared to compliance in the 100% condition (M = 0.51, SEM = 0.05), t(57) = 4.49, p <

0.05. No significant differences were found between the different agents. The red triangles

49 in Fig. 2.18 indicate calibrated trust. In this experiment participants were under-relying in the 100% condition, fairly calibrated in the 75% condition, and over-relied in the 50% and

0% conditions. 1.0 Computer Avatar Human 0.8 0.6 compliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.18: Agent by reliability for compliance in Experiment 3. The red triangles indicate calibrated performance.

Self-Reliance

There was a significant 3-way interaction between agent, reliability and phase for self-reliance,

F (6, 114) = 3.38, p < 0.05 (see Fig. 2.19). Subsequent analyses revealed an agent by

50 reliability interaction for the novel phase alone, F (6, 114) = 2.42, p < 0.05. Simple effect analyses for the novel phase in the 100% condition showed significantly higher self-reliance with the human compared to the computer, t(38) = 3.79, p < 0.05, and compared the to avatar t(38) = 3.4, p < 0.05. No significant differences were found between the computer and the avatar agents. 1.0 Computer Avatar Human 0.8 0.6 self − reliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.19: Agent by reliability for self-reliance in Experiment 3.

Trust

There was a significant 3-way interaction between agent, reliability, and phase on trust, F (6,

114) = 2.51, p < 0.05 (see Fig. 2.20). Subsequent analyses revealed an agent by reliability 51 interaction for the novel phase alone, F (6, 114) = 4.16, p < 0.05. Simple effect analyses for the novel phase in the 100% condition showed significantly higher trust in the computer than in the human, t(38) = 4.41, p < 0.05. Trust in the avatar was also significantly higher than in the human, t(38) = 2.41, p < 0.05. No significant differences were found between the computer and the avatar. 10 Computer Avatar Human 8 6 trust 4 2 0

100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.20: Agent by reliability for trust in Experiment 3. The red triangles indicate calibrated trust.

52 Perceived Security

There was a main effect for agent on perceived security, F (2, 42) = 3.41, p < 0.05.The computer agent and the human agent were considered to be providing similar levels of security. The avatar agent was considered to be providing less security. This effect remained stable after the experiment. Note that compared to Experiment 2, all agents were considered to providing considerably less security than agents with cooperative intent.

Perceived Morality

There was a main effect for agent on perceived morality, F (2,42) = 49.05, p < 0.05. The human was considered as having the most morals. The computer and avatar were rated as having significantly lower perceived morality compared to the human.

2.3.3 Discussion

The purpose of this study was to investigate the effects of working with different cognitive agents on performance, compliance, and trust. With initial stories changed as adversarial and competitive further differences between the agents emerged. Participants complied more with the computer agent as was observed in Experiment 2. However, differences between the agents emerged along the human-agent spectrum.

The computer agent was trusted most, followed by the avatar and then followed by the human agent. Compliance with the human was considerably lower for the human compared to the previous experiments. Initial adversarial stories may have had a different effect on computer agents compared to human agents. Human agents that are adversarial may pose a different kind of threat compared to a computer agent that is still trying to assist in some manner.

The initial belief stories seemed to have influenced participants and brought out more differences between the agents. Experiment 4 focused on changing information about the agents during the experimental task. This may have different effects on the agents as information is learned and updated throughout the task.

53 2.4 Experiment 4

2.4.1 Method

Participants

Twenty-two young adults were rewarded monetarily for participation in this study based on their performance. The reward scheme was identical to the first experiment. None had participated in the previous experiment.

Story Stimuli

Stories were identical to those used in Experiments 1 and 2 and exhibited agents with cooperative intent.

Video Stimuli

For this experiment two types of videos were created to provide feedback to participants.

Videos for the human and avatar showed a happy agent expressing encouraging remarks such as “good job” and “nice work”. The computer agent’s encouragement consisted of a happy musical victory melody comprised of beeps. Apologetic videos for the human and avatar videos showed a sad agent that apologized for any mistakes made by saying “I did not do well” and “I apologize for my mistakes”. The computer agent exhibited a sad sound track that was comprised of slowed down and lower vocal tones. Video clips were cut to about 5 seconds.

Procedure

The procedure was identical to that of Experiment 2 except that in addition to showing the thinking videos before each recommendation, performance feedback videos were shown after each agent block. If human-agent performance for the block was above 50%, the encouraging videos were shown. If human-agent performance was below 50%, the apologetic videos were shown.

54 2.4.2 Results

The Human-Machine Spectrum

There was a main effect for agent on perceived humanness, F (2, 40) = 175.6, p < 0.05. As found in Experiment 1, 2, and 3, simple effect analyses revealed that the computer (M =

1.26, SEM = 0.13) was considered significantly more machine-like than the human (M =

4.57, SEM = 0.16), t(40) = 16.91, p < 0.05. The avatar (M = 1.55, SEM = 0.13) was also considered significantly more machine-like than the human, t(40) = 15.45, p < 0.05, but this time did not significantly differ from the computer, t(40) = 1.46, p = 0.31.

Pattern Recognition Performance

There was a significant 3-way interaction between agent, reliability and phase block for pattern recognition performance, F (6, 126) = 2.36, p < 0.05 (see Fig. 2.21). Simple effect analyses revealed no differences between agents in the different phases.

55 1.0 Computer Avatar Human 0.8 0.6 0.4 pattern recognition performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.21: Agent by reliability for pattern recognition performance in Experiment 4. The red dashed line indicates chance performance on the task.

Human-agent performance

There was a significant interaction between agent and reliability on human-agent performance,

F (6, 126) = 2.38, p < 0.05 (see Fig. 2.22). Simple effects analyses showed significantly higher performance with a computer in the 100% condition compared to performance with the human, t(42) = 4.41, p < 0.05. In the 75% condition, working with the avatar resulted in higher performance than working with the human agent, t(42) = 4.32, p < 0.05.

56 1.0 Computer Avatar Human 0.8 0.6 0.4 human − agent performance 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.22: Agent by reliability for overall performance in Experiment 4.

Compliance

There was a significant main effect for agent on compliance, F (2, 42) = 4, p < 0.05 (see

Fig. 2.23). Simple effect analyses revealed compliance was significantly higher (M = 0.84,

SEM = 0.02) compared to compliance with the human (M = 0.77, SEM = 0.03), t(42) =

2.76, p < 0.05.

57 1.0 Computer Avatar Human 0.8 0.6 compliance 0.4 0.2 0.0 100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.23: Agent by reliability for compliance in Experiment 4. The red triangles indicate calibrated compliance.

Self-Reliance

There was a significant main effect for agent on self-reliance, F (2, 42) = 3.27, p < 0.05 (see

Fig. 2.24). Simple effects did not reveal significant differences between the agents.

58 1.0 Computer Avatar Human 0.8 0.6 self − reliance 0.4 0.2 0.0 100% 100% 75% 75% 50% 50% 0% 0%

agent by reliability by block

Figure 2.24: Agent by reliability for self-reliance in Experiment 4.

Trust

There was a main effect for agent on trust, F (2, 42) = 3.4, p < 0.05 (see Fig. 2.25). Simple effect analyses revealed trust was significantly higher in the computer agent (M = 5.52,

SEM = 0.32) compared to trust in the human (M = 4.74, SEM = 0.31), t(42) = 2.43, p

< 0.05.

59 10 Computer Avatar Human 8 6 trust 4 2 0

100% 75% 50% 0% 100% 75% 50% 0%

novel familiar

Figure 2.25: Agent by reliability for trust in Experiment 4. The red triangles indicate calibrated trust.

Perceived Security

There was a main effect for agent on perceived security, F (2, 40) = 25.69, p < 0.05 (see

Fig. 2.26).The computer agent was perceived to be providing the most security, followed by the avatar, and then the human. Perceived security decreased over time across all agents,

F (1, 20) = 6.68, p < 0.05.

60 5 Computer Avatar Human 4 3 2 Perceived Security Perceived 1 0

belief experience

Agent by Time

Figure 2.26: Agent by Time for perceived security in Experiment 4.

Perceived Morality

There was a significant main effect for agent on perceived morality, F (2, 40) = 88.08, p < 0.05 (see Fig. 2.27). Perceived morality was rated higher for the human agent compared to the computer and avatar agents. This effect remained stable before and after the experiment.

61 5 Computer Avatar Human 4 3 2 Perceived Morality Perceived 1 0

belief experience

Agent by Time

Figure 2.27: Agent by Time for perceived morality in Experiment 4.

2.5 Discussion

The purpose of this study was to investigate the effects of working with different cognitive agents on performance, compliance, and trust. With feedback updated throughout the experiment some differences emerged between the agents.

Particularly striking was the performance effect observed between the first two 100% reliability blocks. Participants complied more with the computer agent in the first block.

Compliance with the avatar and human agent was nearly identical. However, after the

first block, compliance with the avatar increased while compliance with the human agent

62 remained low.

The performance feedback provided after that first block may have had more of an effect on working with the avatar agent. By showing encouragement to participants it may have increased the overall trust in the avatar, which increased compliance.

63 Chapter 3: General Discussion

Four experiments were conducted to examine the differences and similarities between human-human trust and human-automation trust. A cognitive agent spectrum was developed to study cognitive agents that varied on humanness. Initial beliefs in agents were assessed as well as effects of direct experience over time with these agents on performance, compliance, and trust in a controlled version of an automated decision-aid paradigm. This investigation consistently showed that human-human trust is different from human-automation trust, which was mainly caused by the differences in agent characteristics. These initial beliefs interacted with agent experience such that the differences between human-human trust and human-automation trust disappeared over time as more became known about the performance of the agents. Finally, the Cognitive Spectrum has been shown here to be useful in classifying and categorizing different agents.

3.1 Human-Human Trust and Human-Automation Trust are

Different

The primary question of this investigation was whether human-human trust is different from human-automation trust. One side supports the contrast hypothesis, which proposes that human-automation trust is uniquely different from trust in humans and propose models that indicate differences in initial perception, automation monitoring performance, and judgments that lead to differences in trust (M. Dzindolet et al., 2003; Lee & See, 2004;

Madhavan & Wiegmann, 2007). Others support the opposing similarity hypothesis, which proposes that computers are essentially social actors and social psychological findings with humans can be replicated with machines (Nass et al., 1994). The current studies provided more support for the contrast hypothesis, which states that human-human trust is in fact 64 different from human-automation trust. These differences were primarily caused by the characteristics of the cognitive agent, which in these series of experiments involved the humanness of the agent. The computer, an agent that was machine-like, was trusted differently from the human agent.

Questionnaires used to characterize agent beliefs showed that the computer agent was considered to be providing more security compared to the human agent. These results are consistent with the view that automation is viewed initially as more favorable, and more positive compared to humans (M. Dzindolet et al., 2003; Madhavan & Wiegmann,

2007). However, computers were also complied with more than humans, primarily in the

first two blocks of all experiments. This is inconsistent with the finding that human aids were preferred more than computer aids (M. T. Dzindolet, Pierce, Beck, & Dawe, 2002).

One key difference between these studies is the task difficulty which determines the overall level of uncertainty and vulnerability in the experiment. Performance for the Dzindolet soldier detection task was around 90% whereas the average performance of the task used in the present experiments was just above chance level around 40%. There was thus a much greater dependence on the aid which resulted in much lower self-reliance than the Dzindolet experiments. This increased dependence may have caused the initial positive perceptions of the computer agent to persist in the compliance behaviors of participants.

Self-interested human agents were complied with less compared to self-interested computers.

This is a novel result as computers are typically not portrayed with malicious intent. There are several possible explanations for this result. The human agent was rated as much higher on the perceived morality scale compared to the computer agent. An agent that is considered to be aware of morals, but has bad intentions may therefore be considered more harmful than a computer that is unaware of morals and has no capacity to change its original program and behavior. This result is also consistent with an economic study with the ultimatum game (Sanfey et al., 2003). In the ultimatum game, a sum of money is split between two players. In this experiment, the proposer, another human or a computer, offers how the money should be split, and the second player can either accept or reject the offer.

65 If the offer is accepted, the money is split as proposed, but when rejected, neither player receives money. When the proposer keeps more money, the offer becomes more unfair and is more likely to be rejected. A striking result was found that unfair offers were rejected more often from a human agent compared to a computer agent (Sanfey et al., 2003). A key difference between receiving unfair offers from a human compared to a computer may be that it signals a clear intent on the part of the human player with a clear threat to the receiver. Accepting unfair from a human may cause embarrassment or may signal weakness to the other human that may impact future interactions. The computer receives no benefit from the transaction so the offers may be seen as gifts rather than true splits between two entities. Ascribing experiences of satisfaction, gloating or other senses to a computer agent may change this attitude (Gray & Wegner, 2012). Further, this is evidence against the notion proposed by Lee & See (2004) that automation lacks intent. This entirely depends on how to automation is portrayed and as shown in these experiments and in other research, it is relatively easy to manipulate and introduce agent intent with story descriptions (Delgado et al., 2005; Nass et al., 1994).

3.2 Agent Belief interacts with Agent Experience

As time went on in the experiment, other factors such as performance feedback equalized any differences between the agents. Compliance for the computer agents decreased more steeply compared to human agents, primarily because the initial compliance rates were higher. The lack of change in compliance and trust, or trust resilience, was lower for the computer agent compared to the human agent. In situations with high unreliability, rapidly changing situations, and noticeable errors, diminished trust resilience may be detrimental in the long run

This provides further support for the idea that initial agent belief or dispositional-based trust is different from agent experience or history-based trust (Merritt & Ilgen, 2008).

Knowledge about the agents changes quickly as a result of the available information in the experiment and the initial beliefs start to be replaced with directly observed information 66 from the agent. In a previous study, pre-task trust could be predicted by individual difference measures, but post-task was predicted by observed machine characteristics in the experiment (Merritt & Ilgen, 2008).

No additional trust decrement was observed for the computer agent compared to the human agent as predicted by the “perfect schema” framework

Performance feedback had a large effect on human-agent performance, compliance, and subjective trust, which were notably larger than the differences observed between the agents. With immediate and continuous performance feedback, participants could quickly and accurately tune in to the aid’s performance. In other words, they could assess the value of the aid better. This is consistent with Dzindolet et al. (2003) in which continuous feedback produces more calibrated results than cumulative delayed feedback.

As the reliability decreased over time in the experiment, so did performance, compliance and trust, a well-known and replicated finding in the human performance literature (Muir &

Moray, 1996; Lee & Moray, 1992; de Visser & Parasuraman, 2011). Trust was higher with immediate feedback compared to delayed feedback condition. This finding is consistent with other results that show that when situations or agents become more predictable trust increases, even when they are unreliable (de Visser et al., 2006; Cohen et al., 1998).

Ironically, the agents are then reliable in their unreliability.

3.3 The Cognitive Agent Spectrum Is Useful for Categorizing

Agents

This investigation primarily focused on trust in cognitive agents rather than assuming that the very process of trust is different for different agents, such as humans, machines, and robots as has been done in other research (Hancock et al., 2011; Madhavan & Wiegmann,

2007). Instead, it was proposed that the concept of human-agent trust can encompass different types of agents. A focus on the characteristics of the agents allowed for better examination of actual differences in trust a human shows when interacting with a computer,

67 avatar, or another human. The results of the surveys and agent belief questionnaires indicate that agents were rated along the cognitive spectrum with software and machine agents on the lower end of the spectrum, androids in the middle, and humans on the human end of the spectrum as shown in previous research (Bartneck et al., 2009). This result shows that agents can be classified relatively well along this dimension, although behavioral results with the agents do not fall clearly along this spectrum.

As indicated before, the most pronounced differences between agents occurred between the machine-like computer agent and human agent. The results with the avatar agent were not as clear, but showed a trend towards being rated and trusted as more human.

There may be several explanations for the observed results. Instead of responding to agents along the spectrum, participants may have made categorical judgments and decided the avatar was either more like a computer or a human depending on some threshold. This is consistent with the idea that people follow simple heuristics to make quick decisions in uncertain environments (Gigerenzer et al., 2000).

3.4 Limitations

There were a few limitations of this study. Participants did not interact with real humans, but with a virtual agent for all conditions. While this may not have had a large overall effect, the Human Agent was rated slightly lower on humanness after the experimental task.

This effect may be due to the fact that participants realized they played with a virtual human agent. Replicating this experiment with actual humans, robots, and computers may provide stronger support for the effects found in this study (Krach et al., 2008). Another limitation was that we only examined one intermediate Agent. It is less clear which human characteristics of the agent contribute the most to the overall perception of the Agent. One study that examined the design of robot heads on the perception of humanness showed that the more facial features the head contained, the more it was perceived to be human

(DiSalvo & Gemperle, 2002). Conversely, robots with wider heads were considered to be less human compared to those that had more human-like head width. To fully understand

68 the impact of these characteristics on trust and reliance, it will be critical to quantify and model these and other dimensions (i.e. personality, intelligence, interactiveness, etc.) of the cognitive agent spectrum. Taken together, this study provides evidence that increasing the humanness of the automation may help in a more calibrated trust and compliance with the agent resulting in better overall performance and trust.

3.5 Implications for Design and Training

The empathic feedback seems to have promoted higher compliance and performance. This is further consistent with the human-automation etiquette literature that suggests that computers who exhibit more human-like etiquette protocols and are more polite, greater trust and compliance can ensue (Hayes & Miller, 2010; Parasuraman & Miller, 2004).

Compliance and trust were consistently higher with the computer agent compared to the human agent. With a highly reliable system in a task with high uncertainty it may be good to portray the agent as more machine-like to promote higher compliance, especially in novel situations in which performance feedback is not readily available. If performance of the aid is less certain and more likely to be unreliable it may be prudent to equip the aid with more human features that can remediate potential errors and increase trust resilience.

Compliance was also increased with empathic feedback. This result provides further support for the idea that design automation that conforms to human etiquette increases collaboration. Providing a rationale has also been shown to increase trust in an aid

(M. Dzindolet et al., 2003). When an aid makes errors, efforts to repair trust may help to preserve trust and increase trust resilience.

3.6 Future Research

There are a number of avenues that may be pursued based on the current research program.

To further investigate whether HHT and HAT are different, fMRI studies may be helpful in differentiating underlying neural mechanisms. If HHT and HAT are fundamentally

69 different, then they should be associated with different underlying neural systems. If they are the same, then the neural systems should be similar. A third possibility is that the two types of trust share the same neural systems, but the systems interact differently for HHT compared to HAT. Importantly, behavioral measures alone, as collected in this investigation, are unlikely to allow one to distinguish between these possibilities, because the same behavioral outcome (e.g., an accurate choice or a speedy response) can be associated with very different underlying neural mechanisms. Assessing both performance and brain function can therefore provide more information than either alone.

Furthermore, other types of cognitive agents should be tested to further examine whether a cognitive agent spectrum is an appropriate model. Reliability levels can be varied to further examine differences between the agents. For instance, agents could have low reliability and slowly increase in reliability or have a stable reliability throughout the experiment.

Another improvement of the study could be to design the agents to be truly interactive. This may increase the realism of the experiment dramatically. Finally, the risk and vulnerability of participants may be increased by having them earn money first and then loosing it in the experiment.

3.7 Conclusion

There are several important theoretical contributions made to the current literature with this dissertation. First, all four experiments showed that human-human trust was different from human-automation trust. This difference emerged primarily from perceived differences based on the characteristics of the cognitive agents. Rather than identifying new unique indices of types of trust, future research should focus on human-agent trust so the differences between agents and their impact on trust may be examined. Second, agent belief interacted with agent experience. This is a helpful distinction to make as both components of trust can be predictive at different times. Future efforts should focus on how to assess each component separately and how they interact as was done in these experiments. Finally, factors such as reliability and feedback were shown in relation to the effects of cognitive

70 agents. Differences between agents may be particularly important with novel interactions in uncertain environments with little performance feedback.

71 Appendix A: Benevolent Stories

A.1 Computer Agent Story: PC-Check

Figure A.1: Introduction Picture of PC-Check.

PC-check is one of the most commonly used antivirus programs. Released from Memphis-based

Tennessee Technology Limited in 2008, it quickly became the most popular free antivirus software suite on the internet, and has as many users as most commercial antivirus programs.

PC-check exemplifies TTL’s philosophy of bringing advanced security measures to even the least computer-savvy people: it sports advanced malware detection and a simplified, easy-to-use interface. PC-check runs in the background during normal computer operation, monitoring which programs are using the CPU. PC-check makes a list of what the other active programs are trying to do, e.g., performing computations, reading from the hard drive, writing to the hard drive, etc., and then builds a profile for each program. If one of these programs shows enough similarities with a pre-set behavioral profile of a computer virus, PC-check will automatically stop the program. The user receives a warning about 72 the detected program, asking if the user recognizes the program and wants to allow it to function, or wishes to delete it. This prevents PC-check from accidentally deleting innocuous programs that behave somewhat like a piece of malware, as PC-check only collects data on how the CPU is being used. After the user makes a decision, the program is either deleted, or unblocked and allowed to run once more.

A relevant excerpt from a local paper:

”Tennessee Technology Limited was the recipient of a congressional honor yesterday as the company was lauded for its good work fighting computer viruses and adware. The company’s antivirus software, PC-check, is credited with single-handedly stopping an intrusion attempt at the Library of Congress earlier this week. began when an unknown team of hackers, armed with high-tech cyber warfare tools, attempted to bypass the library’s defenses by tunneling through an older, neglected PC still connected to the local network.

Fortunately, this computer still had an old copy of PC-check installed, which was able to freeze the trojan that was attempting to gain access. While experts can only speculate as to the motive behind the attempt, they agree that preventing it was very fortunate.”

A.2 Avatar Agent Story: TaxFiler

Figure A.2: Introduction Picture of TaxFiler.

73 TaxFiler (TF) is a piece of financial software designed to help users fill out and file their tax return quickly and easily. Distributed free of charge by the IRS in 2010, TF has been used by over 40,000 people nation-wide. This type of tax software is seen as something of a win-win by those at the IRS: people save time and stress with the programs simple and straightforward interface, and the IRS can save money by automating a large part of the auditing process.

Unlike most tax prep software, TF is unique in that it is meant to be run all year long.

The program is a standard desktop application, which is capable of handling the users

finances (similar to Quicken or Excel). However, in addition to standard financial functions,

TF also asks for data regarding the nature of expenditures, as well as scans of receipts.

This allows for all expenditures to be recorded more precisely and accurately (because theres less chance anything will slip the users mind), and because TF’s financial reports are continuously updated (and those reports can be easily verified viathe systems internal library of scanned receipts), the user is more likely to identify all possible deductions.

TaxFiler was mentioned in a recent IRS Press Statement:

Thanks to the proliferation of TF software, we are reducing the number of required audits for the 2010 fiscal year. Because the majority of returns were submitted and verified electronically by software packages such as TF, we do not feel the need for a traditional gamut of audits this year. We know this must come as welcome news for everyone, including those of us here in downtown DC. We hope everyone who used TF found it satisfactory and that even more people will use it next year.”

74 A.3 Human Agent Story: Matt O’Shea

Figure A.3: Introduction Picture of Matt O’Shea.

Matt O’Shea was born April 4th, 1990 in a small suburb near Tucson, Arizona. After graduating from Stewart Hill HS in 2008, he applied to, and was accepted into the English program at George Mason University. When not in class or studying at the library, he spends his time as a volunteer at the campus writing center, helping people work on their papers.

Matt has been a long believer in helping those in need. He is quick to lend a hand to a classmate who needs one. A selfless humanitarian, he enjoys being someone that people turn to when they need a favor, large or small. He says his inspiration comes from his father, who spent thirty years working as a police officer in the quiet Milltown suburbs.

75 Matt often spent afternoons riding along in his dads police cruiser, stopping to help people stuck on the side of the road with flat tires or engine troubles. The experience taught him what a small act of kindness can mean to a person in a jam, and Matt has always tried to live up to his fathers example.

An excerpt from a college newspaper: A GMU student has been awarded the Howard

McGills Memorial Scholarship. The award is bestowed annually for up to five university students nationwide in recognition of an exemplary record of service to the community. If you want to see the recipient, Matt O’Shea, at work, he can be found almost every day of the week at the University writing center in Robinson Hall, helping freshmen with their English

101 assignments. When Matt found out he had won, he said he felt honored and pledged to continue his work.

76 Appendix B: Malevolent Stories

B.1 Computer Agent Story: SecCrack

Figure B.1: Introduction Picture of SecCrack.

SecCrack is a powerful malware program, currently being called the most dangerous piece of software on the internet today. Originally developed by the Sofutou Corporation (and called SecTech) for the purpose of recovering lost passwords, it was quickly adapted for less than legal uses. In the past three years since SecCrack became a tool for hacker groups, its use has been documented in over one hundred criminal investigations.

The program works by first installing itself on a target computer. Once it is up and running, it searches through all available files on the computer for common text strings.

The authors of SecCrack assumed that the owner of the computer most likely used the same password for more than one thing or saved it as a note somewhere on the computer, so SecCrack is programmed to indiscriminately build a list of any and all 4 to 12 character strings located on the computer. Once it has finished compiling the list, it attempts 77 to log in to the intended target (i.e., the computer owner’s bank account) using all the potential passwords one after another until it either succeeds or runs out of possibilities.

If SecCrack succeeds in breeching whatever security is present, it executes whatever code it has been given for that particular task (typically reporting the correct password to the hackers operating SecCrack, but it can be customized).

SecCrack was mentioned in a recent issue of a cyber-crime magazine: ”Bank of America has recently come under attack by a group of cyber criminals. The hackers allegedly installed a modified version of the program known as SecCrack onto the bank managers workstation by posing as one of the bank’s custodial staff. The program was then remotely activated just after midnight, after most of the employees had gone home. Once activated, SecCrack quickly broke through all of the security measures present on the terminal. While the bank has yet to announce any official accounting of the damage, insider estimates place the amount stolen at over $500,000 from at least 1000 separate customer accounts.”

B.2 Avatar Agent Story: TXTSPY

Figure B.2: Introduction Picture of TXTSPY.

78 TXTSPY is a malicious program masquerading as an innocuous day planner. Offered on various websites as free software, it preys on people who dont know much about computer security. The program installs a convenient desktop interface and performs the functions one would expect from this type of software. However, TXTSPY has a hidden functionality where it secretly relays all the users appointments and plans to the programs creators. This allows them to know where and when the TXTSPYs users are at all times.

At its core, TXTSPY follows a very simple protocol: despite what the programs readme

file says, none of the information the user enters remains confidential. The program merely relays all the information it has to another source, namely the developers/owners of

TXTSPY. On the developers end, the software notes each users mailing address and sets up calendars to track and predict when each person is likely to be out of the house and for how long they will be gone. Repeated appointment and work schedules are extrapolated into the foreseeable future. Once a schedule is established, TXTSPYd developers sell this information to criminal organizations which can use the information to plan break-ins when they know the victim is going to be out of the house.

An article pertaining to TXTSPY clipped from an issue of the San Francisco chronicle:

The San Francisco police announced that the string of recent break-ins striking the city have been linked to a computer program. All the victims had their home burglarized when they were away from home, and each one used an electronic day planner on their computer. It is thought that the criminals hacked into the program and used what they found to coordinate their movements. A police spokesman advised residents to discontinue using any electronic calendar until arrests have been made.

79 B.3 Human Agent Story: Matt O’Shea

Figure B.3: Introduction Picture of Matt O’Shea.

Matt OShea was born in Houston, Texas, August 24th, 1990. After graduating from

Bartleby HS, a midsize high school in a suburb of Houston, he moved on to George Mason

University to pursue a degree in marketing. He doesnt do much besides study and attend class, but his focus and dedication have paid off. He has maintained a 4.0 throughout his college career and is recognized by faculty as one of the outstanding members of the junior class.

Matt is very achievement driven. He is fiercely competitive and prides himself on being number one. He treats his classes like a competition and its not a win unless he comes in

first. This need can sometimes have negative repercussions: Matt is not above getting his

80 hands dirty to come out on top. He will lie, cheat, and manipulate others to maintain his lead. He doesnt really see this as cheating per se, just a way to correct for things he sees as unfair. For instance, if an unexpected flat tire eats up a few hours the morning of a crucial test, it shouldnt be held against him. In his mind, if he had the time to study, he wouldnt have any trouble with the test, so bringing in a little cheat sheet is just a way of setting things right. He doesnt like bending the rules like this, but hell do it if its necessary to keep him in the 1 spot.

Matt was mentioned in the college newspaper: A fight on campus has lead to the arrests of two students Monday, junior Matt OShea and sophomore Dugan Nash. According to

Nash, he and OShea had planned to study together for their upcoming MKTG 305 final.

Nash, who had missed several weeks due to an illness asked to borrow OSheas lecture notes for the missing classes. Nash alleges that OShea provided him with fake materials containing wrong information, which caused him to fail the test, and leave OShea with the highest grade.

OShea declined to make a statement. The university ethics board is set to rule on the matter early next week.

81 References

82 References

Abe, G., & Richardson, J. (2006, September). Alarm timing, trust and driver expectation for forward collision warning systems. Applied ergonomics, 37 (5), 577–86.

Adolphs, R. (2002). Trust in the brain. nature neuroscience, 5 (3), 192–193.

Adolphs, R., Tranel, D., & Damasio, A. R. (1998). The human amygdala in social judgment. Nature, 393 , 470–474.

Apple. (2012). Siri. Your wish is its command. Available from http://www.apple.com/iphone/features/siri.html

Axelrod, R., & Hamilton, W. (1981, March). The evolution of cooperation. Science, 211 (4489), 1390–1396.

Baas, D., Aleman, A., & Kahn, R. (2004). Lateralization of amygdala activation: a systematic review of functional neuroimaging studies. Brain Research Reviews, 45 (2), 96–103.

Bahner, J., Huper, A., & Manzey, D. (2008). Misuse of automated decision aids: Complacency, automation bias and the impact of training experience. International Journal of Human-Computer Studies, 66 (9), 688–699.

Bailey, N., Scerbo, M., Freeman, F., Mikulka, P., & Scott, L. (2006). Comparison of a brain-based adaptive system and a manual adaptable system for invoking automation. Human Factors, 48 (4), 693.

Barber, B. (1983). The logic and limits of trust. New Brunswick, NJ: Rutgers University Press.

Bartneck, C., Kuli´c, D., Croft, E., & Zoghbi, S. (2009). Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal of Social Robotics, 1 (1), 71–81.

Boudreau, C., McCubbins, M., & Coulson, S. (2009). Knowing when to trust others: An

83 ERP study of decision making after receiving information from unknown people. Social Cognitive and Affective Neuroscience, 4 (1), 23–34.

Breazeal, C. (2002). Designing sociable robots. Cambridge, MA: MIT press.

Brierley, B., Shaw, P., & David, A. (2002). The human amygdala: a systematic review and meta-analysis of volumetric magnetic resonance imaging. Brain Research Reviews, 39 (1), 84–105.

Brown, P., & Stephen, C. (1987). Politeness: Some Universals in Language Usage. Cambridge, MA: Cambridge University Press.

Cohen, M., Parasuraman, R., & Freeman, J. (1998). Trust in decision aids: A model and its training implications. In Proceedings of the 1998 command and control research and technology symposium (pp. 1–37).

Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: G.P. Putnams Sons.

Delgado, M., Frank, R., & Phelps, E. (2005). Perceptions of moral character modulate the neural systems of reward during the trust game. Nature Neuroscience, 8 (11), 1611–1618. de Visser, E. J., & Parasuraman, R. (2011). Adaptive aiding of human-robot teaming: Effects of imperfect automation on performance, trust, and workload. Journal of Cognitive Engineering and Decision Making, 5 (2), 209–231. de Visser, E. J., Parasuraman, R., Freedy, A., Freedy, E., & Weltman, G. (2006). A comprehensive methodology for assessing human-robot team performance for use in training and simulation. In Proceedings of the 50th annual meeting of the human factors and ergonomics society. San Francisco.

Dijkstra, J., & Liebrand, W. (1998). Persuasiveness of expert systems. Behaviour & Information Technology, 17 (3), 155–165.

DiSalvo, C., & Gemperle, F. (2002). All robots are not created equal: the design and perception of humanoid robot heads. In Dis ’02 proceedings of the 4th conference on designing interactive systems: processes, practices, methods, and techniques (pp. 321–326). New York, NY.

Dzindolet, M., Peterson, S., & Pomranky, R. (2003). The role of trust in automation reliance. International Journal of Human-Computer Studies, 58 (6), 697–718.

Dzindolet, M. T., Pierce, L. G., Beck, H. P., & Dawe, L. A. (2002, January). The Perceived Utility of Human and Automated Aids

84 in a Visual Detection Task. Human Factors: The Journal of the Human Factors and Ergonomics Society, 44 (1), 79–94. Available from http://hfs.sagepub.com/cgi/content/abstract/44/1/79

Economist. (2012, June). Morals and the machine. Economist.

Engell, A., Haxby, J. V., & Todorov, A. (2007). Implicit trustworthiness decisions: automatic coding of face properties in the human amygdala. Journal of Cognitive Neuroscience, 19 (9), 1508–1519.

Epley, N., Waytz, A., & Cacioppo, J. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114 (4), 864–886.

Finucane, M., Alhakami, A., Slovic, P., & Johnson, S. (2000). The affect heuristic in judgments of risks and benefits. Journal of Behavioral Decision Making, 13 (1), 1–17.

Fox, J., & Boehm-Davis, D. (1998). Effects of age and congestion information accuracy of advanced traveler information systems on user trust and compliance. Transportation Research Record: Journal of the Transportation Research Board(1621), 43–49.

Gigerenzer, G., Todd, P. M., & Group, A. R. (2000). Simple Heuristics That Make Us Smart. Oxford, United Kingdom: Oxford University Press, USA.

Gillespie, N., & Dietz, G. (2009). Trust repair after an organization-level failure. The Academy of Management Review, 34 (1), 127–145.

Gray, K., & Wegner, D. M. (2012, July). Feeling robots and human zombies: Mind perception and the uncanny valley. Cognition. Available from http://www.ncbi.nlm.nih.gov/pubmed/22784682

Hancock, P., Billings, D., Schaefer, K. E., Chen, J. Y., de Visser, E., & Parasuraman, R. (2011). A meta-analysis of factors affecting trust in human-robot interaction. Human Factors, 53 (5), 517–527.

Hayes, C., & Miller, C. (2010). Human-Computer Etiquette: Cultural Expectations and the Design Implications They Place on Computers and Technology. Auerbach Publications.

Hofstede, G. (1984). Culture’s consequences: International differences in work-related values. Sage Publications, Inc.

Honda. (2012). Asimo heads to Germany for AMI 2012. Available from http://asimo.honda.com/

85 Jian, J., Bisantz, A., & Drury, C. (2000). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4 (1), 53–71.

King-Casas, B., Tomlin, D., Anen, C., Camerer, C., Quartz, S., & Montague, P. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308 (5718), 78.

Kosfeld, M., Heinrichs, M., Zak, P. J., Fischbacher, U., & Fehr, E. (2005, June). Oxytocin increases trust in humans. Nature, 435 (7042), 673–6.

Krach, S., Hegel, F., Wrede, B., Sagerer, G., Binkofski, F., & Kircher, T. (2008, January). Can machines think? Interaction and perspective taking with robots investigated via fMRI. PloS one, 3 (7), e2597.

Kramer, R. (1999). Trust and distrust in organizations: Emerging perspectives, enduring questions. Annual review of psychology, 50 , 569–698.

Krueger, F., McCabe, K., Moll, J., Kriegeskorte, N., Zahn, R., Strenziok, M.Grafman, J. (2007). Neural correlates of trust. Proceedings of the National Academy of Sciences, 104 (50), 20084–20089.

Kurzweil, R. (2005). The singularity is near: When humans transcend biology. New York, NY: The Viking Press.

Lee, J., & Moray, N. (1992). Trust, control strategies and allocation of function in human-machine systems. Ergonomics, 35 (10), 1243–1270.

Lee, J., & See, K. (2004). Trust in automation : Designing for appropriate reliance. Human Factors, 46 (1), 50–80.

Lejuez, C. W., Read, J. P., Kahler, C. W., Richards, J. B., Ramsey, S. E., Stuart, G. L.Brown, R. a. (2002). Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: Applied, 8 (2), 75–84.

Lerch, F., Prietula, M., & Kulik, C. (1997). The Turing effect: The nature of trust in expert systems advice. In Expertise in context (pp. 417–448). Cambridge, MA: MIT Press.

Lewicki, R., Tomlinson, E., & Gillespie, N. (2006). Models of interpersonal trust development: Theoretical approaches, empirical evidence, and future directions. Journal of Management, 32 (6), 991–1022.

Lewicky, R., & Bunker, B. (1996). Developing and maintaining trust in work

86 relationships. In R. Kramer & T. Tyler (Eds.), Trust in organizations: Frontiers of theory and research (p. 440). Sage Publications, Inc.

Loewenstein, G., Weber, E., Hsee, C., & Welch, N. (2001). Risk as feelings. Psychological bulletin, 127 (2), 267–286.

Lyons, J. B., & Stokes, C. K. (2011, December). Human-Human Reliance in the Context of Automation. Human Factors: The Journal of the Human Factors and Ergonomics Society, 54 (1), 112–121. Available from http://hfs.sagepub.com/cgi/content/abstract/54/1/112

Madhavan, P., Wiegmann, D., & Lacson, F. (2006). Automation failures on tasks easily performed by operators undermine trust in automated aids. Human Factors, 48 (2), 241.

Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between human-human and human-automation trust: An integrative review. Theoretical Issues in Ergonomics Science, 8 (4), 277–301.

Mayer, R., Davis, J., & Schoorman, F. (1995). An integrative model of organizational trust. Academy of management review, 20 (3), 709–734.

Meadows, M. (2010). We, Robot: Skywalker’s Hand, Blade Runners, Iron Man, Slutbots, and How Fiction Became Fact. Lyon, France: Lyons Press.

Merritt, S., & Ilgen, D. (2008). Not all trust is created equal: Dispositional and history-based trust in human-automation interactions. Human factors, 50 (2), 194–210.

Meyerson, D., Weick, K., & Kramer, R. (1996). Swift trust and temporary systems. In R. Kramer & T. R. Tyler (Eds.), Trust in organizations (pp. 166–195). Thousand Oaks, CA: Sage Publications, Inc.

Montague, P., King-Casas, B., & Cohen, J. (2006). Imaging valuation models in human choice. Annual Review of Neuroscience, 29 , 417–48.

Muir, B. (1994). Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics, 37 (11), 1905–1922.

Muir, B., & Moray, N. (1996). Trust in automation: Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics, 39 (3), 429–460.

Nass, C., Fogg, B., & Moon, Y. (1996). Can computers be teammates? International Journal of Human-Computer Studies, 45 (6), 669–678.

87 Nass, C., & Lee, K. (2001). Does Computer-Synthesized Speech Manifest Personality? Experimental Tests of Recognition, Similarity-Attraction, and Consistency-Attraction. Journal of Experimental Psychology, 7 (3), 171–181.

Nass, C., & Moon, Y. (2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56 (1), 81–103.

Nass, C., Moon, Y., Fogg, B., Reeves, B., & Dryer, D. (1995, August). Can computer personalities be human personalities? International Journal of Human-Computer Studies, 43 (2), 223–239.

Nass, C., Steuer, J., & Tauber, E. (1994). Computers are social actors. In Chi ’94 proceedings of the sigchi conference on human factors in computing systems (pp. 73–78). Boston, MA.

Olson, P. (2012). We Are Anonymous: Inside the Hacker World of LulzSec, Anonymous, and the Global Cyber Insurgency. New York, NY: Little, Brown and Company.

Parasuraman, R., de Visser, E., Lin, M.-K., & Greenwood, P. M. (2012, January). Dopamine Beta hydroxylase genotype identifies individuals less susceptible to bias in computer-assisted decision making. PloS one, 7 (6), e39675.

Parasuraman, R., & Manzey, D. (2010). Complacency and Bias in Human Use of Automation: An Attentional Integration. Human Factors, 52 (3), 381–410.

Parasuraman, R., & Miller, C. (2004). Trust and etiquette in high-criticality automated systems. Communications of the ACM , 47 (4), 51–55.

Parasuraman, R., Molloy, R., & Singh, I. (1993). Performance consequences of automation-induced’complacency’. The International Journal of Aviation Psychology, 3 (1), 1–23.

Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39 (2), 230–253.

Parasuraman, R., Sheridan, T., & Wickens, C. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 30 (3), 286–297.

Parasuraman, R., Sheridan, T., & Wickens, C. (2008). Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making, 2 (2), 140–160.

Reeves, B., & Nass, C. (1996). The media equation: how people treat computers,

88 television, and new media like real people and places. Cambridge, MA: Cambridge University Press.

Rempel, J., Holmes, J., & Zanna, M. (1985). Trust in close relationships. Journal of personality and social psychology, 49 (1), 95–112.

Rotter, J. (1980). Interpersonal trust, trustworthiness, and gullibility. American Psychologist, 35 (1), 1–7.

Rousseau, D., Sitkin, S., & Burt, R. (1998). Not so different after all: A cross-discipline view of trust. Academy of Management Review, 23 (3), 393–404.

Rovira, E., McGarry, K., & Parasuraman, R. (2007). Effects of imperfect automation on decision making in a simulated command and control task. Human Factors: The Journal of the Human Factors and Ergonomics Society, 49 (1), 76–87.

Said, C. P., Baron, S. G., & Todorov, A. (2009). Nonlinear amygdala response to face trustworthiness: contributions of high and low spatial frequency information. Journal of Cognitive Neuroscience, 21 (3), 519–528.

Sanfey, A. G., Rilling, J. K., Aronson, J. A., Nystrom, L. E., & Cohen, J. D. (2003). The neural basis of economic decision-making in the Ultimatum Game. Science, 300 (5626), 1755–1758. Available from http://www.ncbi.nlm.nih.gov/pubmed/12805551

Sanger, D. (2012, June). Obama Order Sped Up Wave of Cyberattacks Against Iran. New York, NY.

Sarter, N., Woods, D., & Billings, C. (1997). Automation Surprises. In G. Salvendy (Ed.), Handbook of human factors & ergonomics (2nd ed., pp. 1926–1943). Wiley Publishing.

Sheridan, B. T. B., & Parasuraman, R. (2005). Human-Automation Interaction. Reviews of Human Factors and Ergonomics, 1 (1), 89–129.

Singh, I., Molloy, R., & Parasuraman, R. (1993). Automation-induced ”complacency”: Development of the complacency-potential rating scale. The International Journal of Aviation Psychology, 3 (2), 111–122.

Slovic, P., Finucane, M., Peters, E., & MacGregor, D. (2007). The affect heuristic. European Journal of Operational Research, 177 (3), 1333–1352.

Thompson, J., Trafton, G., McCurry, M., & Francis, E. (2010, April). Perceptions of an animated figure as a function of movement naturalness: No sign of the uncanny valley. Journal of Vision, 8 (6), 906–906. Available from

89 http://www.journalofvision.org/content/8/6/906.abstract

Tomlinson, E., & Mayer, R. (2009). The role of causal attribution dimensions in trust repair. The Academy of Management Review, 34 (1), 85–104.

Turing, A. (1950). Computing machinery and intelligence. Mind, 59 (236), 433–460. van Dongen, K., & Maanen, P. (2006). Under-reliance on the Decision Aid: A Difference in Calibration and Attribution between Self and Aid. In Proceedings of the human factors and ergonomics society’s 50th annual meeting. San Francisco.

Weber, E., Blais, A., & Betz, N. (2002). A domain-specific risk-attitude scale: Measuring risk perceptions and risk behaviors. Journal of Behavioral Decision Making, 15 (4), 263–290.

Wickens, C. D., & Dixon, S. R. (2007, May). The benefits of imperfect diagnostic automation: a synthesis of the literature. Theoretical Issues in Ergonomics Science, 8 (3), 201–212. Available from http://dx.doi.org/10.1080/14639220500370105

Winston, J., Strange, B., O’Doherty, J., & Dolan, R. (2002). Automatic and intentional brain responses during evaluation of trustworthiness of faces. nature neuroscience, 5 (3), 277–283.

90 Curriculum Vitae

Ewart J. de Visser was born in Alphen aan de Rijn, The Netherlands. He received a 1st year completion degree in Cognitive Artificial Intelligence at Utrecht University, The Netherlands. He received the degree of Bachelor of Arts in Film Studies from the University of North Carolina at Wilmington in 2005. He received his Master of Arts in Human Factors and Applied Cognition from George Mason University in 2007. He is currently a Human Factors scientist at Perceptronics Solutions, Inc.

91