Quick viewing(Text Mode)

A Balanced Mind from Biased Processes

A Balanced Mind from Biased Processes

A Whole that may be Greater than the Sum of its Parts:

A Balanced Mind from Biased Processes

Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts

in the Graduate School of The Ohio State University

By

Adam Edward Hasinski, B.A.

Graduate Program in Psychology

The Ohio State University 2010

Thesis Committee:

William A. Cunningham, Advisor

Gary G. Berntson

Russell H. Fazio

Copyright by

Adam Edward Hasinski

2010

Abstract

We propose that evaluative , often taken as evidence for the fallibility of human judgment, serve as valuable heuristics, as long as uncertainty is high, by allowing organisms to safely learn about their environment. As uncertainty diminishes via experience, these biases lose their utility and should also wane. Further, we argue that multiple evaluative biases may work in dynamic opposition to produce individuals with balanced and flexible decision-making abilities. These hypotheses were tested using reinforcement learning models to predict subject’s evaluative representations in a newly designed learning task. These computational models indicate that multiple biased decision-making processes influence evaluation under uncertainty. Further our models suggest that these biases act antagonistically towards one another. Finally, our last model indicates that these biases decrease with increasing exposure to the learning environment.

The implications of these findings are discussed.

ii

Dedication

Dedicated to my parents, Ed and Linda Hasinski

iii

Acknowledgements

I would like to thank all the former and current members of the Social, Cognitive

& Affective Neuroscience Lab—Nathan Arbuckle, Andrew Jahn, Ingrid Johnsen, Tabitha

Kirkland, Samantha Mowrer, Dominic Packer, and Jay Van Bavel—for their endless supply of support, patience, and baked goods.

I would also like to thank my research assistants—Geetha Athanki, Anna Preston, and Jessica Stum—for their help in collecting data.

I am grateful for the insightful comments and words of encouragement from various students and faculty within OSU’s psychology department.

I am also grateful to my family for their never-ending support, understanding and confidence in me.

Finally, I would like to thank my advisor, Wil Cunningham, for his encouragement, his contagious enthusiasm, and his ability to always see the bright side.

iv

Vita

2008...... B.A. Psychology & History, Ohio University

2008-2009...... Graduate Fellow, The Ohio State University

2009-present...... Graduate Teaching Associate, The Ohio State University

Publications

1. Ratcliff, J. J., Lassiter, G. D., Jager, V. M., Lindberg, M. J., Elek, J. K., Hasinski,

A. E. (2010). The Hidden Consequences of Racial Salience in Videotaped Interrogations and Confessions. Psychology, Public Policy & Law, 16, 2, 200-218

Fields of Study

Major Field: Psychology

v

Table of Contents

Page Abstract...... ii Dedication...... iii Acknowledgements...... iv Vita...... v Table of Contents...... vi List of Tables...... vii List of Figures...... viii

Chapters:

1. Introduction...... 1

2. Method...... 21

3. Behavioral Analysis...... 25

4. Model 1: Results & Discussion...... 27

5. Model 2: Results & Discussion...... 29

6. Model 3: Results & Discussion...... 33

7. General Discussion...... 35

List of References...... 46 Appendix A: Tables and Figures...... 53

vi

List of Tables

Table Page

1 Reinforcement learning equations...... 54

vii

List of Figures

Figure Page

1 Conceptual overview of the computational modeling process...... 55

2 Example trial from the value estimation task...... 56

3 Single subject value output from Model 1…………………...... 57

viii

Chapter 1: Introduction

Attitudes and evaluations help individuals select appropriate actions in a complicated, ever-changing and often dangerous world. Attitudes are stores of information, derived from experience, used for the purpose of making evaluative judgments about a given object (e.g. Ajzen, 1991; Cunningham, Zelazo, Packer & Van

Bavel, 2007; Fazio & Olson, 2003). For the development of accurate attitude representations, they must be updatable as new information from the environment becomes available. The utility of evaluations depends on their ability to correctly influence behaviors. Therefore, attitudes and the evaluations they generate form a circuit with behavior and the environment. Attitudes form the basis of the evaluations that guide behavioral interactions with the environment. The consequences of these interactions then update attitudes.

In attempting to delineate the boundaries of human cognitive performance, considerable attention has been focused on biases. Broadly speaking, biases refer to the systematic errors in the result of some mental function. The study of not only highlights the limitations of human reasoning, but also has the potential to also elucidate reasoning’s component mechanisms. Much like vision researchers use instances of vision failure—via optical illusions—to piece together the visual process, decision researchers can use evaluative biases to better understand the processes underlying

1 human decision-making ability (e.g. Kahneman & Tversky, 1979). Of course, in the real world, determining what information is relevant can be an exceedingly difficult matter, for subject and researcher alike. In response to the murky nature of reality, experimental psychologists have constructed a variety of artificial contexts that clearly, in the eyes of the experimenter, distinguish the relevant from the irrelevant. After decades of research, it’s no longer contentious to claim the human mind contains, or is laden with, bias.

Various research camps have made great strides understanding what biases exist, in what contexts they operate, and what function—if any—they serve (e.g. Tversky &

Kahneman, 1992). Though research into evaluative biases is far from exhaustive, the general wealth of research findings supporting the existence of biases has, in the minds of some, painted a bleak picture of human evaluative abilities.

What are we to make of human cognitive abilities in the face of such evidence?

Does the extant research suggest that the human mind at its core is—much to the chagrin of economists—a hopelessly irrational machine? Though various camps have argued against the existence of such irrationality (e.g. Gigerenzer & Goldstein, 1996), the argument often becomes one of semantics, hinging on one’s precise definition of rationality. At best, the evidence from these various researchers tends to further elucidate the conditions under which a given bias will or won’t manifest itself, thereby attenuating the claims of the bias-supporters without repudiating their essence (Gigerenzer 2006,

Todd & Gigerenzer, 2000). How then does this body of literature fit in with the well- known truism that humans have survived a hostile and changing world for hundreds of millennia? The answer: biased processes are not necessarily useless or even defective; in

2 contrast, their biased nature may prove to be useful. This has been the often-articulated explanation for the well-documented (e.g. Baumeister, et al., 2001).

While intuitively appealing, the claim is incredibly difficult to verify or falsify.

Furthermore, the claim appears less solid under rigorous logical scrutiny.

The existence of biases are often explained by gesturing to evolution and suggesting, seemingly logically, that such a bias would increase the fitness of organisms who possess it. Ironically, this argument could be spun to fit a valence asymmetry in either direction. That is one may argue, in line with the majority of the literature, that “it is better to be safe than sorry;” therefore, negativity biases, serve to guide organisms away from potentially fatal dangers. Conversely one could argue that “nothing ventured is nothing gained;” therefore, a positivity bias encourages exploration of the environment, which has been shown to be crucial for learning and optimal decision making (Fazio,

Eiser & Shook, 2004). As a result, positivity biases are often thought of as exploration biases. Such biases would be useful for finding both physical resources and knowledge about the world. Perhaps, the type of bias manifested depends on regulatory focus

(Higgins, 1997), with prevention focus leading to negativity bias and promotion focus leading to a positivity bias. Given the ambivalent logical argument, empirical evidence is often sought to settle the matter. As we will see however, the empirical literature does not appear to support one bias to the exclusion of the other.

In light of this, different researchers have proposed three general solution types.

Firstly, some have looked for a consistent trend across all domains and declared the bias with the most support to be the winner, as some have done with the negativity bias (e.g.

3

Baumeister, et al. 2001). Secondly, rather than searching for a single bias, one could suggest that positivity and negativity biases might each have merit, within specific domains. Finally, one could propose, as some have (e.g. Cacioppo & Berntson, 1994;

Peeters & Czapinski, 1990; Taylor, 1991), that human behavior is the product of multiple, opposing biases.

We argue that, via multiple antagonistic evaluative biases, humans achieve balanced, effective interaction with their environment, approaching the rationality dreamed of by philosophers and economists. We propose these biases serve two functions. The first function is to guide behavior through uncertain contexts. That is, when the organism lacks the knowledge needed to make an informed decision, these biases encourage exploratory or self-protective behavior. As knowledge about the environment is accrued, these biases may lose their utility. As a result, we expect these biases to diminish as experience leads to greater certainty. The second function of each bias is to dynamically oppose the other so as to prevent either bias from running rampant and leading to harmful behavioral decisions. This further suggests that, in response to increasing certainty, the biases should wane in proportion to each other, thus preventing a skewed evaluative system. In Part I, we will discuss some of the more prominent multi- bias theories, which explain a range of phenomena. In Part II, we will define components of the evaluative process, suggest how each component might be biased, and discuss the implications for these different potential biases. Ultimately, we will focus on the possibility of two distinctly cognitive mechanisms for evaluative biases and show that the extant literature typically fails to distinguish between these different biases. Finally, to

4 better decompose the cognitive anatomy underlying evaluative decision-making and elucidate the nature of its corresponding biases, Part III will introduce a computational methodology. We will use this methodology, in conjunction with a new behavioral paradigm to answer the following questions:

1. If evaluative biases exist, where in the evaluative-decision-making process

do they occur?

2. Do opposing biases co-occur?

3. Do these biases diminish as uncertainty wanes?

Part I: Multi-bias Models

Psychological models that combine multiple biases offer unique insight into cognition and behavior. Often psychologists work to identify and isolate cognitive biases. Just as this compartmentalization is a necessary step for increasing our understanding of cognition, it is also necessary to recombine those biased processes.

This not only provides a more complete picture of the mind but also helps us to understand the functional value each individual bias provides. It is in the interaction among components that one can see the role each has to play. Early theories laid the groundwork by positing that positivity and negativity biases could coexist. Later models built off this foundation with some developing more detailed descriptive accounts of the biases while others suggested the adaptive function of having multiple biases working in opposition.

In the 1930s, theorists began to postulate the differential impact of positive and negative events for behavior. The goal gradients hypothesis (GGH; Hull 1932) was one

5 of the first attempts to provide a theory that combined positive and negative biases.

Influential in the field of motivation, according to the theory basic drives are conceptualized in one of two basic ways: the motivation to approach, in the case of appetitive stimuli, and the motivation to avoid, in the case of aversive stimuli. Motivation increases as one approaches, spatially or temporally, the goal-related stimulus. For complex stimuli—those that have both appetitive and aversive elements—the organism may be experience conflicting drives, that is, motivations to both approach and avoid.

This approach-avoidance conflict demonstrates both motivational asymmetries of conflict theory (Miller, 1944). The motivational asymmetries are driven by the fact that the avoidance gradient is steeper than the approach gradient. For instance, rats demonstrate a greater increase in effort to avoid electric shock than they do to attain food, as they get closer to the goal object (Miller & Murray, 1952). As a result of this slope difference, in an approach-avoidance conflict, the two gradients often assumed to intersect. At this theoretical intersection, the equilibrium point, the two drives are equivalent in magnitude.

If the organism is closer to the goal-object than the equilibrium point, the avoidance gradient will be stronger and the organism will tend to avoid the stimulus. However, if the organism is farther from the goal-object than the equilibrium point then the approach gradient will dominate and the organism will tend to approach the stimulus. These effects carry over from animal research into the study of human subjects. For instance, when asked to choose a project that will be completed in the distant future, individuals are more influenced by the positive characteristics of their choices; however, when the task must be completed soon, people are more influenced by the negative characteristics

6

(Liberman & Trope, 1998)—an effect that is consistent with GGH. Therefore GGH incorporates both positivity and negativity biases, with distance from the goal-object determining which bias will be expressed in a given context.

Multiple evaluative biases are a logical extension from these motivational asymmetries. Humans often give greater weight to negative information (e.g. ; Kahneman & Tversky, 1979). Nevertheless, sometimes positive information has greater influence (e.g. person-positivity bias; Sears 1983). As an example of a two-bias theory, the evaluative space model (ESM; Cacioppo & Berntson, 1994) attributes coexistence of separable positive and negative evaluative processes. ESM’s negativity bias operates in a manner similar to biases in other theories by giving increased weight to negative information. Thus, negative information has more subjective impact on evaluations than positive information when they are objectively equivalent in magnitude.

However, the positive asymmetry—takes the form of a constant that increases the subjective value of all objects equally, regardless of their objective magnitude. This type of bias helps foster approach behavior towards novel stimuli. That is, it encourages exploration of the environment. For this reason, Cacioppo and Berntson refer to the positive asymmetry as an “offset,” denoting its conceptualization as a constant, while the negative asymmetry, a coefficient, is referred to as a “bias.” Using IAPS photos, evidence for these biases has been found using both bivariate evaluative ratings (Ito,

Cacioppo, & Lang 1998) and event related potentials (Smith, Cacioppo, Larsen &

Chartrand, 2003). Further support comes from an task evidencing both a positivity offset and a negativity bias (Gardner, 1996). After getting initially

7 neutral, even uninformative information, subjects showed elevated positive evaluations relative to negative evaluations, suggesting a positivity offset. However, subjects were then given further information about the target. Negative information led to a greater change in both positive and negative evaluative ratings than positive information, suggesting a negativity bias. This interpretation fits with the functional roles ESM assigns to positivity offset and negativity bias. The former acts to encourage exploration in the absence of evaluative information and counteract neophobia (Cacioppo &

Berntson, 1997). The latter serves to protect the organism from aversive environments.

We propose the negativity bias serves an additional, but related, function: The negativity bias may serve a corrective function, working against the positivity offset to counteract any damage the approach bias might otherwise cause (cf. Peeters & Czapinski, 1990).

Although it is possible that dual biases are merely co-occurring glitches in our cognitive architecture; another possibility is that these two biases evolved together to optimize behavior through dynamic opposition. While this is a plausible extension of

ESM, other theories make explicit this prediction. For instance, the mobilization- minimization hypothesis (MMH; Taylor, 1991) suggests that both negativity and positivity biases operate in tandem and work to offset one another. MMH suggests that an organism mobilizes its resources to deal with events and then engages a separate set of minimization processes to keep the mobilization in check. According to MMH, negative events—due to their increased relevance—elevate a variety of psychological processes to a greater extent than positive events of comparable magnitude. For example, negative mood leads to greater focusing of attention (Eysenck, 1976) and more elaborative

8 processing of information (Bless, Bohner, Schwarz & Strack, 1990) than positive mood.

However, MMH goes on to suggest that the organism also engages attempts to negate responses to negative stimuli to a greater extent than it negates changes occurring in response to positive stimuli. For instance positive events are recalled more easily than negative events (Matlin & Strang, 1978), the so-called principle. As another example, positive emotions, such as relief, often follow on the heels of negative emotions once the aversive context has abated (Solomon & Corbit 1974); yet, negative emotions rarely follow in the wake of positive events. Relatively greater mobilization for negative events is argued to have obvious survival-related advantages. The enhanced minimization of negative events is hypothesized to offset, constrain or otherwise temper the asymmetric mobilization. In this way, MMH suggests that two biases, working against each other, create an appropriate and efficacious psychology.

A biased process, appearing clumsy and perhaps even harmful if observed in isolation, may be revealed to have high utility when its role is examined in the context of a larger system, which may include other biases. As can be seen, different researchers, studying different topics, have developed models involving both positive and negative valence asymmetries. The evidence supporting these theories suggests that biases are functional, facilitating survival and well-being. Furthermore, at least some of these models explicitly explain the two biases as working against one another towards the goal of achieving a balanced psychology. However, one bias, working in the absence of the other, would likely lead to maladaptive cognitive and behavioral patterns. This may explain a paradox in social psychology. The literature is ripe with evidence supporting

9 irrational cognitive processes; yet, as is evidenced by our continued survival, humans appear to be highly functional and capable of flexibly interacting with a changing environment. The existence of multiple biases does not, however, suggest where in the decision-making process they occur.

Part II: How and Where Biases Impact Decision Making

To understand where evaluative biases occur, we must first settle on a framework for evaluations. We begin with mental representations of both the environment and behavior, along with a link between the two. This simple model allows sensory stimulation to activate the representation of a given context, which in turn triggers specific behaviors. This simple model can capture the inflexible behavioral repertoire of simple organisms along with the reflexive acts of more evolved creatures.

The stimulus-behavior link may be strengthened or attenuated as in sensitization and habituation respectively; nevertheless, more advantageous patterns of behavior require the formation of associations as a product of experience. These associations connect the stimulus to various informational representations. Ultimately, these three representational sets lead to the emergence of evaluative representations and the attachment of valence to environmental objects. These evaluations can then influence the actions that are initiated in response to stimuli. Brain phylogeny is correlated with this associative learning ability, with more neurologically developed animals exhibiting more powerful and flexible evaluation-based control over their object-behavior dispositions.

The iterative reprocessing (IR) model of attitudes (Cunningham et al., 2007) provides a more detailed framework for understanding how attitudinal representations emerge from

10 lower order representations. Though the details of this process are not critical for the purposes of this paper, it is important to understand the associative nature of evaluations if we are to understand how they influence behavior. The utility of evaluations established, we must now briefly discuss how these evaluations are formed and modified.

Evaluations depend on two cognitive learning mechanisms: associative learning and backpropagation. The former mechanism, also called “Hebbian” learning after its discoverer, simply involves the buildup of associative strength between two representations as a function of their coactivity (Hebb, 1949). While a wealth of neurobiological evidence supports the existence of Hebbian learning, decades of computational neuroscience research suggest that Hebbian learning alone is insufficient to account for more complicated cognitive phenomena (O’Reilly & Munakata, 2000).

The latter mechanism, also known as error-driven (ED; Bryson & Ho, 1969) learning, requires a two-stages. First, in the presence of a stimulus, relevant evaluative representations are activated. Then, in response to the consequence of encountering said stimulus, other evaluative representations are activated. The disparity between the two representational sets is an error signal, which is used to modify the evaluative representations connected to the object of evaluation. This mechanism has proved more successful at mimicking cognitive phenomena (O’Reilly & Munakata, 2000). As a result, many cognitive scientists have turned their focus towards ED learning, despite continuing uncertainty regarding how it is biologically instantiated. Based on the utility of ED learning, we will focus on this mechanism for the remainder of the paper. Nevertheless, research suggests that both Hebbian and ED learning generally work together (O’Reilly

11

& Munakata, 2000), so it is important to acknowledge that Hebbian learning plays an important role in evaluative function. With a basic framework for understanding how evaluations are formed and how they influence action selection, we can now discuss how and where bias fits into the framework.

Though evaluative bias is often treated as a single construct, we will decompose it into two fundamentally different possibilities, each with different psychological causes and effects. Essentially, bias may occur as the organism learns about the environment or as the organism acts upon the environment. While this list is intended to be neither exhaustive nor the only way of parsing bias, we find it has significant utility, stemming from both explanatory power and parsimony. These biases are also conceptually similar to the to negativity effects proposed by the behavior-adaptive theory (BAT; Peeters &

Czapinski, 1990). What these biases have in common is that they lead to a valence asymmetric cognitive and/or behavioral response to the environment. They differ in their location in the perception-evaluation-action process. If differential learning about positive and negative information leads to differential updating of representations, then one would be exhibiting an updating bias. In other words, negative representations may be updated more or less quickly than comparable positive representations. This bias is similar to BAT’s informational negativity effect, posits that negative information increases cognitive effort (Peeters & Czapinski, 1990). On the other hand, differential use of positive and negative representations, for the purpose of action-selection, would be an action-bias. That is, negative representations might be weighed more or less heavily than positive representations that are equivalent in magnitude. BAT’s affective

12 negativity bias is similar in that it emphasizes valence-asymmetric behavior without regard to knowledge acquisition (Peeters & Czapinski, 1990). Importantly, both these mechanisms could both lead to the same behavioral tendencies. The latter bias does this by directly skewing the action selection, whereas the former does so indirectly via skewing representations. Although the behavioral output could be identical for the two mechanisms, the cognitive difference has important implications.

The type of bias implemented in human psychology determines what type of corrective measures could be taken. If, as mentioned above, the utility of these biases diminishes with increasing certainty regarding one’s environment, then the biases should fade with uncertainty. If the biases are modifying action selection directly, then simply reducing or removing the bias parameters will immediately dampen the bias. Biased past decisions needed have any bearing on future decisions. However, if the bias parameters are tied into the formation of informational representations, then a shadow of bias remains even after the parameters themselves have been eliminated. That is, it would difficult, if not impossible, to instantly correct the biased representations. However, the feedback process would continue to operate and could, over time, correct the bias- induced error, just as it corrects for random error in the representations. Given our hypothesis about evaluative biases serving as guides through uncertainty, we feel the action- is more likely. However, either mechanism—or the coexistence of both mechanisms, as suggested by BAT—is plausible based on the extant literature. The empirical literature generally does not support one mechanism over another, with many models and paradigms failing to distinguish between these two possibilities.

13

Conditioning studies, concerning either human or non-human animals, frequently demonstrate a negative valence asymmetry. For instance, conditioned taste aversion

(CTA), also known as sauce-béarnaise syndrome, is a special case that violates multiple

“rules” of classical conditioning (Garcia, Kimeldorf & Koelling, 1955). With CTA, an organism develops an aversive response to a food that was previous perceived as appetitive. This reversal occurs after the food has been paired with nausea. CTA is striking for 2 reasons. First, it can occur after a single pairing of the CS and US, as opposed to the repeated pairings required for conventional conditioning. Second, this effect is only found in aversive conditioning. A similar pattern of behavior emerges in avoidance learning. When a response is paired with a sufficiently negative stimulus—an electric shock of sufficient intensity—dogs will learn to avoid that response after a single aversive trial (Solomon & Wynne 1954). This conditioning also appears to be highly resistant to extinction. As with CTA, no positive stimulus has been found to result in responses of equivalent learning speed. A major limitation of these studies, involving either animals or humans, is that they do not provide evidence of where this negativity bias occurs. Whether the subject actually learned the negative information better or just relies on it more when selecting a behavioral response is unclear.

Valence asymmetries are prevalent in the field of impression formation, where one of the most well replicated findings is that negative traits are given more weight than positive traits (see Kanouse & Hanson, 1972 for a review). In other words, when evaluating a hypothetical individual, negative information about the individual influences the evaluation to a greater extent than positive information. However, the direction of the

14 valence asymmetry may be context dependent, stemming from the perceived utility of the information. Evidence that a negativity bias is due to the greater diagnosticity of negative information comes from the category diagnosticity model (CDM; Skowronski &

Carlston, 1987). According to CDM, people recognize that negative information is more informative than positive information in certain domains, such as morality, and less informative in other domains, such as ability. Thus, although the impression formation literature shows a definite trend of weighting negative information over positive, this effect appears to be due to the greater utility of negative data. When the utility asymmetry disappears the bias may as well.

A variety of cognitive and behavioral paradigms suggest that different types of environmental information, equivalent in magnitude but opposite in valence lead to asymmetrical observable and self-reported responses. Unfortunately, it is unclear where in the evaluation this asymmetry originates. Does an updating bias lead to negative experiences being incorporated into the evaluative representation more quickly or does an action bias simply the expression of the evaluation. While the self-report findings may seem to support an updating bias, this conclusion is premature. Self-reported attitudinal expressions are meta-cognitive in nature and are likely derived both from access to informational representations and object-action dispositions (e.g. self- perception theory, Bem, 1967; cognitive dissonance theory, Festinger, 1957). To properly test which mechanism is at work, we need a method that can dissociate the updating and action-selection processes.

Part III: Computational Modeling of Evaluative Bias

15

A computational methodology has the potential to disentangle, or at least begin to disentangle, the murky cognitive processes involved in decision-making. A computational approach has several distinct advantages over other approaches. The first is precision: The heavy reliance on numerical quantities and formal mathematical operations necessitates the setting of precise model parameters and offers more precise output values. A related benefit is explicitness. That is, because phenomena and effects are translated into systems of equations, it becomes necessary for researchers to explicitly state the assumptions they are making. Finally, the use of modeling often allows or requires the concurrent testing of multiple assumptions, which might be tested individually in different experiments in a non-computational paradigm. Despite these advantages, computational models are by no means bona fide pipelines to scientific fact.

This approach is merely another way of stating a theoretical model, and all models are inevitably wrong. Furthermore, computational models of cognition cannot tell us how the mind does work, only how it might work and how it does not work. Nevertheless, these models have proven useful and influential, especially in cognitive psychology, but also increasingly in social psychology. ESM (Cacioppo and Berntson, 1994, Cacioppo,

Gardner & Berntson, 1997) lays out explicit formulae for how attitudes might be constructed from bivariate evaluative processes. Furthermore, connectionist models have been used to explain various social psychological phenomena, including: cognitive dissonance (Shultz & Lepper, 1996), attitudes (Overwalle & Siebler, 2005), and the self

(Nowak, Vallacher, Tesser, & Borkowski, 2000). A connectionist model provided additional support for the feedback-related learning asymmetries discovered in the

16

BeanFest paradigm (Eiser, Fazio, Stafford & Prescott, 2003). In light of these successful investigations, we used a computational approach to investigate our questions concerning the nature of evaluative biases.

To compare different potential biases, we adopted a reinforcement learning (RL) approach to modeling (Sutton & Barto, 1998). RL is a branch of artificial intelligence, which developed out of advances in computer science and psychology. RL uses reward- based algorithms to allow machines and computer programs to learn the optimal response to a given problem. Importantly, RL theory takes its cue from operant conditioning and many of the principles are the same, albeit in a more rigorous, mathematical form. As in operant conditioning, reinforcers and punishers play key role in determining the long- term behavior of an RL program. These outcome parameters modify computational representations of value (CRVs), which are used to choose behaviors. The goal of modeling is to find a set of rules that can explain aspects of human cognition. To do this, we first provide the subject and the model with identical test environments. Next, we fit the model behavior to match, subject behavior. Finally, we compare the model’s CRVs with the subject’s self-reported estimations of value. The rules governing our model can be said to explain human cognition to the extent that the model’s values match the subject’s models (see figure 1). In RL algorithms, at least those that computed serially, the evaluative process occurs in two stages. First, in a particular environmental state, an action is selected, based on value representations all possible actions for all possible states. For example, consider an environment with five possible states and an entity that, at any one time, can emit one of four behaviors. A 5 by 4 matrix could represent the

17 value of each action at each possible state. Thus, when the program finds itself in a particular state, it must choose a response from among the possible actions. Second, after feedback has been given to the program, the value representation of the selected action is updated. Each stage in RL is governed by a mathematical formula, which we will now discuss in greater depth.

The two-stage nature of RL allows us to dissociate whether evaluative bias stems from skewed informational representations, skewed action dispositions, or quite possibly both. We will begin by discussing the updating phase. The degree of change a CRV undergoes is directly related to the disparity between the CRV, which is a prediction of future reward, and the actual outcome. This disparity is an error signal (see equation 1).

That error signal is used to update the CRV. However, using the entire error to update the CRV would result in the CRV always representing the last reward outcome. This is problematic if the goal of algorithm is to learn across time, aggregating experiences to develop a more accurate representation of reality. To “slow down” updating, RL update functions multiply the error term by a learning rate. Learning rates are typically a decimal value between 0 and 1. In effect, the learning rate is the proportion of the error that will be added to the CRV in order to create an updated CRV (CRV’; see equation 2).

If the learning rate for a positive stimulus-action CRV were different than the learning rate for a negative stimulus-action CRV then this would indicate an updating bias.

Action selection uses a sigmoidal function, known as a softmax function, to calculate the probability of the algorithm selecting a certain action (See equation 3). The action that has the highest CRV has the highest probability of being selected. Action

18 biases could be introduced to the softmax equation in two ways. First, a constant could be added to the equation for a given action. This would alter its probability of being selected, regardless of that action’s value (see equation 4). This is conceptually similar the positivity offset of the ESM (Cacioppo & Berntson, 1994). Another way an action bias could be created is by introducing a biasing coefficient to CRVs one valence (see equation 5). This would alter that action’s probability of being selected as a function of said action’s CRV. This is conceptually similar to ESM’s negativity bias. Of course both biases could be added simultaneously.

As we have shown, RL computations have the potential to distinguish between different psychological biases that might appear quite similar at the level of the phenomenon. We will use these equations to model the updating and use of information in subjects completing a game-like computer task. Using different models, we can determine whether evaluative bias stems from valence dependent learning rates, from valence effects in the decision process, or from both. We can also look for change in bias as a function of length of exposure to the experimental environment.

The nature of RL methods lends itself to the study of evaluative biases. Because

RL methods separate updating and action-selection processes, RL may be used to break down the phenomenon of evaluation into component parts. As shown above, minor modifications to basic RL equations can be used to instantiate various evaluative biases in the model. By using these modified equations to analyze subject behavior, we hope to gain insight into where and how processes bias evaluations.

19

Chapter 2: Method

Participants

Our sample consisted of 79 undergraduates from The Ohio State University. 6 subjects were dropped because they either failed to follow directions with the task or their data were lost, leaving a usable sample of 73 participants. All subjects were at least eighteen years old and received partial credit for an introductory psychology course as compensation.

Behavioral Task

Subjects completed a value estimation task (VET), which was presented via E-

Prime on a desktop computer. The VET consisted of 120 trials that asked subjects to take or reject an uncertain offer. Each offer consisted of two novel symbols, presented side by side (see figure 2). The task consisted of six symbols total, and each trial consisted of a random pairing of any two non-identical symbols. Subjects were not given any information regarding the specific values of the symbols. Instead, subjects were told that their task would be to learn the value of each symbol over time. However, participants were informed that each symbol was worth some number of points on average but that at on any particular trial the symbol might be worth a little more or less. Subjects were also informed that the overall, mean value of a symbol could change during the task. Finally,

20 subjects were told that some symbols would be positive while others would be negative.

In reality, three symbols were always positive and three were always negative.

Each trial consisted of a presentation phase, when two symbols would be presented for four seconds. During this time, the subjects had the option of either accepting the two symbols together or rejecting the two symbols together. If subjects accepted, then they earned the combined point value of both symbols. If they rejected then they would gain or lose nothing. Subjects were told to try and earn as many points as possible during the task. If subjects did not respond during the four-second presentation then the program passed for them and moved on. Following the presentation phase, a feedback phase displayed the combined point value of the two symbols.

Importantly, this feedback came whether the subject accepted or rejected the offer. As a result, subjects incurred no knowledge penalty for not exploring the task environment.

Following the feedback phase, the next trial would begin. After twenty trials, the task would pause and ask subjects to complete a page in a paper packet. The packet simply asked subjects to give their best estimate of the value of each symbol’s value at that time. Thus, we collected self-report ratings of value (SRRVs) for each symbol at five, equidistant points during the task and one final time after the task had been completed, for a total of 36 SRRVs. Following the final SRRVs, the computer provided subjects with their total score. This was the only time, subjects were told their total score. Although subjects may have mentally kept track of their total points, they were informed not to use pencil and paper as a computational or memory aide. Subjects were debriefed via computer after their final score was given.

21

Computational Modeling

RL models of behavior were generated in MATLAB. These models generated computed representations of value (CRVs) of the VET symbols based solely on trial outcomes—the combined value of symbol pairs—and subjects’ behavior—taking or passing on offers. These CRVs are model analogs for subjects’ mental representations of value (MRVs). For all models, CRVs for symbols before trial 1 were zero. That is, all our models are based off the assumption that, since subject initially knew nothing about the task symbols, their initial MRVs would be 0. The updating of CRVs was done via the value function depicted in equation 2 of table 1. A separate value function was used for each symbol. Only CRVs corresponding to symbols used in the previous trial were updated between trials. That is, two CRVs changed as a result of any particular trial by using two value functions. All other CRVs remained unchanged.

These CRVs were then fed into a softmax function (See equation 6) to compute the probability of accepting a particular trial (Pr(accept)). For simplicity, the slope of this softmax function was fixed in all models. This was done for computational parsimony.

The inputs to the softmax function were the CRVs relevant to the current offer.

Learning rates and certain other terms were treated as free parameters, which had to be estimated. We used a bounded maximum likelihood estimator (MLE) available in the MATLAB package to optimize our free parameters such that the disparities between the model’s probabilities of acceptance and the subject’s actual behavior were minimized.

Acceptances were recorded as 1s while rejections were recorded as 0s and stored in a behavior vector. Similarly, the probability of accepting for each trial was stored in a

22 separate vector. We used a linear optimization function, to compute the overall fit between model and behavior, with smaller values meaning greater fit.

The MLE maximized fit by determining which values for the free parameters lead to the lowest optimization function output. All free parameters were bounded for both mathematical and psychological reasons. Mathematically, bounded parameters facilitated the optimization function (see equation 7). Psychologically, in most cases, only a small range of values would be psychologically plausible for a given parameter. As an example, consider learning rates, which—across all models—were bounded between 0 and 1. It is not psychologically plausible to assume a learning rate could be negative, which would mean a subject is learning value in the opposite direction of what environmental consequences would suggest. Similarly, learning rates greater than 1 would involve a rate of learning completely unwarranted by the environment. The bounded nature of other free parameters will be discussed in the explanation of the specific models that included them.

23

Chapter 3: Behavioral Analysis

The behavioral and self-report data were able to provide initial evidence for how well the subjects learned and how they made decisions during the game, even in the absence of computational models. The primary purpose of these analyses is to validate the VET and the performance of our subjects. Because these first analyses are designed to roughly gauge subject’s learning ability, the findings are critical to the justification for using RL methods in subsequent analyses.

Results & Discussion

Because the data would be meaningless if subjects did not develop reasonable evaluations, our first priority was to determine whether subjects seemed to learn during the VET. To do this we used multi-level modeling (MLM; proc mixed in SAS 9.2) to predict subject SRRVs from the actual values (AVs) of the symbols. The results of this test, F(1,71) = 870.89, p < .0001, indicate that SRRVs were significantly related to the

AVs. This finding demonstrates subjects learned during the experiment and validates the

VET design a learning paradigm.

We also used the SRRVs to look for initial evidence of a learning asymmetry.

We subtracted AVs from SRRVs to get error terms, then took their absolute values. We used MLM to predict these errors from the valence of the AVs. The results were not significant, F(1, 71)=0.32, p = .5702. Thus subjects, on average, were no more or less

24 accurate for positive symbols than for negative ones. This is tentative evidence against an updating bias; however, this does not rule out the possibility of biased components within the evaluative process. The RL-based tests to follow will shed more definitive light on the question.

25

Chapter 4: Model 1

Our first model was designed to see if RL methods could be used to predict subject’s MRVs (see figure 3). Model 1 was also designed to test whether separate learning rates were needed for positive and negative stimuli. If so, then this would be evidence of an updating bias—subjects would be learning one valence more quickly than the other. If the learning rates were not significantly different from one another then this would provide strong computational support for the tentative findings behavioral findings: namely, that evaluative biases are not generated from updating representations.

As a result, model 1 features six free parameters; an estimated learning rate for each symbol.

Results and Discussion

We used MLM to estimate each subject’s thirty-six SRRVs, based on the model’s

CRVs of the six symbols at those same time points. The F-test demonstrated that the

CRVs could in fact, predict SRRVs, F(1,71) = 841.10, p < .0001. While promising, this finding does not necessarily mean that our RL equations have captured any important components of human cognition. RL outputs are dependent on the stimuli and previous analyses have shown that our subjects acquired accurate knowledge about the stimuli.

Thus the relationship between CRV and SRRV could be driven entirely by AV.

26

To determine if the RL algorithm contributes to the prediction of SRRVs, we next, predicted SRRVs based on AVs, CRVs, and the interaction term. Both main effects were significant: SRRVs were predicted both by AVs, F(1, 71) = 72.51, p < .0001, and

CRVs, F(1, 71) = 49.94, p < .0001. Importantly, this demonstrates that the CRVs were useful predictors of SRRVs even in the presence of the AVs. Finally, the interaction term

(CRV*AV) was marginally significant, F(1,71) = 3.70, p =.0547, indicating that the relationship between subjects’ SRRVs and the models CRVs varied slightly as a function of AV. These analyses provided initial evidence that a simple RL model, with six estimated learning rates, could predict subject’s subjective sense of value.

Beginning our computational investigation of bias, the learning rates of positive stimuli were compared to those of negative stimuli. A significant difference would provide computational support to the notion that rate of representational update is partially dependent on valence. To determine if multiple learning rates were required, we again used MLM. Positive learning rates (M = .4686, SD = .3613) did significantly differ from negative learning rates (M = .5625, SD = .3656), F(1,71) = 7.32, p = .0072. Thus model 1 suggests that valence asymmetries may stem, at least in part, from an updating bias.

Model 1’s findings suggest that an RL approach can be a valuable tool for understanding human decision-making. Model 1 also demonstrated that behavioral performance in the VET could be predicted with an updating negativity bias. However, this model’s simplicity is a limitation. Importantly, it lacked the capability to develop action biases. Thus, two unanswered questions must be answered. Firstly, can an RL

27 model find evidence of an action bias? Secondly, will the updating bias survive after the possibility of action biases are incorporated into a new model?

28

Chapter 5: Model 2

Our second model was designed to test if valence had any effect on subject’s decision to act. For this model, we retained the six free learning rate parameters. To look for a direct effect of valence on behavior, we added two additional free parameters, incorporating them into the softmax function. These parameters are based on conceptually similar terms from ESM (Cacioppo & Berntson, 1994). First, we added an offset bias—a constant that would modify the probability of accepting an offer, regardless of any prediction regarding the specific stimuli. If positive, this value would increase the likelihood of approach behavior regardless of environmental context, thereby mimicking ESM’s positivity offset. However, to allow for the possibility of a negativity offset as well, our MLE function solved for the optimal offset bias value within the range

[-10,10]. As a result, our computations allowed us to search for the existence of a positivity offset as suggested by ESM, but also allowed us to investigate the possibility of a negative offset.

The second free parameter we added was a weighting bias that served as a coefficient to CRVs. The presence of this weighting bias was conditional on the valence of the prediction (see equation 8). That is, if a prediction was negative then it was multiplied by the bias term in the softmax function in order to produce a behavioral probability. However, if a prediction was non-negative then the bias term was dropped

29 from the equation. The range for potential bias terms was bounded at zero and 2. If the

MLE selected a bias term of 1 then effectively no weighting bias would in the decision- making process. However, if the optimized bias term were greater than 1, then a negativity bias would exist the selection of behavior. That is, negative information would weigh more heavily than positive information on decision. Alternatively, if the optimized term were less than 1 then the negative information would have relatively less weight on decision than positive information, leading to a positivity bias.

Results & Discussion

To determine whether the optimized simulations contained significant action biases, we conducted one-sample t-tests. For the weighting biases, values were compared with 1, since, as stated above, weighting biases of 1 have no effect on the action-selection process. We found a significant weighting bias (M = 1.2513, SD = .7052), t(72) = 15.16, p < .0001. This positive value indicates that negative information tended to have more of an impact on the decision process. For the offset bias, values were compared with 0, since offset biases of 0 would have no effect on decision-making. We found a significant offset bias (M = 2.2592, SD = 5.9154), t(72) = 3.26, p = .0017. This positive value indicates that people have a bias to engage in approach behavior, regardless of the environmental information available to them.

To test the prediction that these two biases work against one another, we used the offset bias to predict the weighting bias in a simple regression equation. If a positive relationship exists between the two, then this suggests the two components have counteractive effects on the decision-making process. The results indicate that a positive

30 correlation (r = .2838) does exist between the two biases, F(1,71) = 6.22, p = .0150.

Subjects with greater offset biases also tended to have greater weighting of negative information.

Critically, to determine if an updating bias exists in the presence of these decision biases, we again used MLM to compare learning rates for positive (M = .4686, SD =

.3613) versus negative (M = .5363, SD = .3604) information. In the presence of significant action biases, model 2 also produced a marginally significant negative updating bias, F(1,71) = 3.01, p = .0834. This finding provides tentative support for the hypothesis that the manifested negativity bias may arise from biased cognitive processes both at the time of action selection and at the time of representational updating.

The findings from model 2 support our hypothesis that evaluative valence asymmetries are due to, at least in part, biases in the action-selection process. This architecture makes sense from a representational point of view. The action biases can serve as useful heuristics under uncertainty without disturbing the integrity of the MRVs.

Namely, the offset bias encourages approach behavior, allowing for the exploration of the environment and the acquisition of information. The negative weighting and updating biases serve to protect the individual from aversive environments, in part by counteracting the offset bias, when that becomes necessary. With preliminary evidence for opposing action biases, we set out to test our final hypothesis: Will these biases diminish in the face of increasing uncertainty?

31

Chapter 6: Model 3

Model 3 was designed to test our hypothesis that biases would diminish as uncertainty wanes. Since the environment subjects were in was relatively stable, we made the assumption that uncertainty would diminish with time. We then doubled the number of free parameters, using some for the first half of the VET, and the others for the second half. In other words, we used the MLE to estimate 8 free parameters that best fit subject behavior during the first half of the task and then separately estimate 8 new parameters that best fit subject behavior during the second half. If evaluative biases are reduced as certainty increases, then the bias parameters should, on average, be smaller for the second half of the VET.

Results & Discussion

Before model 3 could be used to explore the effects of increasing certainty, we first had to be sure that the new model’s outputs for the first half of the VET were similar to the outputs from model 2. To do this we repeated the analyses done in model 2, namely tests for: differences between positive and negative learning rates, the existence of significant offset and weighting biases, and the correlation between offset and weighting biases. When testing the parameters generated from the first half of the task, all results from model 3 replicated those of model 2. However, when testing the parameters generated from the second half, a different pattern emerged. A significant

32 weighting bias (M = 1.1466, SD = .6742) was found again found, t(72) = 14.53, p <

.0001, as was the correlation between weighting bias and offset bias (r = .2735), F(1,71)

= 5.74, p = .0192. However, no significant offset bias (M = 1.000, SD = 5.8211) was found t(72) = 1.47, p = .1463. The second half also saw no difference existed between learning rates for positive (M = .4979, SD = .3905) and negative stimuli (M = .5086, SD

= .3932), F(1,71) = .08, p = .7763. These results suggest that a reduction in evaluative biases occurred from the first to the second half of the VET.

To confirm whether or not the bias parameters were reduced from the first to the second half, we computed difference scores for action-bias parameters, then conducted separate t-tests to test for significant changes in the weighting bias and the offset bias, respectively. We found significant reductions in both weighting bias, t(72) = 2.13, p =

.0365, and offset bias, t(72) = 2.52, p = .0140. Thus the model found that having lower bias terms during the second half led to better fit with actual subject behavior.

Furthermore, decreases in weighting bias were positively correlated with decreases in offset bias (r = .3016), F(1,71) = 7.10, p < .0095. In other words, the more one bias diminished over the course of the task, the more the other bias tended to diminish as well.

This is consistent with our hypothesis that these two biases act in opposition to one another in order to produce appropriate behavior. Reduction in one bias, without a reduction in the other would ultimately lead to biased behavior.

The findings from model 3 support our conclusion that evaluative biases develop to help individuals handle uncertainty. Action biases were shown to diminish after subjects gained more experience with the VET. Additionally, the reduction in offset bias

33 was correlated with the reduction in weighting bias. Finally, the valence asymmetry in representational updating that was present while subjects were less familiar with the task, was absent once subjects were more familiar with the experimental stimuli. These findings suggest that evaluative biases are more prominent in when uncertainty is high, and may, in fact, exist to help navigate through an uncertain world.

34

Chapter 7: General Discussion

Attitudes are complex mental representations and serve as the basis for evaluative judgments that are critical for guiding an organism’s behavior. Through the use of RL modeling, we have examined human decision making and identified multiple biases at various points within the evaluative process. We designed a learning paradigm to elucidate evaluative biases and the cognitive processes from which they emerge. Our use of RL models allowed us to simultaneously test multiple hypotheses under a single set of assumptions. While bias is commonly thought of as an atomistic construct, our findings suggest that the phenomenon of biased behavior emerges from multiple cognitive processing asymmetries. Our findings suggest the existence of two antagonistic action- biases—a positive offset bias and a negative weighting bias—as would be predicted by

ESM (Cacioppo & Berntson, 1994). Additionally, we found a negative updating bias.

Taken together, these three biases fit nicely with the three components of positive- negative asymmetry posited by BAT (Peeters & Czapinski, 1990). We also found that the magnitudes of the offset bias and weighting bias are inversely related, suggesting that the two biases work against one another for the purpose of optimizing behavioral selection. Similar opposing biases are posited by BAT (Peeters & Czapinski, 1990) and

MMH (Taylor, 1991). Finally, we also found preliminary evidence that these various evaluative biases decline or disappear entirely as individuals acquire more knowledge

35 about their environment. This suggests, that the biases serve an adaptive role, guiding behavior under uncertainty. Still, many questions remain about these biases and certain aspects of our methodology.

Positivity Offset Bias

The positive offset bias suggests subject were willing to explore their environment; however, the motivation behind this exploratory bias remains unclear. One possibility is that subjects were more inclined to approach stimuli, simply because they were more concerned with gaining points than losing them. While this goes against the well-documented risk aversion that people tend to demonstrate (Kahneman & Tversky,

1979), it may be an inadvertent consequence of the VET and its instructions. Telling subjects to try to earn as many points as possible, may have made them promotion focused, and thus more concerned with acquiring gains, than avoiding losses (Higgins,

1997). However, this does not easily explain why positivity offset decreased as subjects became more certain of their environment, suggesting that another mechanism is at work.

An alternative explanation for a positivity offset, is that subjects explored in an attempt to learn more about the environment. This may seem illogical given that subjects were given full feedback during the task. That is, subjects do not actually acquire more knowledge for approach behavior, therefore they have no reason to exhibit an exploratory bias. This argument assumes that the offset bias is a high-level cognitive process, under volitional control. Yet, given that lower-order animals would benefit from positivity offsets as well, it stands to reason that a positivity offset is instantiated at lower levels of the neuraxis. Therefore, subjects may be prone to approach stimuli as a result of a low-

36 level disposition for exploratory behavior. However, we do not mean to suggest that humans are unable to regulate their exploratory behavior. On the contrary, if an exploratory disposition evolved in primitive brains, then it would likely be re-represented as increasingly advanced neural layers developed across species (Jackson, 1958). This re-representation of function would allow for more flexible behavior in more cortically evolved animals, via more complicated value representations. Greater cortical development would allow for richer and more influential MRVs. As these value representations become stronger and more accurate, due to increasing experience, they can exert an ever-increasing top-down influence on representations of approach behavior, thereby reducing the offset bias. Thus, the positivity offset may originate in evolutionarily older areas of the brain; however, its functionality may be attenuated by top-down projections from higher levels of the neuraxis, where MRVs form and increase one’s certainty about the world.

Negativity Biases

The negativity biases displayed by our models offer strong support for the existence of psychological biases in judgment. This is significant, because many attempts to uncover psychological valence asymmetries are vulnerable to an alternative explanation: Positive and negative stimuli may be objectively unequal. Thus, what appears to be an overweighting of negative stimuli may in fact be equivalent weighting of unequal events. This explanation can account for a significant portion of the psychological literature on biases. For instance, lottery winner have been shown to be less elated by their fortune than paralysis victims are saddened by their misfortune

37

(Brickman, Coates, & Janoff-Bulman, 1978). Similarly, research into relationships consistently shows that negative behaviors or interactions have a greater impact on marital satisfaction than positive ones (e.g. Gottman, 1979; Gottman & Krokoff, 1989).

Animal learning studies, which often use shocks for punishers and food for rewards (e.g.

Miller, 1952) suffer from the same drawback. Since equating the value of different events is impossible in these cases, determining evaluative biases is similarly impossible.

The use of points in our study allows us to control for magnitude and equate positive and negative stimuli. Therefore, much like the asymmetries found in economic and decision- making games (e.g. Kahneman & Tversky, 1979), evaluative asymmetries found during the VET can be assumed to originate in the mind, rather than its environment.

The full-feedback design of the study ensures that the reported evaluative asymmetries are the result of cognitive processes rather than due to the nature of learning.

In the real world, learning often coincides only with approach behavior. As a result, negative MRVs, which promote avoidance behavior, often persist without updating. For instance, exposure to negative information leads to negative beliefs about novel stimuli.

As a result, those stimuli are avoided and no new information is acquired (Eiser, Shook &

Fazio, 2007). As a result of full-feedback, we can conclude that updating and action biases do not result from the structure of learning. A contingent-feedback version of the

VET would likely produce a blatant learning asymmetry; that is, SRRVs would likely be less accurate for negative stimuli. If high-level processes can moderate offset bias, then we might expect a greater positivity offset when subjects understand that information requires exploration. If the negative action-bias exists to counter a positive offset bias,

38 then we might also expect an elevated negative action-bias in a contingent-feedback condition. Based on these hypotheses, the contingent-feedback VET could be very useful in validating, or invalidating, our current theories about evaluative biases.

The presence of two negativity biases in models 2 and 3 provides general support for both ESM (Cacioppo & Berntson, 1994) and BAT (Peeters & Czapinski, 1990). The correlation between offset bias and weighting bias provides tentative support for our hypothesis that a negative weighting bias could serve to rapidly correct for the positivity offset in cases where environments become pernicious. Model 3 produced a negative weighting bias for the second half of the task, albeit at a significantly lower magnitude than the first half’s weighting bias. This is interesting because the model generated no offset bias for the second half. Perhaps the weighting bias just takes longer to disappear.

Alternatively, this result could suggest that the weighting bias persists, even in the absence of an offset bias. If this is the case, then it suggests that the weighting bias’s function may extend beyond correcting for the offset bias. This makes sense, as overweighting negative information could help individuals avoid harm, even if an offset bias isn’t present.

The function of a negative updating bias is less clear. Although large learning rates for negative stimuli would certainly be adaptive (so long as they aren’t excessively large), it is not obvious why the learning rates for positive stimuli should be relatively lower. If well-being is contingent upon accurate representations of the environment, then larger learning rates might be better for stimuli of both valences. In other words, if the

39 mind has the capacity to update information at a given rate, why then is that rate used only for negative stimuli, while positive stimuli are updated at a lesser rate?

One possibility is that updating capacities are unequal because negative and positive information are processed by separable evaluative systems. As a result, the negative system may have evolved a greater updating capacity. This would mean that subjects, on average, are simply lack the ability to acquire positive information at the same rate as negative information. Importantly, this hypothesis is not contingent on the existence of fully separable evaluative systems. Rather, the systems need be only partially separable, a far more reasonable requirement. For instance, some evidence suggests that medial orbitofrontal cortex (OFC) is associated with the processing of positive information whereas lateral portions of OFC are involved in the processing of negative information (Kringelbach & Rolls, 2004). A less contentious claim is that different networks within the basal ganglia selectively learn approach and avoidance responses to stimuli (Aubert, Ghorayeb, Normand & Bloch, 2000). However, this explanation does not account for the loss of an updating asymmetry as and certainty increases.

As an alternative explanation, the motivations underlying evaluation could lead to different rates of learning without requiring differential updating abilities or dissociable evaluative systems. According to the IR model (Cunningham et al., 2007), the extent and depth of evaluative processing depends on two competing motivations: a motivation to minimize error and a motivation to conserve processing resources. This latter motivation is crucial because it keeps processing resources available for other important

40 computations. Assuming equivalent updating potential for both valences, the conservation motivation may work to reduce the actual updating rates for all stimuli. The drive for accuracy would then elevate the actual updating rates back towards—but not necessarily to—their maximal potential. Negative stimuli, with their potential cause irreversible harm, might induce a greater accuracy motivation, resulting in an actual updating rate that is closer to its maximal limit. This mechanism could account for the disappearance of the updating asymmetry as uncertainty declines. Increasing knowledge, garnered from experience, may attenuate the threat of negative stimuli. As a result, the motivation for accuracy in the evaluation of negative events may lesson, thereby freeing processing resources for other matters. Currently the motivational account, with its greater flexibility, seems more plausible. However, both accounts are purely speculative at this stage and additional research will be needed to determine if either of these proposed solutions could account for the updating bias.

Additional models may be able to provide support for either the motivational or abilities hypothesis by using more elaborate versions of the updating function. With the use of increasingly advanced updating functions, effects may emerge that suggest either motivational or systems differences lead to updating biases. For instance, subjects’ response times could be used as an indicator of motivation for accuracy and used as a moderator of learning rate. Additionally, learning rates were assigned to stimuli in all current models. However, assigning learning rates to valence may have greater psychological realism. By that, we mean the learning rate used might depend on the valence of outcome information rather than the identity of the stimuli. Alternatively, or

41 perhaps additionally, representations of one valence may be more amenable to updating.

That is, negative representations might be more or less updatable than positive representations. Such modifications might lead to more accurate models of cognition.

The Modeling Approach: Merits, Limitations, and Improvements

Given the initial support for our hypotheses, we consider the present models to be valid starting points from which more in-depth investigations can be launched. Even a basic RL algorithm was capable of predicting self-reported values. This model was amenable to modifications allowing the testing of increasingly complicated psychological theories. The modeling approach allowed us to simultaneously test all of our predictions within a single experimental design. This reduces the number of alternative explanations for our findings. Importantly, the versatility of the RL models suggests more sophisticated modifications will be feasible, allowing us to investigate the questions that have emerged from this first set of analyses.

An important limitation of the current work is that uncertainty level was neither measured nor estimated. Instead, we assumed that uncertainty would diminish as a function of time. Although the reasoning is straightforward and plausible, future studies should provide a more an empirical basis for quantifying uncertainty. Feelings of uncertainty could be induced through an experimental manipulation. Uncertainty could be objectively increased by altering the stimuli partway through the task. Alternatively, a subjective sense of uncertainty could be induced by instructing subjects to shake their heads side to side as they complete the task, as this has been shown to reduce confidence in one’s thoughts (Briñol and Petty 2003). Alternatively, subjective uncertainty could be

42 assessed through a self-report measure. Finally, it may be possible to develop a model that could estimate uncertainty. Given the influence uncertainty is assumed to have on the bias parameters of model 3, more precise quantification is a top priority.

One criticism that can be leveled against computational models of cognition, and indeed psychological models in general, is that the problem space allows for multiple equivalently plausible solutions. That is, multiple models can explain the behavioral phenomena equally well and it is difficult to ascertain which models best reflect mental processes. To improve on model selection, non-behavioral constraints can be applied to future models, thereby reducing the number of feasible solutions. Given that the human mind emerges from neural activity, a logical future step for our research is to incorporate biological constraints into our model building and model testing. What we know about the brain can help us develop computational models with greater plausibility. These constraints will come in the form of connectionist architectures and neuroimaging methods.

Connectionism is a framework for designing mental models built from networks of simple neuron-like units. Mimicked psychological phenomena emerge from the transmission of activation between these units. Connectionist models have already been used to demonstrate a negative generalizing bias and a learning asymmetry that arises from the avoidance of negative stimuli (Eiser et al. 2003). While early versions of connectionism lacked biological plausibility (e.g. Perceptron; Rosenblatt, 1958), more recent frameworks mimic biological mechanisms to a far greater extent (e.g. LEABRA;

O’Reilly & Munakata, 2000). This trend is likely to continue as increasing computer

43 power allows for more detailed connectionist frameworks. Future RL models, based on connectionist architecture, will be an important step towards satisfying biological constraints because its use in model development will restrict possible solutions to those that conform to connectionism’s increasingly biologically-based principles.

Neurobiology can also be used to test models for biological plausibility.

Neuroimaging methods provide an opportunity to test multiple RL models against brain data. If multiple models explain behavioral data equally well, then those models can also be tested to see which one best predicts neural activity. For instance, the OFC has been shown to represent value (Kringelbach & Rolls, 2004). A correlation between CRVs and

OFC activity would suggest that the model is biologically supported. However, computational models can also inform neuroscience. Because psychological constructs rarely map neatly onto brain activity, what brain signals mean is often unclear. Models may be useful in decoding these signals, thus helping to bridge the behavioral and neurological levels of analysis.

Conclusion

We propose that human judgments and evaluations are influenced by multiple cognitive biases, with some influencing response selection whereas others impact the revision of mental representations. We suggest that these biases are not merely defects in our cognitive architecture but rather are designed to optimize behaviors when knowledge about the environment is limited. These dissociable biases may oppose one another, leading to a relatively balanced evaluative system. As a result, while decision-making

44 abilities are not perfect, they are flexible enough to operate in wide variety of contexts, typically with sufficient accuracy to ensure well-being.

45

References

Aubert, I., Ghorayeb, I., Normand, E., & Bloch, B. (2000). Phenotypical characterization

of the neurons expressing the D1 and D2 dopamine receptors in the monkey

striatum. Journal of Comparative Neurology, 418, 22–32.

Baumeister, R. F., Bratslavsky, E., Finkenauer, C. & Vohs, K. D. (2001). Bad is

stronger than good. Review of General Psychology, 5, 4, 323 – 370.

Bem, D. J. (1967). Self-Perception: An Alternative Interpretation of Cognitive

Dissonance Phenomena. Psychological Review, 74, 183-200.

Bless, H., Bohner, G., Schwarz, N., & Strack, F. (1990). Mood and persuasion: A

cognitive response analysis. Personality and Social Psychology Bulletin, 16, 331

– 345.

Brickman, P., Coates, D. & Janoff-Bulman, R. (1978). Lottery winners and accident

victims: Is happiness relative? Journal of Personality and Social Psychology, 36,

917 – 927.

Briñol, P. & Petty, R. E. (2003). Overt head movements and persuasion: A self-

validation analysis. Journal of Personality and Social Psychology, 84, 6, 1123 –

1139.

46

Bryson, A. E., Ho, Yu-Chi, (1969). Applied optimal control: optimization, estimation,

and control. Blaisdell Publishing Company or Xerox College Publishing.

Cacioppo, J. T. & Berntson, G. G. (1994). The relationship between attitudes and

evaluative space: A critical review, with emphasis on the separability of positive

and negative substrates. Psychological Bulletin, 115, 3, 401 – 423.

Cacioppo, J. T., Gardner, W. L., & Berntson, G. G. (1997). Beyond bipolar

conceptualizations and measures: The case of attitudes and evaluative space.

Personality and Social Psychology Review, 1, 1, 3 – 25.

Cunningham. W.A., Zelazo, P. D., Packer, D. J., & Van Bavel, J. J. (2007). The Iterative

Reprocessing Model: A Multilevel Framework For Attitudes And Evaluation.

Social Cognition, 25, 5, 736 – 760.

Eiser, J. R., Fazio, R. H., Stafford, T., Prescott, T. J. (2003). Connectionist simulation of

attitude learning: Asymmetries in the acquisition of positive and negative

evaluations. Personality and Social Psychology Bulletin, 20, 10, 1221 – 1235.

Eiser, J. R., Shook, N. J., & Fazio, R. H. (2007). Attitude learning through exploration:

Advice and strategy appraisals. European Journal of Social Psychology, 37, 1046

– 1056.

Eysenck, M. W. (1976). Arousal, learning, and memory. Psychological Bulletin, 83, 389

– 404.

Fazio, R. H. (2007). Attitudes as object-evaluation associations of varying strengths.

Social Cognition, 25, 5, 603 – 637.

47

Fazio, R. H., Eiser, J. R., & Shook, N. J. (2004). Attitude formation through exploration:

Valence asymmetries. Journal of Personality and Social Psychology, 87, 3, 293 –

311.

Fazio, R. H., & Olson, M. A. (2003). Attitudes: Foundations, functions, and

consequences. In M. A. Hogg & J. Cooper (Eds.), The Handbook of Social

Psychology (pp. 139-160). London: Sage.

Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University

Press.

Garcia, J., Kimeldorf, D. J., Koelling, R. A. (1955). Conditioned aversion to saccharin

resulting from exposure to gamma radiation. Science 122, 157-58.

Gardner, W. I. (1996). Biases in impression formation: A demonstration of a bivariate

model of evaluation. Unpublished doctoral dissertation, Ohio State University,

Columbus.

Gigerenzer, G. (2006). Bounded and Rational. In R.J. Stainton (Ed.), Contemporary

debates in cognitive science. Oxford, UK: Blackwell.

Gigerenzer, G. & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models

of bounded rationality. Psychological Review, 104, 4, 650 – 669.

Gottman, J. M. (1979). Marital interaction. New York: Academic Press.

Gottman, J. M. & Krokoff, L. J. (1989). Marital interaction and satisfaction: A

longitudinal view. Journal of Consulting and clinical Psychology, 57, 47 – 52.

Hebb, D. O. (1949). The organization of behavior. New York: Wiley.

48

Higgins, E. T. (1997). Beyond pleasure and pain. American Psychologist, 52, 12, 1280 –

1300.

Hull, C. L. (1932). The goal-gradient hypothesis and maze learning. Psychological

Review, 39, 1, 25 – 43.

Ito, T. A., Cacioppo, J. T., & Lang, P. J. (1998). Personality and Social Psychology

Bulletin, 24, 8, 855 – 879.

Jackson, J. H. (1958). Evolution and dissolution of the nervous system. In J. Taylor (Ed.),

Selected writings of John Hughlings Jackson (Vol. 2, pp. 3-92). New York: Basic

Books.

Kahneman, D. & Tversky, A. (1979). Prospect theory: An analysis of decision under

risk. Econometrica, 47, 2, 263 – 291.

Kanouse, D. E., & Hanson, L. R., Jr. (1972). Negativity in evaluations. In E. E. Jones,

D. E. Kanouse, H. H. Kelly, R. E. Nisbett, S. Valins, & B. Weiner (Eds.),

Attribution: Perceiving the causes of behavior (pp. 47 – 62). Morristown, NJ:

General Learning Press.

Kringelbach, M. L. & Rolls, E. T. (2004). The functional neuroanatomy of the human

orbitofrontal cortex: evidence from neuroimaging and neuropsychology.

Progress in neurobiology, 72, 341 – 372.

Liberman N., & Trope, Y. (1998). The role of feasibility and desirability considerations

in near and distant future decisions: A test of temporal construal theory. Journal

of Personality and Social Psychology, 75, 5-18.

49

Matlin, M. W., & Strang, D. J. (1978). The : Selectivity in language

, memory, and thought. Cambridge, MA: Schenkman.

Miller, N. E. (1944). Experimental studies of conflict. In J. McV. Hunt (Ed.), Personality

and the behavior disorders (Vol. 1 pp. 431 – 465). New York: Ronald Press.

Miller, N. E. & Murray, E. J. (1952). Displacement and conflict: Learnable drive as a

basis for the steeper gradient of avoidance than of approach. Journal of

experimental psychology, 43, 3, 221 – 231.

Nowak, A., Vallacher, R. R., Tesser, A., Borkowski, W. (2000). The emergence of

collective properties in self-structure. Psychological Review, 107, 1, 39 – 61.

O’Reilly, R. C. & Munakata, Y. (2000). Computational explorations in cognitive

neuroscience: Understanding the mind by simulating the brain. Cambridge:

MIT Press.

Overwalle, F. V., & Siebler, F. (2005). A connectionist model of attitude formation and

change. Personality and social psychology review, 9, 3, 231 – 274.

Peeters, G. & Czapinski, J. (1990). Positive-negative asymmetry in evaluations: The

distinction between affective and informational negativity effects. European

Review of Social Psychology, 1, 33 – 60.

Rosenblatt, Frank (1958), The perceptron: A probabilistic model for information storage

and organization in the brain, Psychological Review, 65, 6, 386-408.

Sears, D. O. (1983). The person-positivity bias, Journal of Personality and Social

Psychology, 44, 233 – 250.

50

Shultz, T. R., & Lepper, M. R. (1996). Cognitive dissonance reduction as constraint

satisfaction. Psychological Review, 103, 2, 219 – 240.

Skowronski, J. J. & Carlston, D. E. (1987). Social judgment and social memory: The

role of cue diagnosticity in negativity, positivity, and extremity biases. Journal of

Personality and Social Psychology, 52, 689 – 699.

Smith, N. K., Cacioppo, J. T., Larsen, J. T., & Chartrand, T. L. (2003). May I have your

attention please: Electrocortical responses to positive and negative stimuli.

Neuropsychologia, 41, 171 – 183.

Solomon, R. L. & Corbit, J. D. (1974). An opponent-process theory of motivation: I.

Temporal dynamics of affect. Psychological Review, 81, 119 – 145.

Solomon, R. L., & Wynne, L. C. (1954). Traumatic avoidance learning: The principles

of conservation and partial irreversibility. Psychological Review, 61, 6,

353 – 385.

Sutton, R.S. & Barto, A.G. (1998). Reinforcement learning: An introduction. Cambridge,

MA: The MIT Press.

Taylor, S. E. (1991). Asymmetrical effects of positive and negative events: The

mobilization-minimization hypothesis. Psychological Bulletin, 110, 1, 67 – 85.

Todd, P. M., Gigerenzer, G. (2000). Précis of simple heuristics that make us smart.

Behavioral and brain sciences, 23, 727 – 780.

Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative

representation of uncertainty. Journal of Risk and Uncertainty, 5, 297 – 323.

51

Willis, J., & Todorov, A. (2006). First impressions: Making up your mind after a 100-ms

exposure to a face. Psychological Science, 17, 7, 592 – 598.

52

Appendix A: Tables and Figures

53

# Equation Function type Used for:

Error Illustration 1 error = outcome − CRV function only

Update Models 2 CRV '= CRV + LR *(outcome − CRV ) function 1, 2 & 3 € CRV1 e Softmax Illustration 3 Pr(CRV1) = € eCRV1+...+CRVn function only

CRV1+obias e Softmax Illustration 4 Pr(CRV1) = CRV1 ... CRVn obias function only € e + + +

wbias*CRV1− e Softmax Illustration 5 Pr(CRV1) = − + − + function only € ewbias*CRV1 +CRV 2 +CRV 3 +CRV 4

CRV1+CRV 2 e Softmax 6 Model 1 Pr(accept) = CRV1+CRV 2 function € e +1

⎧ behavior * Pr accept ⎫ ⎪[ ] [ ( )] ⎪ Optimization Models 7 optimizer 1*⎨ ⎬ € = − function 1, 2 & 3 ⎩⎪+ 1− [behavior] * 1− [Pr(accept)] ⎭⎪

wbias *CRV1+ wbias *CRV 2+obias e Pr(accept) = € wbias *CRV1+ wbias *CRV 2+obias e +1 Softmax Models 8 function 2 & 3 note : wbias disappears if corresponding CRV is a non - negative value.

Table 1. Reinforcement learning equations. €

54

Figure 1. Conceptual overview of the computational modeling process.

55

Figure 2. Sample trial from the value estimation task.

56

Trial

Figure 3. Single subject value output from model 1.

57