Quick viewing(Text Mode)

Signal Detection Theories of Recognition Memory Caren M

Signal Detection Theories of Recognition Memory Caren M

Rotello 1

Signal Detection Theories of Recognition

Caren M. Rotello Department of Psychological & Sciences University of Massachusetts

June 21, 2016 – to appear as: Rotello, C. M. (in press). detection theories of recognition memory. To appear in J. T. Wixted (Ed.), Learning and Memory: A Comprehensive Reference, 2nd edition (Vol. 4: Cognitive of Memory). Elsevier. Please read and cite the published version.

Correspondence:

Caren M. Rotello Department of Psychological & Brain Sciences University of Massachusetts 135 Hicks Way Amherst, MA 01003-9271 (413) 545-1543 [email protected]

Rotello 2

Abstract

Signal detection theory has guided thinking about recognition memory since it was

first applied by Egan in 1958. Essentially a tool for measuring decision accuracy in

the context of uncertainty, detection theory offers an integrated account of simple

old-new recognition judgments, decision confidence, and the relationship of those

responses to more complex memory judgments such as recognition of the context in

which an event was previously experienced. In this chapter, several commonly used

signal detection models of recognition memory, and their threshold-based

competition, are reviewed and compared against data from a wide range of tasks.

Overall, the simpler signal detection models are the most successful.

Keywords:

Associative memory d' Discrimination accuracy Dual-process recognition memory Item memory Familiarity Mixture models Modeling Plurality discrimination Recollection Recognition memory Signal detection theory Source memory Threshold models

Rotello 3

1 Introduction

Recognition memory errors are unavoidable. Sometimes we fail to recognize

an individual we’ve had a casual conversation with; other times we initiate a

conversation with a stranger, erroneously thinking we met him at a campus event.

These errors are not limited to weak : Lachman and Field (1965) asked

subjects to study a single list of 50 common words as many as 128 times and found

that the percentage of studied words that are called “old” reached an maximum of

88%, while the false recognition of an unstudied word hovered around 2%.

Although this performance level is excellent, free of those studied words

under the same conditions was 98% correct with no intrusion errors. The difference

of memory accuracy between recall and recognition suggests that the decision

process itself plays an important role in recognition memory.

Signal detection theory (SDT: Green & Swets, 1966; Macmillan & Creelman,

2005) provides a theoretical framework for quantifying memory accuracy as well as

the role of decision processes. It makes explicit the balance between possible

memory errors (missed studied items and false alarms to lures), as well as the

inevitability of those errors. In some applications, such as eyewitness

identifications, the different error types come with variable implicit costs: failing to

identify the guilty suspect in a lineup allows a criminal to go free; falsely accusing

the wrong individual leaves the true criminal unpunished and may send an innocent

person to jail. In experimental settings, there may be performance penalties (cash

Rotello 4

or point deductions, delayed onset of the subsequent trial) associated with missed

opportunities to recognize a studied item or with false identifications of unstudied

memory probes. These considerations may suggest the one type of error is more

desirable – or, at least, less undesirable – than the other, and decision makers can

shift their strategy to reduce the probability of the costlier error. Modeling

recognition memory using signal detection allows independent assessment of the

decision process and the ability of the individual to discriminate categories of items.

Competing models of recognition memory make different assumptions about

the nature of memory errors. Discrete state, or threshold, models (e.g., Krantz,

1969) assume that probing memory with a studied item can either result in its

detection as a previously experienced item, or in no information at all being

available about its status. For this reason, these models are often described as

having complete information loss below a recollection threshold. In the most

common version of these models, errors occur either because of a random guessing

process, or, sometimes, because a response is offered that directly contradicts the

evidence available from memory. The relationship between misses and false alarms

is not specified in advance by threshold models; as we'll see, different parameter

choices allow a good deal of flexibility. In particular, certain parameter settings

allow either misses or false alarms to be avoided entirely.

A second type of competing model assumes that more than one process

contributes to recognition memory decision, in agreement with Mandler’s (1980)

well-known butcher on the bus example. The man on the bus may be recognized

because he seems familiar; both models in this class assume that familiarity

Rotello 5

operates as a signal detection process. The man may also be recognized because we

remember how we know him; that he is the butcher. This recollection process has

been described as operating either as a second signal detection process (Wixted &

Mickes, 2010) or as a threshold process (Yonelinas, 1994). The recognition memory

errors that are predicted by these models vary with their assumptions, as will be

described in section 2.

Finally, a third type of competing model assumes that recognition decisions

for studied items are based on a mixture of two types of trials, those on which the

study item was attended, and those for which it was not. As we will see, these

mixture models inherit most of the properties of the signal detection models,

including the inability to avoid a trade-off between the two types of recognition

errors.

This chapter is divided into three main sections. I begin by describing the

competing models in detail. Next, I review the empirical evidence that discriminates

the models, concluding that the traditional signal detection approach provides the

best overall description of the literature. Finally, I consider some challenges for

signal detection models.

2 The Models

2.1 Equal- (EVSDT) and Unequal-Variance (UVSDT) Signal

Detection Models

Rotello 6

The earliest signal detection models of recognition memory assume

recognition decisions are made based on a single underlying evidence dimension

(see Figure 1). Both studied items (targets) and unstudied items (lures) are assumed

to resonate with memory to varying degrees, resulting in a distribution of observed

memory strengths; the strength of targets is increased by their recent study

exposure, shifting that distribution's mean to a greater value than that of the lures.

Recognition decisions are based on a criterion level of evidence, with positive

("old") decisions given to all memory probes whose strengths exceed that criterion,

otherwise negative ("new") responses are made. The proportion of the target

distribution that exceeds the criterion equals the theoretical proportion of studied

items that are identified as "old", which is the hit rate (H). The miss rate is the

proportion of targets that are erroneously called "new," so H + miss rate = 1.

Similarly, the proportion of the lure distribution that exceeds that same criterion

provides the false alarm rate (F), which is the proportion of lures that falsely elicit

an "old" response. Finally, the proportion of lures that are called "new" is the correct

rejection rate. By varying the location of the criterion, the hit and false alarm rates

can be increased or decreased.

< Insert Figure 1 near here >

There are two important points to make about the SDT model. First, changes

in the criterion location always change hit and false alarm rates in the same

direction (both increasing, for more liberally-placed criteria, or both decreasing, for

more conservative criteria), though the observed differences may not be statistically

Rotello 7

significant. Second, errors are impossible to avoid. A criterion that is liberal enough

to yield a low rate of missed targets will necessarily result in a very high false alarm

rate, and one that is conservative enough to result in a low false alarm rate will

necessarily produce a high miss rate.

We can measure participants' discrimination accuracy – that is, their ability

to distinguish the targets from the lures – in terms of the distance between the

means of the distributions, in standardized (z-score) units. When the two

distributions have the same variance (as in the upper row of Figure 1), this distance

is called d' and it is independent of the criterion location, k. In that case, the model

can be defined with only those two parameters. Setting the mean of the lure

distribution to 0 and its standard deviation to 1 (without loss of generality), the

false alarm rate is defined by

F = Φ(−k) (1)

where F is the cumulative normal distribution function. Similarly, the hit rate is

defined by the same criterion relative to the mean, d’, of the target distribution:

H = Φ(d"− k) . (2)

We can combine equations 1 and 2 to see that

d! = zH − zF (3)

where z is the inverse of the normal CDF. Because the criterion location, k, drops

out of calculations in Equation 3, its value does not our estimate of

discrimination accuracy: response (k) and memory sensitivity (d') are

independent in this model. Effectively, this means that there is a set of (F, H) points

Rotello 8

that all yield the same value of d'; each point simply reflects a different willingness

of the participant to say “old.” Connecting all of these possible (F, H) pairs yields a

theoretical curve called a receiver operating characteristic (ROC); transforming both

F and H to their z-score equivalents yields a zROC.

Several example ROCs and corresponding zROCs are shown in the upper row

of Figure 1, for different values of d'. The ROCs and zROCs associated with higher

decision accuracy fall above those with lower accuracy: for any given false alarm

rate, the ROC yielding higher accuracy has a greater predicted hit rate. The points

on each ROC and zROC vary only in criterion location, k, with more conservative

response (larger k) yielding points that fall on the lower left end of the ROC

because larger values of k result in lower F and H. In probability space, the ROCs are

curved and symmetric about the minor diagonal. In normal-normal space, we can

use Equation 3 to see that the zROC is a line, zH = d' + zF, for which the y-intercept

equals d' and the slope is 1. The model in the upper row of Figure 1, and thus

equations 1-3, applies only when the variability of the target distribution equals that

of the lure distribution. For this reason, this model is called the equal-variance SDT

(EVSDT) model. As we will soon see, the equal variance property of this basic model

causes the symmetry in the theoretical ROC.

Egan (1958) was the first to apply this model to recognition memory data.

Participants in two experiments studied 100 words either once or twice, and then

made old-new decisions on those studied words mixed with 100 lures. Their

responses were made on a confidence rating scale ranging from “positive” that the

test probe was studied to “positive” that it was new. The response probabilities in

Rotello 9

each of these confidence bins can be modeled with the same overall discrimination

value (d' ) but a different criterion (k1-km, see Figure 1). Egan's data immediately

suggested a problem with the model: the ROCs were not symmetric, either for

individual data or for the average across participants, and therefore were not

consistent with the equal-variance SDT model. (Figure 6 shows the ROC produced

by one of his participants.) Fortunately, it is straightforward to modify the model to

allow for unequal-variance distributions.

Again assuming (without loss of generality) that the lure distribution is

normal with a mean of 0 and a standard deviation of 1, we can set the mean of the

target distribution to be d and its standard deviation to be s. The lower row of

Figure 1 shows what these distributions might look like. In this unequal-variance

version of the EVSDT model, the UVSDT, Equation 1 still holds because the false

alarm rate is defined by the mean and standard deviation of the lure distribution,

and those haven’t changed. The hit rate calculation in the unequal-variance model

must take account of the standard deviation of the target distribution, s, yielding

# d − k & H = Φ% ( $ s ' . (4)

Notice that discrimination accuracy, which is the distance between the means of the

target and lure distributions, now can be defined in several ways. The two most

obvious approaches are to measure the distance in units of the standard deviation of

the lure distribution (d'1),

Rotello 10

d1! = s⋅ zH − zF , (5)

or in units of the standard deviation of the target distribution (d'2),

1 d! = zH − ⋅ zF . (6) 2 s

Several example ROCs are shown in the lower row of Figure 1, for different values of

d'1 (and, equivalently, d'2). The corresponding zROCs are also shown. The y-

intercept is the value of zH when zF = 0; Equation 5 tells us that zH = d'2 at that

point. It also shows that the zROC, zH = d'2 + (1/s) zF, is a line with slope equal to the

ratio of the lure and target distributions standard deviations. This connection of the

slope of the zROC and the ratio of standard deviations is a handy property that has

theoretical significance. The linear form of the zROC, and its slope, are heavily

studied aspects of recognition ROCs.

A third strategy for measuring the distance between the target and lure

distribution means is to use units that reflect a compromise between the two

standard deviations. The best strategy for doing so involves the root mean square

standard deviation, yielding da,

! 2 $ ! 1 $ d = # & #zH− ⋅zF& a "1+s2 % " s % (7)

because of its relationship to the area under the ROC, Az:

" da % Az = Φ$ ' . (8) # 2 &

Rotello 11

Az is equals the proportion correct an unbiased participant would achieve in a task

involving selection of the target in a set of target-lure pairs. Notice that da = d' when

the variance of the target and lures distributions are equal (s = 1).

2.2 Dual-process and Mixture Signal Detection Models

2.2.1 The high-threshold signal detection model (HTSDT)

Yonelinas (1994) proposed a very different explanation of the asymmetry in

the recognition ROC, namely that participants sometimes recollect studied items.

Because recollection can't occur for lures and because it's likely to result in high

confidence responses, only the highest-confidence hit rate is increased by the

contribution of recollection. This causes the left end of the ROC to be shifted

upwards, resulting in an asymmetric function. Yonelinas assumed that, in the

absence of recollection, responses are based on an equal-variance signal detection

process that assesses the familiarity of the memory probe. In this dual-process

model, recollection operates as a high-threshold process (lures can't be recollected),

so we'll call it the high threshold signal detection (HTSDT) model. According to

HTSDT, the hit rate is defined by

H = R + (1− R)⋅ Φ(d% − k), (9)

where R is the probability of recollection, and the false alarm rate is given by

€ Equation 1; the model can be described by the parameters R and d'. Some example

Rotello 12

ROCs and zROCs are shown in Figure 2. Notice that zROC for the HTSDT model is

curved. In this model, familiarity-based recognition errors occur trade off against

one another, exactly as in the EVSDT and UVSDT models. However, the recollection

process cannot result in false alarms, and if all responses to targets are based on

recollection, there can be no misses. 1

< Insert Figure 2 near here >

2.2.2 The continuous dual process signal detection model (CDP)

The idea that some items on a recognition test may be recollected has a long

history (e.g., Mandler, 1980). However, nothing about recollection demands that it

is a high-threshold process. Wixted and Mickes (2010) proposed a dual-process

model in which both recollection and familiarity are based on underlying signal

detection processes, the results of which are summed to yield an old-new response.

For this reason, the CDP model is identical to the UVSDT model for item recognition.

However, the CDP model also allows the two processes to be queried separately.

2.2.3 The mixture signal detection model (MSDT)

DeCarlo (2002, 2003, 2007) proposed an extension of the standard EVSDT

1 The Sources of Activation Confusion (SAC) Model (Reder et al., 2000) is a process model similar to the HTSDT model. This chapter will not discuss process models.

Rotello 13

model in which study items are not always fully encoded. On some trials, the item is

fully encoding (resulting in a relatively large increment in memory strength, dFull),

whereas on other trials the study item is only partially encoded (leading to a small

increment in strength, dPartial). The probability of full encoding is given by the

parameter l, which can be interpreted as a measure of attention. At test, the target

distribution reflects a mixture of responses from these two distributions. This

mixture distribution for the targets is not Gaussian, and its precise form depends on

both l and the distance between the means of the full- and partially-encoded

distributions. The hit rate is defined as follows:

H = λ ⋅Φ(dFull − k) +(1− λ)⋅Φ(dPartial − k) (10)

The false alarm rate is defined as in Equation 1. Notice the close relationship

between the MSDT and HTSDT models: when dFull is very large (as is often the case

when fit to empirical data), then the first component of the hit rate is essentially just

l, analogous to the R parameter of the HTSDT model.

The decision space assumed by the MSDT model is shown in Figure 3, as well

as several example ROCs and zROCs. Note the unusual form of the zROCs, which will

become important in the discussion of associative and source recognition.

< Insert Figure 3 near here >

2.3 The Double High-Threshold (2HT) Model

Rotello 14

The remaining model of recognition memory that has been popular in recent

years is a discrete state model. The double high-threshold (2HT: Snodgrass &

Corwin, 1988) model assumes that memory probes result in different internal states

(see Figure 4). Targets can either be recollected, in which case they are always

judged to be “old,” or they result in a state of uncertainty from which the participant

must guess "old" or "new." Lures can be detected as new, resulting in a "new"

decision, or they result in the same state of uncertainty as un-recollected targets.

The model is called a double high-threshold model because there are two

thresholds: the lures can't cross the recollection threshold and be called "old" from

that state, and the targets can't be detected as new. Memory errors always result

from the uncertain state. In the 2HT model, the hit rate depends on the probability

that targets are recollected (po) and the probability that, if not recollected, they

result in a guess "old" decision (g):

H = po + (1− po )g. (11)

An "old" response to a lure can only occur because of guessing, so the false alarm

€ rate is determined by the probability that lures fail to be detected as new and the

rate of "old" guessing:

F = (1− pn )g. (12)

In this form, the 2HT model predicts that the ROC is a line with y-intercept equal to € po and slope of (1-po)/(1-pn). Different response biases can occur in simple old-new

decision tasks when the guessing rate, g, is varied. Sensible changes in g occur when

different base-rates of targets and lures appear on a memory test, or when

Rotello 15

participants are offered different penalties and rewards for specific responses.

However, it’s possible within this model for the g parameter to be set to 0, so that

false alarms are entirely avoided, at the price of an increase in the miss rate.

Similarly, there’s nothing about the model itself that prevents the g parameter from

being set to 1, so that misses are eliminated at the price of an increase in false

alarms.2

< Insert Figure 4 near here >

The 2HT model does not naturally predict confidence ratings, though it can

be extended to do so by adding parameters from the internal states (recollect,

uncertain, detect-new) to the possible confidence judgments, as in Figure 5. If

recollected targets are always given highest-confidence "old" responses, then the

confidence-based ROC predicted this modified 2HT model is still linear. However,

the model can accommodate the curved confidence-based ROCs that are observed

empirically, if recollected targets (which logically should receive the highest-

confidence response) are allow to yield lower-confidence "old" responses (e.g.,

Malmberg, 2002; Bröder & Schütz, 2009). Similarly, the detected lures are allowed

to result in lower "new" decisions. (Some researchers even allow responses to

detected items to fall in the "opposite" response category, so that recollected targets

may still yield a "new" decision.) This modified model requires more parameters to

describe the mapping from the internal states, but the increase in free parameters

results in much better fits to data (and greater model flexibility, of course).

2 More complex versions of the basic 2HT model have also been proposed (e.g., Brainerd, Reyna, & Mojardin, 1999). Testing these variants require experimental conditions beyond those considered in this chapter.

Rotello 16

< Insert Figure 5 near here >

3 The Evidence

In 1970, Banks expressed the optimistic view that signal detection modeling

would ultimately allow us to identify the number and nature of processes involved

in recognition memory decision. At the time, the models under consideration

included the SDT and threshold models (including the 2HT model and several other

variants). Those models do make quite distinct predictions about the form of the

ROC (see Pazzaglia, Dube, and Rotello, 2013, for details). Specifically, the 2HT model

can accommodate a curved ROC based on confidence ratings, but it must predict a

linear ROC if participants make binary old-new decisions and the responses that

define the different operating points are collected independently. In contrast, the

SDT model always predicts a curved ROC will result, regardless of whether

confidence ratings or binary old-new decisions are collected. As we’ll see, Banks’s

optimism was reasonably well placed for these models.

With the addition of the hybrid HTSDT and mixture MSDT models, however,

the landscape became more challenging. These models do make predictions that, in

principle, allow them to be distinguished from the others. For example, the zROC is

predicted to be linear for SDT and to have an upward curvature for the HTSDT

model, and the HTSDT models predicts that both the slope of the zROC and its

curvature are systematically influenced the probability of recollection. Despite

Rotello 17

these apparently contrasting predictions, the models turn out to fit basic item

recognition data quite well, requiring that the experimental paradigms be expanded.

In section 3.1, I review the relevant data from item recognition experiments before

turning, in section 3.2, to new paradigms designed to influence recollection

probabilities and, in section 3.3, to experiments that rely on differential model

predictions in paradigms that do not yield ROC data. To preview the results of this

literature survey, the UVSDT model comes out ahead on nearly every measure.

However, section 4 will consider some potential challenges to the success of the

UVSDT model.

3.1 Traditional Item Recognition Tasks

Standard item recognition experiments, like Egan's (1958), provide a wealth

of data that can be reasonably well described by all of the models considered here.

In these experiments, participants study a set of items (typically words) one at a

time. At test, they are asked to identify the studied words from a test list that

includes both targets and lures. Subjects' responses may be simple old-new

decisions for each memory probe, but more commonly they are ratings of

confidence that the probe was studied. These ratings are then used to generate

ROCs, a strategy that has revealed a number of "regularities" in the data. For

example, the zROC is typically linear with a slope of about 0.8 as long as memory

accuracy is well above chance (e.g., Ratcliff, Sheu, & Gronlund, 1992; Glanzer, Kim,

Hilford, & Adams, 1999; Yonelinas & Parks, 2007).

Rotello 18

3.1.1 Confidence based ROCs

3.1.1.1 Fits of the models to data

Confidence based ROCs are generated by asking participants to rate their

confidence that each memory probe was studied (e.g., 6 = “sure old”, 5 = “probably

old”, 4 = “maybe old”, 3 = “maybe new”, 2 = “probably new”, 1 = “sure new”). The

probability of a hit and a false alarm in each confidence bin is then calculated; these

values are incrementally summed to yield the operating points on the ROC. For

example, the most conservative point, yielding the lowest hit and false alarm rates,

is based on responses in the “sure old” category; the second point on the ROC

depends on the sum of the response probabilities in the “sure old” and “probably

old” bins, etc. The final point on the confidence ROC is always (1,1), when all

responses are included.

Pazzaglia et al. (2013) suggested that discriminating the UVSDT and HTSDT

models with ROCs would be extremely difficult because they make very similar

predictions. Indeed, both models have a parameter to summarize old-new

discrimination accuracy (d', d) and a parameter to capture the asymmetry of the

ROC (R, s). Similarly, the MSDT model can be viewed as a version of the HTSDT

model if the full attention trials yield (essentially) perfect encoding. When applied

to data, all of these models fit well. Figure 6 shows the fit of each of these models, as

well as the 2HT model, to the data of a single subject in Egan’s (1958, Exp. 1) study.

Rotello 19

As is obvious in the figure, all of the models provide excellent descriptions of these

data; only the EVSDT model can be rejected based on a goodness of fit statistic.

< Insert Figure 6 near here >

High-fidelity fits like those in Figure 6 are routinely observed. For example,

DeCarlo (2002) compared the MSDT and UVSDT models’ ability to fit data, finding

them to be essentially tied. Yonelinas (1999b, p. 514) noted that the HTSDT and

UVSDT “models provided an accurate account of the ROCs, capturing more than

99.9% of the variance of the average ROCs.” For the very same studies, however,

Glanzer et al. (1999) found that the UVSDT model provided a better fit in all 10 data

sets. Glanzer et al. (1999) also tested specific HTSDT predictions about the

relationship between the probability of recollection, the slope of the zROC, and the

curvature of the zROC, finding no support for those predictions. Similarly,

Heathcote (2003) compared the fits of the UVSDT and HTSDT models for a set of

experiments in which the targets and lures were similar to another. The similarity

manipulation was intended to increase the probability that recollection would play

a role in the recognition of targets (Westerman, 2001), yet the zROCs showed no

evidence of the curvature that is predicted by the HTSDT model. Other comparisons

of the UVSDT and HTSDT models have also favored the signal detection view (e.g.,

Kelley & Wixted, 2001; Healey, Light, & Chung, 2005; Rotello, Macmillan, Hicks, &

Hautus, 2006; Dougal & Rotello, 2007; Kapucu, Rotello, Ready, & Seidl, 2008; Jang,

Wixted, & Huber, 2011).

3.1.1.2 Assessments of model flexibility

Rotello 20

The UVSDT, HTSDT, 2HT, and MSDT models all provide reasonable fits to

data, but they do have different assumptions and therefore can’t all provide an

accurate account of the underlying recognition memory processes. As Roberts and

Pashler (2000) pointed out, a good fit to data does not imply that the model itself is

a good one; it might have enough flexibility to fit a wide range of possible data, for

example, including random noise. The flexibility of a model comes from its number

of free parameters and its functional form (Pitt, Myung, & Zhang, 2002). Increasing a

model’s parameters generally increases its ability to fit data, but two models with

the same number of parameters may nonetheless differ in flexibility. For example, a

sinusoidal model y = a sin (bx) can exactly fit any data generated by the linear model

y = cx + d, as well as fitting data that are non-linear; the sinusoidal model has greater

flexibility because of its functional form. For this reason, we must determine the

relatively flexibility of the competing models of recognition memory before we can

conclude that the UVSDT model provides the best description of the data.

The number of parameters for each model is shown in Table 1 for a

confidence-rating old-new task. The UVSDT and HTSDT models have the same

number of parameters, whereas the EVSDT model has one fewer, and the MSDT

model one more. The 2HT model has a high degree of flexibility because the state-

response mapping parameters are selected by the experimenter. Wixted (2007)

reported a small-scale model-recovery simulation of the UVSDT and HTSDT models

that concluded that the two models were about equally flexible, a conclusion that

has been modified only slightly in subsequent work.

Rotello 21

< Insert Table 1 near here >

Jang et al. (2011) used a parametric bootstrap cross-fitting method (PBCM:

Wagenmakers et al., 2004) to compare the flexibility of the UVSDT and HTSDT

models. The PBCM has several steps. Essentially, model parameters are sampled

from the range that would be estimated from fits to real data, and those parameters

are used to generate simulated data from the models. Then, both models are fit to

the simulated data and the difference in the resulting goodness of fit measures

(DGOF) is computed. This process is repeated many times to generate a distribution

of observed DGOF values when each model generated the data. The degree of

overlap of these distributions is a measure of how well the models mimic each other

(see Wagenmakers et al., 2004; Cohen, Rotello, & Macmillan, 2008, for details). For

group-level data, Jang et al. concluded that models could be readily distinguished,

and that the UVSDT model was very slightly more flexible than the HTSDT model. In

contrast, a similar analysis by Cohen et al. (2008) concluded that the UVSDT model

was slightly less flexible. A related but smaller-scale comparison of a version of the

2HT and UVSDT models concluded that the 2HT model has greater flexibility (Dube,

Rotello, & Heit, 2011).

When an individual subject provides the initial data that are used to estimate

the model parameters from which simulated data are generated, the distributions of

DGOF values also provide quantitative information about how diagnostic those data

are. Diagnostic data are those for which the distributions have little to no overlap;

for non-diagnostic data, the overlap may be substantial. The extent of overlap can

be used as a measure of the probability that the wrong model is identified by the

Rotello 22

GOF measures. Jang et al. (2011) used this approach to assess the diagnosticity of

97 individual participants’ ROCs, finding that many provided non-diagnostic data.

For those individuals, there were two common findings: the slope of the zROC was

likely to be near 1, and the HTSDT model was more likely to be selected. For

individuals who provided more diagnostic data (the DGOF distributions overlapped

less), the slope of the zROC tended to be shallower, and the UVSDT model was

usually selected. The clear implication of Jang et al.’s work is that the UVSDT model

is the better-fitting model when the data actually allow a strong conclusion to be

drawn.

Overall, the analyses of model flexibility suggest that the success of the

UVSDT model cannot be attributed to greater flexibility. For group-level data, the

UVSDT and HTSDT models have similar degrees of flexibility, and for individual

subjects’ data, the UVSDT model is selected only when the data are actually

informative.

3.1.1.3 Explanations of the zROC slope

There are qualitative reasons to prefer the UVSDT model over the HTSDT

model, as well as the quantitative reasons in section 3.1.1.2. For one, the slope of

the zROC has a natural explanation in terms of variability in the increment to an

item's strength that occurs during study (Wixted, 2007); this idea is reflected in the

assumptions of recent process models of memory (e.g., Shiffrin & Steyvers, 1997).

In contrast, the HTSDT model assumes that the slope of the zROC is directly related

Rotello 23

to the probability of recollection, a hypothesis that has not survived inspection (e.g.,

Glanzer et al., 1999).

The variable encoding hypothesis is that during study the strength of the

item is incremented by an amount sampled from a baseline distribution appropriate

for the encoding task, plus an amount sampled at random from a noise distribution

with a mean of zero. Koen and Yonelinas (2010) attempted to discredit this

hypothesis by asking one group subjects to study items for either a shorter (1 sec)

or longer (4 sec) amount of time, and another group to study the same items for 2.5

seconds each. The slope of the zROC did not differ across groups. However, their

experiment assessed the impact of mixing two distributions on the slope of the

zROC, rather than testing the variable encoding hypothesis (Jang, Mickes, & Wixted,

2012; Starns, Rotello, & Ratcliff, 2012). A mixture of two Gaussian distributions

(one for strong items and another for weak) isn’t Gaussian and thus the slope of the

resulting zROC doesn’t provide a valid estimate of the ratio of lure and target

standard deviations, 1/s. Thus, there is still no definitive test of encoding

variability. Although the Koen and Yonelinas (2010) experiment appears to provide

a good test of the MSDT model, Starns et al. (2012) showed that their manipulation

of encoding strength lacked power.

One interesting test of the HTSDT, 2HT, and UVSDT accounts of zROC slope is

this: Does the relative variability in the observed confidence ratings for targets and

lures correspond to the slope of the zROC? Mickes, Wixted, & Wais (2007) asked

exactly this question. Participants in a standard old-new recognition experiment

made their responses on either a 20- or 99-point confidence scale. These confidence

Rotello 24

ratings were used to generate empirical ROCs, which were well-described by the

UVSDT model. The estimated slope parameter, which equals the ratio of standard

deviations of the lure and target distributions, was about 0.8, as usual. Mickes et al.

also calculated the ratio of the standard deviations of each subject's confidence

ratings to the lures and to the targets, independently of the ROC analysis. The

HTSDT does not predict any relationship between these two ratios, because it

assumes that the zROC slope is determined by the probability of recollection (which

doesn't depend on the confidence ratings). Similarly, the 2HT model does not

predict a relationship between the two measures of variability because the

confidence ratings are not based on memory strength. Even the UVSDT model does

not constrain the two ratios to yield the same value: If the criteria are tightly-spaced

for higher memory strengths and spread out for weaker strengths, then the ratings-

based ratio may underestimate the ROC-based value. On the other hand, the

ratings-based ratio may overestimate the ROC-based value if the criteria are widely-

spaced for high evidence values and compressed for lower evidence values. Overall,

however, the average value of the two estimates of the standard deviation ratios

was identical in one experiment and highly similar in the other. Across participants,

the correlations of the two estimates were .61 and .83 in the two experiments,

providing strong support for the UVSDT model over the competitors.

3.1.1.4 A test of the 2HT model: conditional independence

Like the other models, the 2HT model can fit the confidence-based ROCs (see

Figure 6), at the price of additional parameters to map from internal states to

Rotello 25

response ratings (compare Figures 4 and 5). Moreover, to fit the curvature that is

ubiquitous in these confidence ROCs, the 2HT model must assume that participants

give at least some lower-confidence responses to studied items that they have

detected as old (e.g., Erdfelder & Buchner, 1998; Malmberg, 2002; Bröder & Schütz,

2009). Said differently, to fit the confidence-based ROCs, the 2HT model must

assume that participants sometimes give low-confidence responses even in the face

of infallible evidence that the item was studied.

The 2HT model allows for the possibility that items that are encoded with

greater strength may have a higher probability of being detected than more weakly-

encoded items (i.e., the value of po may differ for strong and weak items; see Fig. 5),

but responses from the detect-old state depend only on the arbitrary state-response

mapping parameters (e.g., Klauer & Kellen, 2010; Bröder, Kellen, Schütz, &

Rohrmeier, 2013). The distribution of confidence ratings must be the same for all

detected targets, regardless of their encoding strength, because there is only one

detect-old state and only one set of response probabilities that lead from that state

to the confidence ratings (Fig. 5).3 This aspect of the 2HT model is known as the

conditional independence assumption (Province & Rouder, 2012)

Province and Rouder (2012) tested the conditional independence

assumption of the 2HT model by presenting participants with about 240 unique

study items a variable number of times (1, 2, or 4 times each) on a single list. The

recognition test was a two-alternative forced choice task: a target and a lure were

3 The overall distribution of confidence ratings for strong and weak targets may vary because they reflect different mixtures of responses based on detection and guessing.

Rotello 26

tested together, and participants were asked to select the target from each pair.

Responses were made on a continuous scale to indicate confidence in the decision

("sure target on left" to "sure target on right"). Province and Rouder also included

some test pairs that contained two lures, forcing participants to guess; responses to

these trials provide important data about how confidence ratings were distributed

from the uncertain state. Across three experiments, they reported that the

distribution of confidence ratings (conditional on a detection-based response) was

independent of encoding strength: the ROCs were curved in all strength conditions

except for the lure-lure trials.

While Province and Rouder’s (2012) data support the 2HT model, Chen,

Starns, and Rotello (2015) also tested the conditional independence assumption,

reaching a different conclusion. Two primary changes were made to Province and

Rouder's approach. First, the memory test used a simple old-new recognition

procedure with confidence ratings. Second, and more importantly, multiple short

study lists were used (14 lists with 42 unique items each) that included a small

number of "super strong" study items. The super strong stimuli were shown four

times each with a different encoding task each time (e.g., rate how easy it is to form

a mental image of this item; rate it for survival relevance). These super strong items

were expected to have a high probability of being detected, and they did for most

participants. For these participants, the super strong items were almost invariably

given the highest-confidence "old" response. For more weakly encoded stimuli

(those studied 1, 2, or 4 times without a specific encoding task), the probability of a

highest-confidence old rating was much lower, even for detected items. Most

Rotello 27

participants’ data were better fit by the UVSDT model, because the conditional

independence assumption of the 2HT model was violated: the distribution of

confidence ratings varied with the encoding strength of the targets.

3.1.2 Binary response ROCs

The ROCs described so far were generated from confidence ratings. It is also

possible to generate ROCs from manipulations, such as presenting

different proportions of targets and lures across tests, or by offering different

incentives for “old” or “new” responses; confidence ratings are not collected. In this

second “binary response” type of ROC, the operating points are generated

independently of one another, either in different tests or even from different groups

of participants.

Binary response ROCs are important data that can discriminate signal

detection models from the threshold models (Banks, 1970). As Figures 4 and 5

show, confidence-based ROCs can generally be fit with threshold models by

assuming that there are state-response parameters to generate the probabilities of

the ratings responses (Erdfelder & Buchner, 1998; Malmberg, 2002; Bröder &

Schütz, 2009). In effect, those extra parameters give the 2HT model the flexibility it

needs to fit the curved confidence ROCs that are consistently observed. The story is

different when the ROCs are generated from binary old-new decisions in different

bias conditions, because in that case there are only two response bins (“old,” “new”)

and additional state-response parameters can't be added to redistribute responses

Rotello 28

from one category to another. The ROC predicted by the 2HT in this case is always a

line with slope equal to (1-po)/(1-pn), as shown in Figure 4. In contrast, the SDT

models make the same prediction of curved ROCS regardless of whether they are

confidence-based or generated from independent bias conditions.

3.1.2.1 Fits of the models to data

Bröder & Schütz (2009) fit all binary response ROCs in the recognition

memory literature, concluding that the 2HT model fit as well as the UVSDT model.

However, their analyses included a large number of ROCs that contained only two

points. As Dube and Rotello (2012) noted, two-point ROCs cannot discriminate the

2HT from UVSDT models because two points can be fit by either a line or a curve.

After excluding those two-point ROCs and running two new experiments, Dube and

Rotello (2012) fit the 2HT and UVSDT models to all available data on binary-

response ROCs reported for individual subjects in the domains of and

recognition memory. The resulting binary ROCs were curved, not linear, and

strongly supported the UVSDT model for the vast majority of participants.4 Dube,

Starns, Rotello, and Ratcliff (2012) also reported ROCs based on binary responses.

They included a within-list manipulation of encoding strength (words were studied

once or 5 times each); because a single set of lures was used, the 2HT model is

constrained to a single value of pn. Dube and colleagues tested the UVSDT and 2HT

4 Dube and Rotello (2012) based their conclusion on the AIC and BIC fit statistics. Kellen et al. (2013) used normalized maximum likelihood (NML) for model selection, and on that basis concluded in favor of the 2HT model. In the next section, we will evaluate the plausibility of the NML conclusion.

Rotello 29

models, again finding that the ROCs were curved and inconsistent with the 2HT

model. Thus, the model-fitting evidence is strongly in favor of the UVSDT model

(Dube & Rotello, 2012; Dube et al., 2012).

One additional study provides strong evidence in favor of the UVSDT model

and against both the 2HT and HTSDT models. Starns, Ratcliff, and McKoon (2012)

collected both old-new recognition decisions and reaction times in an experiment

that manipulated response bias by varying the proportion of targets on the test. In

addition, participants were asked to respond quickly on some tests, and to

emphasize accuracy on other tests. Fits of the HTSDT model revealed that the

probability of recollection increased with encoding strength (which was

manipulated within-study list), but was not affected by the speed and accuracy

instructions. The absence of a reduction in recollection under speed instructions is

problematic for the HTSDT model because most of those responses were made in

less time than recollection appears to require (e.g., McElree, Jacoby & Dolan, 1999).

The zROCs were also linear in all conditions, with slopes less than 1, contradicting

the predictions of the HTSDT and 2HT models.

Perhaps the most interesting aspect of the Starns et al. (2012) study,

however, is that the diffusion model (Ratcliff, 1978) was fit to the response

probabilities and reaction time distributions simultaneously. The diffusion model is

a sequential sampling model that assumes information accumulates over time

according to an average drift rate that reflects the quality of the evidence for targets

and lures. Across trials, a distribution of drift rates is assumed; the variance of this

distribution can be interpreted as measuring the variability in evidence values for

Rotello 30

targets and lures. The ratio of lure to target drift rate standard deviations was less

than one for 9 of the 12 conditions, providing converging evidence for the UVSDT

model.

3.1.2.2 Assessments of model flexibility

As for the confidence rating paradigm, the flexibility of the models to fit the

binary response ROCs should be considered. Dube, Rotello, and Heit (2011)

reported a comparison of the 2HT and UVSDT models as fit to binary ROCs with

three operating points that were either closely spaced or more spread out. They

concluded that the models were difficult to discriminate, especially when the

operating points were close together, but that the 2HT and UVSDT models were

approximately equally flexible. A large-scale evaluation of the flexibility of the

HTSDT, UVSDT, 2HT, and MSDT models was reported by Kellen, Klauer, and Bröder

(2013). They used normalized maximum likelihood (NML; see Myung, Navarro, &

Pitt, 2006) for model selection, and on that basis concluded in favor of the 2HT

model. One challenge for the conclusions based on NML is that they show a strong

preference for the equal-variance (or po = pn) versions of the models that predict

symmetric ROC. As we've seen, symmetric recognition ROCs are not observed

empirically.

3.1.3 Summary of the item recognition data

Overall, the evidence reviewed so far supports the UVSDT model over the

others. Critical tests of the most popular threshold model, the 2HT model, reveal

Rotello 31

deep problems: empirical data violate both the conditional independence

assumption for the state-response mapping parameters (Chen et al., 2015) and the

prediction of linear binary ROCs (Dube & Rotello, 2012; Dube et al., 2012). The

HTSDT model fares somewhat better with these item recognition ROC, but it

predicts a decrease in zROC slopes with increasing probability of recollection that

has not been observed (e.g., Glanzer et al., 1999). The HTSDT model also predicts

that curved zROCs should be observed when recollection is needed to distinguish

targets from lures, as when they are similar to one another. However, that zROC

curvature is typically not observed (e.g., Glanzer et al., 1999; Heathcote, 2003).

Stronger tests of the HTSDT model come in the form of assessments of the

contribution of recollection, which will be considered in the next section.

3.2 Expanding the Data: The Contribution of Recollection

The basic predictions of the HTSDT and UVSDT models have been tested in

numerous item recognition memory experiments that did not specifically

manipulate recollection (see Wixted, 2007; Yonelinas & Parks, 2007, for reviews).

Instead, recollection estimates were based solely on the parameters of the HTSDT

model's best fit to the data. One problem with this approach is that both the UVSDT

and HTSDT models fit the data well, qualitatively and quantitatively (see Figure 6).

Two strategies have been used to distinguish these models. The first approach is to

obtain measures of recollection that are independent of the HTSDT's parameter

estimates (e.g., Yonelinas, 2001), to assess their convergence. The primary such

Rotello 32

measure of recollection has come from asking participants to provide remember-

know judgments to supplement their old-new decisions. The second strategy for

expanding the data is to take advantage of empirical tasks that appeared to require

recollection for accurate responses. Three popular tasks are associative recognition

decisions, which require the participant to decide whether two studied items

appeared together on the list, plurality-discrimination, and source memory

judgments that ask the participant to recognize not only that an item was studied

but also to report something specific about that presentation. We'll consider each of

these approaches in turn. Where appropriate, we'll also consider threshold

modeling approaches to these tasks.

3.2.1 Remember-know judgments

Tulving (1985) proposed that we ask participants to report the basis of their

"old" recognition decisions: do they remember something specific about the

encoding experience, or does their memory lack particular details yet they know

that the memory probe was studied? Rajaram (1993) developed extensive

instructions on the remember-know distinction, which have since been used in

hundreds of experiments. For about the first fifteen years of remember-know

research, most experiments focused on identifying variables that dissociated the

remember and know judgments, influencing one type of response without affecting

the other or moving the response probabilities in opposite directions. This goal was

readily achieved, and all possible combinations of influence on remember and know

Rotello 33

responses have been obtained (see Gardiner & Richardson-Klavehn, 2000, for a

summary). The conclusion in the literature was that these experiments identified

the variables that selectively influence either recollection or familiarity.

From the perspective of the HTSDT model, these remember-know

dissociation experiments provide a wealth of evidence in favor of the recollection

process (Yonelinas, 2002). Remember hits tend to increase with variables that

increase memory strength (e.g., full rather than divided attention: Yonelinas, 2001),

whereas know responses tend to increase more as a function of superficial

manipulations such as perceptual similarity (e.g., fluency manipulations: Rajaram &

Geraci, 2000). One particular aspect of the data that is convincing to HTSDT

proponents is that false alarms occur primarily with “know” justifications rather

than “remember” responses (Dunn, 2004). This finding is important because a high-

threshold process cannot produce any false alarms: if “remember” responses reflect

recollection, then they must only occur after correct “old” decisions (i.e., hits). In

addition, remember responses given after hits should be high confidence decisions

because the recollection process “trumps” the familiarity process; this assumption

of highest-confidence remembering is often built into the modeling (e.g., Yonelinas

& Jacoby, 1995; Yonelinas, 2001). On the other hand, direct comparisons of

estimates of recollection based on remember responses and on ROC parameters

have not provided a convincing level of agreement (e.g., Rotello, Macmillan, Reeder,

& Wong, 2005).

Other problems for the HTSDT model were soon identified. The dissociation

evidence seems to strongly suggest that there are distinct underlying processes of

Rotello 34

recollection and familiarity, but dissociations are weak and frequently inconclusive

evidence for multiple processes (Dunn & Kirsner, 1988). Stronger evidence for the

presence of multiple processes comes from state-trace analysis (Bamber, 1979;

Dunn & Kirsner, 1988). State-trace plots show how performance on one task relates

to performance on another task, as function of manipulations intended to selectively

influence one of the presumed underlying processes. Monotonic state-trace plots

are consistent with a single underlying process that may have a non-linear

relationship with the levels of the experimental factors. In contrast, non-monotonic

state trace plots imply that more than one underlying process determines

performance. Dunn (2008) applied the logic of state-trace analysis to remember-

know data, finding only monotonic functions; this analysis is consistent with the

UVSDT model and inconsistent with the HTSDT view. A stronger test reached the

same conclusion: Pratt and Rouder (2012) showed that even when a hierarchical

model is applied, eliminating potential confounds that might occur from averaging

data over subjects or trials, the state-trace analysis offers no support for the dual-

process view.

The remember-know data do not demand a dual-process interpretation, and

in fact can be readily accounted for by the UVSDT model. Donaldson (1996) was the

first to suggest this interpretation of remember-know data: he proposed that the

data could be accounted for by a signal detection model with two decision criteria, a

conservative criterion that divides old-remember from old-know responses, and a

more liberal criterion that provides an old-new boundary. Indeed, Dunn (2004)

showed that all of the existing patterns of remember-know dissociations were well-

Rotello 35

described by the UVSDT model. Dunn also tackled four other common arguments

against the SDT model of remember-know judgments, showing them all to be faulty.

One of these demonstrations is particularly challenging for the HTSDT model: old-

new discrimination accuracy measured with the remember hit and false alarm rates

is equal to accuracy measured from the overall hit and false alarm rates (see also

Macmillan, Rotello, & Verde, 2005). The UVSDT model predicts this result because

response bias and accuracy are independent, but it is contrary to the assumption

that recollection is a high-accuracy (or high-threshold) process.

The competing interpretations of remember-know judgments have been

extended to account for the inclusion of confidence ratings in several different

experimental paradigms (Rotello & Macmillan, 2006; Rotello et al., 2006). For

example, participants might be asked to first decide if they remember a memory

probe, and if not, to rate their confidence that they know they studied it or that it's a

lure (Yonelinas & Jacoby, 1995). Alternatively, subjects might be asked to decide

among three response alternatives (remember, know, new) and then to rate their

confidence in that decision (Rotello & Macmillan, 2006), or they might first make an

old-new decision and then rate their confidence along a remember-know dimension

for items judged to be old. Across a range of tasks and corresponding model

versions, the one-dimensional UVSDT model consistently provides the best

quantitative fit to data (e.g., Rotello & Macmillan, 2006; Rotello et al., 2006). This

general finding accords well with the observation that remember responses are

easily influenced by manipulations intended to affect only old-new response bias

(Rotello et al., 2006; Dougal & Rotello, 2007; Kapucu et al., 2008), and with the

Rotello 36

observation that ROCs based only on remember responses are strongly curved

(Slotnick, Jeye, & Dodson, 2016). Importantly, Cohen et al. (2008) showed that these

conclusions do not reflect differences in model complexity: the UVSDT model of

remember-know judgments is somewhat less flexible than the HTSDT model.

Finally, one other type of evidence argues in favor of the UVSDT

interpretation of remember-know judgments. Reaction times for remember

responses have long been known to be shorter than those for know decisions

(Dewhurst & Conway, 1994; Wixted & Stretch, 2004; Dewhurst, Holmes, Brandt, &

Dean, 2006); on the surface, this effect suggests that remember and know responses

reflect distinct processes. However, higher confidence decisions are also made

more quickly than lower confidence responses (Petrusic & Baranski, 2003), and the

probability of a remember response is correlated with response confidence (Rotello,

Macmillan, & Reeder, 2004). When confidence is controlled, Rotello and Zeng (2008)

found that the reaction time distributions for remember and know responses do not

differ significantly. In addition, Wixted and Mickes (2010) found that remember

false alarms are made more quickly than either know hits or know false alarms,

consistent with the UVSDT model.

In summary, all of the observed remember-know data, from basic

dissociation effects (Dunn, 2004) to reaction times (Rotello & Zeng, 2008; Wixted &

Mickes, 2010) and confidence ratings (e.g., Dougal & Rotello, 2007; Slotnick et al.,

2016) can be accounted for with the UVSDT model. The success of the UVSDT model

occurs in spite of its slightly lower flexibility than the HTSDT model (Cohen et al.,

2008). Finally, state-trace analyses of remember-know data conclude that a single

Rotello 37

process is sufficient (Dunn, 2008; Pratt & Rouder, 2012). There is little reason to

believe that remember responses reflect threshold-based recollection; they are

easily influenced by manipulations of old-new response bias (Rotello et al., 2005).

3.2.2 Associative recognition and plurality discrimination tasks

A better way of assessing the role of recollection in recognition decisions is to

design tasks in which an accurate response requires recollection. One candidate

task is associative recognition. Participants study pairs of items (A-B, C-D), usually

words, and are asked to remember them together. At test, they must select the

intact pairs that appear exactly as studied (A-B) while rejecting those that are

completely new (X-Y). The interesting challenge presented to participants is that

some of the lures are rearranged pairs (C-B) in which both words were studied, but

with different partners. The assumption is that rejection of the rearranged pairs

requires more than an assessment of familiarity. Because both of the words were

studied, correct decisions about rearranged pairs requires recollection of the

specific studied combinations. A closely related argument has been made about

plurality discrimination (e.g., Hintzman & Curran, 1994), a task in which participants

study nouns in their singular or plural form (trucks, frog), and then must recognize

targets presented in their studied form (trucks) amid plurality-changed (frogs) and

completely new lures.

Early evidence consistent with the dual-process view of associative

Rotello 38

recognition comes from response signal experiments in which participants are

asked to make their recognition judgments immediately after an unknown, variable,

amount of time. On some trials, the response signal occurs very soon after the

memory probe is presented (i.e., within 50-100 ms of probe onset), allowing only a

small amount of processing time prior to decision, whereas on other trials

processing may complete because the signal occurs after a long lag (2000 or more

ms after probe onset). The signal lag varies randomly, so that participants cannot

anticipate the amount of decision time that will available on any given test trial.

Response signal data from associative recognition paradigms are suggestive of

multiple processes with different time courses: for about the first 600 or 700 ms of

processing time, "old" responses to both intact and rearranged pairs increase, as if

familiarity for those memory probes develops over time. After that point, however,

additional processing time yields a decrease in "old" responses to rearranged pairs,

as if the results of a recollection process ("recall-to-reject") begin to contribute to

the decision (e.g., Gronlund & Ratcliff, 1989; Rotello & Heit, 2000).

Plurality discrimination response-signal experiments have revealed the same

type of non-monotonic responses to plurality-changed lures as a function

processing time (Hintzman & Curran, 1994). However, Rotello and Heit (1999)

suggested that dynamic response bias changes may be sufficient to explain those

data. A recollection process is not required by the data because the false alarm rate

to the completely new lures decreases in the same way as the false alarm rate to the

plurality-changed lures, consistent with an increasingly conservative response bias

as processing time elapses. A useful recollection process must do more than mimic a

Rotello 39

familiarity process; it should dominate the decision outcome.

ROCs have also been used to assess whether a high-threshold recollection

process contributes to associative recognition or plurality discrimination, with

somewhat mixed conclusions. If recollection is required to discriminate intact from

rearranged test probes, and if recollection is a high-threshold process as the HTSDT

model assumes, then a linear ROC should result if “old” responses to intact pairs are

plotted against “old” responses to rearranged pairs. Because recollection should be

more likely when items are strongly encoded, a clear prediction of the HTSDT model

is that associative ROCs should be increasingly linear with greater memory strength.

The first reported associative recognition ROCs were linear (Yonelinas, 1997;

Yonelinas, Kroll, Dobbins, & Soltani, 1999, upside-down faces; Rotello, Macmillan, &

Van Tassel, 2000) but virtually all subsequently reported ROCs have been curved

(Yonelinas et al., 1999, right-side up faces; Kelley & Wixted, 2001; Verde & Rotello,

2004; Healy et al., 2005; Quamme, Yonelinas, & Norman, 2007; Voskuilen & Ratcliff,

2016; for an exception, see Bastin et al., 2013). In addition, the degree of curvature

of the associative recognition ROC increases with memory strength (Kelley &

Wixted, 2001; Mickes, Johnson, & Wixted, 2010; see also Quamme et al., 2007), in

contrast to the most natural prediction of the HTSDT model. Macho (2004) showed

that the HTSDT model could fit these associative ROCs, but only by adopting

implausible parameter values such as greater recollection for the more weakly

encoded items.

A simpler and briefer, but otherwise similar, history exists for plurality-

discrimination ROCs. The HTSDT predictions about the form of the plurality-change

Rotello 40

ROC are analogous to its predictions for associative recognition: target-similar

ROCs based on responses to targets and plurality-changed lures should be linear,

especially as memory strength increases. Rotello (2000), Rotello et al. (2000), and

Arndt and Reder (2002) reported that target-similar ROCs in the plurality-

discrimination task were linear, but those reported by Heathcote, Raymond, and

Dunn (2006) are curved. The ROCs from these experiments are actually quite

similar looking, despite the different conclusions that were reached (see Kapucu,

Macmillan, & Rotello, 2010, Figure 1). Recently, Slotnick et al. (2016) reported

strongly curved plurality-discrimination ROCs, even when they were based only on

trials for which a "remember" response was given.

There is overwhelming evidence for curved ROCs in both associative

recognition and plurality-discrimination tasks when the study items are well-

learned, which is clearly a challenge for the HTSDT model. On the other hand, for

more weakly encoding items, detailed quantitative fits of the data also reveal

systematic deviations from the predictions of both the UVSDT and HTSDT models.

The ROCs for these more poorly learned items are both more linear than the UVSDT

model predicts and more curved than HTSDT expects, leading to curvilinear zROCs

(e.g., DeCarlo, 2007).

A number of explanations have been offered for these systematic distortions

in the ROCs. One account assumes a change in the decision criteria (Hautus,

Macmillan, & Rotello, 2008; Starns, Pazzaglia, Rotello, Hautus, & Macmillan, 2013),

as will be described in detail in the section on source recognition. The other

accounts all rely on changes to the effective target distribution as a consequence of

Rotello 41

mixing trials sampled from a fully-encoded distribution (e.g., item and associative

information available) and trials sampled from a distribution that assumes the study

items were only partially encoded (DeCarlo, 2002, 2003, 2007). According to the

HTSDT model, of course, we could call the fully-encoded distribution "recollection"

and the item-information distributions "familiarity" (Yonelinas, 1994, 1997, 1999a;

see section 2.2.3). Other accounts assume that the associative information is

continuously-valued (Kelley & Wixted, 2001; DeCarlo, 2002, 2003; Greve,

Donaldson, & van Rossum, 2010; Mickes, Johnson, & Wixted, 2010). Mixing

responses from multiple distributions (those with and without associative

information) changes the distribution of evidence associated with a target from its

presumed Gaussian form to something that is non-Gaussian, thus changing the

predicted form of the ROC (see Figure 3). All of these mixture models can generate

ROCs that have a characteristic "flattened" shape for weaker items. For stronger

items, the probability that associative information is unavailable is greatly reduced,

so the mixture distribution predominantly reflects the fully-encoded distribution.

Thus, strong items yield ROCs that are well described by the standard UVSDT model.

Unusual ROC shapes can also be observed if participants make random

response on some proportion of trials, effectively shifting probability mass from one

part of the target distribution to another (Ratcliff, McKoon, & Tindall, 1994). The

impact of this random guessing on the exact shape of the ROC varies as a function of

how those guesses are distributed over confidence ratings (Malmberg & Xu, 2006).

Harlow and Donaldson (2012) provided a recent empirical demonstration of this

consequence of guessing. In a clever associative recognition experiment,

Rotello 42

participants viewed a series of study words that were paired with a specific location

indicated on a circular surround. (Words and locations did not appear on the screen

at the same time.) In a subsequent memory test, participants were shown a studied

word and asked to click the corresponding location on test circle, then to rate their

confidence in their response. The distribution of memory errors was best described

as a mixture of pure guesses (randomly spread around the circle, 30-40% of trials,

depending on test lag) and correct responses (with a certain spread about the true

location, about 10°, due to memory-based loss of precision). The corresponding

ROCs were flatter than the UVSDT model predicts, consistent with the presence of

those random guesses.

In summary, the evidence from the associative recognition and plurality-

detection tasks is somewhat mixed. The ROCs are increasingly curved and

consistent with UVSDT as memory strength increases (Mickes et al., 2010), and

ROCs based only on remember responses are also curved and inconsistent with the

HTSDT interpretation (Slotnick et al., 2016). However, the flattened ROCs that are

consistently observed for more weakly encoded items suggest that the responses

may reflect a mixture of trials for which the associative detail is and is not available

for report (e.g., DeCarlo, 2002; Harlow & Donaldson, 2012). The idea that some

responses are pure guesses is more consistent with a threshold view than a signal

detection process (see Figure 4). We will revisit this issue after reviewing the data

from another task designed to require recollection, namely source recognition tasks.

3.2.3 Source recognition

Rotello 43

Another common task that appears to require recollection is a source

memory task. In this paradigm, participants are asked to judge the context (or

source) in which a memory probe was presented. For example, was it shown in

green or in red? Heard in a woman’s voice or a man’s? The simplest SDT model of

source recognition simply replaces the target distribution in Figure 1 with a target

source (say, male voice) and the lure distribution with the alternative source

(female voice). The simplest HTSDT model assumes that correct source

identification requires recollection; thus the source ROC, which plots correct source

identifications against errors, is assumed to be linear. If the two sources are equally

strong, then the HTSDT model further assumes that the ROC will be symmetric

because po = pn; otherwise it will be asymmetric (with po > pn on the assumption that

the target source is the stronger of the two: Yonelinas, 1999a). The simple SDT

model predicts a curved ROC, as usual.

3.2.3.1 Source recognition ROCs

As in the associative recognition and plurality-discrimination literatures, the

earliest reported source ROCs supported the HTSDT model. Yonelinas (1999a)

reported three experiments in which linear source ROCs were observed. In two

experiments, study items were presented in the two sources in a random order on

the same study list, and the resulting source ROCs were symmetric and linear. In

the remaining experiments, study items appeared on two lists, which served as the

Rotello 44

sources. When the lists were presented during the same experimental session, the

source ROC was linear and slightly asymmetric, but when the study lists were

separated by five days, the source ROC was curved and more strongly asymmetric.

Also echoing the associative recognition and plurality-discrimination literatures, all

subsequently published source ROCs have been curved and inconsistent with the

HTSDT model (e.g., Slotnick et al., 2000; Qin et al., 2001; Hilford, Glanzer, Kim &

DeCarlo, 2002; Dodson, Bawa, & Slotnick, 2007; Onyper, Zhang, & Howard, 2010;

Slotnick, 2010; Parks, Murray, Elfman, & Yonelinas, 2011; Schütz & Bröder, 2011;

Starns et al., 2013; Starns & Ksander, 2016).

Important insight on the discrepancy between the linear and curved source

ROCs comes from Slotnick et al. (2000). Like Yonelinas (1999a, Exps. 2 & 3), they

asked their participants to make both old-new confidence ratings and source

confidence ratings (“sure source A” to “sure source B”) for every memory probe.

Whereas Yonelinas ignored the old-new ratings when plotting his source ROCs,

Slotnick et al. took advantage of them. They assessed the form of the overall source

ROC (exactly as in Yonelinas, 1999) and the "refined" source ROC that results from

including only items for which participants had made the highest-confidence old

decisions. Slotnick et al. reasoned that if a high-threshold process contributes to

responses, then that process should be reflected in both the old-new decisions and

in the subsequent source judgments on the very same items. According to the

HTSDT model, as well as the 2HT model of source memory (Bayen, Murnane, &

Erdfelder, 1996), the refined source ROC should be linear. However, as Slotnick et

al. (2000, see Mickes, Wais, & Wixted, 2009, for a related argument) showed, the

Rotello 45

refined source ROC is actually strongly curved and reasonably consistent with the

EVSDT model.5 A re-analysis of the Yonelinas (1999a) data shows that the source

ROC becomes more linear - not more curved - as trials are included for which lower-

confidence-old (or even "new") decisions are made (Slotnick & Dodson, 2005). This

result is perfectly sensible: as responses to weaker memory items are added to the

ROC, discrimination is reduced towards chance levels, and the chance-level ROC is a

line (see Figure 1).

In a novel defense of the HTSDT model, Parks and Yonelinas (2007) argued

that the curved source ROCs were the result of decisions that were based on

"unitized" familiarity rather than recollection. In effect, this argument assumes that

item-source pairs (or item-item pairs in an associative recognition task) are so well-

encoded that they become a single highly-familiar unit. If that were true, then those

refined source judgments should reflect know responses in a remember-know task;

source ROCs based on remember decisions should be linear because remember

responses are a hallmark of recollection (Yonelinas, 2002). Parks and Yonelinas’s

(2007) claim was tested by Slotnick (2010), who reported source ROCs conditional

on a "remember" response. These conditional ROCS are strongly curved and well

described by the UVSDT model. The same conclusion was reached by Mickes et al.

(2010) in a similar analysis of associative recognition ROCs.

5Slotnick et al. (2000) claimed that the 2HT model could not fit their confidence-based source ROCs. As described earlier, only binary-response ROCs must be linear according to threshold models, so the Slotnick et al. data are not convincing. As it turns out, neither are the binary-response source ROCs: Schütz and Bröder (2011) presented binary-response source ROCS from five experiments. Although they claimed the ROCs were linear, comparative model fitting of the 2HT and SDT models was inconclusive (Pazzaglia et al., 2013).

Rotello 46

Consideration of the refined ROCs led to efforts to model the full set of old-

new and source memory confidence ratings simultaneously. This effort expands on

Banks (2000), who was the first to show that old-new recognition and binary source

judgments could be modeled within a single two-dimensional decision space. In this

space, one dimension defines the information that distinguished targets from lures,

and the other dimension defines the information that distinguishes the two sources.

As one might expect, there is now a complete model of recognition and source

memory in each model flavor: a bivariate signal detection model (Banks, 2000;

DeCarlo, 2003; Glanzer, Hilford, & Kim, 2004; Hautus et al. 2008), a discrete-state

model that assumes both recognition and source judgments are 2HT processes

(Klauer & Kellen, 2010), and a hybrid model that assumes a high-threshold

recollection process and continuous familiarity (Onyper, Zhang, & Howard, 2010).

Representations of the SDT and hybrid models are shown schematically in Figures 7

and 8.

< Insert Figures 7 and 8 near here >

A careful comparison of these models on the same data sets (e.g., Yonelinas,

1999a; Slotnick et al., 2000) concluded that the existing data were not powerful

enough to allow selection of the best model (Klauer & Kellen, 2010). Despite being

unable to identify a "winner," some conclusions about the plausibility of the models

may be drawn (see also Pazzaglia et al., 2013). First, the threshold model of Klauer

and Kellen (2010) faces the same challenges as the simpler threshold models

discussed earlier, including the observed curvature of binary recognition ROCs

(Dube & Rotello, 2012; Dube et al., 2012) and the violation of the conditional

Rotello 47

independence assumption that is central to this class of models (Chen et al., 2015).

A broader criticism of the threshold approach is its tremendous flexibility: different

parameter settings can yield ROC-forms that have never been observed empirically.

Similarly, the hybrid source model of Onyper et al. (2010) inherits the challenges of

the HTSDT model.

3.2.3.2 Source decisions to missed targets

There are a few additional arguments that discriminate the SDT approach

from the HTSDT model for source memory. The first point is quite simple: because

both old-new and source judgments are based on continuous information according

to the SDT view (e.g., Banks, 2000; DeCarlo, 2003; Hautus et al., 2008), participants

who set a conservative criterion and respond "new" to a studied item should

nonetheless be able to make a source judgment for that item with accuracy that is

above chance. One easy way to understand this prediction is to consider Slotnick

and Dodson's (2005) refined source ROCs. A conservative old-new criterion could

be placed similarly to the highest-confidence old criterion. As Slotnick and Dodson

showed, source ROCs condition on somewhat lower confidence responses still

discriminated the two sources. In contrast, threshold models of source judgment

predict that "new" decisions are based on guessing, and thus memory for those

items will contain no source details.

Starns, Hicks, Brown, & Martin (2008) tested this prediction in three

experiments. To induce a conservative or liberal old-new bias, they manipulated

participants’ expectations about the proportion of targets and lures on the

Rotello 48

recognition test. Participants were then asked for both old-new judgments and, for

studied words, for source decisions. Participants in the conservative conditions (but

not those in the liberal conditions) were able to discriminate the source of studied

words they had missed on the test, exactly as the SDT model predicts. This result

was challenged by Malejka and Bröder (2016), who argued that asking for source

judgments only for studied items provided some feedback to participants about the

accuracy of their old-new response, which may have caused them to re-evaluate

memory on those trials. They re-ran Starns et al.'s (2008) experiments, asking for

source judgments for all test items and finding no difference in source accuracy as a

function of bias. However, their bias manipulation was less effective than that of

Starns et al. (2008); in particular, their conservative condition was not nearly as

conservative, which may account for the reduced source accuracy.

3.2.3.3 zROC slopes and curvature are affected by decision processes

A third argument in favor of the SDT models over the HTSDT model stems

from a prediction about how the slope of the source zROC is influenced by the

relative strengths of the two sources (S1 and S2). If items appearing in each source

are studied the same number of times, then the source information should be

equally easy (or hard) to recollect. In other words, RS1 = RS2 and the slope of the

zROC is 1. On the other hand, suppose that the items studied in one source (say, S2)

are strengthened relative to those in the other source (S1), perhaps by presenting

the items in S2 twice each and the items in S1 only once each. In that case, the

Rotello 49

HTSDT model predicts that recollection should be increased for the stronger source,

so RS2 > RS1. The slope of the zROC will then depend on which source is selected to

be the “target” source that defines the y-axis. If S2 is the target source, then the

slope of the zROC will be less than 1, and if S1 is the target source, the slope of the

zROC will be greater than 1. Starns et al. (2013) called this experimental design

“unbalanced” because the source strengths are not equal; they confirmed that the

source slope depends on which source is the target.

An interesting test of the HTSDT model comes from experiments with

balanced but variable source strengths. In this design, both S1 and S2 items are

studied an equal number of times, but for half of the items in each source the

encoding is weak (one study exposure) and for the remaining items the encoding is

stronger (two study exposures). Because the source strengths are equal overall, the

HTSDT model predicts the source zROC slope will be 1. The predictions of the SDT

models are different. According to SDT, the optimal decision bounds for the source

decision are likelihood-based (these are shown in Figure 7 for a specific data set).

These curved decision bounds naturally capture the intuition that participants

should be unwilling to make high-confidence source decisions for weakly-encoded

items that they have low-confidence they’ve even studied. For stronger items,

confidence in the old decision is higher, and the probability of a higher-confidence

source decision increases; because item and source strengths are correlated, these

higher-confidence source responses are also likely to be accurate. Starns et al.

(2013) showed that the SDT model predicts the source zROC slopes in the balanced

design will differ as a function of whether the stronger or weaker item source serves

Rotello 50

as the target source, exactly like the HTSDT model’s predictions for the unbalanced

design. In three experiments, the balanced design produced “crossed” source zROC

slopes as predicted by the SDT model but not by the HTSDT model.

Starns and Ksander (2016) showed that the source zROC slope effect

depends on item strength, not source strength. Thus, the slope effect must be due to

the decision process rather than the underlying evidence distributions. Starns and

Ksander (2016) had participants study words paired with a male or female face, or

with a picture of a bird or a fish. In the no-repetition condition, each item-face

combination was studied once. In the same-source repetition condition, item-face

pairs were studied three times each. And in the different-source repetition

condition, each word was studied twice with either a bird or a fish image, and then

once with a male or female face. Although male-female source accuracy was lower

in the different-source repetition condition than in the no-repetition condition, high-

confidence male-female source judgments were more frequent. In addition, the

zROC slopes crossed exactly as in Starns et al. (2013). Both of these effects are

consistent with the predictions of the bivariate SDT model with likelihood-type

decision bounds (e.g., Hautus et al., 2008).6

In the associative recognition literature, the appearance of relatively

flattened ROCs and curved z ROCs has been interpreted in terms of mixtures of trials

drawn from distributions that do and don’t contain associative information. The

6 When fit to data, the threshold (Klauer & Kellen, 2010) and hybrid (Onyper et al., 2010) bivariate models of item and source memory also yield parameters consistent with the idea that participants are reluctant to give high-confidence source judgments to items they do not remember well (see Figure 8). For these models, however, the parameters are arbitrary; unlike the bivariate SDT model (e.g., Hautus et al., 2008), nothing about the structure of the threshold and hybrid models dictates the likelihood-type decision bounds.

Rotello 51

bivariate SDT models of source memory suggest an alternative explanation that

rests in the nature of the decision bounds rather than the evidence distributions.

Specifically, the convergence of the decision bounds in the bivariate SDT model (Fig.

7) that predicts the observed source zROC slopes (Starns et al., 2013; Starns &

Ksander, 2016) also accounts for the relatively flattened source ROC shape and the

presence of curvature in the z ROC (Hautus et al., 2008). As Starns, Rotello, and

Hautus (2014) explained, that curvature occurs because strongly encoded items

tend to receive both high-confidence old and high-confidence (and correct) source

decisions, whereas more weakly encoded items tend to be assigned lower-

confidence responses on both scales. This means that the end points of the source

ROC tend to be based on responses to more well-learned items (with higher source

accuracy), and the mid-points of the ROC tend to be based on responses to more

poorly-learned stimuli (with source accuracy closer to chance); a flattened ROC and

curved z ROC are the result.

3.2.3.4 Source memory provides some evidence for (continuous) recollection

The data discussed so far have consistently supported the UVSDT model over

its competitors. Particularly problematic for the HTSDT model has been its

assumption of a threshold recollection process. As Wixted and Mickes (2010)

pointed out, however, there is no reason to assume that a recollection process has

threshold characteristics. They suggested the alternative view that recollection

operates as a continuous, signal detection process, the result of which is usually

Rotello 52

summed with the result of the familiarity process. Together, these two processes

are usually completely consistent with the UVSDT model; Wixted and Mickes

termed this model the continuous dual process (CDP) model (see section 2.2.2).

There are some limited circumstances in which the UVSDT and CDP models

may be distinguished. Wixted and Mickes (2010) partitioned the highest-confidence

“old” decisions that were associated with remember and know responses, and then

separately calculated recognition and source memory accuracy for those two types

of subjective report. Source accuracy was higher after remember than know

responses, even though overall recognition accuracy was equated. These data

provide some evidence that remember decisions may reflect recollection after all,

albeit in a continuous form. That basic effect was replicated and strengthened by

Ingram, Mickes, and Wixted (2012), who reported that source accuracy was higher

after lower-confidence remember judgments than after high confidence know

decisions. Even more convincing are the state-trace analyses for all of these

experiments, which were non-monotonic and consistent with the contribution of

more than process to these recognition and source decisions (Dunn & Kirsner,

1988).

3.2.3.5 Summary of the source recognition data

Overall, the source recognition data are more consistent with the UVSDT

approach than with either the HTSDT or 2HT models. The slopes of the source

zROCs are systematically affected by item strength and the decision bounds in the

Rotello 53

bivariate UVSDT model (Starns et al., 2012; Hautus et al., 2008). The flattened ROC

shapes (and slightly curved zROCs) are also explained by the decision bounds in the

bivariate model, or by a mixture process as assumed by the MSDT model (DeCarlo,

2003).

3.3 Expanding the Data: Beyond ROCs

The data discussed so far have largely come from ROC experiments, and have

provided evidence in favor of the UVSDT model over the others. A powerful

advantage of signal detection models is that they not only separate response bias

from decision accuracy within a task and show how different types of errors trade

off against one another, they also make specific predictions about how accuracy

should compare across tasks. Comparing model parameters across different

empirical paradigms provides converging evidence on the quality of a model, in the

form of a test of its generalization ability. In this section, data from several different

paradigms will be considered, including the commonly-used two-alternative forced

choice (2AFC) task. We’ll also consider two less familiar paradigms: the oddity task,

in which participants see three test items (2 lures and a target or 2 targets and a

lure) and must choose the “odd one out,” and the so-called second choice paradigm

in which participants are given four memory probes (3 of which are lures) and they

get two chances to identify the target. Finally, data from some recent experimental

tests of the models under “minimal assumptions” will be discussed.

Rotello 54

3.3.1 Two-alternative forced choice recognition

3.3.1.1 Accuracy comparison with old-new recognition

In a 2AFC recognition task, participants are presented with a study list and

then face a test on which the memory probes are presented in pairs. Typically, only

one of the test items was studied, and the participants’ task is to select that item, the

target. This task can be accomplished by comparing the memory strengths of the

two stimuli, selecting the one with greater strength as the “old” member of the pair.

The one-dimensional model in Figure 1 serves as our theoretical starting point. The

target comes from a Normal distribution that has a mean of d'YesNo and a standard

deviation of 1; the lure comes from a Normal distribution with a mean of 0 and

standard deviation of 1. The difference between the two strengths, target-lure or

lure-target, is also normally distributed with a mean of d' YesNo (or -d' YesNo ) and a

standard deviation of √2. The participants’ task is to discriminate the target-lure

pairs from the lure-target pairs; Figure 9 shows that their overall accuracy is

predicted to be

2d! d! = YesNo = 2d! (13) 2 AFC 2 YesNo

In other words, SDT tells us that performance in the 2AFC task should be easier than

in an old-new recognition task, by a factor of √2: a d' score of 1.5 in a 2AFC task is

equivalent to a d'YesNo of 1.06 = 1.5/√2.

< Insert Figure 9 near here >

Rotello 55

The theoretical relationship between old-new recognition and 2AFC

performance is clearly established by SDT. Early tests of this prediction, using

auditory and visual detection tasks, were successful (see Green & Swets, 1966, for a

summary). In the domain of recognition memory, the assessment of the SDT

prediction has run in parallel with an assessment of competing models such as the

HTSDT and MSDT models.

Kroll, Yonelinas, Dobbins, and Frederick (2002) used old-new recognition

responses to estimate the parameters of the HTSDT and EVSDT models. Those

parameters were then used to predict the percentage of correct responses on a

2AFC task with the same materials. Kroll et al. (2002) concluded that the HTSDT

model more accurately predicted 2AFC performance than the EVSDT model.

However, as Smith and Duncan (2004) noted, there are two major problems with

that analysis. First, the UVSDT model is more appropriate than its equal-variance

cousin for recognition memory tasks (see section 3.1), and second, a stronger test of

the models focuses on whether its parameters are consistent across tasks. Many

different combinations of parameter values in the old-new recognition model can

result in the same value of percent correct on the 2AFC test, making percent correct

a weak target for model assessment.

Jang, Wixted, and Huber (2009) had participants complete both a confidence-

rating item recognition task and a 2AFC recognition task with confidence ratings;

the two types of trials were randomly intermixed on the test. The resulting old-new

recognition ROC was curved with a zROC slope of 0.7, whereas the 2AFC ROC was

curved and symmetric. Three different models (UVSDT, HTSDT, and MSDT) were fit

Rotello 56

to both ROCs simultaneously, so that the same parameters were required to fit both

tasks. For example, the UVSDT was constrained to have a single discrimination

parameter that was related across tasks as described by Equation 13. For the

HTSDT model, the recollection parameter was set to be constant across tasks;

because the familiarity process operates as a signal detection model, the d’

parameter in each task was defined by Equation 13. Finally, the signal detection

component of the MSDT model was also assumed to comply with Equation 13, and

the attention parameter was fixed to be equal in both old-new and 2AFC

recognition. In both a new experiment and a reanalysis of Smith and Duncan’s

(2004) data, the UVSDT model clearly provided the best fit of individual

participants’ data.

3.3.1.2 The relationship of accuracy and confidence in 2AFC tasks

Forced choice tasks have played another interesting role in the recognition

literature. Rather than focusing on the predicted accuracy relationship across tasks,

these 2AFC studies have investigated the relationship between participants’

decision accuracy and their confidence. All of the signal detection models we’ve

considered make the assumption that old-new memory decisions and confidence

ratings are based on the same underlying evidence axis; confidence criteria and the

old-new criterion are simply different locations on that axis. For this reason, the

SDT models predict that empirical factors that influence accuracy should also

Rotello 57

influence confidence in the same manner, with higher levels of confidence

corresponding to higher levels of accuracy.

Tulving (1981) was the first to report that accuracy and confidence in a 2AFC

recognition memory task had an “inverted” relationship: confidence was higher in

the condition with lower accuracy. In Tulving’s experiment, participants studied a

series of photographs and then were tested in a 2AFC task in which the lure item

was either highly-similar to the target in that pair (A/A' pairs, where A was studied)

or was highly-similar to a different studied item (A/B' pairs, where both A and B

were studied). Participants were more confident in their responses to the A/B'

pairs, but were more accurate for the A/A' pairs. This basic effect has been

replicated several times (Chandler, 1989, 1994; Dobbins, Kroll, & Liu, 1998;

Heathcote, Freeman, Etherington, Tonkin, & Bora, 2009; Heathcote, Bora, &

Freeman, 2010), and appears to offer a true puzzle for SDT models.

As it turns out, though, SDT can easily and simultaneously account for both

the confidence and accuracy effects. Signal detection’s description of the 2AFC task

is that participants select the target by calculating the difference in strengths of the

two items at test (see section 3.3.1.1). As Clark (1997) point out, however, the A/A'

pairs are not independent random variables: they share variance by virtue of their

similarity to one another, and therefore the variance of the A-A' strength difference

2 2 2 is s A + s A' - 2cov(A, A'). In contrast, the A-B' difference has a larger variance: s A +

2 s B'. There are two consequences of this reduced variance for the A/A' test pairs.

First, selection of the studied photo from the A/A' pairs is easier than for A/B' pairs.

That’s because the mean strength difference is the same for both tests (A' and B' are

Rotello 58

both equally similar to their corresponding studied item), but the variability is

lower, which increases discrimination. That accounts for the accuracy effect, as

shown in Figure 10. Second, assuming confidence is roughly determined by

distance from the decision criterion, which could plausibly be set at the zero point

where there’s no strength difference between the test items, then the average

confidence level for the A/A' pairs will be higher than for the A/B' pairs. So, the

same covariance difference accounts for both the increased accuracy and the

decreased confidence. Indeed, Heathcote et al. (2010) successfully modeled the

results of their experiments and Dobbins et al.’s (1998), all of which involved

remember-know judgments, using Clark’s model plus a remember-know criterion

with a location that varied randomly from trial to trial. That SDT model fit the data

better than an HTSDT variant proposed by Dobbins et al. (1998).

< Insert Figure 10 near here >

3.3.2 Oddity Task

O'Connor, Guhl, Cox, and Dobbins (2011) tested participants on an unusual

task, using the oddity paradigm. In an oddity task, participants are shown three

memory probes simultaneously. There are either two targets and a lure, or two

lures and a target; the participant's task is to select the "odd" item that is of a

different class than the others. This task is not in common use outside of

the food science literature, but has been argued to be appropriate for requesting

Rotello 59

discriminations that are difficult to describe. For example, Macmillan and Creelman

(2005) suggested that an oddity task might be appropriate for testing the ability of

novices to discriminate two different types of red wine.

O'Connor et al. generated predictions of the HTSDT model for the oddity task,

concluding that the model expects higher accuracy when a lure is the odd item.

Essentially, this prediction arises because only targets can be recollected: on

average, the HTSDT has more information about targets than about lures. According

to the UVSDT model, on the other hand, decisions may be made based on a

differencing strategy like that used for 2AFC paradigm, except that two differences

are required (item 1 – item 2; item 2 – item 3). Each possible combination of trial

types (i.e., lure-target-target, target-lure-target, etc) produces a unique combination

of expected difference scores, allowing identification of the odd item (see Macmillan

& Creelman, 2005, for details). Alternatively, participants may simply order the

memory strengths of the three items and then compare the middle strength to an

unbiased old-new decision bound. If the middle item falls above (below) that

criterion, then the weakest (strongest) item should be selected as the odd item.

Because the target distribution is known to be more variable than the lure

distribution in recognition memory tasks, both SDT decision rules lead to the

prediction that accuracy will be higher when the target is the odd item: in that case

the two lures will likely be closer together in strength than two targets would be.

In a series of experiments and simulations, O'Connor et al. found that the

lures were easier to identify as odd, consistent with a model that assumes

recollection contributes to the decision. Given the evidence against threshold

Rotello 60

recollection, however, Wixted and Mickes' (2010) continuous dual-process model is

likely to be more successful overall than the HTSDT model. And as O'Connor et al.

(2011) realized, there is also a version of the UVSDT model that can account for the

higher accuracy on lure-odd trials: if the old-new decision criterion is set liberally,

then most targets will fall above that criterion, allowing the lure to be readily

identified.

3.3.3 Second choice tasks

Parks and Yonelinas (2009) brought a different task from the perception

literature (Swets, Tanner, & Birdsall, 1961) to the recognition memory literature,

and applied it to both item and associative recognition. They gave participants four

memory probes to choose from (one target and 3 lures), and two tries to select the

target. Threshold and SDT models make different predictions about how the second

choice responses should be related to the first choice. For the threshold model in

which only targets can be detected, the first selection should be the target, if it is

detected (Kellen & Klauer, 2011). Lures can never be detected as old in this model,

so that response strategy would always lead to a correct decision on the initial

response. Because only one of the response options is a target, failure to select it

first means that both the first and second choices must be a consequence of a

random guessing process from a state of uncertainty. Thus, first and second choices

will be unrelated to one another.

Rotello 61

In the SDT model, the second choice will be systematically related to the first.

The participant is assumed to select the option that has the greatest strength, which

will usually be a target (DeCarlo, 2013). If, due to distributional overlap, the

strongest item happens to be a lure, then the second choice is likely to be the target,

because the probability that two lures will fall in the upper tail of the distribution is

low.

In item recognition, Parks and Yonelinas (2009, see also Kellen & Klauer,

2014) found that the second choice responses were related to first choice,

consistent with the UVSDT model and inconsistent with the HTSDT model. In

contrast, associative recognition second-choices were unrelated to first choice

responses, a result that seems to suggest a threshold interpretation consistent with

recollection contributing to associative responses. The problem with that

conclusion, of course, is that the HTSDT account of the associative recognition ROCs

(see section 3.2.2) assumes that "unitization" leads participants to respond to based

on familiarity, a UVSDT process (see Equation 9).

Another challenge for the interpretation of these second choice responses is

that that the analyses fail to account for model complexity. Consistent with earlier

analyses for confidence-based recognition ROCs (Jang et al., 2011), but not

remember-know data (Cohen et al., 2008), Kellen and Klauer (2011) argued that the

UVSDT model is more flexible than the other models when applied to data from this

second-choice task. For this reason, they conclude that either the HTSDT or MSDT

model provides the best description of the data. However, their conclusions are

based on normalized maximum likelihood. This criterion may have too strong of a

Rotello 62

preference for simple models, because when applied to old-new recognition data it

concludes in favor of models that generate symmetric ROCs (Kellen et al., 2013).

3.3.4 Minimal assumption tests

Many of the predictions for particular model outcomes depend on

assumptions about the model that may not be essential properties of that model.

For example, the assumption that the SDT models involve Gaussian evidence

distributions is a convenience rather than a necessity of the model. So-called

minimal assumption tests of these model attempt to side step these ancillary

assumptions, thus making predictions that reflect the core properties of the model,

rather than the details of how it happens to be implemented.

One minimal assumption test is that the SDT models predict that fewer

extreme errors (high-confidence misses or false alarms) should be made as memory

strength increases. This prediction follows directly from the decreasing overlap of

the target and lure distributions (see Figure 1). In contrast, the 2HT model assumes

that extreme errors result from guessing; the conditional independence assumption

requires that responses based on guessing are independent of memory strength

(see section 3.1.2). Kellen and Klauer (2015) tested these competing predictions by

focusing on high-confidence misses ("sure new" decisions for targets). Their data

were consistent with the 2HT model's predictions, leading them to conclude that

there is no direct mapping of memory evidence to response confidence. Of course,

Rotello 63

this conclusion contradicts the results of a great many other experiments on

recognition memory.

4 Challenges

We've seen that the evidence from a variety of memory tasks supported the

UVSDT model over its competitors. Despite the near unanimity of that success,

there are some challenges that must be faced. Several of these challenges stem from

a failure of virtually all recognition memory studies to consider all variables in the

experiment, including both those that are part of the design (e.g., item and subject

effects) and all possible dependent measures (i.e., reaction times).

4.1 Aggregation Effects

One issue faced by all of the models is the problem of data aggregation (Pratt,

Rouder, & Morey, 2010; DeCarlo, 2011; Pratt & Rouder, 2011). Our modeling

efforts usually involve collapsing responses over trials, subjects, or both. We know

that there are individual differences in participants' response strategies in these

tasks (e.g., Kapucu et al., 2010; Jang et al., 2011; Kantner & Lindsay, 2012), yet we

typically ignore those differences and consider only group-level behavior. The

consequences for the model fits are not necessarily bad (Cohen, Sanborn, & Shiffrin,

2008; Cohen et al., 2008), but certainly should be evaluated. Likewise, we almost

Rotello 64

invariably ignore item effects that also occur (e.g., Freeman, Heathcote, Chalmers, &

Hockley, 2010; Isola, Xiao, Parikh, Torralba, & Oliva, 2014). Pratt et al. (2010; see

also DeCarlo, 2011) demonstrated that aggregation of data over items and subjects

can systematically distort our conclusions: In item recognition tasks, overestimation

of accuracy effects and underestimation of zROC slopes can result. Aggregation

disguises variability, meaning that we also have too much confidence in the stability

of our parameter estimates. Hierarchical modeling approaches have been

developed to address these challenges (e.g., Klauer, 2006, 2010; Pratt et al., 2010;

Pratt & Rouder, 2011), but have yet to be widely adopted.

4.2 Reaction Times

We saw earlier that reaction times have successfully discriminated different

interpretations of remember-know responses, concluding in favor of the UVSDT

model (Wixted & Stretch, 2004; Rotello & Zeng, 2008; Wixted & Mickes, 2010). In

addition, diffusion model fits to binary-response recognition memory data provide

converging evidence for the UVSDT model (Starns et al., 2012). Despite these

positives, newer models developed to simultaneously fit both confidence ratings

and reaction time distributions present an interpretive challenge to those studies

(Ratcliff & Starns, 2009; Voskuilen & Ratcliff, 2016; see also Van Zandt, 2000;

Pleskac & Busemeyer, 2010). Ratcliff's RTCON model includes two sets of criteria

that together predict the ROC and RT distributions for the task. The confidence

Rotello 65

criteria are like those in Figure 1; they partially determine the predicted points on

the ROC. The areas under the curve between those confidence criteria also yield the

mean drift rates for a set of diffusion processes, one for each confidence level.

Responses are made when the first diffusion process hits its decision criterion,

resulting in both a response time and a confidence rating. Thus, the decision criteria

determine the reaction time distributions, but because the model is fit to all of the

data simultaneously, the confidence parameters are constrained by decision

parameters, and vice versa, and both sets of criteria constrain the estimated

evidence distributions. A consequence is that the slope of the zROC does not

correspond to the ratio of standard deviations of the lure and target distributions, as

the UVSDT model assumes. For this reason, Ratcliff and colleagues (Ratcliff &

Starns, 2009; Voskuilen & Ratcliff, 2016) caution against relying on zROC slopes as a

basis for theoretical conclusions.

4.3. Criterion Variability

A different criticism of using zROC slopes to draw theoretical conclusions

comes from experiments in which confidence ratings are collected across test lists

that vary in their base rate of targets and lures (Schulman & Greenberg, 1970; Van

Zandt, 2000). These studies yield data that appear to violate the core assumptions

of signal detection theory. As described in sections 3.1.1 and 3.1.2, zROCs can be

constructed across lists, using the different base rates to generate the operating

Rotello 66

points, and they can also be constructed within lists using the confidence ratings.

According to SDT models, both zROCs are based on the same underlying evidence

distributions, so the same slope should be estimated from both methods. In

contrast to this prediction, steeper confidence-based zROC slopes were observed on

tests that included a higher proportion of targets. For this reason, Van Zandt

(2000) concluded that the UVSDT model could not account for the data. However,

this conclusion fails to consider the influence of criterion variability. As Rotello and

Macmillan (2008) argued, Treisman and Williams' (1984) model of criterion

variability predicts the observed pattern of slopes, without any modification to the

underlying signal detection model.

Essentially, Treisman and Williams (1984) argued that criterion location is

determined by three factors. Task demands and the first few test trials set the

criterion initially, then on each trial the criterion is shifted toward the average of the

recently observed evidence values so that finer discriminations may be made. The

shift toward the recent-mean is offset by a tendency to adjust the criterion so that

the same response is more likely on the next trial, allowing sequential dependencies

to occur (Malmberg & Annis, 2012). When confidence criteria are required, the

more extreme criteria tend to have greater variability because the probability of

observing a test item from the upper tail of the target distribution (or the lower tail

of the lures) is low. The overall effect of Treisman and Williams’s three factors is

that the presence of more targets than lures increases "old" confidence criterion

variability to a greater extent than "new" confidence criterion, and the opposite is

true for tests that include a relatively more lures.

Rotello 67

Across trials, this means that there is noise in the location of the criterion as

well as the evidence (Wickelgren & Norman, 1966; Norman & Wickelgren, 1969).

This means that the slope of the zROC is not just the ratio of the standard deviation

of the lures (1) to the targets (s), it’s actually

2 (1+σ c ) slope = s2 2 +σ c (14)

2 where s c is the variance of the criterion location. These evidence and criterion

components of variance cannot be separated empirically using a standard

experimental design (but see Benjamin, Diaz, & Wee, 2009 and Mueller &

Weidemann, 2008, for two attempts, and Kellen, Klauer, & Singmann, 2012, for

criticism). In fact, we don't need to measure both evidence and criterion noise

separately to know that both are present: the data reported by Van Zandt (2000)

and by Schulman and Greenberg (1970) confirm earlier simulation work by

Treisman and Faulkner (1984) and demonstrate that the criterion noise is both

present and systematic.

4.4 Residual Analyses Reveal Potential Problems for All Models

Recently, Dede, Squire, and Wixted (2014) proposed a new strategy for

evaluating the relative fit of models of recognition memory. Specifically, they

suggested looking at the pattern of residuals between observed data and the

models’ best fitting predictions. If the residuals for a particular model are

Rotello 68

systematic across data sets, then that indicates a problem inherent to the model. On

the other hand, if the residuals are not systematic then they reflect random noise in

any given experimental outcome, lending credibility to that model. Dede et al.

(2014) applied this strategy to four recognition memory data sets, with the goal of

deciding whether the HTSDT or UVSDT model offers the better explanation. The

HTSDT fits yielded the same systematic pattern of residuals across all four data sets,

whereas the residuals for the UVSDT model appear to reflect only statistical noise,

suggesting that the UVSDT model provides the better overall account of these data.

Dede et al.’s (2014) results are promising. On the other hand, Kellen and

Singmann (2016) adopted the same basic strategy of analyzing residuals and

reached a different conclusion. Kellen and Singmann fit a larger number of data

sets, included the MSDT model in the evaluation, and used a different criterion for

defining systematic residuals. They found that all of the models under consideration

displayed at least some systematic deviations from the data. It remains an open

question whether these residuals reflect core assumptions of each of the models, or

whether they reflect ancillary details such as the assumed form of the evidence

distributions.

5 Conclusion

Across a wide range of recognition memory tasks, including those thought to

rely heavily on a recollection process, the unequal-variance signal detection model

Rotello 69

provides a consistently – almost unanimously – better fit to data than that offered by

competing models. Assessments of model flexibility indicate that the success of the

UVSDT model is not due to intrinsically greater flexibility. Instead, this simple

model appears to provide an excellent description of recognition memory

performance. As such, it should serve as the foundation for research on recognition

memory in domains as varied as "real-world" applications of memory (i.e.,

eyewitness identification decisions: Mickes, Flowe, & Wixted, 2012,) and the Comment [CR1]: Cross reference chapter 02025. Eyewitness Identification by Laura Mickes neurological basis of recognition (e.g., Squire, Wixted, & Clark, 2007).

Rotello 70

6 References

Arndt, J., & Reder, L. M. (2002). Word frequency and receiver operating

characteristic curves in recognition memory: Evidence for a dual-process

interpretation. Journal of : Learning, Memory, &

Cognition, 28, 830-842. DOI: 10.1037//0278-7393.28.5.830

Banks, W. P. (1970). Signal detection theory and human memory. Psychological

Bulletin, 74, 81-99.

Banks, W. P. (2000). Recognition and source memory as multivariate decision

processes. Psychological Science, 11, 267–273.

Bamber, D. (1979). State-trace analysis: A method of testing simple theories of

causation. Journal of , 19, 137-181.

Bastin, C., Diana, R.A., Simon, J., Collette, F., Yonelinas, A.P., & Salmon, E. (2013).

Associative memory in aging: The effect of unitization on source memory.

Psychology and Aging, 28, 275-283. doi: 10.1037/a0031566

Bayen, U. J., Murnane, K., & Erdfelder, E. (1996). Source discrimination, item

detection, and multinomial models of source monitoring. Journal of

Experimental Psychology: Learning, Memory, & , 22, 197-215.

Benjamin, A. S., Diaz, M., & Wee, S. (2009). Signal detection with criterion noise:

Applications to recognition memory. Psychological Review, 116, 84-115. Doi:

10.1037/a0014351

Brainerd, C. J., Reyna, V. F., & Mojardin, A. H. (1999). Conjoint recognition.

Psychological Review, 106, 160-179.

Rotello 71

Bröder, A., Kellen, D., Schütz, J., & Rohrmeier, C. (2013). Validating a two-high-

threshold measurement model for confidence rating data in recognition.

Memory, 21, 916-︎944. Doi: 10.1080/09658211.2013.767348

Bröder, A. & Schütz, J. (2009). Recognition ROCs are curvilinear - or are they? On

premature arguments against the two-high-threshold model of recognition.

Journal of Experimental Psychology: Learning, Memory & Cognition, 35, 587-

606. [see also correction to Bröder and Schütz (2009). JEP: LMC, 47(5), 1301]

Buchner, A., & Erdfelder, E. (1996). On the assumptions of, relations between, and

evaluations of some process dissociation measurement models.

Consciousness and Cognition, 5, 581-594.

Chandler, C. C. (1989). Specific retroactive interference in modified recognition

tests: Evidence for an unknown cause of interference. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 15, 256-265.

Chandler, C. C. (1994). Studying related pictures can reduce accuracy, but increase

confidence, in a modified recognition test. Memory & Cognition, 22, 273-280.

Chen, T., Starns, J. J., & Rotello, C. M. (2015). A violation of the conditional

independence assumption of discrete state models of recognition memory.

Journal of Experimental Psychology: Learning, Memory, & Cognition, 41, 1215-

1222. doi: 10.1037/xlm0000077

Clark, S. E. (1997). A Familiarity-Based Account of Confidence-Accuracy Inversions

in Recognition Memory. Journal of Experimental Psychology: Learning,

Memory, & Cognition, 23, 232-238.

Rotello 72

Cohen, A. L., Rotello, C. M., & Macmillan, N. A. (2008). Evaluating models of

remember-know judgments: Complexity, mimicry, and discriminability.

Psychonomic Bulletin & Review, 15, 906-926.

Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped

or individual data. Psychonomic Bulletin & Review, 15, 692-712. doi:

10.3758/PBR.15.4.692

DeCarlo, L. T. (2002). Signal detection theory with finite mixture distributions:

Theoretical developments with applications to

recognition memory. Psychological Review, 109, 710-721.

DeCarlo, L. T. (2003). An application of signal detection theory with finite mixture

distributions to source discrimination. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 29, 767-778.

DeCarlo, L. T. (2007). The mirror effect and mixture signal detection theory. Journal

of Experimental Psychology: Learning, Memory, and Cognition, 33, 18-33.

DeCarlo, L. T. (2011). Signal detection theory with item effects. Journal of

Mathematical Psychology, 55, 229-239.

DeCarlo, L. T. (2012). On a signal detection approach to m-alternative forced choice

with bias, with maximum likelihood and Bayesian approaches to estimation.

Journal of Mathematical Psychology, 56, 196-207.

Dede, A. J. O., Squire, L. R., & Wixted, J. T. (2014). A novel approach to an old

problem: Analysis of systematic errors in two models of recognition memory.

Neuropsychologia, 52, 51–56. doi: 10.1016/j.neuropsychologia.2013.10.012

Rotello 73

Dewhurst, S. A., & Conway, M. A. (1994). Pictures, images, and recollective

experience. Journal of Experimental Psychology: Learning, Memory, &

Cognition, 20, 1088-1098.

Dewhurst, S. A., Holmes, S. J., Brandt, K. R., & Dean, G. M. (2006). Measuring the

speed of the conscious components of recognition memory: Remembering is

faster than knowing. Consciousness & Cognition, 15, 147-162.

Dobbins, I. G., Kroll, N. E. A., & Liu, Q. (1998). Confidence–accuracy inversions in

scene recognition: A remember–know analysis. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 24, 1306–1315.

Dodson, C. S., Bawa, S., & Slotnick, S. D. (2007). Aging, source memory, and

misrecollections. Journal of Experimental Psychology: Learning, Memory, &

Cognition, 33, 169-181.

Donaldson, W. (1996). The role of decision processes in remembering and knowing.

Memory & Cognition, 24, 523-533.

Dougal, S., & Rotello, C. M. (2007). “Remembering” emotional words is based on

response bias, not recollection. Psychonomic Bulletin & Review, 14, 423-429.

Dube, C., & Rotello, C. M. (2012). Binary ROCs in perception and recognition

memory are curved. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 38, 130-151.

Dube, C., Rotello, C. M., & Heit, E. (2011). The belief bias effect is aptly named: A

reply to Klauer and Kellen (2011). Psychological Review, 118, 155-163.

Rotello 74

Dube, C., Starns, J. J., Rotello, C. M., & Ratcliff, R. (2012). Beyond ROC curvature:

Strength effects and response time data support continuous-evidence models

of recognition memory. Journal of Memory & Language, 67, 389-406.

Dunn, J. C. (2004). Remember-know: A matter of confidence. Psychological Review,

111, 524-542.

Dunn, J. C. (2008). The dimensionality of the Remember-Know task: A state-trace

analysis. Psychological Review, 115, 426-446.

Dunn, J. C. & Kirsner, K. (1988). Discovering functionally independent mental

processes: The principle of reversed association. Psychological Review, 95,

91- 101.

Egan, J. P. (1958). Recognition memory and the operating characteristic (Technical

Note AFCRC-TN-58-51). Bloomington, IN: Indiana University and

Communication Laboratory.

Erdfelder, E., & Buchner, A. (1998). Process-dissociation measurement models:

Threshold theory or detection theory? Journal of Experimental Psychology:

General, 127, 83–97. doi:10.1037/0096-3445.127.1.83

Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in

recognition memory for words. Journal of Memory and Language, 62, 1–18.

Doi: 10.1016/j.jml.2009.09.004

Gardiner, J. M., & Richardson-Klavehn, R. (2000). Remembering and knowing. In E.

Tulving & F. I. M. Craik (Eds.), The Oxford handbook of memory (pp. 229–244).

Oxford, England: Oxford University Press.

Rotello 75

Glanzer, M., Hilford, A., & Kim, K. (2004). Six regularities of source recognition.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 30,

1176–1195.

Glanzer, M., Kim, K., Hilford, A., & Adams, J. K. (1999). Slope of the receiver-operating

characteristic in recognition memory. Journal of Experimental Psychology:

Learning, Memory, & Cognition, 25, 500–513.

Green, D. M. & Swets, J. A. (1966). Signal detection theory and . New

York: Wiley. Reprinted 1974 by Krieger, Huntington, NY.

Greve, A., Donaldson, D. I., & van Rossum, M. C. W. (2010). A single-trace dual-

process model of episodic memory: A novel computational account of

familiarity and recollection. , 20, 235-251. Doi:

10.1002/hipo.20606

Gronlund, S D., Ratcliff, R. (1989). Time course of item and associative information:

Implications for global memory models. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 15, 846-858.

Harlow, I. M., & Donaldson, D. I. (2013). Source accuracy data reveal the thresholded

nature of human episodic memory. Psychonomic Bulletin & Review, 20, 318–

325. doi:10.3758/s13423-012-0340-9

Hautus, M., Macmillan, N. A., & Rotello, C. M. (2008). Toward a complete decision

model of item and source recognition. Psychonomic Bulletin & Review, 15,

889-905.

Healy, M. R., Light, L. L., & Chung, C. (2005). Dual-process models of associative

recognition in young and older adults: Evidence from receiver operating

Rotello 76

characteristics. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 31, 768-788. DOI: 10.1037/0278-7393.31.4.768

Heathcote, A. (2003). Item Recognition Memory and the Receiver Operating

Characteristic. Journal of Experimental Psychology: Learning, Memory, &

Cognition, 29, 1210-1230. DOI: 10.1037/0278-7393.29.6.1210

Heathcote, A., Bora, B., & Freeman, E. (2010). Recollection and confidence in

two-alternative forced choice episodic recognition. Journal of Memory and

Language, 62, 183-203. doi:10.1016/j.jml.2009.11.003

Heathcote, A., Raymond, F., & Dunn, J. (2006). Recollection and familiarity in

recognition memory: Evidence from ROC curves. Journal of Memory and

Language, 55, 495-514.

Hilford, A., Glanzer, M., Kim, K., & DeCarlo, L. T. (2002). Regularities of source

recognition: ROC analysis. Journal of Experimental Psychology: General, 131,

494-510.

Hintzman, D. L., & Curran, T. (1994). Retrieval dynamics of recognition and

frequency judgments: Evidence for separate processes of familiarity and

recall. Journal of Memory & Language, 33, 1-18.

Ingram, K. M., Mickes, L., & Wixted, J. T. (2012). Recollection can be weak and

Familiarity can be strong. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 38, 325-339. Doi: 10.1037/a0025483

Isola, P., Xiao, J., Parikh, D., Torralba, A. & Oliva, A. (2014). What makes a photograph

memorable? Pattern Analysis and Machine . 1-14.

Rotello 77

Jang, Y., Mickes, L. & Wixted, J. T. (2012). Three tests and three corrections:

Comment on Koen and Yonelinas (2010). Journal of Experimental Psychology:

Learning, Memory, and Cognition, 38, 513–523.

Jang, Y., Wixted, J. T., & Huber, D. E. (2009). Testing signal- detection models of

yes/no and two-alternative forced-choice recognition memory. Journal of

Experimental Psychology: General, 138, 291–306.

Jang, Y., Wixted, J. T., & Huber, D. E. (2011). The diagnosticity of individual data for

model selection: Comparing signal-detection models of recognition memory.

Psychonomic Bulletin & Review, 18, 751–757. DOI 10.3758/s13423-011-

0096-7

Kantner, J., & Lindsay, D. S. (2012). Response bias in recognition memory as a

cognitive trait. Memory & Cognition, 40, 1163-1177.

Kapucu, A., Macmillan, N. A., & Rotello, C. M. (2010). Positive and negative

remember judgments and ROCs in the plurals paradigm: Evidence for

alternative decision strategies. Memory & Cognition, 38, 541-554.

PMCID2887610

Kapucu, A., Rotello, C. M., Ready, R. E., & Seidl, K. N. (2008). Response bias in

‘remembering’ emotional stimuli: A new perspective on age differences.

Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 703-

711.

Kellen, D., & Klauer, K. C. (2011). Evaluating models of recognition memory using

first- and second-choice responses. Journal of Mathematical Psychology, 55,

251–266. http://dx.doi.org/10.1016/j.jmp.2010.11.004

Rotello 78

Kellen, D., & Klauer, K. C. (2014). Discrete-state and continuous models of

recognition memory: Testing core properties under minimal assumptions.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 40,

1795–1804. http://dx.doi.org/10.1037/xlm0000016

Kellen, D., & Klauer, K. C. (2015). Signal detection and threshold modeling of

confidence-rating ROCs: A critical test with minimal assumptions.

Psychological Review, 122, 542–557. Doi: 10.1037/a0039251

Kellen, D., Klauer, K. C., & Broder, A. (2013). Recognition memory models and

binary-response ROCs: A comparison by minimum description length.

Psychonomic Bulletin & Review, 20, 693-719. Doi: 10.3758/s13423-013-0407-

2

Kellen, D., Klauer, K. C., & Singmann, H. (2012). On the measurement of criterion

noise in Signal Detection Theory: The case of recognition

memory. Psychological Review, 119, 457-479.

Kellen, D., & Singmann, H. (2016). ROC residuals in signal-detection models of

recognition memory. Psychonomic Bulletin & Review, 23, 253-264. Doi:

10.3758/s13423-015-0888-2

Kelley, R., & Wixted, J. T. (2001). On the nature of associative information in

recognition memory. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 27, 701–722. Doi: 10.1037//0278-7393.27.3.701

Klauer, K.C. (2006). Hierarchical multinomial processing tree models: a latent-class

approach. Psychometrika, 71, 1–31.

Rotello 79

Klauer, K. C. (2010). Hierarchical multinomial processing tree models: a latent-trait

approach. Psychometrika, 75, 70-98. DOI: 10.1007/S11336-009-9141-0

Klauer, K. C. & Kellen, D. (2010). Toward a complete decision model of item and

source recognition: A discrete-state approach. Psychonomic Bulletin & Review,

17, 465-478.

Koen, J. D., & Yonelinas, A. P. (2010). Memory variability is due to the contribution of

recollection and familiarity, not encoding variability. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 36, 1536–1542.

doi:10.1037/a0020448

Krantz, D. H. (1969). Threshold theories of signal detection. Psychological Review,

76, 308–324. doi:10.1037/h0027238

Kroll, N. E. A., Yonelinas, A. P., Dobbins, I. G., & Frederick, C. M. (2002). Separating

sensitivity from response bias: Implications of comparisons of yes-no and

forced-choice tests for models and measures of recognition memory. Journal

of Experimental Psychology: General, 131, 241-254. Doi: 10.1037/0096-

3445.131.2.241

Lachman, R., & Field, W. H. (1965). Recognition and recall of verbal materials as a

function of degree of training. Psychonomic Science, 2, 225-226.

Macho, S. (2004). Modeling associative recognition: A comparison of two-high-

threshold, two-high-threshold signal detection, and mixture distribution

models. Journal of Experimental Psychology: Learning, Memory, and Cognition,

30, 83–97. DOI: 10.1037/0278-7393.30.1.83

Rotello 80

Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.).

Mahwah, NJ: Erlbaum.

Macmillan, N. A., Rotello, C. M., & Verde, M. F. (2005). On the importance of models

in interpreting remember-know experiments: Comments on Gardiner et al.’s

(2002) meta-analysis. Memory, 13, 607-621.

Malejka, S., Bröder, A. (2016). No source memory for unrecognized items when

implicit feedback is avoided. Memory & Cognition, 44, 63-72. Doi:

10.3758/s13421-015-0549-8

Malmberg, K. J. (2002). On the form of ROCs constructed from confidence ratings.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 28,

380-387.

Malmberg, K. J., & Annis, J. (2012). On the Relationship between Memory and

Perception: Sequential dependencies in recognition testing. Journal of

Experimental Psychology: General, 141, 233-259.

Malmberg, K.J. & Xu, J. (2006). The Influence of Averaging and Noisy Decision

Strategies on the Recognition Memory ROC, Psychonomic Bulletin & Review,

13, 99-105.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence.

Psychological Review, 87, 252–271

McElree, B., Dolan, P. O., & Jacoby, L. L. (1999). Isolating the contributions of

familiarity and source information to item recognition: A time course

analysis. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 25, 563–582.

Rotello 81

Mickes, L., Flowe, H. D., & Wixted, J. T. (2012). Receiver operating characteristic

analysis of eyewitness memory: Comparing the diagnostic accuracy of

simultaneous versus sequential lineups. Journal of Experimental Psychology:

Applied, 18, 361-376. Doi: 10.1037/a0030609

Mickes, L., Johnson, E. M., & Wixted, J. T. (2010). Continuous recollection versus

unitized familiarity in associative recognition. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 36, 843–863. doi:

10.1037/a001975

Mickes, L., Wais, P. E., & Wixted, J. (2009). Recollection is a continuous process:

Implications for dual-process theories of recognition memory. Psychological

Science, 20, 509-515.

Mickes, L., Wixted, J. T. & Wais, P. E. (2007). A Direct Test of the Unequal-Variance

Signal-Detection Model of Recognition Memory. Psychonomic Bulletin &

Review, 14, 858-865.

Mueller, S. T., & Weidemann, C. T. (2008). Decision noise: An explanation for

observed violations of signal detection theory. Psychonomic Bulletin &

Review, 15, 465-494. doi: 10.3758/PBR.15.3.465

Myung, J. I., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized

maximum likelihood. Journal of Mathematical Psychology, 50, 167-179.

Norman, D. A., & Wickelgren, W. A. (1969). Strength theory of decision rules and

latency in retrieval from short-term memory. Journal of Mathematical

Psychology, 6, 192-208.

Rotello 82

O’Connor, A. R., Guhl, E. N., Cox, J. C., & Dobbins, I. G. (2011). Some memories are

odder than others: Judgments of episodic oddity violate known decision

rules. Journal of Memory and Language, 64, 299–315. Doi:

10.1016/j.jml.2011.02.001

Onyper, S. V., Zhang, Y. X., & Howard, M. W. (2010). Some-or-none recollection:

Evidence from item and source memory. Journal of Experimental Psychology:

General, 139, 341–364. doi:10.1037/a0018926

Parks, C. M., & Yonelinas, A. P. (2007). Moving beyond pure signal detection models:

Comment on Wixted (2007). Psychological Review, 114, 188–202.

doi:10.1037/0033-295X.114.1.188

Parks, C. M., & Yonelinas, A. P. (2009). Evidence for a memory threshold in second-

choice recognition memory responses. Proceedings of the National Academy

of Sciences, 106, 11515-9.

Parks, C. M., Murray, L. J., Elfman, K., & Yonelinas, A. P. (2011). Variations in

recollection: The effects of complexity on source recognition. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 37, 861-873.

Doi:10.1037/a0022798

Pazzaglia, A. M., Dube, C., & Rotello, C. M. (2013). A critical comparison of discrete-

state and continuous models of recognition memory: Implications for

recognition and beyond. Psychological Bulletin, 139, 1173-1203.

Petrusic, W. M., & Baranski, J. V. (2003). Judging confidence influences decision

processing in comparative judgments. Psychonomic Bulletin & Review, 10,

177–183. doi:10.3758/BF03196482

Rotello 83

Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among

computational models of cognition. Psychological Review, 109, 472-491.

Pleskac, T. J., & Busemeyer, J. R. (2010). Two-stage dynamic signal detection: A

theory of choice, decision time, and confidence. Psychological Review, 117,

864-901. DOI: 10.1037/a0019737

Pratte, M. S., & Rouder J. N. (2011). Hierarchical single- and dual-process models of

recognition memory. Journal of Mathematical Psychology, 55, 36-46.

Pratte, M. S., & Rouder, J. N. (2012). Assessing the dissociability of recollection and

familiarity in recognition memory. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 38, 1591-1607. 10.1037/a0028144

Pratte, M. S., Rouder J. N., & Morey R. D. (2010). Separating mnemonic process from

participant and item effects in the assessment of ROC asymmetries. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 36, 224--232.

Province, J. M., & Rouder, J. N. (2012). Evidence for discrete-state processing in

recognition memory, Proceedings of the National Academy of Sciences, 109,

14357–14362 . doi: 10.1073/pnas.1103880109

Qin, J., Raye, C. L., Johnson, M. K., & Mitchell, K. J. (2001). Source ROCs are (typically)

curvilinear: Comment on Yonelinas (1999). Journal of Experimental

Psychology: Learning, Memory, & Cognition, 27, 1110-1115.

Quamme, J. R., Yonelinas, A. P., & Norman, K. A. (2007). Effect of unitization on

associative recognition in . Hippocampus, 17, 192-200. Doi:

10.1002/hipo.20257

Rotello 84

Rajaram, S. (1993). Remembering and knowing: Two means of access to the

personal past. Memory & Cognition, 21, 89–102.

Rajaram, S., & Geraci, L. (2000). Conceptual fluency selectively influences knowing.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 26,

1070-1074.

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.

Ratcliff, R., McKoon, G., & Tindall, M. (1994). The empirical generality of data from

recognition memory receiver-operating characteristic functions and

implications for global memory models. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 20, 763–785.

Ratcliff, R., Sheu, C.-F., & Gronlund, S. D. (1992). Testing global matching memory

models using ROC curves. Psychological Review, 99, 518-535.

Ratcliff, R., & Starns, J. J. (2009). Modeling confidence and response time in

recognition memory. Psychological Review, 116, 59–83.

Reder, L. M., Nhouyvanisvong, A., Schunn, C. D., Ayers, M. S., Angstadt, P., & Hiraki, K.

(2000). A mechanistic account of the mirror effect for word frequency: A

computational model of remember–know judgments in a continuous

recognition paradigm. Journal of Experimental Psychology: Learning, Memory,

& Cognition, 26, 294-320.

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory

testing. Psychological Review, 107, 358–367. DOI: 10.1037//0033-

295X.107.2.358

Rotello 85

Rotello, C. M. (2000). Recall processes in recognition memory. In D. L. Medin (Ed.),

The Psychology of Learning and Motivation, Vol. 40 (pp. 183-221). San Diego,

CA: Academic Press.

Rotello, C. M., & Heit, E. (1999). Two-process models of recognition memory:

Evidence for Recall-to-reject? Journal of Memory and Language, 40, 432-453.

Rotello, C. M., & Heit, E. (2000). Associative recognition: A case of recall-to-reject

processing. Memory and Cognition, 28, 907-922.

Rotello, C. M., & Macmillan, N. A. (2006). Remember-know models as decision

strategies in two experimental paradigms. Journal of Memory & Language,55,

479-494.

Rotello, C. M., & Macmillan, N. A. (2008). Response bias in recognition memory. In A.

S. Benjamin & B. H. Ross (Eds.), The Psychology of Learning and Motivation:

Skill and Strategy in Memory Use, Vol. 48 (pp. 61-94). London: Academic

Press.

Rotello, C. M., & Zeng, M. (2008). Analysis of RT distributions in the remember-

know paradigm. Psychonomic Bulletin & Review, 15, 825-832.

Rotello, C. M., Macmillan, N. A., Hicks, J. L., & Hautus, M. (2006). Interpreting the

effects of response bias on remember-know judgments using signal-

detection and threshold models. Memory & Cognition, 34, 1598-1614.

Rotello, C. M., Macmillan, N. A., Reeder, J. A., & Wong, M. (2005). The remember

response: Subject to bias, graded, and not a process-pure indicator of

recollection. Psychonomic Bulletin & Review, 12, 865-873.

Rotello 86

Rotello, C. M., Macmillan, N. A., & Van Tassel, G. (2000). Recall-to-reject in

recognition: Evidence from ROC curves. Journal of Memory and Language, 43,

67-88.

Schulman, A. I., & Greenberg, G. Z. (1970). Operating characteristics and a priori

probability of the signal. Perception & Psychophysics, 8, 317-320.

Schütz, J. & Bröder, A. (2011). Signal detection and threshold models of source

memory. Experimental Psychology, 58(4), 293-311.

Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM –

Retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145-

166.

Slotnick, S. D. (2010). "Remember" source memory ROCs indicate recollection is a

continuous process. Memory, 18, 27−39. doi: 10.1080/09658210903390061

Slotnick, S. D., & Dodson, C. S. (2005). Support for a continuous (single process)

model of recognition memory and source memory. Memory & Cognition, 33,

151–170.

Slotnick, S. D., Jeye, B. M., & Dodson, C. S. (2016). Recollection is a continuous

process: Evidence from plurality memory receiver operating characteristics.

Memory, 24, 2-11. Doi: 10.1080/09658211.2014.971033

Slotnick, S. D., Klein, S. A., Dodson, C. S., & Shimamura, A. P. (2000). An analysis of

signal detection and threshold models of source memory. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 26, 1499–1517.

Smith, D. G., & Duncan, M. J. J. (2004). Testing theories of recognition memory by

predicting performance across paradigms. Journal of Experimental

Rotello 87

Psychology: Learning, Memory, & Cognition, 30, 615-625. Doi: 10.1037/0278-

7393.30.3.615

Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognition memory:

Applications to dementia and amnesia. Journal of Experimental Psychology:

General, 117, 34–50. doi:10.1037/0096-3445.117.1.34

Squire, L. R., Wixted, J. R., & Clark, R. E. (2007).Recognition memory and the medial

: A new perspective. Nature Reviews , 8, 872-883.

10.1038/nrn2154

Starns, J. J., & Ksander, J. C. (2016). Item strength influences source confidence and

alters source memory z ROC slopes. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 42, 351-365. Doi: 10.1037/xlm0000177

Starns, J. J., Hicks, J. L., Brown, N. L., & Martin, B. A. (2008). Source memory for

unrecognized items: Predictions from multivariate signal detection theory.

Memory & Cognition, 36, 1-8.

Starns, J. J., Pazzaglia, A. M., Rotello, C. M., Hautus, M. J., & Macmillan, N. A. (2013).

Unequal-strength source z ROC slopes reflect criteria placement and not

(necessarily) memory processes. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 39, 1377-1392

Starns, J. J., Ratcliff, R., & McKoon, G. (2012). Evaluating the unequal-variance and

dual-process explanations of z ROC slopes with response time data and the

diffusion model. , 64, 1–34. doi:

10.1016/j.cogpsych.2011.10.002

Rotello 88

Starns, J. J., Rotello, C. M., & Hautus, M. J. (2014). Recognition memory z ROC slopes

for items with correct versus incorrect source decisions discriminate the

dual process and unequal variance signal detection models. Journal of

Experimental Psychology: Learning, Memory, & Cognition, 40, 1205-1225.

Starns, J. J., Rotello, C. M., Ratcliff, R. (2012). Mixing strong and weak targets

provides no evidence against the unequal-variance explanation of z ROC

slopes: A comment on Koen & Yonelinas (2010). Journal of Experimental

Psychology: Learning, Memory, and Cognition, 38, 793-801.

Swets, J. A., Tanner, W. P., Jr., & Birdsall, T. G. (1961). Decision processes in

perception. Psychological Review, 68, 301–340. doi: 10.1037/h0040547

Treisman, M., & Faulkner, A. (1984). The effects of signal probability on the slope of

the receiver operating characteristic given by the rating procedure. British

Journal of Mathematical and Statistical Psychology, 37, 199-215.

Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an

application to sequential dependencies. Psychological Review, 91, 68-111.

Tulving, E. (1981). Similarity relations in recognition. Journal of Verbal Learning and

Verbal Behavior, 20, 479-49.

Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26, 1-12.

Van Zandt, T. (2000). ROC curves and confidence judgments in recognition memory.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 26,

582-600.

Verde, M. F., & Rotello, C. M. (2004). ROC curves show that the revelation effect is

not a single phenomenon. Psychonomic Bulletin & Review, 11, 560-566.

Rotello 89

Voskuilen, C., & Ratcliff, R. (2016). Modeling confidence and response time in

associative recognition. Journal of Memory and Language, 86, 60-96. doi:

10.1016/j.jml.2015.09.006

Wagenmakers, E.-J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004). Assessing model

mimicry using the parametric bootstrap. Journal of Mathematical Psychology,

48, 28-50.

Westerman, D. L. (2001). The role of familiarity in item recognition, associative

recognition, and plurality recognition on self-paced and speeded tests.

Journal of Experimental Psychology: Learning, Memory, and Cognition, 27,

723-732. DO1: 10.1037//0278-7393.27.3.723

Wickelgren, W. A., & Norman, D. A. (1966). Srength models and serial position in

short-term recognition memory. Journal of Mathematical Psychology, 3, 316-

347.

Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition

memory. Psychological Review, 114, 152-176.

Wixted, J. T. & Mickes, L. (2010). Useful scientific theories are useful: A reply to

Rouder, Pratte and Morey (2010). Psychonomic Bulletin & Review, 17, 436-

442.

Wixted, J. T. & Mickes, L. (2010). A continuous dual-process model of

remember/know judgments. Psychological Review, 117, 1025-1054.

Wixted, J. T. & Stretch, V. (2004). In defense of the signal-detection interpretation of

Remember/Know judgments. Psychonomic Bulletin & Review, 11, 616-641.

Rotello 90

Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory:

Evidence for a dual-process model. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 20, 1341-1354.

Yonelinas, A. P. (1997). Recognition memory ROC’s for item and associative

information: The contribution of recollection and familiarity. Memory &

Cognition, 25, 747–763.

Yonelinas, A. P. (1999a). The contribution of recollection and familiarity to

recognition and source-memory judgments: A formal dual-process model

and analysis of receiver operating characteristics. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 25, 1415–1434.

Yonelinas, A. P. (1999b). Recognition memory ROCs and the dual-process signal

detection model: Comment on Glanzer, Kim, Hilford, and Adams. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 25, 514–521.

Yonelinas, A. P. (2001). Consciousness, control and confidence: The three Cs of

recognition memory. Journal of Experimental Psychology: General, 130, 361-

379.

Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30

years of research. Journal of Memory and Language, 46 (3), 441-517.

Yonelinas, A. P., & Jacoby, L. L. (1995). The relation between remembering and

knowing as bases for recognition: Effects of size congruency. Journal of

Memory and Language, 34, 622-643. doi:10.1006/jmla.1995.1028

Rotello 91

Yonelinas, A. P., Kroll N. E. A., Dobbins I. G., & Soltani, M. (1999). Recognition

Memory for Faces: When Familiarity Supports Associative Recognition

Judgments. Psychonomic Bulletin and Review, 6, 654-661.

Yonelinas, A. P., & Parks, C. M. (2007). Receiver operating characteristics (ROCs) in

recognition memory: A review. Psychological Bulletin, 133, 800–832.

doi:10.1037/0033-2909.133.5.800

Rotello 92

Table 1. Parameters of the models when fit to a confidence-rating ROC with m confidence bins.

Model Sensitivity Variance or Criterion State-response Parameter(s) Mixture Locations mapping parameters Parameter EVSDT d’ -- m-1 -- UVSDT d s m-1 -- HTSDT R, d’ -- m-1 -- MSDT dFull, dPartial l m-1 -- 2HT po, pn -- -- Varies. Maximum = 3(m-1).

Rotello 93

Figure Captions

Figure 1. Top row: Equal-variance signal detection (EVSDT) model decision space

(left panel), example ROCs (middle panel), and corresponding zROCs (right panel).

Bottom row: Unequal-variance signal detection (UVSDT) model decision space (left

panel), example ROCs (middle panel), and corresponding zROCs (right panel).

Figure 2. High-threshold signal detection (HTSDT) model. Left panel: decision

space for targets. Lure decision space is identical to EVSDT. Middle panel: example

ROCs for three recollection probabilities (.2, .4, .6) and a constant d’ (1.5). Right

panel: corresponding zROCs.

Figure 3. Mixture signal detection (MSDT) model. Left panel: decision space. Lure

distribution on the left, distribution for unattended targets (dashed distribution),

and attended target distribution on the right. Middle panel: Example ROCs for three

values of l (.2, .4, .6). Right panel: corresponding zROCs.

Figure 4. Double high-threshold (2HT) model for binary (old-new) decision task.

Left panel: decision space, T = Target; L = Lure ; ? = uncertain state Middle panel:

Example ROCs for three values of po (.2, .4, .6) and three values of pn (.05, .25, .45).

Right panel: corresponding zROCs.

Figure 5. Double high-threshold (2HT) model for confidence rating task. Left

panel: decision space, T = Target; L = Lure; ? = uncertain state. Middle panel:

Example ROCs for po = .6, pn = .45 and three different sets of detect state to response

mapping parameters. Right panel: corresponding zROCs.

Rotello 94

Figure 6. Example fits of the competing models to an item recognition task. Data

(circles) are from a single subject in Egan (1958, Exp. 1), reported in his Table 1.

Best-fitting model predictions are shown. The MSDT and HTSDT models make

identical predictions for these data, though the MSDT requires an extra parameter

to do so. All models except the EVSDT provide acceptable fits (they cannot be

rejected by a G2 goodness of fit measure).

Figure 7. Decision space for Hautus et al.’s (2008) bivariate model of item and

source recognition. Horizontal lines reflect the old-new confidence criteria, which

depend only on the old-new evidence dimension. Curved boundaries indicate the

optimal (likelihood-based) source confidence criteria (“1” = Sure Source B; “6” =

Sure Source A). From: Hautus, M., Macmillan, N. A., & Rotello, C. M. (2008). Toward

a complete decision model of item and source recognition. Psychonomic Bulletin &

Review, 15, 889-905. Figure 8. Reprinted with permission of Springer.

Figure 8. Decision space for Onyper et al.’s (2010) model of source and item

recognition. Like the HTSDT model, this model assumes source information is

recollected with some probability (upper distributions). In the absence of

recollection, responses are based on familiarity (lower distributions). Confidence

ratings (‘1’ though ‘9’) depend only on the item or source dimension. From: Onyper,

Zhang, & Howard (2010). Some-or-none recollection: Evidence from item and

source memory, Journal of Experimental Psychology: General, 139, 341–364, Figure

6A. Reprinted with permission of the American Psychological Association.

Figure 9. Two-alternative forced choice (2AFC) decision space.

Rotello 95

Figure 10. Heathcote et al.’s (2010) model of the 2AFC confidence-accuracy

inversion data. Positive differences result in correct decisions; negative differences

in errors. Confidence is defined by distance to the criterion (0-point). Remember

responses are made for differences larger than the R/K criterion, and know

responses for small differences. The arrow indicates trial-to-trial variability in the

location of the R/K criterion. Adapted from: Clark, S. E. (1997). A Familiarity-Based

Account of Confidence-Accuracy Inversions in Recognition Memory. Journal of

Experimental Psychology: Learning, Memory, & Cognition, 23, 232-238. Adapted with

permission of the American Psychological Association.

Rotello 96

Figure 1.

Rotello 97

Figure 2

Rotello 98

Figure 3

Rotello 99

Figure 4

Rotello 100

Figure 5

Rotello 101

Figure 6

Rotello 102

Figure 7

Rotello 103

Figure 8

Rotello 104

Figure 9

Rotello 105

Figure 10