Signal Detection Theories of Recognition Memory Caren M
Rotello 1
Signal Detection Theories of Recognition Memory
Caren M. Rotello Department of Psychological & Brain Sciences University of Massachusetts
June 21, 2016 – to appear as: Rotello, C. M. (in press). Signal detection theories of recognition memory. To appear in J. T. Wixted (Ed.), Learning and Memory: A Comprehensive Reference, 2nd edition (Vol. 4: Cognitive Psychology of Memory). Elsevier. Please read and cite the published version.
Correspondence:
Caren M. Rotello Department of Psychological & Brain Sciences University of Massachusetts 135 Hicks Way Amherst, MA 01003-9271 (413) 545-1543 [email protected]
Rotello 2
Abstract
Signal detection theory has guided thinking about recognition memory since it was
first applied by Egan in 1958. Essentially a tool for measuring decision accuracy in
the context of uncertainty, detection theory offers an integrated account of simple
old-new recognition judgments, decision confidence, and the relationship of those
responses to more complex memory judgments such as recognition of the context in
which an event was previously experienced. In this chapter, several commonly used
signal detection models of recognition memory, and their threshold-based
competition, are reviewed and compared against data from a wide range of tasks.
Overall, the simpler signal detection models are the most successful.
Keywords:
Associative memory d' Discrimination accuracy Dual-process recognition memory Item memory Familiarity Mixture models Modeling Plurality discrimination Recollection Recognition memory Signal detection theory Source memory Threshold models
Rotello 3
1 Introduction
Recognition memory errors are unavoidable. Sometimes we fail to recognize
an individual we’ve had a casual conversation with; other times we initiate a
conversation with a stranger, erroneously thinking we met him at a campus event.
These errors are not limited to weak memories: Lachman and Field (1965) asked
subjects to study a single list of 50 common words as many as 128 times and found
that the percentage of studied words that are called “old” reached an maximum of
88%, while the false recognition of an unstudied word hovered around 2%.
Although this performance level is excellent, free recall of those studied words
under the same conditions was 98% correct with no intrusion errors. The difference
of memory accuracy between recall and recognition suggests that the decision
process itself plays an important role in recognition memory.
Signal detection theory (SDT: Green & Swets, 1966; Macmillan & Creelman,
2005) provides a theoretical framework for quantifying memory accuracy as well as
the role of decision processes. It makes explicit the balance between possible
memory errors (missed studied items and false alarms to lures), as well as the
inevitability of those errors. In some applications, such as eyewitness
identifications, the different error types come with variable implicit costs: failing to
identify the guilty suspect in a lineup allows a criminal to go free; falsely accusing
the wrong individual leaves the true criminal unpunished and may send an innocent
person to jail. In experimental settings, there may be performance penalties (cash
Rotello 4
or point deductions, delayed onset of the subsequent trial) associated with missed
opportunities to recognize a studied item or with false identifications of unstudied
memory probes. These considerations may suggest the one type of error is more
desirable – or, at least, less undesirable – than the other, and decision makers can
shift their strategy to reduce the probability of the costlier error. Modeling
recognition memory using signal detection allows independent assessment of the
decision process and the ability of the individual to discriminate categories of items.
Competing models of recognition memory make different assumptions about
the nature of memory errors. Discrete state, or threshold, models (e.g., Krantz,
1969) assume that probing memory with a studied item can either result in its
detection as a previously experienced item, or in no information at all being
available about its status. For this reason, these models are often described as
having complete information loss below a recollection threshold. In the most
common version of these models, errors occur either because of a random guessing
process, or, sometimes, because a response is offered that directly contradicts the
evidence available from memory. The relationship between misses and false alarms
is not specified in advance by threshold models; as we'll see, different parameter
choices allow a good deal of flexibility. In particular, certain parameter settings
allow either misses or false alarms to be avoided entirely.
A second type of competing model assumes that more than one process
contributes to recognition memory decision, in agreement with Mandler’s (1980)
well-known butcher on the bus example. The man on the bus may be recognized
because he seems familiar; both models in this class assume that familiarity
Rotello 5
operates as a signal detection process. The man may also be recognized because we
remember how we know him; that he is the butcher. This recollection process has
been described as operating either as a second signal detection process (Wixted &
Mickes, 2010) or as a threshold process (Yonelinas, 1994). The recognition memory
errors that are predicted by these models vary with their assumptions, as will be
described in section 2.
Finally, a third type of competing model assumes that recognition decisions
for studied items are based on a mixture of two types of trials, those on which the
study item was attended, and those for which it was not. As we will see, these
mixture models inherit most of the properties of the signal detection models,
including the inability to avoid a trade-off between the two types of recognition
errors.
This chapter is divided into three main sections. I begin by describing the
competing models in detail. Next, I review the empirical evidence that discriminates
the models, concluding that the traditional signal detection approach provides the
best overall description of the literature. Finally, I consider some challenges for
signal detection models.
2 The Models
2.1 Equal- (EVSDT) and Unequal-Variance (UVSDT) Signal
Detection Models
Rotello 6
The earliest signal detection models of recognition memory assume
recognition decisions are made based on a single underlying evidence dimension
(see Figure 1). Both studied items (targets) and unstudied items (lures) are assumed
to resonate with memory to varying degrees, resulting in a distribution of observed
memory strengths; the strength of targets is increased by their recent study
exposure, shifting that distribution's mean to a greater value than that of the lures.
Recognition decisions are based on a criterion level of evidence, with positive
("old") decisions given to all memory probes whose strengths exceed that criterion,
otherwise negative ("new") responses are made. The proportion of the target
distribution that exceeds the criterion equals the theoretical proportion of studied
items that are identified as "old", which is the hit rate (H). The miss rate is the
proportion of targets that are erroneously called "new," so H + miss rate = 1.
Similarly, the proportion of the lure distribution that exceeds that same criterion
provides the false alarm rate (F), which is the proportion of lures that falsely elicit
an "old" response. Finally, the proportion of lures that are called "new" is the correct
rejection rate. By varying the location of the criterion, the hit and false alarm rates
can be increased or decreased.
< Insert Figure 1 near here >
There are two important points to make about the SDT model. First, changes
in the criterion location always change hit and false alarm rates in the same
direction (both increasing, for more liberally-placed criteria, or both decreasing, for
more conservative criteria), though the observed differences may not be statistically
Rotello 7
significant. Second, errors are impossible to avoid. A criterion that is liberal enough
to yield a low rate of missed targets will necessarily result in a very high false alarm
rate, and one that is conservative enough to result in a low false alarm rate will
necessarily produce a high miss rate.
We can measure participants' discrimination accuracy – that is, their ability
to distinguish the targets from the lures – in terms of the distance between the
means of the distributions, in standardized (z-score) units. When the two
distributions have the same variance (as in the upper row of Figure 1), this distance
is called d' and it is independent of the criterion location, k. In that case, the model
can be defined with only those two parameters. Setting the mean of the lure
distribution to 0 and its standard deviation to 1 (without loss of generality), the
false alarm rate is defined by
F = Φ(−k) (1)
where F is the cumulative normal distribution function. Similarly, the hit rate is
defined by the same criterion relative to the mean, d’, of the target distribution:
H = Φ(d"− k) . (2)
We can combine equations 1 and 2 to see that
d! = zH − zF (3)
where z is the inverse of the normal CDF. Because the criterion location, k, drops
out of calculations in Equation 3, its value does not affect our estimate of
discrimination accuracy: response bias (k) and memory sensitivity (d') are
independent in this model. Effectively, this means that there is a set of (F, H) points
Rotello 8
that all yield the same value of d'; each point simply reflects a different willingness
of the participant to say “old.” Connecting all of these possible (F, H) pairs yields a
theoretical curve called a receiver operating characteristic (ROC); transforming both
F and H to their z-score equivalents yields a zROC.
Several example ROCs and corresponding zROCs are shown in the upper row
of Figure 1, for different values of d'. The ROCs and zROCs associated with higher
decision accuracy fall above those with lower accuracy: for any given false alarm
rate, the ROC yielding higher accuracy has a greater predicted hit rate. The points
on each ROC and zROC vary only in criterion location, k, with more conservative
response biases (larger k) yielding points that fall on the lower left end of the ROC
because larger values of k result in lower F and H. In probability space, the ROCs are
curved and symmetric about the minor diagonal. In normal-normal space, we can
use Equation 3 to see that the zROC is a line, zH = d' + zF, for which the y-intercept
equals d' and the slope is 1. The model in the upper row of Figure 1, and thus
equations 1-3, applies only when the variability of the target distribution equals that
of the lure distribution. For this reason, this model is called the equal-variance SDT
(EVSDT) model. As we will soon see, the equal variance property of this basic model
causes the symmetry in the theoretical ROC.
Egan (1958) was the first to apply this model to recognition memory data.
Participants in two experiments studied 100 words either once or twice, and then
made old-new decisions on those studied words mixed with 100 lures. Their
responses were made on a confidence rating scale ranging from “positive” that the
test probe was studied to “positive” that it was new. The response probabilities in
Rotello 9
each of these confidence bins can be modeled with the same overall discrimination
value (d' ) but a different criterion (k1-km, see Figure 1). Egan's data immediately
suggested a problem with the model: the ROCs were not symmetric, either for
individual data or for the average across participants, and therefore were not
consistent with the equal-variance SDT model. (Figure 6 shows the ROC produced
by one of his participants.) Fortunately, it is straightforward to modify the model to
allow for unequal-variance distributions.
Again assuming (without loss of generality) that the lure distribution is
normal with a mean of 0 and a standard deviation of 1, we can set the mean of the
target distribution to be d and its standard deviation to be s. The lower row of
Figure 1 shows what these distributions might look like. In this unequal-variance
version of the EVSDT model, the UVSDT, Equation 1 still holds because the false
alarm rate is defined by the mean and standard deviation of the lure distribution,
and those haven’t changed. The hit rate calculation in the unequal-variance model
must take account of the standard deviation of the target distribution, s, yielding
# d − k & H = Φ% ( $ s ' . (4)
Notice that discrimination accuracy, which is the distance between the means of the
target and lure distributions, now can be defined in several ways. The two most
obvious approaches are to measure the distance in units of the standard deviation of
the lure distribution (d'1),
Rotello 10
d1! = s⋅ zH − zF , (5)
or in units of the standard deviation of the target distribution (d'2),
1 d! = zH − ⋅ zF . (6) 2 s
Several example ROCs are shown in the lower row of Figure 1, for different values of
d'1 (and, equivalently, d'2). The corresponding zROCs are also shown. The y-
intercept is the value of zH when zF = 0; Equation 5 tells us that zH = d'2 at that
point. It also shows that the zROC, zH = d'2 + (1/s) zF, is a line with slope equal to the
ratio of the lure and target distributions standard deviations. This connection of the
slope of the zROC and the ratio of standard deviations is a handy property that has
theoretical significance. The linear form of the zROC, and its slope, are heavily
studied aspects of recognition ROCs.
A third strategy for measuring the distance between the target and lure
distribution means is to use units that reflect a compromise between the two
standard deviations. The best strategy for doing so involves the root mean square
standard deviation, yielding da,
! 2 $ ! 1 $ d = # & #zH− ⋅zF& a "1+s2 % " s % (7)
because of its relationship to the area under the ROC, Az:
" da % Az = Φ$ ' . (8) # 2 &
Rotello 11
Az is equals the proportion correct an unbiased participant would achieve in a task
involving selection of the target in a set of target-lure pairs. Notice that da = d' when
the variance of the target and lures distributions are equal (s = 1).
2.2 Dual-process and Mixture Signal Detection Models
2.2.1 The high-threshold signal detection model (HTSDT)
Yonelinas (1994) proposed a very different explanation of the asymmetry in
the recognition ROC, namely that participants sometimes recollect studied items.
Because recollection can't occur for lures and because it's likely to result in high
confidence responses, only the highest-confidence hit rate is increased by the
contribution of recollection. This causes the left end of the ROC to be shifted
upwards, resulting in an asymmetric function. Yonelinas assumed that, in the
absence of recollection, responses are based on an equal-variance signal detection
process that assesses the familiarity of the memory probe. In this dual-process
model, recollection operates as a high-threshold process (lures can't be recollected),
so we'll call it the high threshold signal detection (HTSDT) model. According to
HTSDT, the hit rate is defined by
H = R + (1− R)⋅ Φ(d% − k), (9)
where R is the probability of recollection, and the false alarm rate is given by
€ Equation 1; the model can be described by the parameters R and d'. Some example
Rotello 12
ROCs and zROCs are shown in Figure 2. Notice that zROC for the HTSDT model is
curved. In this model, familiarity-based recognition errors occur trade off against
one another, exactly as in the EVSDT and UVSDT models. However, the recollection
process cannot result in false alarms, and if all responses to targets are based on
recollection, there can be no misses. 1
< Insert Figure 2 near here >
2.2.2 The continuous dual process signal detection model (CDP)
The idea that some items on a recognition test may be recollected has a long
history (e.g., Mandler, 1980). However, nothing about recollection demands that it
is a high-threshold process. Wixted and Mickes (2010) proposed a dual-process
model in which both recollection and familiarity are based on underlying signal
detection processes, the results of which are summed to yield an old-new response.
For this reason, the CDP model is identical to the UVSDT model for item recognition.
However, the CDP model also allows the two processes to be queried separately.
2.2.3 The mixture signal detection model (MSDT)
DeCarlo (2002, 2003, 2007) proposed an extension of the standard EVSDT
1 The Sources of Activation Confusion (SAC) Model (Reder et al., 2000) is a process model similar to the HTSDT model. This chapter will not discuss process models.
Rotello 13
model in which study items are not always fully encoded. On some trials, the item is
fully encoding (resulting in a relatively large increment in memory strength, dFull),
whereas on other trials the study item is only partially encoded (leading to a small
increment in strength, dPartial). The probability of full encoding is given by the
parameter l, which can be interpreted as a measure of attention. At test, the target
distribution reflects a mixture of responses from these two distributions. This
mixture distribution for the targets is not Gaussian, and its precise form depends on
both l and the distance between the means of the full- and partially-encoded
distributions. The hit rate is defined as follows:
H = λ ⋅Φ(dFull − k) +(1− λ)⋅Φ(dPartial − k) (10)
The false alarm rate is defined as in Equation 1. Notice the close relationship
between the MSDT and HTSDT models: when dFull is very large (as is often the case
when fit to empirical data), then the first component of the hit rate is essentially just
l, analogous to the R parameter of the HTSDT model.
The decision space assumed by the MSDT model is shown in Figure 3, as well
as several example ROCs and zROCs. Note the unusual form of the zROCs, which will
become important in the discussion of associative and source recognition.
< Insert Figure 3 near here >
2.3 The Double High-Threshold (2HT) Model
Rotello 14
The remaining model of recognition memory that has been popular in recent
years is a discrete state model. The double high-threshold (2HT: Snodgrass &
Corwin, 1988) model assumes that memory probes result in different internal states
(see Figure 4). Targets can either be recollected, in which case they are always
judged to be “old,” or they result in a state of uncertainty from which the participant
must guess "old" or "new." Lures can be detected as new, resulting in a "new"
decision, or they result in the same state of uncertainty as un-recollected targets.
The model is called a double high-threshold model because there are two
thresholds: the lures can't cross the recollection threshold and be called "old" from
that state, and the targets can't be detected as new. Memory errors always result
from the uncertain state. In the 2HT model, the hit rate depends on the probability
that targets are recollected (po) and the probability that, if not recollected, they
result in a guess "old" decision (g):
H = po + (1− po )g. (11)
An "old" response to a lure can only occur because of guessing, so the false alarm
€ rate is determined by the probability that lures fail to be detected as new and the
rate of "old" guessing:
F = (1− pn )g. (12)
In this form, the 2HT model predicts that the ROC is a line with y-intercept equal to € po and slope of (1-po)/(1-pn). Different response biases can occur in simple old-new
decision tasks when the guessing rate, g, is varied. Sensible changes in g occur when
different base-rates of targets and lures appear on a memory test, or when
Rotello 15
participants are offered different penalties and rewards for specific responses.
However, it’s possible within this model for the g parameter to be set to 0, so that
false alarms are entirely avoided, at the price of an increase in the miss rate.
Similarly, there’s nothing about the model itself that prevents the g parameter from
being set to 1, so that misses are eliminated at the price of an increase in false
alarms.2
< Insert Figure 4 near here >
The 2HT model does not naturally predict confidence ratings, though it can
be extended to do so by adding parameters from the internal states (recollect,
uncertain, detect-new) to the possible confidence judgments, as in Figure 5. If
recollected targets are always given highest-confidence "old" responses, then the
confidence-based ROC predicted this modified 2HT model is still linear. However,
the model can accommodate the curved confidence-based ROCs that are observed
empirically, if recollected targets (which logically should receive the highest-
confidence response) are allow to yield lower-confidence "old" responses (e.g.,
Malmberg, 2002; Bröder & Schütz, 2009). Similarly, the detected lures are allowed
to result in lower "new" decisions. (Some researchers even allow responses to
detected items to fall in the "opposite" response category, so that recollected targets
may still yield a "new" decision.) This modified model requires more parameters to
describe the mapping from the internal states, but the increase in free parameters
results in much better fits to data (and greater model flexibility, of course).
2 More complex versions of the basic 2HT model have also been proposed (e.g., Brainerd, Reyna, & Mojardin, 1999). Testing these variants require experimental conditions beyond those considered in this chapter.
Rotello 16
< Insert Figure 5 near here >
3 The Evidence
In 1970, Banks expressed the optimistic view that signal detection modeling
would ultimately allow us to identify the number and nature of processes involved
in recognition memory decision. At the time, the models under consideration
included the SDT and threshold models (including the 2HT model and several other
variants). Those models do make quite distinct predictions about the form of the
ROC (see Pazzaglia, Dube, and Rotello, 2013, for details). Specifically, the 2HT model
can accommodate a curved ROC based on confidence ratings, but it must predict a
linear ROC if participants make binary old-new decisions and the responses that
define the different operating points are collected independently. In contrast, the
SDT model always predicts a curved ROC will result, regardless of whether
confidence ratings or binary old-new decisions are collected. As we’ll see, Banks’s
optimism was reasonably well placed for these models.
With the addition of the hybrid HTSDT and mixture MSDT models, however,
the landscape became more challenging. These models do make predictions that, in
principle, allow them to be distinguished from the others. For example, the zROC is
predicted to be linear for SDT and to have an upward curvature for the HTSDT
model, and the HTSDT models predicts that both the slope of the zROC and its
curvature are systematically influenced the probability of recollection. Despite
Rotello 17
these apparently contrasting predictions, the models turn out to fit basic item
recognition data quite well, requiring that the experimental paradigms be expanded.
In section 3.1, I review the relevant data from item recognition experiments before
turning, in section 3.2, to new paradigms designed to influence recollection
probabilities and, in section 3.3, to experiments that rely on differential model
predictions in paradigms that do not yield ROC data. To preview the results of this
literature survey, the UVSDT model comes out ahead on nearly every measure.
However, section 4 will consider some potential challenges to the success of the
UVSDT model.
3.1 Traditional Item Recognition Tasks
Standard item recognition experiments, like Egan's (1958), provide a wealth
of data that can be reasonably well described by all of the models considered here.
In these experiments, participants study a set of items (typically words) one at a
time. At test, they are asked to identify the studied words from a test list that
includes both targets and lures. Subjects' responses may be simple old-new
decisions for each memory probe, but more commonly they are ratings of
confidence that the probe was studied. These ratings are then used to generate
ROCs, a strategy that has revealed a number of "regularities" in the data. For
example, the zROC is typically linear with a slope of about 0.8 as long as memory
accuracy is well above chance (e.g., Ratcliff, Sheu, & Gronlund, 1992; Glanzer, Kim,
Hilford, & Adams, 1999; Yonelinas & Parks, 2007).
Rotello 18
3.1.1 Confidence based ROCs
3.1.1.1 Fits of the models to data
Confidence based ROCs are generated by asking participants to rate their
confidence that each memory probe was studied (e.g., 6 = “sure old”, 5 = “probably
old”, 4 = “maybe old”, 3 = “maybe new”, 2 = “probably new”, 1 = “sure new”). The
probability of a hit and a false alarm in each confidence bin is then calculated; these
values are incrementally summed to yield the operating points on the ROC. For
example, the most conservative point, yielding the lowest hit and false alarm rates,
is based on responses in the “sure old” category; the second point on the ROC
depends on the sum of the response probabilities in the “sure old” and “probably
old” bins, etc. The final point on the confidence ROC is always (1,1), when all
responses are included.
Pazzaglia et al. (2013) suggested that discriminating the UVSDT and HTSDT
models with ROCs would be extremely difficult because they make very similar
predictions. Indeed, both models have a parameter to summarize old-new
discrimination accuracy (d', d) and a parameter to capture the asymmetry of the
ROC (R, s). Similarly, the MSDT model can be viewed as a version of the HTSDT
model if the full attention trials yield (essentially) perfect encoding. When applied
to data, all of these models fit well. Figure 6 shows the fit of each of these models, as
well as the 2HT model, to the data of a single subject in Egan’s (1958, Exp. 1) study.
Rotello 19
As is obvious in the figure, all of the models provide excellent descriptions of these
data; only the EVSDT model can be rejected based on a goodness of fit statistic.
< Insert Figure 6 near here >
High-fidelity fits like those in Figure 6 are routinely observed. For example,
DeCarlo (2002) compared the MSDT and UVSDT models’ ability to fit data, finding
them to be essentially tied. Yonelinas (1999b, p. 514) noted that the HTSDT and
UVSDT “models provided an accurate account of the ROCs, capturing more than
99.9% of the variance of the average ROCs.” For the very same studies, however,
Glanzer et al. (1999) found that the UVSDT model provided a better fit in all 10 data
sets. Glanzer et al. (1999) also tested specific HTSDT predictions about the
relationship between the probability of recollection, the slope of the zROC, and the
curvature of the zROC, finding no support for those predictions. Similarly,
Heathcote (2003) compared the fits of the UVSDT and HTSDT models for a set of
experiments in which the targets and lures were similar to another. The similarity
manipulation was intended to increase the probability that recollection would play
a role in the recognition of targets (Westerman, 2001), yet the zROCs showed no
evidence of the curvature that is predicted by the HTSDT model. Other comparisons
of the UVSDT and HTSDT models have also favored the signal detection view (e.g.,
Kelley & Wixted, 2001; Healey, Light, & Chung, 2005; Rotello, Macmillan, Hicks, &
Hautus, 2006; Dougal & Rotello, 2007; Kapucu, Rotello, Ready, & Seidl, 2008; Jang,
Wixted, & Huber, 2011).
3.1.1.2 Assessments of model flexibility
Rotello 20
The UVSDT, HTSDT, 2HT, and MSDT models all provide reasonable fits to
data, but they do have different assumptions and therefore can’t all provide an
accurate account of the underlying recognition memory processes. As Roberts and
Pashler (2000) pointed out, a good fit to data does not imply that the model itself is
a good one; it might have enough flexibility to fit a wide range of possible data, for
example, including random noise. The flexibility of a model comes from its number
of free parameters and its functional form (Pitt, Myung, & Zhang, 2002). Increasing a
model’s parameters generally increases its ability to fit data, but two models with
the same number of parameters may nonetheless differ in flexibility. For example, a
sinusoidal model y = a sin (bx) can exactly fit any data generated by the linear model
y = cx + d, as well as fitting data that are non-linear; the sinusoidal model has greater
flexibility because of its functional form. For this reason, we must determine the
relatively flexibility of the competing models of recognition memory before we can
conclude that the UVSDT model provides the best description of the data.
The number of parameters for each model is shown in Table 1 for a
confidence-rating old-new task. The UVSDT and HTSDT models have the same
number of parameters, whereas the EVSDT model has one fewer, and the MSDT
model one more. The 2HT model has a high degree of flexibility because the state-
response mapping parameters are selected by the experimenter. Wixted (2007)
reported a small-scale model-recovery simulation of the UVSDT and HTSDT models
that concluded that the two models were about equally flexible, a conclusion that
has been modified only slightly in subsequent work.
Rotello 21
< Insert Table 1 near here >
Jang et al. (2011) used a parametric bootstrap cross-fitting method (PBCM:
Wagenmakers et al., 2004) to compare the flexibility of the UVSDT and HTSDT
models. The PBCM has several steps. Essentially, model parameters are sampled
from the range that would be estimated from fits to real data, and those parameters
are used to generate simulated data from the models. Then, both models are fit to
the simulated data and the difference in the resulting goodness of fit measures
(DGOF) is computed. This process is repeated many times to generate a distribution
of observed DGOF values when each model generated the data. The degree of
overlap of these distributions is a measure of how well the models mimic each other
(see Wagenmakers et al., 2004; Cohen, Rotello, & Macmillan, 2008, for details). For
group-level data, Jang et al. concluded that models could be readily distinguished,
and that the UVSDT model was very slightly more flexible than the HTSDT model. In
contrast, a similar analysis by Cohen et al. (2008) concluded that the UVSDT model
was slightly less flexible. A related but smaller-scale comparison of a version of the
2HT and UVSDT models concluded that the 2HT model has greater flexibility (Dube,
Rotello, & Heit, 2011).
When an individual subject provides the initial data that are used to estimate
the model parameters from which simulated data are generated, the distributions of
DGOF values also provide quantitative information about how diagnostic those data
are. Diagnostic data are those for which the distributions have little to no overlap;
for non-diagnostic data, the overlap may be substantial. The extent of overlap can
be used as a measure of the probability that the wrong model is identified by the
Rotello 22
GOF measures. Jang et al. (2011) used this approach to assess the diagnosticity of
97 individual participants’ ROCs, finding that many provided non-diagnostic data.
For those individuals, there were two common findings: the slope of the zROC was
likely to be near 1, and the HTSDT model was more likely to be selected. For
individuals who provided more diagnostic data (the DGOF distributions overlapped
less), the slope of the zROC tended to be shallower, and the UVSDT model was
usually selected. The clear implication of Jang et al.’s work is that the UVSDT model
is the better-fitting model when the data actually allow a strong conclusion to be
drawn.
Overall, the analyses of model flexibility suggest that the success of the
UVSDT model cannot be attributed to greater flexibility. For group-level data, the
UVSDT and HTSDT models have similar degrees of flexibility, and for individual
subjects’ data, the UVSDT model is selected only when the data are actually
informative.
3.1.1.3 Explanations of the zROC slope
There are qualitative reasons to prefer the UVSDT model over the HTSDT
model, as well as the quantitative reasons in section 3.1.1.2. For one, the slope of
the zROC has a natural explanation in terms of variability in the increment to an
item's strength that occurs during study (Wixted, 2007); this idea is reflected in the
assumptions of recent process models of memory (e.g., Shiffrin & Steyvers, 1997).
In contrast, the HTSDT model assumes that the slope of the zROC is directly related
Rotello 23
to the probability of recollection, a hypothesis that has not survived inspection (e.g.,
Glanzer et al., 1999).
The variable encoding hypothesis is that during study the strength of the
item is incremented by an amount sampled from a baseline distribution appropriate
for the encoding task, plus an amount sampled at random from a noise distribution
with a mean of zero. Koen and Yonelinas (2010) attempted to discredit this
hypothesis by asking one group subjects to study items for either a shorter (1 sec)
or longer (4 sec) amount of time, and another group to study the same items for 2.5
seconds each. The slope of the zROC did not differ across groups. However, their
experiment assessed the impact of mixing two distributions on the slope of the
zROC, rather than testing the variable encoding hypothesis (Jang, Mickes, & Wixted,
2012; Starns, Rotello, & Ratcliff, 2012). A mixture of two Gaussian distributions
(one for strong items and another for weak) isn’t Gaussian and thus the slope of the
resulting zROC doesn’t provide a valid estimate of the ratio of lure and target
standard deviations, 1/s. Thus, there is still no definitive test of encoding
variability. Although the Koen and Yonelinas (2010) experiment appears to provide
a good test of the MSDT model, Starns et al. (2012) showed that their manipulation
of encoding strength lacked power.
One interesting test of the HTSDT, 2HT, and UVSDT accounts of zROC slope is
this: Does the relative variability in the observed confidence ratings for targets and
lures correspond to the slope of the zROC? Mickes, Wixted, & Wais (2007) asked
exactly this question. Participants in a standard old-new recognition experiment
made their responses on either a 20- or 99-point confidence scale. These confidence
Rotello 24
ratings were used to generate empirical ROCs, which were well-described by the
UVSDT model. The estimated slope parameter, which equals the ratio of standard
deviations of the lure and target distributions, was about 0.8, as usual. Mickes et al.
also calculated the ratio of the standard deviations of each subject's confidence
ratings to the lures and to the targets, independently of the ROC analysis. The
HTSDT does not predict any relationship between these two ratios, because it
assumes that the zROC slope is determined by the probability of recollection (which
doesn't depend on the confidence ratings). Similarly, the 2HT model does not
predict a relationship between the two measures of variability because the
confidence ratings are not based on memory strength. Even the UVSDT model does
not constrain the two ratios to yield the same value: If the criteria are tightly-spaced
for higher memory strengths and spread out for weaker strengths, then the ratings-
based ratio may underestimate the ROC-based value. On the other hand, the
ratings-based ratio may overestimate the ROC-based value if the criteria are widely-
spaced for high evidence values and compressed for lower evidence values. Overall,
however, the average value of the two estimates of the standard deviation ratios
was identical in one experiment and highly similar in the other. Across participants,
the correlations of the two estimates were .61 and .83 in the two experiments,
providing strong support for the UVSDT model over the competitors.
3.1.1.4 A test of the 2HT model: conditional independence
Like the other models, the 2HT model can fit the confidence-based ROCs (see
Figure 6), at the price of additional parameters to map from internal states to
Rotello 25
response ratings (compare Figures 4 and 5). Moreover, to fit the curvature that is
ubiquitous in these confidence ROCs, the 2HT model must assume that participants
give at least some lower-confidence responses to studied items that they have
detected as old (e.g., Erdfelder & Buchner, 1998; Malmberg, 2002; Bröder & Schütz,
2009). Said differently, to fit the confidence-based ROCs, the 2HT model must
assume that participants sometimes give low-confidence responses even in the face
of infallible evidence that the item was studied.
The 2HT model allows for the possibility that items that are encoded with
greater strength may have a higher probability of being detected than more weakly-
encoded items (i.e., the value of po may differ for strong and weak items; see Fig. 5),
but responses from the detect-old state depend only on the arbitrary state-response
mapping parameters (e.g., Klauer & Kellen, 2010; Bröder, Kellen, Schütz, &
Rohrmeier, 2013). The distribution of confidence ratings must be the same for all
detected targets, regardless of their encoding strength, because there is only one
detect-old state and only one set of response probabilities that lead from that state
to the confidence ratings (Fig. 5).3 This aspect of the 2HT model is known as the
conditional independence assumption (Province & Rouder, 2012)
Province and Rouder (2012) tested the conditional independence
assumption of the 2HT model by presenting participants with about 240 unique
study items a variable number of times (1, 2, or 4 times each) on a single list. The
recognition test was a two-alternative forced choice task: a target and a lure were
3 The overall distribution of confidence ratings for strong and weak targets may vary because they reflect different mixtures of responses based on detection and guessing.
Rotello 26
tested together, and participants were asked to select the target from each pair.
Responses were made on a continuous scale to indicate confidence in the decision
("sure target on left" to "sure target on right"). Province and Rouder also included
some test pairs that contained two lures, forcing participants to guess; responses to
these trials provide important data about how confidence ratings were distributed
from the uncertain state. Across three experiments, they reported that the
distribution of confidence ratings (conditional on a detection-based response) was
independent of encoding strength: the ROCs were curved in all strength conditions
except for the lure-lure trials.
While Province and Rouder’s (2012) data support the 2HT model, Chen,
Starns, and Rotello (2015) also tested the conditional independence assumption,
reaching a different conclusion. Two primary changes were made to Province and
Rouder's approach. First, the memory test used a simple old-new recognition
procedure with confidence ratings. Second, and more importantly, multiple short
study lists were used (14 lists with 42 unique items each) that included a small
number of "super strong" study items. The super strong stimuli were shown four
times each with a different encoding task each time (e.g., rate how easy it is to form
a mental image of this item; rate it for survival relevance). These super strong items
were expected to have a high probability of being detected, and they did for most
participants. For these participants, the super strong items were almost invariably
given the highest-confidence "old" response. For more weakly encoded stimuli
(those studied 1, 2, or 4 times without a specific encoding task), the probability of a
highest-confidence old rating was much lower, even for detected items. Most
Rotello 27
participants’ data were better fit by the UVSDT model, because the conditional
independence assumption of the 2HT model was violated: the distribution of
confidence ratings varied with the encoding strength of the targets.
3.1.2 Binary response ROCs
The ROCs described so far were generated from confidence ratings. It is also
possible to generate ROCs from response bias manipulations, such as presenting
different proportions of targets and lures across tests, or by offering different
incentives for “old” or “new” responses; confidence ratings are not collected. In this
second “binary response” type of ROC, the operating points are generated
independently of one another, either in different tests or even from different groups
of participants.
Binary response ROCs are important data that can discriminate signal
detection models from the threshold models (Banks, 1970). As Figures 4 and 5
show, confidence-based ROCs can generally be fit with threshold models by
assuming that there are state-response parameters to generate the probabilities of
the ratings responses (Erdfelder & Buchner, 1998; Malmberg, 2002; Bröder &
Schütz, 2009). In effect, those extra parameters give the 2HT model the flexibility it
needs to fit the curved confidence ROCs that are consistently observed. The story is
different when the ROCs are generated from binary old-new decisions in different
bias conditions, because in that case there are only two response bins (“old,” “new”)
and additional state-response parameters can't be added to redistribute responses
Rotello 28
from one category to another. The ROC predicted by the 2HT in this case is always a
line with slope equal to (1-po)/(1-pn), as shown in Figure 4. In contrast, the SDT
models make the same prediction of curved ROCS regardless of whether they are
confidence-based or generated from independent bias conditions.
3.1.2.1 Fits of the models to data
Bröder & Schütz (2009) fit all binary response ROCs in the recognition
memory literature, concluding that the 2HT model fit as well as the UVSDT model.
However, their analyses included a large number of ROCs that contained only two
points. As Dube and Rotello (2012) noted, two-point ROCs cannot discriminate the
2HT from UVSDT models because two points can be fit by either a line or a curve.
After excluding those two-point ROCs and running two new experiments, Dube and
Rotello (2012) fit the 2HT and UVSDT models to all available data on binary-
response ROCs reported for individual subjects in the domains of perception and
recognition memory. The resulting binary ROCs were curved, not linear, and
strongly supported the UVSDT model for the vast majority of participants.4 Dube,
Starns, Rotello, and Ratcliff (2012) also reported ROCs based on binary responses.
They included a within-list manipulation of encoding strength (words were studied
once or 5 times each); because a single set of lures was used, the 2HT model is
constrained to a single value of pn. Dube and colleagues tested the UVSDT and 2HT
4 Dube and Rotello (2012) based their conclusion on the AIC and BIC fit statistics. Kellen et al. (2013) used normalized maximum likelihood (NML) for model selection, and on that basis concluded in favor of the 2HT model. In the next section, we will evaluate the plausibility of the NML conclusion.
Rotello 29
models, again finding that the ROCs were curved and inconsistent with the 2HT
model. Thus, the model-fitting evidence is strongly in favor of the UVSDT model
(Dube & Rotello, 2012; Dube et al., 2012).
One additional study provides strong evidence in favor of the UVSDT model
and against both the 2HT and HTSDT models. Starns, Ratcliff, and McKoon (2012)
collected both old-new recognition decisions and reaction times in an experiment
that manipulated response bias by varying the proportion of targets on the test. In
addition, participants were asked to respond quickly on some tests, and to
emphasize accuracy on other tests. Fits of the HTSDT model revealed that the
probability of recollection increased with encoding strength (which was
manipulated within-study list), but was not affected by the speed and accuracy
instructions. The absence of a reduction in recollection under speed instructions is
problematic for the HTSDT model because most of those responses were made in
less time than recollection appears to require (e.g., McElree, Jacoby & Dolan, 1999).
The zROCs were also linear in all conditions, with slopes less than 1, contradicting
the predictions of the HTSDT and 2HT models.
Perhaps the most interesting aspect of the Starns et al. (2012) study,
however, is that the diffusion model (Ratcliff, 1978) was fit to the response
probabilities and reaction time distributions simultaneously. The diffusion model is
a sequential sampling model that assumes information accumulates over time
according to an average drift rate that reflects the quality of the evidence for targets
and lures. Across trials, a distribution of drift rates is assumed; the variance of this
distribution can be interpreted as measuring the variability in evidence values for
Rotello 30
targets and lures. The ratio of lure to target drift rate standard deviations was less
than one for 9 of the 12 conditions, providing converging evidence for the UVSDT
model.
3.1.2.2 Assessments of model flexibility
As for the confidence rating paradigm, the flexibility of the models to fit the
binary response ROCs should be considered. Dube, Rotello, and Heit (2011)
reported a comparison of the 2HT and UVSDT models as fit to binary ROCs with
three operating points that were either closely spaced or more spread out. They
concluded that the models were difficult to discriminate, especially when the
operating points were close together, but that the 2HT and UVSDT models were
approximately equally flexible. A large-scale evaluation of the flexibility of the
HTSDT, UVSDT, 2HT, and MSDT models was reported by Kellen, Klauer, and Bröder
(2013). They used normalized maximum likelihood (NML; see Myung, Navarro, &
Pitt, 2006) for model selection, and on that basis concluded in favor of the 2HT
model. One challenge for the conclusions based on NML is that they show a strong
preference for the equal-variance (or po = pn) versions of the models that predict
symmetric ROC. As we've seen, symmetric recognition ROCs are not observed
empirically.
3.1.3 Summary of the item recognition data
Overall, the evidence reviewed so far supports the UVSDT model over the
others. Critical tests of the most popular threshold model, the 2HT model, reveal
Rotello 31
deep problems: empirical data violate both the conditional independence
assumption for the state-response mapping parameters (Chen et al., 2015) and the
prediction of linear binary ROCs (Dube & Rotello, 2012; Dube et al., 2012). The
HTSDT model fares somewhat better with these item recognition ROC, but it
predicts a decrease in zROC slopes with increasing probability of recollection that
has not been observed (e.g., Glanzer et al., 1999). The HTSDT model also predicts
that curved zROCs should be observed when recollection is needed to distinguish
targets from lures, as when they are similar to one another. However, that zROC
curvature is typically not observed (e.g., Glanzer et al., 1999; Heathcote, 2003).
Stronger tests of the HTSDT model come in the form of assessments of the
contribution of recollection, which will be considered in the next section.
3.2 Expanding the Data: The Contribution of Recollection
The basic predictions of the HTSDT and UVSDT models have been tested in
numerous item recognition memory experiments that did not specifically
manipulate recollection (see Wixted, 2007; Yonelinas & Parks, 2007, for reviews).
Instead, recollection estimates were based solely on the parameters of the HTSDT
model's best fit to the data. One problem with this approach is that both the UVSDT
and HTSDT models fit the data well, qualitatively and quantitatively (see Figure 6).
Two strategies have been used to distinguish these models. The first approach is to
obtain measures of recollection that are independent of the HTSDT's parameter
estimates (e.g., Yonelinas, 2001), to assess their convergence. The primary such
Rotello 32
measure of recollection has come from asking participants to provide remember-
know judgments to supplement their old-new decisions. The second strategy for
expanding the data is to take advantage of empirical tasks that appeared to require
recollection for accurate responses. Three popular tasks are associative recognition
decisions, which require the participant to decide whether two studied items
appeared together on the list, plurality-discrimination, and source memory
judgments that ask the participant to recognize not only that an item was studied
but also to report something specific about that presentation. We'll consider each of
these approaches in turn. Where appropriate, we'll also consider threshold
modeling approaches to these tasks.
3.2.1 Remember-know judgments
Tulving (1985) proposed that we ask participants to report the basis of their
"old" recognition decisions: do they remember something specific about the
encoding experience, or does their memory lack particular details yet they know
that the memory probe was studied? Rajaram (1993) developed extensive
instructions on the remember-know distinction, which have since been used in
hundreds of experiments. For about the first fifteen years of remember-know
research, most experiments focused on identifying variables that dissociated the
remember and know judgments, influencing one type of response without affecting
the other or moving the response probabilities in opposite directions. This goal was
readily achieved, and all possible combinations of influence on remember and know
Rotello 33
responses have been obtained (see Gardiner & Richardson-Klavehn, 2000, for a
summary). The conclusion in the literature was that these experiments identified
the variables that selectively influence either recollection or familiarity.
From the perspective of the HTSDT model, these remember-know
dissociation experiments provide a wealth of evidence in favor of the recollection
process (Yonelinas, 2002). Remember hits tend to increase with variables that
increase memory strength (e.g., full rather than divided attention: Yonelinas, 2001),
whereas know responses tend to increase more as a function of superficial
manipulations such as perceptual similarity (e.g., fluency manipulations: Rajaram &
Geraci, 2000). One particular aspect of the data that is convincing to HTSDT
proponents is that false alarms occur primarily with “know” justifications rather
than “remember” responses (Dunn, 2004). This finding is important because a high-
threshold process cannot produce any false alarms: if “remember” responses reflect
recollection, then they must only occur after correct “old” decisions (i.e., hits). In
addition, remember responses given after hits should be high confidence decisions
because the recollection process “trumps” the familiarity process; this assumption
of highest-confidence remembering is often built into the modeling (e.g., Yonelinas
& Jacoby, 1995; Yonelinas, 2001). On the other hand, direct comparisons of
estimates of recollection based on remember responses and on ROC parameters
have not provided a convincing level of agreement (e.g., Rotello, Macmillan, Reeder,
& Wong, 2005).
Other problems for the HTSDT model were soon identified. The dissociation
evidence seems to strongly suggest that there are distinct underlying processes of
Rotello 34
recollection and familiarity, but dissociations are weak and frequently inconclusive
evidence for multiple processes (Dunn & Kirsner, 1988). Stronger evidence for the
presence of multiple processes comes from state-trace analysis (Bamber, 1979;
Dunn & Kirsner, 1988). State-trace plots show how performance on one task relates
to performance on another task, as function of manipulations intended to selectively
influence one of the presumed underlying processes. Monotonic state-trace plots
are consistent with a single underlying process that may have a non-linear
relationship with the levels of the experimental factors. In contrast, non-monotonic
state trace plots imply that more than one underlying process determines
performance. Dunn (2008) applied the logic of state-trace analysis to remember-
know data, finding only monotonic functions; this analysis is consistent with the
UVSDT model and inconsistent with the HTSDT view. A stronger test reached the
same conclusion: Pratt and Rouder (2012) showed that even when a hierarchical
model is applied, eliminating potential confounds that might occur from averaging
data over subjects or trials, the state-trace analysis offers no support for the dual-
process view.
The remember-know data do not demand a dual-process interpretation, and
in fact can be readily accounted for by the UVSDT model. Donaldson (1996) was the
first to suggest this interpretation of remember-know data: he proposed that the
data could be accounted for by a signal detection model with two decision criteria, a
conservative criterion that divides old-remember from old-know responses, and a
more liberal criterion that provides an old-new boundary. Indeed, Dunn (2004)
showed that all of the existing patterns of remember-know dissociations were well-
Rotello 35
described by the UVSDT model. Dunn also tackled four other common arguments
against the SDT model of remember-know judgments, showing them all to be faulty.
One of these demonstrations is particularly challenging for the HTSDT model: old-
new discrimination accuracy measured with the remember hit and false alarm rates
is equal to accuracy measured from the overall hit and false alarm rates (see also
Macmillan, Rotello, & Verde, 2005). The UVSDT model predicts this result because
response bias and accuracy are independent, but it is contrary to the assumption
that recollection is a high-accuracy (or high-threshold) process.
The competing interpretations of remember-know judgments have been
extended to account for the inclusion of confidence ratings in several different
experimental paradigms (Rotello & Macmillan, 2006; Rotello et al., 2006). For
example, participants might be asked to first decide if they remember a memory
probe, and if not, to rate their confidence that they know they studied it or that it's a
lure (Yonelinas & Jacoby, 1995). Alternatively, subjects might be asked to decide
among three response alternatives (remember, know, new) and then to rate their
confidence in that decision (Rotello & Macmillan, 2006), or they might first make an
old-new decision and then rate their confidence along a remember-know dimension
for items judged to be old. Across a range of tasks and corresponding model
versions, the one-dimensional UVSDT model consistently provides the best
quantitative fit to data (e.g., Rotello & Macmillan, 2006; Rotello et al., 2006). This
general finding accords well with the observation that remember responses are
easily influenced by manipulations intended to affect only old-new response bias
(Rotello et al., 2006; Dougal & Rotello, 2007; Kapucu et al., 2008), and with the
Rotello 36
observation that ROCs based only on remember responses are strongly curved
(Slotnick, Jeye, & Dodson, 2016). Importantly, Cohen et al. (2008) showed that these
conclusions do not reflect differences in model complexity: the UVSDT model of
remember-know judgments is somewhat less flexible than the HTSDT model.
Finally, one other type of evidence argues in favor of the UVSDT
interpretation of remember-know judgments. Reaction times for remember
responses have long been known to be shorter than those for know decisions
(Dewhurst & Conway, 1994; Wixted & Stretch, 2004; Dewhurst, Holmes, Brandt, &
Dean, 2006); on the surface, this effect suggests that remember and know responses
reflect distinct processes. However, higher confidence decisions are also made
more quickly than lower confidence responses (Petrusic & Baranski, 2003), and the
probability of a remember response is correlated with response confidence (Rotello,
Macmillan, & Reeder, 2004). When confidence is controlled, Rotello and Zeng (2008)
found that the reaction time distributions for remember and know responses do not
differ significantly. In addition, Wixted and Mickes (2010) found that remember
false alarms are made more quickly than either know hits or know false alarms,
consistent with the UVSDT model.
In summary, all of the observed remember-know data, from basic
dissociation effects (Dunn, 2004) to reaction times (Rotello & Zeng, 2008; Wixted &
Mickes, 2010) and confidence ratings (e.g., Dougal & Rotello, 2007; Slotnick et al.,
2016) can be accounted for with the UVSDT model. The success of the UVSDT model
occurs in spite of its slightly lower flexibility than the HTSDT model (Cohen et al.,
2008). Finally, state-trace analyses of remember-know data conclude that a single
Rotello 37
process is sufficient (Dunn, 2008; Pratt & Rouder, 2012). There is little reason to
believe that remember responses reflect threshold-based recollection; they are
easily influenced by manipulations of old-new response bias (Rotello et al., 2005).
3.2.2 Associative recognition and plurality discrimination tasks
A better way of assessing the role of recollection in recognition decisions is to
design tasks in which an accurate response requires recollection. One candidate
task is associative recognition. Participants study pairs of items (A-B, C-D), usually
words, and are asked to remember them together. At test, they must select the
intact pairs that appear exactly as studied (A-B) while rejecting those that are
completely new (X-Y). The interesting challenge presented to participants is that
some of the lures are rearranged pairs (C-B) in which both words were studied, but
with different partners. The assumption is that rejection of the rearranged pairs
requires more than an assessment of familiarity. Because both of the words were
studied, correct decisions about rearranged pairs requires recollection of the
specific studied combinations. A closely related argument has been made about
plurality discrimination (e.g., Hintzman & Curran, 1994), a task in which participants
study nouns in their singular or plural form (trucks, frog), and then must recognize
targets presented in their studied form (trucks) amid plurality-changed (frogs) and
completely new lures.
Early evidence consistent with the dual-process view of associative
Rotello 38
recognition comes from response signal experiments in which participants are
asked to make their recognition judgments immediately after an unknown, variable,
amount of time. On some trials, the response signal occurs very soon after the
memory probe is presented (i.e., within 50-100 ms of probe onset), allowing only a
small amount of processing time prior to decision, whereas on other trials
processing may complete because the signal occurs after a long lag (2000 or more
ms after probe onset). The signal lag varies randomly, so that participants cannot
anticipate the amount of decision time that will available on any given test trial.
Response signal data from associative recognition paradigms are suggestive of
multiple processes with different time courses: for about the first 600 or 700 ms of
processing time, "old" responses to both intact and rearranged pairs increase, as if
familiarity for those memory probes develops over time. After that point, however,
additional processing time yields a decrease in "old" responses to rearranged pairs,
as if the results of a recollection process ("recall-to-reject") begin to contribute to
the decision (e.g., Gronlund & Ratcliff, 1989; Rotello & Heit, 2000).
Plurality discrimination response-signal experiments have revealed the same
type of non-monotonic responses to plurality-changed lures as a function
processing time (Hintzman & Curran, 1994). However, Rotello and Heit (1999)
suggested that dynamic response bias changes may be sufficient to explain those
data. A recollection process is not required by the data because the false alarm rate
to the completely new lures decreases in the same way as the false alarm rate to the
plurality-changed lures, consistent with an increasingly conservative response bias
as processing time elapses. A useful recollection process must do more than mimic a
Rotello 39
familiarity process; it should dominate the decision outcome.
ROCs have also been used to assess whether a high-threshold recollection
process contributes to associative recognition or plurality discrimination, with
somewhat mixed conclusions. If recollection is required to discriminate intact from
rearranged test probes, and if recollection is a high-threshold process as the HTSDT
model assumes, then a linear ROC should result if “old” responses to intact pairs are
plotted against “old” responses to rearranged pairs. Because recollection should be
more likely when items are strongly encoded, a clear prediction of the HTSDT model
is that associative ROCs should be increasingly linear with greater memory strength.
The first reported associative recognition ROCs were linear (Yonelinas, 1997;
Yonelinas, Kroll, Dobbins, & Soltani, 1999, upside-down faces; Rotello, Macmillan, &
Van Tassel, 2000) but virtually all subsequently reported ROCs have been curved
(Yonelinas et al., 1999, right-side up faces; Kelley & Wixted, 2001; Verde & Rotello,
2004; Healy et al., 2005; Quamme, Yonelinas, & Norman, 2007; Voskuilen & Ratcliff,
2016; for an exception, see Bastin et al., 2013). In addition, the degree of curvature
of the associative recognition ROC increases with memory strength (Kelley &
Wixted, 2001; Mickes, Johnson, & Wixted, 2010; see also Quamme et al., 2007), in
contrast to the most natural prediction of the HTSDT model. Macho (2004) showed
that the HTSDT model could fit these associative ROCs, but only by adopting
implausible parameter values such as greater recollection for the more weakly
encoded items.
A simpler and briefer, but otherwise similar, history exists for plurality-
discrimination ROCs. The HTSDT predictions about the form of the plurality-change
Rotello 40
ROC are analogous to its predictions for associative recognition: target-similar
ROCs based on responses to targets and plurality-changed lures should be linear,
especially as memory strength increases. Rotello (2000), Rotello et al. (2000), and
Arndt and Reder (2002) reported that target-similar ROCs in the plurality-
discrimination task were linear, but those reported by Heathcote, Raymond, and
Dunn (2006) are curved. The ROCs from these experiments are actually quite
similar looking, despite the different conclusions that were reached (see Kapucu,
Macmillan, & Rotello, 2010, Figure 1). Recently, Slotnick et al. (2016) reported
strongly curved plurality-discrimination ROCs, even when they were based only on
trials for which a "remember" response was given.
There is overwhelming evidence for curved ROCs in both associative
recognition and plurality-discrimination tasks when the study items are well-
learned, which is clearly a challenge for the HTSDT model. On the other hand, for
more weakly encoding items, detailed quantitative fits of the data also reveal
systematic deviations from the predictions of both the UVSDT and HTSDT models.
The ROCs for these more poorly learned items are both more linear than the UVSDT
model predicts and more curved than HTSDT expects, leading to curvilinear zROCs
(e.g., DeCarlo, 2007).
A number of explanations have been offered for these systematic distortions
in the ROCs. One account assumes a change in the decision criteria (Hautus,
Macmillan, & Rotello, 2008; Starns, Pazzaglia, Rotello, Hautus, & Macmillan, 2013),
as will be described in detail in the section on source recognition. The other
accounts all rely on changes to the effective target distribution as a consequence of
Rotello 41
mixing trials sampled from a fully-encoded distribution (e.g., item and associative
information available) and trials sampled from a distribution that assumes the study
items were only partially encoded (DeCarlo, 2002, 2003, 2007). According to the
HTSDT model, of course, we could call the fully-encoded distribution "recollection"
and the item-information distributions "familiarity" (Yonelinas, 1994, 1997, 1999a;
see section 2.2.3). Other accounts assume that the associative information is
continuously-valued (Kelley & Wixted, 2001; DeCarlo, 2002, 2003; Greve,
Donaldson, & van Rossum, 2010; Mickes, Johnson, & Wixted, 2010). Mixing
responses from multiple distributions (those with and without associative
information) changes the distribution of evidence associated with a target from its
presumed Gaussian form to something that is non-Gaussian, thus changing the
predicted form of the ROC (see Figure 3). All of these mixture models can generate
ROCs that have a characteristic "flattened" shape for weaker items. For stronger
items, the probability that associative information is unavailable is greatly reduced,
so the mixture distribution predominantly reflects the fully-encoded distribution.
Thus, strong items yield ROCs that are well described by the standard UVSDT model.
Unusual ROC shapes can also be observed if participants make random
response on some proportion of trials, effectively shifting probability mass from one
part of the target distribution to another (Ratcliff, McKoon, & Tindall, 1994). The
impact of this random guessing on the exact shape of the ROC varies as a function of
how those guesses are distributed over confidence ratings (Malmberg & Xu, 2006).
Harlow and Donaldson (2012) provided a recent empirical demonstration of this
consequence of guessing. In a clever associative recognition experiment,
Rotello 42
participants viewed a series of study words that were paired with a specific location
indicated on a circular surround. (Words and locations did not appear on the screen
at the same time.) In a subsequent memory test, participants were shown a studied
word and asked to click the corresponding location on test circle, then to rate their
confidence in their response. The distribution of memory errors was best described
as a mixture of pure guesses (randomly spread around the circle, 30-40% of trials,
depending on test lag) and correct responses (with a certain spread about the true
location, about 10°, due to memory-based loss of precision). The corresponding
ROCs were flatter than the UVSDT model predicts, consistent with the presence of
those random guesses.
In summary, the evidence from the associative recognition and plurality-
detection tasks is somewhat mixed. The ROCs are increasingly curved and
consistent with UVSDT as memory strength increases (Mickes et al., 2010), and
ROCs based only on remember responses are also curved and inconsistent with the
HTSDT interpretation (Slotnick et al., 2016). However, the flattened ROCs that are
consistently observed for more weakly encoded items suggest that the responses
may reflect a mixture of trials for which the associative detail is and is not available
for report (e.g., DeCarlo, 2002; Harlow & Donaldson, 2012). The idea that some
responses are pure guesses is more consistent with a threshold view than a signal
detection process (see Figure 4). We will revisit this issue after reviewing the data
from another task designed to require recollection, namely source recognition tasks.
3.2.3 Source recognition
Rotello 43
Another common task that appears to require recollection is a source
memory task. In this paradigm, participants are asked to judge the context (or
source) in which a memory probe was presented. For example, was it shown in
green or in red? Heard in a woman’s voice or a man’s? The simplest SDT model of
source recognition simply replaces the target distribution in Figure 1 with a target
source (say, male voice) and the lure distribution with the alternative source
(female voice). The simplest HTSDT model assumes that correct source
identification requires recollection; thus the source ROC, which plots correct source
identifications against errors, is assumed to be linear. If the two sources are equally
strong, then the HTSDT model further assumes that the ROC will be symmetric
because po = pn; otherwise it will be asymmetric (with po > pn on the assumption that
the target source is the stronger of the two: Yonelinas, 1999a). The simple SDT
model predicts a curved ROC, as usual.
3.2.3.1 Source recognition ROCs
As in the associative recognition and plurality-discrimination literatures, the
earliest reported source ROCs supported the HTSDT model. Yonelinas (1999a)
reported three experiments in which linear source ROCs were observed. In two
experiments, study items were presented in the two sources in a random order on
the same study list, and the resulting source ROCs were symmetric and linear. In
the remaining experiments, study items appeared on two lists, which served as the
Rotello 44
sources. When the lists were presented during the same experimental session, the
source ROC was linear and slightly asymmetric, but when the study lists were
separated by five days, the source ROC was curved and more strongly asymmetric.
Also echoing the associative recognition and plurality-discrimination literatures, all
subsequently published source ROCs have been curved and inconsistent with the
HTSDT model (e.g., Slotnick et al., 2000; Qin et al., 2001; Hilford, Glanzer, Kim &
DeCarlo, 2002; Dodson, Bawa, & Slotnick, 2007; Onyper, Zhang, & Howard, 2010;
Slotnick, 2010; Parks, Murray, Elfman, & Yonelinas, 2011; Schütz & Bröder, 2011;
Starns et al., 2013; Starns & Ksander, 2016).
Important insight on the discrepancy between the linear and curved source
ROCs comes from Slotnick et al. (2000). Like Yonelinas (1999a, Exps. 2 & 3), they
asked their participants to make both old-new confidence ratings and source
confidence ratings (“sure source A” to “sure source B”) for every memory probe.
Whereas Yonelinas ignored the old-new ratings when plotting his source ROCs,
Slotnick et al. took advantage of them. They assessed the form of the overall source
ROC (exactly as in Yonelinas, 1999) and the "refined" source ROC that results from
including only items for which participants had made the highest-confidence old
decisions. Slotnick et al. reasoned that if a high-threshold process contributes to
responses, then that process should be reflected in both the old-new decisions and
in the subsequent source judgments on the very same items. According to the
HTSDT model, as well as the 2HT model of source memory (Bayen, Murnane, &
Erdfelder, 1996), the refined source ROC should be linear. However, as Slotnick et
al. (2000, see Mickes, Wais, & Wixted, 2009, for a related argument) showed, the
Rotello 45
refined source ROC is actually strongly curved and reasonably consistent with the
EVSDT model.5 A re-analysis of the Yonelinas (1999a) data shows that the source
ROC becomes more linear - not more curved - as trials are included for which lower-
confidence-old (or even "new") decisions are made (Slotnick & Dodson, 2005). This
result is perfectly sensible: as responses to weaker memory items are added to the
ROC, discrimination is reduced towards chance levels, and the chance-level ROC is a
line (see Figure 1).
In a novel defense of the HTSDT model, Parks and Yonelinas (2007) argued
that the curved source ROCs were the result of decisions that were based on
"unitized" familiarity rather than recollection. In effect, this argument assumes that
item-source pairs (or item-item pairs in an associative recognition task) are so well-
encoded that they become a single highly-familiar unit. If that were true, then those
refined source judgments should reflect know responses in a remember-know task;
source ROCs based on remember decisions should be linear because remember
responses are a hallmark of recollection (Yonelinas, 2002). Parks and Yonelinas’s
(2007) claim was tested by Slotnick (2010), who reported source ROCs conditional
on a "remember" response. These conditional ROCS are strongly curved and well
described by the UVSDT model. The same conclusion was reached by Mickes et al.
(2010) in a similar analysis of associative recognition ROCs.
5Slotnick et al. (2000) claimed that the 2HT model could not fit their confidence-based source ROCs. As described earlier, only binary-response ROCs must be linear according to threshold models, so the Slotnick et al. data are not convincing. As it turns out, neither are the binary-response source ROCs: Schütz and Bröder (2011) presented binary-response source ROCS from five experiments. Although they claimed the ROCs were linear, comparative model fitting of the 2HT and SDT models was inconclusive (Pazzaglia et al., 2013).
Rotello 46
Consideration of the refined ROCs led to efforts to model the full set of old-
new and source memory confidence ratings simultaneously. This effort expands on
Banks (2000), who was the first to show that old-new recognition and binary source
judgments could be modeled within a single two-dimensional decision space. In this
space, one dimension defines the information that distinguished targets from lures,
and the other dimension defines the information that distinguishes the two sources.
As one might expect, there is now a complete model of recognition and source
memory in each model flavor: a bivariate signal detection model (Banks, 2000;
DeCarlo, 2003; Glanzer, Hilford, & Kim, 2004; Hautus et al. 2008), a discrete-state
model that assumes both recognition and source judgments are 2HT processes
(Klauer & Kellen, 2010), and a hybrid model that assumes a high-threshold
recollection process and continuous familiarity (Onyper, Zhang, & Howard, 2010).
Representations of the SDT and hybrid models are shown schematically in Figures 7
and 8.
< Insert Figures 7 and 8 near here >
A careful comparison of these models on the same data sets (e.g., Yonelinas,
1999a; Slotnick et al., 2000) concluded that the existing data were not powerful
enough to allow selection of the best model (Klauer & Kellen, 2010). Despite being
unable to identify a "winner," some conclusions about the plausibility of the models
may be drawn (see also Pazzaglia et al., 2013). First, the threshold model of Klauer
and Kellen (2010) faces the same challenges as the simpler threshold models
discussed earlier, including the observed curvature of binary recognition ROCs
(Dube & Rotello, 2012; Dube et al., 2012) and the violation of the conditional
Rotello 47
independence assumption that is central to this class of models (Chen et al., 2015).
A broader criticism of the threshold approach is its tremendous flexibility: different
parameter settings can yield ROC-forms that have never been observed empirically.
Similarly, the hybrid source model of Onyper et al. (2010) inherits the challenges of
the HTSDT model.
3.2.3.2 Source decisions to missed targets
There are a few additional arguments that discriminate the SDT approach
from the HTSDT model for source memory. The first point is quite simple: because
both old-new and source judgments are based on continuous information according
to the SDT view (e.g., Banks, 2000; DeCarlo, 2003; Hautus et al., 2008), participants
who set a conservative criterion and respond "new" to a studied item should
nonetheless be able to make a source judgment for that item with accuracy that is
above chance. One easy way to understand this prediction is to consider Slotnick
and Dodson's (2005) refined source ROCs. A conservative old-new criterion could
be placed similarly to the highest-confidence old criterion. As Slotnick and Dodson
showed, source ROCs condition on somewhat lower confidence responses still
discriminated the two sources. In contrast, threshold models of source judgment
predict that "new" decisions are based on guessing, and thus memory for those
items will contain no source details.
Starns, Hicks, Brown, & Martin (2008) tested this prediction in three
experiments. To induce a conservative or liberal old-new bias, they manipulated
participants’ expectations about the proportion of targets and lures on the
Rotello 48
recognition test. Participants were then asked for both old-new judgments and, for
studied words, for source decisions. Participants in the conservative conditions (but
not those in the liberal conditions) were able to discriminate the source of studied
words they had missed on the test, exactly as the SDT model predicts. This result
was challenged by Malejka and Bröder (2016), who argued that asking for source
judgments only for studied items provided some feedback to participants about the
accuracy of their old-new response, which may have caused them to re-evaluate
memory on those trials. They re-ran Starns et al.'s (2008) experiments, asking for
source judgments for all test items and finding no difference in source accuracy as a
function of bias. However, their bias manipulation was less effective than that of
Starns et al. (2008); in particular, their conservative condition was not nearly as
conservative, which may account for the reduced source accuracy.
3.2.3.3 zROC slopes and curvature are affected by decision processes
A third argument in favor of the SDT models over the HTSDT model stems
from a prediction about how the slope of the source zROC is influenced by the
relative strengths of the two sources (S1 and S2). If items appearing in each source
are studied the same number of times, then the source information should be
equally easy (or hard) to recollect. In other words, RS1 = RS2 and the slope of the
zROC is 1. On the other hand, suppose that the items studied in one source (say, S2)
are strengthened relative to those in the other source (S1), perhaps by presenting
the items in S2 twice each and the items in S1 only once each. In that case, the
Rotello 49
HTSDT model predicts that recollection should be increased for the stronger source,
so RS2 > RS1. The slope of the zROC will then depend on which source is selected to
be the “target” source that defines the y-axis. If S2 is the target source, then the
slope of the zROC will be less than 1, and if S1 is the target source, the slope of the
zROC will be greater than 1. Starns et al. (2013) called this experimental design
“unbalanced” because the source strengths are not equal; they confirmed that the
source slope depends on which source is the target.
An interesting test of the HTSDT model comes from experiments with
balanced but variable source strengths. In this design, both S1 and S2 items are
studied an equal number of times, but for half of the items in each source the
encoding is weak (one study exposure) and for the remaining items the encoding is
stronger (two study exposures). Because the source strengths are equal overall, the
HTSDT model predicts the source zROC slope will be 1. The predictions of the SDT
models are different. According to SDT, the optimal decision bounds for the source
decision are likelihood-based (these are shown in Figure 7 for a specific data set).
These curved decision bounds naturally capture the intuition that participants
should be unwilling to make high-confidence source decisions for weakly-encoded
items that they have low-confidence they’ve even studied. For stronger items,
confidence in the old decision is higher, and the probability of a higher-confidence
source decision increases; because item and source strengths are correlated, these
higher-confidence source responses are also likely to be accurate. Starns et al.
(2013) showed that the SDT model predicts the source zROC slopes in the balanced
design will differ as a function of whether the stronger or weaker item source serves
Rotello 50
as the target source, exactly like the HTSDT model’s predictions for the unbalanced
design. In three experiments, the balanced design produced “crossed” source zROC
slopes as predicted by the SDT model but not by the HTSDT model.
Starns and Ksander (2016) showed that the source zROC slope effect
depends on item strength, not source strength. Thus, the slope effect must be due to
the decision process rather than the underlying evidence distributions. Starns and
Ksander (2016) had participants study words paired with a male or female face, or
with a picture of a bird or a fish. In the no-repetition condition, each item-face
combination was studied once. In the same-source repetition condition, item-face
pairs were studied three times each. And in the different-source repetition
condition, each word was studied twice with either a bird or a fish image, and then
once with a male or female face. Although male-female source accuracy was lower
in the different-source repetition condition than in the no-repetition condition, high-
confidence male-female source judgments were more frequent. In addition, the
zROC slopes crossed exactly as in Starns et al. (2013). Both of these effects are
consistent with the predictions of the bivariate SDT model with likelihood-type
decision bounds (e.g., Hautus et al., 2008).6
In the associative recognition literature, the appearance of relatively
flattened ROCs and curved z ROCs has been interpreted in terms of mixtures of trials
drawn from distributions that do and don’t contain associative information. The
6 When fit to data, the threshold (Klauer & Kellen, 2010) and hybrid (Onyper et al., 2010) bivariate models of item and source memory also yield parameters consistent with the idea that participants are reluctant to give high-confidence source judgments to items they do not remember well (see Figure 8). For these models, however, the parameters are arbitrary; unlike the bivariate SDT model (e.g., Hautus et al., 2008), nothing about the structure of the threshold and hybrid models dictates the likelihood-type decision bounds.
Rotello 51
bivariate SDT models of source memory suggest an alternative explanation that
rests in the nature of the decision bounds rather than the evidence distributions.
Specifically, the convergence of the decision bounds in the bivariate SDT model (Fig.
7) that predicts the observed source zROC slopes (Starns et al., 2013; Starns &
Ksander, 2016) also accounts for the relatively flattened source ROC shape and the
presence of curvature in the z ROC (Hautus et al., 2008). As Starns, Rotello, and
Hautus (2014) explained, that curvature occurs because strongly encoded items
tend to receive both high-confidence old and high-confidence (and correct) source
decisions, whereas more weakly encoded items tend to be assigned lower-
confidence responses on both scales. This means that the end points of the source
ROC tend to be based on responses to more well-learned items (with higher source
accuracy), and the mid-points of the ROC tend to be based on responses to more
poorly-learned stimuli (with source accuracy closer to chance); a flattened ROC and
curved z ROC are the result.
3.2.3.4 Source memory provides some evidence for (continuous) recollection
The data discussed so far have consistently supported the UVSDT model over
its competitors. Particularly problematic for the HTSDT model has been its
assumption of a threshold recollection process. As Wixted and Mickes (2010)
pointed out, however, there is no reason to assume that a recollection process has
threshold characteristics. They suggested the alternative view that recollection
operates as a continuous, signal detection process, the result of which is usually
Rotello 52
summed with the result of the familiarity process. Together, these two processes
are usually completely consistent with the UVSDT model; Wixted and Mickes
termed this model the continuous dual process (CDP) model (see section 2.2.2).
There are some limited circumstances in which the UVSDT and CDP models
may be distinguished. Wixted and Mickes (2010) partitioned the highest-confidence
“old” decisions that were associated with remember and know responses, and then
separately calculated recognition and source memory accuracy for those two types
of subjective report. Source accuracy was higher after remember than know
responses, even though overall recognition accuracy was equated. These data
provide some evidence that remember decisions may reflect recollection after all,
albeit in a continuous form. That basic effect was replicated and strengthened by
Ingram, Mickes, and Wixted (2012), who reported that source accuracy was higher
after lower-confidence remember judgments than after high confidence know
decisions. Even more convincing are the state-trace analyses for all of these
experiments, which were non-monotonic and consistent with the contribution of
more than process to these recognition and source decisions (Dunn & Kirsner,
1988).
3.2.3.5 Summary of the source recognition data
Overall, the source recognition data are more consistent with the UVSDT
approach than with either the HTSDT or 2HT models. The slopes of the source
zROCs are systematically affected by item strength and the decision bounds in the
Rotello 53
bivariate UVSDT model (Starns et al., 2012; Hautus et al., 2008). The flattened ROC
shapes (and slightly curved zROCs) are also explained by the decision bounds in the
bivariate model, or by a mixture process as assumed by the MSDT model (DeCarlo,
2003).
3.3 Expanding the Data: Beyond ROCs
The data discussed so far have largely come from ROC experiments, and have
provided evidence in favor of the UVSDT model over the others. A powerful
advantage of signal detection models is that they not only separate response bias
from decision accuracy within a task and show how different types of errors trade
off against one another, they also make specific predictions about how accuracy
should compare across tasks. Comparing model parameters across different
empirical paradigms provides converging evidence on the quality of a model, in the
form of a test of its generalization ability. In this section, data from several different
paradigms will be considered, including the commonly-used two-alternative forced
choice (2AFC) task. We’ll also consider two less familiar paradigms: the oddity task,
in which participants see three test items (2 lures and a target or 2 targets and a
lure) and must choose the “odd one out,” and the so-called second choice paradigm
in which participants are given four memory probes (3 of which are lures) and they
get two chances to identify the target. Finally, data from some recent experimental
tests of the models under “minimal assumptions” will be discussed.
Rotello 54
3.3.1 Two-alternative forced choice recognition
3.3.1.1 Accuracy comparison with old-new recognition
In a 2AFC recognition task, participants are presented with a study list and
then face a test on which the memory probes are presented in pairs. Typically, only
one of the test items was studied, and the participants’ task is to select that item, the
target. This task can be accomplished by comparing the memory strengths of the
two stimuli, selecting the one with greater strength as the “old” member of the pair.
The one-dimensional model in Figure 1 serves as our theoretical starting point. The
target comes from a Normal distribution that has a mean of d'YesNo and a standard
deviation of 1; the lure comes from a Normal distribution with a mean of 0 and
standard deviation of 1. The difference between the two strengths, target-lure or
lure-target, is also normally distributed with a mean of d' YesNo (or -d' YesNo ) and a
standard deviation of √2. The participants’ task is to discriminate the target-lure
pairs from the lure-target pairs; Figure 9 shows that their overall accuracy is
predicted to be
2d! d! = YesNo = 2d! (13) 2 AFC 2 YesNo
In other words, SDT tells us that performance in the 2AFC task should be easier than
in an old-new recognition task, by a factor of √2: a d' score of 1.5 in a 2AFC task is
equivalent to a d'YesNo of 1.06 = 1.5/√2.
< Insert Figure 9 near here >
Rotello 55
The theoretical relationship between old-new recognition and 2AFC
performance is clearly established by SDT. Early tests of this prediction, using
auditory and visual detection tasks, were successful (see Green & Swets, 1966, for a
summary). In the domain of recognition memory, the assessment of the SDT
prediction has run in parallel with an assessment of competing models such as the
HTSDT and MSDT models.
Kroll, Yonelinas, Dobbins, and Frederick (2002) used old-new recognition
responses to estimate the parameters of the HTSDT and EVSDT models. Those
parameters were then used to predict the percentage of correct responses on a
2AFC task with the same materials. Kroll et al. (2002) concluded that the HTSDT
model more accurately predicted 2AFC performance than the EVSDT model.
However, as Smith and Duncan (2004) noted, there are two major problems with
that analysis. First, the UVSDT model is more appropriate than its equal-variance
cousin for recognition memory tasks (see section 3.1), and second, a stronger test of
the models focuses on whether its parameters are consistent across tasks. Many
different combinations of parameter values in the old-new recognition model can
result in the same value of percent correct on the 2AFC test, making percent correct
a weak target for model assessment.
Jang, Wixted, and Huber (2009) had participants complete both a confidence-
rating item recognition task and a 2AFC recognition task with confidence ratings;
the two types of trials were randomly intermixed on the test. The resulting old-new
recognition ROC was curved with a zROC slope of 0.7, whereas the 2AFC ROC was
curved and symmetric. Three different models (UVSDT, HTSDT, and MSDT) were fit
Rotello 56
to both ROCs simultaneously, so that the same parameters were required to fit both
tasks. For example, the UVSDT was constrained to have a single discrimination
parameter that was related across tasks as described by Equation 13. For the
HTSDT model, the recollection parameter was set to be constant across tasks;
because the familiarity process operates as a signal detection model, the d’
parameter in each task was defined by Equation 13. Finally, the signal detection
component of the MSDT model was also assumed to comply with Equation 13, and
the attention parameter was fixed to be equal in both old-new and 2AFC
recognition. In both a new experiment and a reanalysis of Smith and Duncan’s
(2004) data, the UVSDT model clearly provided the best fit of individual
participants’ data.
3.3.1.2 The relationship of accuracy and confidence in 2AFC tasks
Forced choice tasks have played another interesting role in the recognition
literature. Rather than focusing on the predicted accuracy relationship across tasks,
these 2AFC studies have investigated the relationship between participants’
decision accuracy and their confidence. All of the signal detection models we’ve
considered make the assumption that old-new memory decisions and confidence
ratings are based on the same underlying evidence axis; confidence criteria and the
old-new criterion are simply different locations on that axis. For this reason, the
SDT models predict that empirical factors that influence accuracy should also
Rotello 57
influence confidence in the same manner, with higher levels of confidence
corresponding to higher levels of accuracy.
Tulving (1981) was the first to report that accuracy and confidence in a 2AFC
recognition memory task had an “inverted” relationship: confidence was higher in
the condition with lower accuracy. In Tulving’s experiment, participants studied a
series of photographs and then were tested in a 2AFC task in which the lure item
was either highly-similar to the target in that pair (A/A' pairs, where A was studied)
or was highly-similar to a different studied item (A/B' pairs, where both A and B
were studied). Participants were more confident in their responses to the A/B'
pairs, but were more accurate for the A/A' pairs. This basic effect has been
replicated several times (Chandler, 1989, 1994; Dobbins, Kroll, & Liu, 1998;
Heathcote, Freeman, Etherington, Tonkin, & Bora, 2009; Heathcote, Bora, &
Freeman, 2010), and appears to offer a true puzzle for SDT models.
As it turns out, though, SDT can easily and simultaneously account for both
the confidence and accuracy effects. Signal detection’s description of the 2AFC task
is that participants select the target by calculating the difference in strengths of the
two items at test (see section 3.3.1.1). As Clark (1997) point out, however, the A/A'
pairs are not independent random variables: they share variance by virtue of their
similarity to one another, and therefore the variance of the A-A' strength difference
2 2 2 is s A + s A' - 2cov(A, A'). In contrast, the A-B' difference has a larger variance: s A +
2 s B'. There are two consequences of this reduced variance for the A/A' test pairs.
First, selection of the studied photo from the A/A' pairs is easier than for A/B' pairs.
That’s because the mean strength difference is the same for both tests (A' and B' are
Rotello 58
both equally similar to their corresponding studied item), but the variability is
lower, which increases discrimination. That accounts for the accuracy effect, as
shown in Figure 10. Second, assuming confidence is roughly determined by
distance from the decision criterion, which could plausibly be set at the zero point
where there’s no strength difference between the test items, then the average
confidence level for the A/A' pairs will be higher than for the A/B' pairs. So, the
same covariance difference accounts for both the increased accuracy and the
decreased confidence. Indeed, Heathcote et al. (2010) successfully modeled the
results of their experiments and Dobbins et al.’s (1998), all of which involved
remember-know judgments, using Clark’s model plus a remember-know criterion
with a location that varied randomly from trial to trial. That SDT model fit the data
better than an HTSDT variant proposed by Dobbins et al. (1998).
< Insert Figure 10 near here >
3.3.2 Oddity Task
O'Connor, Guhl, Cox, and Dobbins (2011) tested participants on an unusual
task, using the oddity paradigm. In an oddity task, participants are shown three
memory probes simultaneously. There are either two targets and a lure, or two
lures and a target; the participant's task is to select the "odd" item that is of a
different stimulus class than the others. This task is not in common use outside of
the food science literature, but has been argued to be appropriate for requesting
Rotello 59
discriminations that are difficult to describe. For example, Macmillan and Creelman
(2005) suggested that an oddity task might be appropriate for testing the ability of
novices to discriminate two different types of red wine.
O'Connor et al. generated predictions of the HTSDT model for the oddity task,
concluding that the model expects higher accuracy when a lure is the odd item.
Essentially, this prediction arises because only targets can be recollected: on
average, the HTSDT has more information about targets than about lures. According
to the UVSDT model, on the other hand, decisions may be made based on a
differencing strategy like that used for 2AFC paradigm, except that two differences
are required (item 1 – item 2; item 2 – item 3). Each possible combination of trial
types (i.e., lure-target-target, target-lure-target, etc) produces a unique combination
of expected difference scores, allowing identification of the odd item (see Macmillan
& Creelman, 2005, for details). Alternatively, participants may simply order the
memory strengths of the three items and then compare the middle strength to an
unbiased old-new decision bound. If the middle item falls above (below) that
criterion, then the weakest (strongest) item should be selected as the odd item.
Because the target distribution is known to be more variable than the lure
distribution in recognition memory tasks, both SDT decision rules lead to the
prediction that accuracy will be higher when the target is the odd item: in that case
the two lures will likely be closer together in strength than two targets would be.
In a series of experiments and simulations, O'Connor et al. found that the
lures were easier to identify as odd, consistent with a model that assumes
recollection contributes to the decision. Given the evidence against threshold
Rotello 60
recollection, however, Wixted and Mickes' (2010) continuous dual-process model is
likely to be more successful overall than the HTSDT model. And as O'Connor et al.
(2011) realized, there is also a version of the UVSDT model that can account for the
higher accuracy on lure-odd trials: if the old-new decision criterion is set liberally,
then most targets will fall above that criterion, allowing the lure to be readily
identified.
3.3.3 Second choice tasks
Parks and Yonelinas (2009) brought a different task from the perception
literature (Swets, Tanner, & Birdsall, 1961) to the recognition memory literature,
and applied it to both item and associative recognition. They gave participants four
memory probes to choose from (one target and 3 lures), and two tries to select the
target. Threshold and SDT models make different predictions about how the second
choice responses should be related to the first choice. For the threshold model in
which only targets can be detected, the first selection should be the target, if it is
detected (Kellen & Klauer, 2011). Lures can never be detected as old in this model,
so that response strategy would always lead to a correct decision on the initial
response. Because only one of the response options is a target, failure to select it
first means that both the first and second choices must be a consequence of a
random guessing process from a state of uncertainty. Thus, first and second choices
will be unrelated to one another.
Rotello 61
In the SDT model, the second choice will be systematically related to the first.
The participant is assumed to select the option that has the greatest strength, which
will usually be a target (DeCarlo, 2013). If, due to distributional overlap, the
strongest item happens to be a lure, then the second choice is likely to be the target,
because the probability that two lures will fall in the upper tail of the distribution is
low.
In item recognition, Parks and Yonelinas (2009, see also Kellen & Klauer,
2014) found that the second choice responses were related to first choice,
consistent with the UVSDT model and inconsistent with the HTSDT model. In
contrast, associative recognition second-choices were unrelated to first choice
responses, a result that seems to suggest a threshold interpretation consistent with
recollection contributing to associative responses. The problem with that
conclusion, of course, is that the HTSDT account of the associative recognition ROCs
(see section 3.2.2) assumes that "unitization" leads participants to respond to based
on familiarity, a UVSDT process (see Equation 9).
Another challenge for the interpretation of these second choice responses is
that that the analyses fail to account for model complexity. Consistent with earlier
analyses for confidence-based recognition ROCs (Jang et al., 2011), but not
remember-know data (Cohen et al., 2008), Kellen and Klauer (2011) argued that the
UVSDT model is more flexible than the other models when applied to data from this
second-choice task. For this reason, they conclude that either the HTSDT or MSDT
model provides the best description of the data. However, their conclusions are
based on normalized maximum likelihood. This criterion may have too strong of a
Rotello 62
preference for simple models, because when applied to old-new recognition data it
concludes in favor of models that generate symmetric ROCs (Kellen et al., 2013).
3.3.4 Minimal assumption tests
Many of the predictions for particular model outcomes depend on
assumptions about the model that may not be essential properties of that model.
For example, the assumption that the SDT models involve Gaussian evidence
distributions is a convenience rather than a necessity of the model. So-called
minimal assumption tests of these model attempt to side step these ancillary
assumptions, thus making predictions that reflect the core properties of the model,
rather than the details of how it happens to be implemented.
One minimal assumption test is that the SDT models predict that fewer
extreme errors (high-confidence misses or false alarms) should be made as memory
strength increases. This prediction follows directly from the decreasing overlap of
the target and lure distributions (see Figure 1). In contrast, the 2HT model assumes
that extreme errors result from guessing; the conditional independence assumption
requires that responses based on guessing are independent of memory strength
(see section 3.1.2). Kellen and Klauer (2015) tested these competing predictions by
focusing on high-confidence misses ("sure new" decisions for targets). Their data
were consistent with the 2HT model's predictions, leading them to conclude that
there is no direct mapping of memory evidence to response confidence. Of course,
Rotello 63
this conclusion contradicts the results of a great many other experiments on
recognition memory.
4 Challenges
We've seen that the evidence from a variety of memory tasks supported the
UVSDT model over its competitors. Despite the near unanimity of that success,
there are some challenges that must be faced. Several of these challenges stem from
a failure of virtually all recognition memory studies to consider all variables in the
experiment, including both those that are part of the design (e.g., item and subject
effects) and all possible dependent measures (i.e., reaction times).
4.1 Aggregation Effects
One issue faced by all of the models is the problem of data aggregation (Pratt,
Rouder, & Morey, 2010; DeCarlo, 2011; Pratt & Rouder, 2011). Our modeling
efforts usually involve collapsing responses over trials, subjects, or both. We know
that there are individual differences in participants' response strategies in these
tasks (e.g., Kapucu et al., 2010; Jang et al., 2011; Kantner & Lindsay, 2012), yet we
typically ignore those differences and consider only group-level behavior. The
consequences for the model fits are not necessarily bad (Cohen, Sanborn, & Shiffrin,
2008; Cohen et al., 2008), but certainly should be evaluated. Likewise, we almost
Rotello 64
invariably ignore item effects that also occur (e.g., Freeman, Heathcote, Chalmers, &
Hockley, 2010; Isola, Xiao, Parikh, Torralba, & Oliva, 2014). Pratt et al. (2010; see
also DeCarlo, 2011) demonstrated that aggregation of data over items and subjects
can systematically distort our conclusions: In item recognition tasks, overestimation
of accuracy effects and underestimation of zROC slopes can result. Aggregation
disguises variability, meaning that we also have too much confidence in the stability
of our parameter estimates. Hierarchical modeling approaches have been
developed to address these challenges (e.g., Klauer, 2006, 2010; Pratt et al., 2010;
Pratt & Rouder, 2011), but have yet to be widely adopted.
4.2 Reaction Times
We saw earlier that reaction times have successfully discriminated different
interpretations of remember-know responses, concluding in favor of the UVSDT
model (Wixted & Stretch, 2004; Rotello & Zeng, 2008; Wixted & Mickes, 2010). In
addition, diffusion model fits to binary-response recognition memory data provide
converging evidence for the UVSDT model (Starns et al., 2012). Despite these
positives, newer models developed to simultaneously fit both confidence ratings
and reaction time distributions present an interpretive challenge to those studies
(Ratcliff & Starns, 2009; Voskuilen & Ratcliff, 2016; see also Van Zandt, 2000;
Pleskac & Busemeyer, 2010). Ratcliff's RTCON model includes two sets of criteria
that together predict the ROC and RT distributions for the task. The confidence
Rotello 65
criteria are like those in Figure 1; they partially determine the predicted points on
the ROC. The areas under the curve between those confidence criteria also yield the
mean drift rates for a set of diffusion processes, one for each confidence level.
Responses are made when the first diffusion process hits its decision criterion,
resulting in both a response time and a confidence rating. Thus, the decision criteria
determine the reaction time distributions, but because the model is fit to all of the
data simultaneously, the confidence parameters are constrained by decision
parameters, and vice versa, and both sets of criteria constrain the estimated
evidence distributions. A consequence is that the slope of the zROC does not
correspond to the ratio of standard deviations of the lure and target distributions, as
the UVSDT model assumes. For this reason, Ratcliff and colleagues (Ratcliff &
Starns, 2009; Voskuilen & Ratcliff, 2016) caution against relying on zROC slopes as a
basis for theoretical conclusions.
4.3. Criterion Variability
A different criticism of using zROC slopes to draw theoretical conclusions
comes from experiments in which confidence ratings are collected across test lists
that vary in their base rate of targets and lures (Schulman & Greenberg, 1970; Van
Zandt, 2000). These studies yield data that appear to violate the core assumptions
of signal detection theory. As described in sections 3.1.1 and 3.1.2, zROCs can be
constructed across lists, using the different base rates to generate the operating
Rotello 66
points, and they can also be constructed within lists using the confidence ratings.
According to SDT models, both zROCs are based on the same underlying evidence
distributions, so the same slope should be estimated from both methods. In
contrast to this prediction, steeper confidence-based zROC slopes were observed on
tests that included a higher proportion of targets. For this reason, Van Zandt
(2000) concluded that the UVSDT model could not account for the data. However,
this conclusion fails to consider the influence of criterion variability. As Rotello and
Macmillan (2008) argued, Treisman and Williams' (1984) model of criterion
variability predicts the observed pattern of slopes, without any modification to the
underlying signal detection model.
Essentially, Treisman and Williams (1984) argued that criterion location is
determined by three factors. Task demands and the first few test trials set the
criterion initially, then on each trial the criterion is shifted toward the average of the
recently observed evidence values so that finer discriminations may be made. The
shift toward the recent-mean is offset by a tendency to adjust the criterion so that
the same response is more likely on the next trial, allowing sequential dependencies
to occur (Malmberg & Annis, 2012). When confidence criteria are required, the
more extreme criteria tend to have greater variability because the probability of
observing a test item from the upper tail of the target distribution (or the lower tail
of the lures) is low. The overall effect of Treisman and Williams’s three factors is
that the presence of more targets than lures increases "old" confidence criterion
variability to a greater extent than "new" confidence criterion, and the opposite is
true for tests that include a relatively more lures.
Rotello 67
Across trials, this means that there is noise in the location of the criterion as
well as the evidence (Wickelgren & Norman, 1966; Norman & Wickelgren, 1969).
This means that the slope of the zROC is not just the ratio of the standard deviation
of the lures (1) to the targets (s), it’s actually
2 (1+σ c ) slope = s2 2 +σ c (14)
2 where s c is the variance of the criterion location. These evidence and criterion
components of variance cannot be separated empirically using a standard
experimental design (but see Benjamin, Diaz, & Wee, 2009 and Mueller &
Weidemann, 2008, for two attempts, and Kellen, Klauer, & Singmann, 2012, for
criticism). In fact, we don't need to measure both evidence and criterion noise
separately to know that both are present: the data reported by Van Zandt (2000)
and by Schulman and Greenberg (1970) confirm earlier simulation work by
Treisman and Faulkner (1984) and demonstrate that the criterion noise is both
present and systematic.
4.4 Residual Analyses Reveal Potential Problems for All Models
Recently, Dede, Squire, and Wixted (2014) proposed a new strategy for
evaluating the relative fit of models of recognition memory. Specifically, they
suggested looking at the pattern of residuals between observed data and the
models’ best fitting predictions. If the residuals for a particular model are
Rotello 68
systematic across data sets, then that indicates a problem inherent to the model. On
the other hand, if the residuals are not systematic then they reflect random noise in
any given experimental outcome, lending credibility to that model. Dede et al.
(2014) applied this strategy to four recognition memory data sets, with the goal of
deciding whether the HTSDT or UVSDT model offers the better explanation. The
HTSDT fits yielded the same systematic pattern of residuals across all four data sets,
whereas the residuals for the UVSDT model appear to reflect only statistical noise,
suggesting that the UVSDT model provides the better overall account of these data.
Dede et al.’s (2014) results are promising. On the other hand, Kellen and
Singmann (2016) adopted the same basic strategy of analyzing residuals and
reached a different conclusion. Kellen and Singmann fit a larger number of data
sets, included the MSDT model in the evaluation, and used a different criterion for
defining systematic residuals. They found that all of the models under consideration
displayed at least some systematic deviations from the data. It remains an open
question whether these residuals reflect core assumptions of each of the models, or
whether they reflect ancillary details such as the assumed form of the evidence
distributions.
5 Conclusion
Across a wide range of recognition memory tasks, including those thought to
rely heavily on a recollection process, the unequal-variance signal detection model
Rotello 69
provides a consistently – almost unanimously – better fit to data than that offered by
competing models. Assessments of model flexibility indicate that the success of the
UVSDT model is not due to intrinsically greater flexibility. Instead, this simple
model appears to provide an excellent description of recognition memory
performance. As such, it should serve as the foundation for research on recognition
memory in domains as varied as "real-world" applications of memory (i.e.,
eyewitness identification decisions: Mickes, Flowe, & Wixted, 2012,) and the Comment [CR1]: Cross reference chapter 02025. Eyewitness Identification by Laura Mickes neurological basis of recognition (e.g., Squire, Wixted, & Clark, 2007).
Rotello 70
6 References
Arndt, J., & Reder, L. M. (2002). Word frequency and receiver operating
characteristic curves in recognition memory: Evidence for a dual-process
interpretation. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 28, 830-842. DOI: 10.1037//0278-7393.28.5.830
Banks, W. P. (1970). Signal detection theory and human memory. Psychological
Bulletin, 74, 81-99.
Banks, W. P. (2000). Recognition and source memory as multivariate decision
processes. Psychological Science, 11, 267–273.
Bamber, D. (1979). State-trace analysis: A method of testing simple theories of
causation. Journal of Mathematical Psychology, 19, 137-181.
Bastin, C., Diana, R.A., Simon, J., Collette, F., Yonelinas, A.P., & Salmon, E. (2013).
Associative memory in aging: The effect of unitization on source memory.
Psychology and Aging, 28, 275-283. doi: 10.1037/a0031566
Bayen, U. J., Murnane, K., & Erdfelder, E. (1996). Source discrimination, item
detection, and multinomial models of source monitoring. Journal of
Experimental Psychology: Learning, Memory, & Cognition, 22, 197-215.
Benjamin, A. S., Diaz, M., & Wee, S. (2009). Signal detection with criterion noise:
Applications to recognition memory. Psychological Review, 116, 84-115. Doi:
10.1037/a0014351
Brainerd, C. J., Reyna, V. F., & Mojardin, A. H. (1999). Conjoint recognition.
Psychological Review, 106, 160-179.
Rotello 71
Bröder, A., Kellen, D., Schütz, J., & Rohrmeier, C. (2013). Validating a two-high-
threshold measurement model for confidence rating data in recognition.
Memory, 21, 916-︎944. Doi: 10.1080/09658211.2013.767348
Bröder, A. & Schütz, J. (2009). Recognition ROCs are curvilinear - or are they? On
premature arguments against the two-high-threshold model of recognition.
Journal of Experimental Psychology: Learning, Memory & Cognition, 35, 587-
606. [see also correction to Bröder and Schütz (2009). JEP: LMC, 47(5), 1301]
Buchner, A., & Erdfelder, E. (1996). On the assumptions of, relations between, and
evaluations of some process dissociation measurement models.
Consciousness and Cognition, 5, 581-594.
Chandler, C. C. (1989). Specific retroactive interference in modified recognition
tests: Evidence for an unknown cause of interference. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 15, 256-265.
Chandler, C. C. (1994). Studying related pictures can reduce accuracy, but increase
confidence, in a modified recognition test. Memory & Cognition, 22, 273-280.
Chen, T., Starns, J. J., & Rotello, C. M. (2015). A violation of the conditional
independence assumption of discrete state models of recognition memory.
Journal of Experimental Psychology: Learning, Memory, & Cognition, 41, 1215-
1222. doi: 10.1037/xlm0000077
Clark, S. E. (1997). A Familiarity-Based Account of Confidence-Accuracy Inversions
in Recognition Memory. Journal of Experimental Psychology: Learning,
Memory, & Cognition, 23, 232-238.
Rotello 72
Cohen, A. L., Rotello, C. M., & Macmillan, N. A. (2008). Evaluating models of
remember-know judgments: Complexity, mimicry, and discriminability.
Psychonomic Bulletin & Review, 15, 906-926.
Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped
or individual data. Psychonomic Bulletin & Review, 15, 692-712. doi:
10.3758/PBR.15.4.692
DeCarlo, L. T. (2002). Signal detection theory with finite mixture distributions:
Theoretical developments with applications to
recognition memory. Psychological Review, 109, 710-721.
DeCarlo, L. T. (2003). An application of signal detection theory with finite mixture
distributions to source discrimination. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 29, 767-778.
DeCarlo, L. T. (2007). The mirror effect and mixture signal detection theory. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 33, 18-33.
DeCarlo, L. T. (2011). Signal detection theory with item effects. Journal of
Mathematical Psychology, 55, 229-239.
DeCarlo, L. T. (2012). On a signal detection approach to m-alternative forced choice
with bias, with maximum likelihood and Bayesian approaches to estimation.
Journal of Mathematical Psychology, 56, 196-207.
Dede, A. J. O., Squire, L. R., & Wixted, J. T. (2014). A novel approach to an old
problem: Analysis of systematic errors in two models of recognition memory.
Neuropsychologia, 52, 51–56. doi: 10.1016/j.neuropsychologia.2013.10.012
Rotello 73
Dewhurst, S. A., & Conway, M. A. (1994). Pictures, images, and recollective
experience. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 20, 1088-1098.
Dewhurst, S. A., Holmes, S. J., Brandt, K. R., & Dean, G. M. (2006). Measuring the
speed of the conscious components of recognition memory: Remembering is
faster than knowing. Consciousness & Cognition, 15, 147-162.
Dobbins, I. G., Kroll, N. E. A., & Liu, Q. (1998). Confidence–accuracy inversions in
scene recognition: A remember–know analysis. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 24, 1306–1315.
Dodson, C. S., Bawa, S., & Slotnick, S. D. (2007). Aging, source memory, and
misrecollections. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 33, 169-181.
Donaldson, W. (1996). The role of decision processes in remembering and knowing.
Memory & Cognition, 24, 523-533.
Dougal, S., & Rotello, C. M. (2007). “Remembering” emotional words is based on
response bias, not recollection. Psychonomic Bulletin & Review, 14, 423-429.
Dube, C., & Rotello, C. M. (2012). Binary ROCs in perception and recognition
memory are curved. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 38, 130-151.
Dube, C., Rotello, C. M., & Heit, E. (2011). The belief bias effect is aptly named: A
reply to Klauer and Kellen (2011). Psychological Review, 118, 155-163.
Rotello 74
Dube, C., Starns, J. J., Rotello, C. M., & Ratcliff, R. (2012). Beyond ROC curvature:
Strength effects and response time data support continuous-evidence models
of recognition memory. Journal of Memory & Language, 67, 389-406.
Dunn, J. C. (2004). Remember-know: A matter of confidence. Psychological Review,
111, 524-542.
Dunn, J. C. (2008). The dimensionality of the Remember-Know task: A state-trace
analysis. Psychological Review, 115, 426-446.
Dunn, J. C. & Kirsner, K. (1988). Discovering functionally independent mental
processes: The principle of reversed association. Psychological Review, 95,
91- 101.
Egan, J. P. (1958). Recognition memory and the operating characteristic (Technical
Note AFCRC-TN-58-51). Bloomington, IN: Indiana University Hearing and
Communication Laboratory.
Erdfelder, E., & Buchner, A. (1998). Process-dissociation measurement models:
Threshold theory or detection theory? Journal of Experimental Psychology:
General, 127, 83–97. doi:10.1037/0096-3445.127.1.83
Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in
recognition memory for words. Journal of Memory and Language, 62, 1–18.
Doi: 10.1016/j.jml.2009.09.004
Gardiner, J. M., & Richardson-Klavehn, R. (2000). Remembering and knowing. In E.
Tulving & F. I. M. Craik (Eds.), The Oxford handbook of memory (pp. 229–244).
Oxford, England: Oxford University Press.
Rotello 75
Glanzer, M., Hilford, A., & Kim, K. (2004). Six regularities of source recognition.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 30,
1176–1195.
Glanzer, M., Kim, K., Hilford, A., & Adams, J. K. (1999). Slope of the receiver-operating
characteristic in recognition memory. Journal of Experimental Psychology:
Learning, Memory, & Cognition, 25, 500–513.
Green, D. M. & Swets, J. A. (1966). Signal detection theory and psychophysics. New
York: Wiley. Reprinted 1974 by Krieger, Huntington, NY.
Greve, A., Donaldson, D. I., & van Rossum, M. C. W. (2010). A single-trace dual-
process model of episodic memory: A novel computational account of
familiarity and recollection. Hippocampus, 20, 235-251. Doi:
10.1002/hipo.20606
Gronlund, S D., Ratcliff, R. (1989). Time course of item and associative information:
Implications for global memory models. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 15, 846-858.
Harlow, I. M., & Donaldson, D. I. (2013). Source accuracy data reveal the thresholded
nature of human episodic memory. Psychonomic Bulletin & Review, 20, 318–
325. doi:10.3758/s13423-012-0340-9
Hautus, M., Macmillan, N. A., & Rotello, C. M. (2008). Toward a complete decision
model of item and source recognition. Psychonomic Bulletin & Review, 15,
889-905.
Healy, M. R., Light, L. L., & Chung, C. (2005). Dual-process models of associative
recognition in young and older adults: Evidence from receiver operating
Rotello 76
characteristics. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 31, 768-788. DOI: 10.1037/0278-7393.31.4.768
Heathcote, A. (2003). Item Recognition Memory and the Receiver Operating
Characteristic. Journal of Experimental Psychology: Learning, Memory, &
Cognition, 29, 1210-1230. DOI: 10.1037/0278-7393.29.6.1210
Heathcote, A., Bora, B., & Freeman, E. (2010). Recollection and confidence in
two-alternative forced choice episodic recognition. Journal of Memory and
Language, 62, 183-203. doi:10.1016/j.jml.2009.11.003
Heathcote, A., Raymond, F., & Dunn, J. (2006). Recollection and familiarity in
recognition memory: Evidence from ROC curves. Journal of Memory and
Language, 55, 495-514.
Hilford, A., Glanzer, M., Kim, K., & DeCarlo, L. T. (2002). Regularities of source
recognition: ROC analysis. Journal of Experimental Psychology: General, 131,
494-510.
Hintzman, D. L., & Curran, T. (1994). Retrieval dynamics of recognition and
frequency judgments: Evidence for separate processes of familiarity and
recall. Journal of Memory & Language, 33, 1-18.
Ingram, K. M., Mickes, L., & Wixted, J. T. (2012). Recollection can be weak and
Familiarity can be strong. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 38, 325-339. Doi: 10.1037/a0025483
Isola, P., Xiao, J., Parikh, D., Torralba, A. & Oliva, A. (2014). What makes a photograph
memorable? Pattern Analysis and Machine Intelligence. 1-14.
Rotello 77
Jang, Y., Mickes, L. & Wixted, J. T. (2012). Three tests and three corrections:
Comment on Koen and Yonelinas (2010). Journal of Experimental Psychology:
Learning, Memory, and Cognition, 38, 513–523.
Jang, Y., Wixted, J. T., & Huber, D. E. (2009). Testing signal- detection models of
yes/no and two-alternative forced-choice recognition memory. Journal of
Experimental Psychology: General, 138, 291–306.
Jang, Y., Wixted, J. T., & Huber, D. E. (2011). The diagnosticity of individual data for
model selection: Comparing signal-detection models of recognition memory.
Psychonomic Bulletin & Review, 18, 751–757. DOI 10.3758/s13423-011-
0096-7
Kantner, J., & Lindsay, D. S. (2012). Response bias in recognition memory as a
cognitive trait. Memory & Cognition, 40, 1163-1177.
Kapucu, A., Macmillan, N. A., & Rotello, C. M. (2010). Positive and negative
remember judgments and ROCs in the plurals paradigm: Evidence for
alternative decision strategies. Memory & Cognition, 38, 541-554.
PMCID2887610
Kapucu, A., Rotello, C. M., Ready, R. E., & Seidl, K. N. (2008). Response bias in
‘remembering’ emotional stimuli: A new perspective on age differences.
Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 703-
711.
Kellen, D., & Klauer, K. C. (2011). Evaluating models of recognition memory using
first- and second-choice responses. Journal of Mathematical Psychology, 55,
251–266. http://dx.doi.org/10.1016/j.jmp.2010.11.004
Rotello 78
Kellen, D., & Klauer, K. C. (2014). Discrete-state and continuous models of
recognition memory: Testing core properties under minimal assumptions.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 40,
1795–1804. http://dx.doi.org/10.1037/xlm0000016
Kellen, D., & Klauer, K. C. (2015). Signal detection and threshold modeling of
confidence-rating ROCs: A critical test with minimal assumptions.
Psychological Review, 122, 542–557. Doi: 10.1037/a0039251
Kellen, D., Klauer, K. C., & Broder, A. (2013). Recognition memory models and
binary-response ROCs: A comparison by minimum description length.
Psychonomic Bulletin & Review, 20, 693-719. Doi: 10.3758/s13423-013-0407-
2
Kellen, D., Klauer, K. C., & Singmann, H. (2012). On the measurement of criterion
noise in Signal Detection Theory: The case of recognition
memory. Psychological Review, 119, 457-479.
Kellen, D., & Singmann, H. (2016). ROC residuals in signal-detection models of
recognition memory. Psychonomic Bulletin & Review, 23, 253-264. Doi:
10.3758/s13423-015-0888-2
Kelley, R., & Wixted, J. T. (2001). On the nature of associative information in
recognition memory. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 27, 701–722. Doi: 10.1037//0278-7393.27.3.701
Klauer, K.C. (2006). Hierarchical multinomial processing tree models: a latent-class
approach. Psychometrika, 71, 1–31.
Rotello 79
Klauer, K. C. (2010). Hierarchical multinomial processing tree models: a latent-trait
approach. Psychometrika, 75, 70-98. DOI: 10.1007/S11336-009-9141-0
Klauer, K. C. & Kellen, D. (2010). Toward a complete decision model of item and
source recognition: A discrete-state approach. Psychonomic Bulletin & Review,
17, 465-478.
Koen, J. D., & Yonelinas, A. P. (2010). Memory variability is due to the contribution of
recollection and familiarity, not encoding variability. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 36, 1536–1542.
doi:10.1037/a0020448
Krantz, D. H. (1969). Threshold theories of signal detection. Psychological Review,
76, 308–324. doi:10.1037/h0027238
Kroll, N. E. A., Yonelinas, A. P., Dobbins, I. G., & Frederick, C. M. (2002). Separating
sensitivity from response bias: Implications of comparisons of yes-no and
forced-choice tests for models and measures of recognition memory. Journal
of Experimental Psychology: General, 131, 241-254. Doi: 10.1037/0096-
3445.131.2.241
Lachman, R., & Field, W. H. (1965). Recognition and recall of verbal materials as a
function of degree of training. Psychonomic Science, 2, 225-226.
Macho, S. (2004). Modeling associative recognition: A comparison of two-high-
threshold, two-high-threshold signal detection, and mixture distribution
models. Journal of Experimental Psychology: Learning, Memory, and Cognition,
30, 83–97. DOI: 10.1037/0278-7393.30.1.83
Rotello 80
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.).
Mahwah, NJ: Erlbaum.
Macmillan, N. A., Rotello, C. M., & Verde, M. F. (2005). On the importance of models
in interpreting remember-know experiments: Comments on Gardiner et al.’s
(2002) meta-analysis. Memory, 13, 607-621.
Malejka, S., Bröder, A. (2016). No source memory for unrecognized items when
implicit feedback is avoided. Memory & Cognition, 44, 63-72. Doi:
10.3758/s13421-015-0549-8
Malmberg, K. J. (2002). On the form of ROCs constructed from confidence ratings.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 28,
380-387.
Malmberg, K. J., & Annis, J. (2012). On the Relationship between Memory and
Perception: Sequential dependencies in recognition testing. Journal of
Experimental Psychology: General, 141, 233-259.
Malmberg, K.J. & Xu, J. (2006). The Influence of Averaging and Noisy Decision
Strategies on the Recognition Memory ROC, Psychonomic Bulletin & Review,
13, 99-105.
Mandler, G. (1980). Recognizing: The judgment of previous occurrence.
Psychological Review, 87, 252–271
McElree, B., Dolan, P. O., & Jacoby, L. L. (1999). Isolating the contributions of
familiarity and source information to item recognition: A time course
analysis. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 25, 563–582.
Rotello 81
Mickes, L., Flowe, H. D., & Wixted, J. T. (2012). Receiver operating characteristic
analysis of eyewitness memory: Comparing the diagnostic accuracy of
simultaneous versus sequential lineups. Journal of Experimental Psychology:
Applied, 18, 361-376. Doi: 10.1037/a0030609
Mickes, L., Johnson, E. M., & Wixted, J. T. (2010). Continuous recollection versus
unitized familiarity in associative recognition. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 36, 843–863. doi:
10.1037/a001975
Mickes, L., Wais, P. E., & Wixted, J. (2009). Recollection is a continuous process:
Implications for dual-process theories of recognition memory. Psychological
Science, 20, 509-515.
Mickes, L., Wixted, J. T. & Wais, P. E. (2007). A Direct Test of the Unequal-Variance
Signal-Detection Model of Recognition Memory. Psychonomic Bulletin &
Review, 14, 858-865.
Mueller, S. T., & Weidemann, C. T. (2008). Decision noise: An explanation for
observed violations of signal detection theory. Psychonomic Bulletin &
Review, 15, 465-494. doi: 10.3758/PBR.15.3.465
Myung, J. I., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized
maximum likelihood. Journal of Mathematical Psychology, 50, 167-179.
Norman, D. A., & Wickelgren, W. A. (1969). Strength theory of decision rules and
latency in retrieval from short-term memory. Journal of Mathematical
Psychology, 6, 192-208.
Rotello 82
O’Connor, A. R., Guhl, E. N., Cox, J. C., & Dobbins, I. G. (2011). Some memories are
odder than others: Judgments of episodic oddity violate known decision
rules. Journal of Memory and Language, 64, 299–315. Doi:
10.1016/j.jml.2011.02.001
Onyper, S. V., Zhang, Y. X., & Howard, M. W. (2010). Some-or-none recollection:
Evidence from item and source memory. Journal of Experimental Psychology:
General, 139, 341–364. doi:10.1037/a0018926
Parks, C. M., & Yonelinas, A. P. (2007). Moving beyond pure signal detection models:
Comment on Wixted (2007). Psychological Review, 114, 188–202.
doi:10.1037/0033-295X.114.1.188
Parks, C. M., & Yonelinas, A. P. (2009). Evidence for a memory threshold in second-
choice recognition memory responses. Proceedings of the National Academy
of Sciences, 106, 11515-9.
Parks, C. M., Murray, L. J., Elfman, K., & Yonelinas, A. P. (2011). Variations in
recollection: The effects of complexity on source recognition. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 37, 861-873.
Doi:10.1037/a0022798
Pazzaglia, A. M., Dube, C., & Rotello, C. M. (2013). A critical comparison of discrete-
state and continuous models of recognition memory: Implications for
recognition and beyond. Psychological Bulletin, 139, 1173-1203.
Petrusic, W. M., & Baranski, J. V. (2003). Judging confidence influences decision
processing in comparative judgments. Psychonomic Bulletin & Review, 10,
177–183. doi:10.3758/BF03196482
Rotello 83
Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among
computational models of cognition. Psychological Review, 109, 472-491.
Pleskac, T. J., & Busemeyer, J. R. (2010). Two-stage dynamic signal detection: A
theory of choice, decision time, and confidence. Psychological Review, 117,
864-901. DOI: 10.1037/a0019737
Pratte, M. S., & Rouder J. N. (2011). Hierarchical single- and dual-process models of
recognition memory. Journal of Mathematical Psychology, 55, 36-46.
Pratte, M. S., & Rouder, J. N. (2012). Assessing the dissociability of recollection and
familiarity in recognition memory. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 38, 1591-1607. 10.1037/a0028144
Pratte, M. S., Rouder J. N., & Morey R. D. (2010). Separating mnemonic process from
participant and item effects in the assessment of ROC asymmetries. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 36, 224--232.
Province, J. M., & Rouder, J. N. (2012). Evidence for discrete-state processing in
recognition memory, Proceedings of the National Academy of Sciences, 109,
14357–14362 . doi: 10.1073/pnas.1103880109
Qin, J., Raye, C. L., Johnson, M. K., & Mitchell, K. J. (2001). Source ROCs are (typically)
curvilinear: Comment on Yonelinas (1999). Journal of Experimental
Psychology: Learning, Memory, & Cognition, 27, 1110-1115.
Quamme, J. R., Yonelinas, A. P., & Norman, K. A. (2007). Effect of unitization on
associative recognition in amnesia. Hippocampus, 17, 192-200. Doi:
10.1002/hipo.20257
Rotello 84
Rajaram, S. (1993). Remembering and knowing: Two means of access to the
personal past. Memory & Cognition, 21, 89–102.
Rajaram, S., & Geraci, L. (2000). Conceptual fluency selectively influences knowing.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 26,
1070-1074.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.
Ratcliff, R., McKoon, G., & Tindall, M. (1994). The empirical generality of data from
recognition memory receiver-operating characteristic functions and
implications for global memory models. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 763–785.
Ratcliff, R., Sheu, C.-F., & Gronlund, S. D. (1992). Testing global matching memory
models using ROC curves. Psychological Review, 99, 518-535.
Ratcliff, R., & Starns, J. J. (2009). Modeling confidence and response time in
recognition memory. Psychological Review, 116, 59–83.
Reder, L. M., Nhouyvanisvong, A., Schunn, C. D., Ayers, M. S., Angstadt, P., & Hiraki, K.
(2000). A mechanistic account of the mirror effect for word frequency: A
computational model of remember–know judgments in a continuous
recognition paradigm. Journal of Experimental Psychology: Learning, Memory,
& Cognition, 26, 294-320.
Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory
testing. Psychological Review, 107, 358–367. DOI: 10.1037//0033-
295X.107.2.358
Rotello 85
Rotello, C. M. (2000). Recall processes in recognition memory. In D. L. Medin (Ed.),
The Psychology of Learning and Motivation, Vol. 40 (pp. 183-221). San Diego,
CA: Academic Press.
Rotello, C. M., & Heit, E. (1999). Two-process models of recognition memory:
Evidence for Recall-to-reject? Journal of Memory and Language, 40, 432-453.
Rotello, C. M., & Heit, E. (2000). Associative recognition: A case of recall-to-reject
processing. Memory and Cognition, 28, 907-922.
Rotello, C. M., & Macmillan, N. A. (2006). Remember-know models as decision
strategies in two experimental paradigms. Journal of Memory & Language,55,
479-494.
Rotello, C. M., & Macmillan, N. A. (2008). Response bias in recognition memory. In A.
S. Benjamin & B. H. Ross (Eds.), The Psychology of Learning and Motivation:
Skill and Strategy in Memory Use, Vol. 48 (pp. 61-94). London: Academic
Press.
Rotello, C. M., & Zeng, M. (2008). Analysis of RT distributions in the remember-
know paradigm. Psychonomic Bulletin & Review, 15, 825-832.
Rotello, C. M., Macmillan, N. A., Hicks, J. L., & Hautus, M. (2006). Interpreting the
effects of response bias on remember-know judgments using signal-
detection and threshold models. Memory & Cognition, 34, 1598-1614.
Rotello, C. M., Macmillan, N. A., Reeder, J. A., & Wong, M. (2005). The remember
response: Subject to bias, graded, and not a process-pure indicator of
recollection. Psychonomic Bulletin & Review, 12, 865-873.
Rotello 86
Rotello, C. M., Macmillan, N. A., & Van Tassel, G. (2000). Recall-to-reject in
recognition: Evidence from ROC curves. Journal of Memory and Language, 43,
67-88.
Schulman, A. I., & Greenberg, G. Z. (1970). Operating characteristics and a priori
probability of the signal. Perception & Psychophysics, 8, 317-320.
Schütz, J. & Bröder, A. (2011). Signal detection and threshold models of source
memory. Experimental Psychology, 58(4), 293-311.
Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM –
Retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145-
166.
Slotnick, S. D. (2010). "Remember" source memory ROCs indicate recollection is a
continuous process. Memory, 18, 27−39. doi: 10.1080/09658210903390061
Slotnick, S. D., & Dodson, C. S. (2005). Support for a continuous (single process)
model of recognition memory and source memory. Memory & Cognition, 33,
151–170.
Slotnick, S. D., Jeye, B. M., & Dodson, C. S. (2016). Recollection is a continuous
process: Evidence from plurality memory receiver operating characteristics.
Memory, 24, 2-11. Doi: 10.1080/09658211.2014.971033
Slotnick, S. D., Klein, S. A., Dodson, C. S., & Shimamura, A. P. (2000). An analysis of
signal detection and threshold models of source memory. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 26, 1499–1517.
Smith, D. G., & Duncan, M. J. J. (2004). Testing theories of recognition memory by
predicting performance across paradigms. Journal of Experimental
Rotello 87
Psychology: Learning, Memory, & Cognition, 30, 615-625. Doi: 10.1037/0278-
7393.30.3.615
Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognition memory:
Applications to dementia and amnesia. Journal of Experimental Psychology:
General, 117, 34–50. doi:10.1037/0096-3445.117.1.34
Squire, L. R., Wixted, J. R., & Clark, R. E. (2007).Recognition memory and the medial
temporal lobe: A new perspective. Nature Reviews Neuroscience, 8, 872-883.
10.1038/nrn2154
Starns, J. J., & Ksander, J. C. (2016). Item strength influences source confidence and
alters source memory z ROC slopes. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 42, 351-365. Doi: 10.1037/xlm0000177
Starns, J. J., Hicks, J. L., Brown, N. L., & Martin, B. A. (2008). Source memory for
unrecognized items: Predictions from multivariate signal detection theory.
Memory & Cognition, 36, 1-8.
Starns, J. J., Pazzaglia, A. M., Rotello, C. M., Hautus, M. J., & Macmillan, N. A. (2013).
Unequal-strength source z ROC slopes reflect criteria placement and not
(necessarily) memory processes. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 39, 1377-1392
Starns, J. J., Ratcliff, R., & McKoon, G. (2012). Evaluating the unequal-variance and
dual-process explanations of z ROC slopes with response time data and the
diffusion model. Cognitive Psychology, 64, 1–34. doi:
10.1016/j.cogpsych.2011.10.002
Rotello 88
Starns, J. J., Rotello, C. M., & Hautus, M. J. (2014). Recognition memory z ROC slopes
for items with correct versus incorrect source decisions discriminate the
dual process and unequal variance signal detection models. Journal of
Experimental Psychology: Learning, Memory, & Cognition, 40, 1205-1225.
Starns, J. J., Rotello, C. M., Ratcliff, R. (2012). Mixing strong and weak targets
provides no evidence against the unequal-variance explanation of z ROC
slopes: A comment on Koen & Yonelinas (2010). Journal of Experimental
Psychology: Learning, Memory, and Cognition, 38, 793-801.
Swets, J. A., Tanner, W. P., Jr., & Birdsall, T. G. (1961). Decision processes in
perception. Psychological Review, 68, 301–340. doi: 10.1037/h0040547
Treisman, M., & Faulkner, A. (1984). The effects of signal probability on the slope of
the receiver operating characteristic given by the rating procedure. British
Journal of Mathematical and Statistical Psychology, 37, 199-215.
Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an
application to sequential dependencies. Psychological Review, 91, 68-111.
Tulving, E. (1981). Similarity relations in recognition. Journal of Verbal Learning and
Verbal Behavior, 20, 479-49.
Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26, 1-12.
Van Zandt, T. (2000). ROC curves and confidence judgments in recognition memory.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 26,
582-600.
Verde, M. F., & Rotello, C. M. (2004). ROC curves show that the revelation effect is
not a single phenomenon. Psychonomic Bulletin & Review, 11, 560-566.
Rotello 89
Voskuilen, C., & Ratcliff, R. (2016). Modeling confidence and response time in
associative recognition. Journal of Memory and Language, 86, 60-96. doi:
10.1016/j.jml.2015.09.006
Wagenmakers, E.-J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004). Assessing model
mimicry using the parametric bootstrap. Journal of Mathematical Psychology,
48, 28-50.
Westerman, D. L. (2001). The role of familiarity in item recognition, associative
recognition, and plurality recognition on self-paced and speeded tests.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 27,
723-732. DO1: 10.1037//0278-7393.27.3.723
Wickelgren, W. A., & Norman, D. A. (1966). Srength models and serial position in
short-term recognition memory. Journal of Mathematical Psychology, 3, 316-
347.
Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition
memory. Psychological Review, 114, 152-176.
Wixted, J. T. & Mickes, L. (2010). Useful scientific theories are useful: A reply to
Rouder, Pratte and Morey (2010). Psychonomic Bulletin & Review, 17, 436-
442.
Wixted, J. T. & Mickes, L. (2010). A continuous dual-process model of
remember/know judgments. Psychological Review, 117, 1025-1054.
Wixted, J. T. & Stretch, V. (2004). In defense of the signal-detection interpretation of
Remember/Know judgments. Psychonomic Bulletin & Review, 11, 616-641.
Rotello 90
Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory:
Evidence for a dual-process model. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 1341-1354.
Yonelinas, A. P. (1997). Recognition memory ROC’s for item and associative
information: The contribution of recollection and familiarity. Memory &
Cognition, 25, 747–763.
Yonelinas, A. P. (1999a). The contribution of recollection and familiarity to
recognition and source-memory judgments: A formal dual-process model
and analysis of receiver operating characteristics. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 25, 1415–1434.
Yonelinas, A. P. (1999b). Recognition memory ROCs and the dual-process signal
detection model: Comment on Glanzer, Kim, Hilford, and Adams. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 25, 514–521.
Yonelinas, A. P. (2001). Consciousness, control and confidence: The three Cs of
recognition memory. Journal of Experimental Psychology: General, 130, 361-
379.
Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30
years of research. Journal of Memory and Language, 46 (3), 441-517.
Yonelinas, A. P., & Jacoby, L. L. (1995). The relation between remembering and
knowing as bases for recognition: Effects of size congruency. Journal of
Memory and Language, 34, 622-643. doi:10.1006/jmla.1995.1028
Rotello 91
Yonelinas, A. P., Kroll N. E. A., Dobbins I. G., & Soltani, M. (1999). Recognition
Memory for Faces: When Familiarity Supports Associative Recognition
Judgments. Psychonomic Bulletin and Review, 6, 654-661.
Yonelinas, A. P., & Parks, C. M. (2007). Receiver operating characteristics (ROCs) in
recognition memory: A review. Psychological Bulletin, 133, 800–832.
doi:10.1037/0033-2909.133.5.800
Rotello 92
Table 1. Parameters of the models when fit to a confidence-rating ROC with m confidence bins.
Model Sensitivity Variance or Criterion State-response Parameter(s) Mixture Locations mapping parameters Parameter EVSDT d’ -- m-1 -- UVSDT d s m-1 -- HTSDT R, d’ -- m-1 -- MSDT dFull, dPartial l m-1 -- 2HT po, pn -- -- Varies. Maximum = 3(m-1).
Rotello 93
Figure Captions
Figure 1. Top row: Equal-variance signal detection (EVSDT) model decision space
(left panel), example ROCs (middle panel), and corresponding zROCs (right panel).
Bottom row: Unequal-variance signal detection (UVSDT) model decision space (left
panel), example ROCs (middle panel), and corresponding zROCs (right panel).
Figure 2. High-threshold signal detection (HTSDT) model. Left panel: decision
space for targets. Lure decision space is identical to EVSDT. Middle panel: example
ROCs for three recollection probabilities (.2, .4, .6) and a constant d’ (1.5). Right
panel: corresponding zROCs.
Figure 3. Mixture signal detection (MSDT) model. Left panel: decision space. Lure
distribution on the left, distribution for unattended targets (dashed distribution),
and attended target distribution on the right. Middle panel: Example ROCs for three
values of l (.2, .4, .6). Right panel: corresponding zROCs.
Figure 4. Double high-threshold (2HT) model for binary (old-new) decision task.
Left panel: decision space, T = Target; L = Lure ; ? = uncertain state Middle panel:
Example ROCs for three values of po (.2, .4, .6) and three values of pn (.05, .25, .45).
Right panel: corresponding zROCs.
Figure 5. Double high-threshold (2HT) model for confidence rating task. Left
panel: decision space, T = Target; L = Lure; ? = uncertain state. Middle panel:
Example ROCs for po = .6, pn = .45 and three different sets of detect state to response
mapping parameters. Right panel: corresponding zROCs.
Rotello 94
Figure 6. Example fits of the competing models to an item recognition task. Data
(circles) are from a single subject in Egan (1958, Exp. 1), reported in his Table 1.
Best-fitting model predictions are shown. The MSDT and HTSDT models make
identical predictions for these data, though the MSDT requires an extra parameter
to do so. All models except the EVSDT provide acceptable fits (they cannot be
rejected by a G2 goodness of fit measure).
Figure 7. Decision space for Hautus et al.’s (2008) bivariate model of item and
source recognition. Horizontal lines reflect the old-new confidence criteria, which
depend only on the old-new evidence dimension. Curved boundaries indicate the
optimal (likelihood-based) source confidence criteria (“1” = Sure Source B; “6” =
Sure Source A). From: Hautus, M., Macmillan, N. A., & Rotello, C. M. (2008). Toward
a complete decision model of item and source recognition. Psychonomic Bulletin &
Review, 15, 889-905. Figure 8. Reprinted with permission of Springer.
Figure 8. Decision space for Onyper et al.’s (2010) model of source and item
recognition. Like the HTSDT model, this model assumes source information is
recollected with some probability (upper distributions). In the absence of
recollection, responses are based on familiarity (lower distributions). Confidence
ratings (‘1’ though ‘9’) depend only on the item or source dimension. From: Onyper,
Zhang, & Howard (2010). Some-or-none recollection: Evidence from item and
source memory, Journal of Experimental Psychology: General, 139, 341–364, Figure
6A. Reprinted with permission of the American Psychological Association.
Figure 9. Two-alternative forced choice (2AFC) decision space.
Rotello 95
Figure 10. Heathcote et al.’s (2010) model of the 2AFC confidence-accuracy
inversion data. Positive differences result in correct decisions; negative differences
in errors. Confidence is defined by distance to the criterion (0-point). Remember
responses are made for differences larger than the R/K criterion, and know
responses for small differences. The arrow indicates trial-to-trial variability in the
location of the R/K criterion. Adapted from: Clark, S. E. (1997). A Familiarity-Based
Account of Confidence-Accuracy Inversions in Recognition Memory. Journal of
Experimental Psychology: Learning, Memory, & Cognition, 23, 232-238. Adapted with
permission of the American Psychological Association.
Rotello 96
Figure 1.
Rotello 97
Figure 2
Rotello 98
Figure 3
Rotello 99
Figure 4
Rotello 100
Figure 5
Rotello 101
Figure 6
Rotello 102
Figure 7
Rotello 103
Figure 8
Rotello 104
Figure 9
Rotello 105
Figure 10