Andrew M Mccullough

FALSE MEMORY INCREASED BY SELF-GENERATION IN COMPARISONS OF RECOGNITION AND RECALL

Andrew M. McCullough

A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Arts

Department of Psychology

University of North Carolina Wilmington

2010

Approved by

Advisory Committee

Julian R. Keith Dale J. Cohen

Jeffrey P. Toth Chair

Accepted by

h Dean, Graduate School

TABLE OF CONTENTS

ABSTRACT ...... iv

LIST OF TABLES...... v

LIST OF FIGURES ...... vi

INTRODUCTION ...... 1

COMPARING RECOGNITION AND RECALL...... 4

THEORIES OF RECOGNITION AND RECALL ...... 7

Theories of Recognition ...... 9

Theories of Recall ...... 10

Metacognitive Processes...... 11

Directly Comparing Recognition and Recall...... 14

False Memory in Recognition and Recall ...... 17

OVERVIEW OF EXPERIMENTS ...... 19

EXPERIMENT 1...... 19

Memory and Detection Theory...... 20

Current Predictions...... 23

Methods ...... 25

Participants and design...... 25

Materials...... 26

Procedure...... 26

Results and Discussion ...... 28

Confidence ratings...... 28

Sensitivity and bias...... 29

ROC-based measures...... 29

EXPERIMENT 2...... 33

Current Predictions...... 35

Methods ...... 36

Participants, design, and procedure ...... 36

Results and Discussion ...... 37

Type 1 analyses ...... 37

Type 2 confidence...... 38

Metamemory parameters...... 38

ROC-based measures...... 39

GENERAL DISCUSSION...... 42

TABLES...... 49

FIGURES ...... 54

REFERENCES ...... 59

iii

ABSTRACT

The overarching goal of this research was to explore the relation between recognition and recall. A more proximate goal was to investigate a recent report of elevated false memories in cued recall relative to recognition. Two experiments are presented that compared performance on matched tests of recognition and cued recall, and replicated this increase in false memory in confidence ratings (Experiment 1) and binary judgments (Experiment 2). Type-1 signal detection analyses of memory performance (Experiments 1) revealed changes in both sensitivity and response bias. In Experiment 2, the effect on type-1 sensitivity was observed with no effect on response bias. Type-2 analyses (Experiment 2) revealed no effects of test condition on metamemory for studied-item trials, but significant effects on metamemory confidence bias for unstudied-item trials. These results clarify prior studies, showing that self-generation in cued recall impairs memory sensitivity, and imply that previous, inconclusive results were obscured by metacognitive processes.

LIST OF TABLES

Table Page

1. Proportions of Recollect, Familiar, and New Judgments for Recognition and Cued Recall ...... 49

2. Type-1 Signal Detection Measures and Parameters for Recognition and Cued Recall ...... 50

3. Type-1 Signal Detection Measures and Parameters for Recognition and Cued Recall in Experiment 2 ...... 51

4. Mean Type-2 Confidence ...... 52

5. Type-2 Signal Detection Measures and Parameters...... 53

LIST OF FIGURES

Figure Page

1. Classification model for Type-1 responses and theoretical signal detection distributions...... 54

2. General interpretations of memory data according to signal detection theory...... 55

3. Receiver operating characteristic curves for recognition and cued recall in experiment 1...... 56

4. Classification model for Type-2 responses and theoretical signal detection distributions...... 57

5. Type-2 receiver operating characteristic curves for recognition and cued recall in experiment 2...... 58

FALSE MEMORY INCREASED BY SELF-GENERATION

IN COMPARISONS OF RECOGNITION AND RECALL

INTRODUCTION

Two methods for measuring memory performance are recognition and recall tasks, and both are analogous to everyday uses of memory. While recognition refers to an awareness of having previously experienced some target information, recall requires the (re-)production of previously experienced information. In recognition tasks, people are presented with an item (e.g., a word, a face, a melody) and are asked to judge whether they have experienced that item in the past, usually in some specific context (i.e., a study list). For recall tasks, people are reminded of a specific past event and asked to produce (i.e., self-generate) some items that occurred within it.

Everyday life provides abundant examples of this contrast between recall and recognition. After attending a weekend party, for example, one might be asked by a friend who was unable to attend, "who was at the party?" (a recall question), followed by, "was John there?" (a recognition question). The distinction between recall and recognition is also critical for the scientific study of memory. However, although both are generally viewed as conscious, "explicit" forms of retrieval

(Cabeza et al., 1997) they are often studied separately, and have engendered different methods, phenomena, and theories (Haist, Shimamura, & Squire, 1992).

The ubiquity of the distinction between recall and recognition raises two important issues. The first concerns the extent to which recognition and recall engage similar underlying processes. As described more fully below, attempts to understand the relation between these two ways of expressing memory has a long history in psychology; however, it is difficult to find clear statements in the literature about exactly how these two tests are related. The second issue

concerns false memories. That is, to what extent are these two forms of memory susceptible to errors and distortion? The study of false memory has become a central issue in both cognitive psychology and cognitive neuroscience (e.g., Schacter & Tulving, 1994), not only because of its relevance in applied contexts such as eye-witness testimony (see Loftus, 1975), but also because it is viewed as providing clues to the more general operation of memory, both accurate and false

(see Jacoby & Rhodes, 2006; Johnson, Hashtroudi, & Lindsay, 1993; Schacter, Norman,

Koutstaal, 1998).

The purpose of the present research was to address the two issues raised above. Two experiments are presented that directly compared recall and recognition in an attempt to better understand the relation between the two, and to investigate whether the generation process underlying recall makes this form of retrieval more susceptible to false memory. The research is based on unpublished studies by Toth (2005) and McCullough & Toth (2008), both of which found evidence for a high degree of similarity in performance on matched tests of recall and recognition, as well as evidence that, relative to recognition, recall increases false memory.

Importantly, those studies employed a "remember/know" methodology (Tulving, 1985) that, while used extensively in the memory literature (see Gardiner, Ramponi, & Richardson-Klavehn,

2002; Gardiner & Richardson-Klavehn, 2000), has been criticized on theoretical grounds (Dunn,

2004; Rotello, Macmillan, Reeder, & Wong, 2005; Wixted & Stretch, 2004). Thus, one specific goal of the present research was to replicate those prior findings using a more standard memory- testing technique, confidence ratings.

A second goal of the present research was to replicate and extend our prior findings by analyzing memory performance using Receiver Operator Characteristic (ROC) curves and other measures from both type-1 and type-2 Signal Detection Theory (SDT). Type-1 SDT is

2 commonly used in the analysis of recognition memory performance (for a review, see Yonelinas

& Parks, 2007) where it provides measures of both sensitivity (i.e., the ability to discriminate between studied and unstudied items) and response bias (i.e., the overall tendency of a person to respond “old”). The application of type-1 SDT to recall is less common, given the participants’ role in producing both studied and unstudied test items (Green & Swets, 1966), but there are exceptions (see Higham, Perfect, & Bruno, 2009). The present research examined recall in a paradigm that encouraged production of unstudied items, and therefore allows SDT-based measures to be computed.

Although type-1 SDT provides a solid theoretical basis for analyzing memory performance, it does not incorporate metacognitive processes that have recently been argued to be critical for regulating mnemonic accuracy (Koriat & Goldsmith, 1996; Koriat, Goldsmith,

Pansky, 2000; Perfect & Schwartz, 2002). However, Higham has recently shown how type-2

SDT can be used to measure metacognitive processes, both in recall (Higham & Tam, 2005) and recognition (Higham et al., 2009). In the context of memory, type-2 SDT provides two metacognitive measures of performance that are useful for describing the relationship between response accuracy and confidence (for a review of type-2 SDT, see Galvin, Podd, Drga, &

Whitmore, 2003). Monitoring (analogous to sensitivity) reflects a person's ability to discriminate between correct and incorrect responses, while confidence bias (analogous to response bias) reflects the tendency to accept their answers as correct. The present research also employed type-

2 SDT methods in order to compare the metacognitive processes associated with recall and recognition, and to investigate their role in the production of false memories.

The remainder of this introduction is designed to more fully explain and motivate the current research. I start by describing the research by Toth (2005) and McCullough and Toth

(2008) in more detail. I then provide a theoretical background for the two main issues addressed in the current research, that is, the relation between recognition and recall, and findings of false memories on these tests. One specific false memory phenomenon – the revelation effect – is described in particular detail because it bears a high similarity to the effects explored in the present research, and thusly motivated the use of ROC curves and SDT for these investigations. I conclude this introduction with an overview of the experiments. Details about the application of type-1 SDT to recognition and recall are provided in the introduction to Experiment 1; details about type-2 SDT are provided in the introduction to Experiment 2.

COMPARING RECOGNITION AND RECALL IN THE REMEMBER/KNOW TASK

Toth (2005) compared recognition and recall in six experiments using variants of the remember/know procedure initially introduced by Tulving (1985; see also Gardiner &

Richardson-Klavehn, 2000). Experiment 1 by Toth provides an example of this procedure, along with the mixed-test design that was adapted for the current research. In this experiment, participants studied a list of common, 5-letter English words (e.g., “truck”, “smart”), and were then given a memory test that included both recognition and recall trials. On recognition trials, participants were shown studied or nonstudied words which they were asked to classify as

Recollected (R), Familiar (F), or New (N). Participants were told to classify a word as R when they could clearly remember details surrounding their initial study of the word, F when the word was familiar but they could not remember any specific study details, and N when the word neither elicited details nor was familiar.1 On recall trials, participants were presented with word stems (e.g., “tru--”, “sma--”) which they were told to use as cues for recalling words studied

1 Rather than “Familiar”, many researchers use the term “Know” to denote items that are familiar but lack episodic detail.

4 earlier, guessing if necessary. After completing the stem, they then provided an R/F/N judgment, just as they had for recognition trials.

The data from this experiment, mean proportions of R, F, and N judgments for recognition and recall trials, are shown in Table 1a. Note that, for recognition trials, participants provided a response to every studied item, and an equal number of unstudied items, and thus these proportions sum to 1.0. For the recall trials, in contrast, participants did not have the opportunity to respond to every item in the stimulus set because they sometimes failed to generate these words. The recall data is thus presented in two ways, as unconditionalized and conditionalized proportions. The unconditionalized proportions (bottom row) represent the proportions of R, F, and N responses out of all possible trials (i.e., out of all possible studied and unstudied items cued in recall); thus, these values sum to the mean proportions of studied and unstudied items that were output in response to all of the recall cues. For the conditionalized proportions, the number of R, F, and N responses is divided by the number of studied or unstudied words output.2 Note that this correction is analogous to that used by Koriat and colleagues to compute memory accuracy (see Koriat & Goldsmith, 1994, 1996). More relevant for present purposes, it puts the recall data on the same scale as recognition (i.e., the conditionalized recall proportions sum to 1.0) thereby allowing a direct comparison of performance in the two test types.

Two interesting results emerged from this experiment. First, looking at studied items

(left panel of Table 1a), one can see that performance was nearly identical between recognition and conditionalized recall. These data suggest the processes underlying recognition and recall are highly similar, differing primarily in initial access to (i.e., the source of generation of)

2 As in the priming literature, restricting analyses to the critical set of target words allows the effects of studying an item to be compared to a baseline of outputting the same item when it had not been studied.

5 relevant target information. Once this difference is removed, by conditionalizing recall proportions on output of test items, recognition and recall performance are nearly identical.

Despite this evidence of similarity between processes underlying recognition and recall

(see also, Hamilton & Rajaram, 2003), the data revealed an important difference between test formats. In particular, the proportion of F responses to unstudied words (right panel of Table 1a) was much higher in conditionalized recall than in recognition. This elevation in false alarms for cued recall, replicated in five additional experiments by Toth (2005), suggests that the process of generating items increased their subjective familiarity, compared to when those items were presented externally (i.e., by the computer) on the recognition memory trials.

McCullough and Toth (2008) replicated the above findings using a novel sample (UNCW undergraduates), a novel set of stimuli, and a slightly different test procedure – asking participants to first make an old/new (i.e., studied/unstudied) judgment, followed by a recollect/familiar classification only for the items judged “old”. As shown in Table 1b, both patterns described above were replicated. That is, for studied items, there was a close correspondence of response proportions across test type (all differences n.s.) while, for unstudied items, there was a significant increase in F responses in recall as compared to recognition.3 This consistent finding is particularly interesting when one considers that the set of (generated) items in which the false alarms increase are all words that are counterbalanced to recognition trials.

Consequently, the procedure allows one to conclude that something inherent to self-generation is increasing false memory in cued recall.

What explains the increased false alarm rate in cued recall observed by Toth (2005) and

McCullough and Toth (2008)? One possibility is that, relative to recognition, self-generation of

3 Note that conditionalized recall proportions are necessarily interdependent (i.e., constrained to sum to 1.0); thus, the increased proportion of F responses is accompanied by a reduced proportion of N responses.

6 items in recall increases the processing fluency, and thus the perceived familiarity, of these items. Fluency has been identified as an important factor affecting recognition memory and other judgments (Jacoby & Whitehouse, 1989; see also Oppenheimer, 2008; Whittlesea, 1993;

Whittlesea & Williams, 2001). However, research shows that processing fluency affects response criterion placement (Whittlesea; see also Jacoby & Whitehouse), suggesting that effects would be similar for studied and unstudied items. The consistent finding that hit rates do not differ between recognition and cued recall (Toth; McCullough & Toth) refutes a simple fluency-based explanation of the increased false alarm rate.

The finding that an independent variable affects unstudied, but not studied, items is unusual and relevant to theories of memory (see Jacoby, Shimizu, Daniels, & Rhodes, 2005).

One potential way of explaining this pattern is to assume that self-generation in recall produced a shift in the strength or familiarity distributions underlying old and/or new items, perhaps in addition to a change in response bias. Indeed, an explanation of this sort has been put forth to explain a similar pattern of data (i.e., a memory effect selective to unstudied items) in the revelation effect literature (Verde & Rotello, 2004). One of the goals of the present research is to evaluate whether a similar “distribution-shift” explanation can explain the difference in false alarm rates for recall vs. recognition. I return to this issue in the False Memory section below.

First, however, I provide some background on the nature of the processes thought to underlie recognition and recall.

THEORIES OF RECOGNITION AND RECALL

What are the processes that underlie recognition and recall performance? An early idea was that the two forms of retrieval reflected the same memory process – activation of an

7 underlying memory trace – and that performance differences between the two simply reflected the superiority of recognition in activating the trace (Hunt & Ellis, 2004). This idea was consistent with the fact that recognition is generally easier than recall, and often results in higher memory scores. It is also consistent with Tulving's (1983) claim that recognition is similar to recall, but involves a highly specific “copy cue” for retrieval. However, subsequent research cast doubt on at least a simple form of this single-process hypothesis, which predicts that (a) recognition will always be higher than recall, and (b) recognition and recall will be affected similarly by manipulations that affect memory strength. Both of these predictions have been refuted. Specifically, recall can exceed recognition, as in the case of recognition failure of recallable words (Tulving & Thomson, 1973; see also, Watkins, 1974), and some experimental variables, such as word frequency (see Gregg, 1976), have opposite effects on recognition and recall performance.

Given these findings, most of the research over the last two decades has progressed as independent domains and there are few clear statements about the relationship between recognition and recall. However, from those decades of research, a variety of theories have been proposed to explain each of these types of retrieval separately. In the following subsections, I provide brief overviews of these theories. I then describe recent research and theory on the role of metacognitive processes in memory tasks. These developments provide motivation for the current Experiment 2, which used type-2 SDT to explore metacognitive processes in recognition and recall. I conclude this section by describing research that has directly compared recognition and recall, noting the advantages and disadvantages of the mixed-test approach used in the present research.

Theories of Recognition

Early theories proposed a single, evaluative process to underlie recognition (see

Bernbach, 1967), such that test items are evaluated along a psychological scale of evidence often characterized as memory strength or familiarity. This "single-process" view of recognition is embodied in Global-Matching Models, such as SAM (Gillund & Shiffrin, 1984) and MINERVA

2 (Hintzman, 1988), and in the application of detection theory (Green & Swets, 1966) to recognition (e.g., Donaldson, 1992; Donaldson & Good, 1996; MacMillan & Creelman, 2005;

Mickes, Wixted, Wais, 2007). However, numerous studies suggest that, in addition to a sense of memory strength or familiarity, recognition can also invoke a process of controlled retrieval that results in lucid memory of contextual details from the encoding event (e.g., Tulving & Thomson,

1973; see also Jacoby et al., 2005; Mandler, 1980). Stated differently, there is evidence that recognition cannot be fully described as involving only a process of evaluation, because encoding details or “source” information may also be retrieved or generated from memory

(Gardiner, Ramponi, & Richardson-Klavehn, 1998; Gardiner, Richardson-Klavehn, & Ramponi,

1998; Johnson et al., 1993; Vilberg & Rugg, 2007).

The above findings have led to the development of “dual-process” models of recognition

(e.g., Jacoby, 1991; Jacoby & Dallas, 1981; Mandler, 1980) which propose a controlled-retrieval or recollection process that works in parallel with a more strength-based process of familiarity evaluation. In fact, hybrid theories such as the “dual-process/signal-detection” model have been supported by ROC data from recognition tasks in normal (Yonelinas, 1994) and amnesiac populations (Yonelinas, Kroll, Dobbins, Lazzara, & Knight, 1998), as well as by a wealth of neuro-imaging data (Eichenbaum, Yonelinas, & Ranganath, 2007; Elfman, Parks, & Yonelinas,

2008; Vilberg & Rugg, 2007).

The current project will not settle the ongoing debate between “single-process” and

“dual-process” theories of recognition, as a variety of research has revealed compelling evidence on both sides (cf. Parks & Yonelinas, 2007; Wixted, 2007). In later sections, I describe the motivation of the use of signal detection theory (SDT) to analyze memory performance. Given the application of SDT, the experiments are most easily viewed in terms of a single, strength- based recognition process. At the same time, the use of type-2 SDT in Experiment 2 tacitly assumes the involvement of metacognitive processes in task performance, thus, going beyond a simple “single-process” view (see also Bellezza, 2003). Moreover, in the General Discussion, I consider interpretations of the present data if a controlled retrieval process (e.g., Recollection) was assumed to affect performance, as is proposed by dual-process theories.

Theories of Recall

As with recognition, a variety of theories have been developed from studies of recall. One of the earliest accounts suggested that two main processes underlie recall performance: generation and recognition. First, potential memory information is covertly generated, and then generated candidates undergo a recognition-like process of evaluation (e.g., Bodner, Masson,

Caldwell, 2000). These Generate-Recognize (GR) theories can be traced to Kintsch (1970; see also Anderson & Bower, 1972; Bahrick, 1970), and in simplest form, predict that a recognition test probe will incite better performance than any recall cue. However, as Jacoby & Hollingshead

(1990) discuss, Tulving & colleagues (e.g., Tulving & Thomson, 1973) refuted early GR theories based on experimental findings that, in some situations, recall can exceed recognition.

Such limitations of early GR theories led to proposals that recall may be accomplished by the Direct Retrieval (DR) of information from memory (e.g., Guynn & McDaniel, 1999; Jacoby,

1998). Theories of DR posit that the cues available for recall can constrain processing such that

10 only relevant information is retrieved, irrespective of whether retrieval is controlled or automatic.

Admittedly, the basic theories of GR and DR are just starting points for exploring recall performance, and both have evolved into many forms. Though the models may seem to be in opposition, there is a general consensus that both strategies are used in different situations

(Curran & Hintzman, 1995; Jacoby, 1998). Variations of GR theories have been proposed that do not succumb to the refutations by Tulving and colleagues described above. Moreover, recent models propose that DR could be described as a contextually-constrained version of GR (e.g.,

Higham & Tam, 2005; Jacoby & Hollingshead, 1990). That is, if recall cues are sufficient to specify a particular prior experience (e.g., a study list), those cues could limit the generation process to information from only that specific experience. Such a theory implies that the varied strategies employed across the wide spectrum of recall tasks are not easily divided into two distinct categories.

Most relevant to the current research, Toth (2005, Experiment 2) compared GR and DR strategies in cued recall by directly manipulating test instructions, but observed little difference between the two conditions. This result, coupled with the high degree of correspondence across test format between R/F/N judgments for studied items, led Toth to conclude that, at least for the stimuli and retrieval cues he employed, recall performance was most consistent with a GR theory

(see also Jacoby & Hollingshead, 1990). The present experiments will also bear on the issue of how best to characterize cued recall performance.

Metacognitive Processes in Recognition and Recall

Metacognition refers to thoughts, feelings, and beliefs about one’s own cognitive processing; metamemory describes metacognitive phenomena related to remembering.

Metamemory is typically investigated by collecting some type of performance judgment during

11 the study phase (e.g., judgments of learning) or test procedure (e.g., feelings of knowing, ratings of confidence-in-accuracy) (Hunt & Ellis, 2004). To paraphrase Dunlosky & Bjork (2008), any attempt to investigate memory or metamemory alone will likely fall short, as they are intricately linked. Moreover, the complex interplay of memory and metamemory is likely related to the ideas of consciousness, and particularly subjective awareness. Tulving (1985) reminded memory researchers that conscious remembering is an intrinsically subjective phenomenon, and encouraged the development of methods for exploring the subjective awareness of memory and other cognitive phenomena. This general view permeates contemporary theories of memory and metamemory, as evidenced by Koriat, Goldsmith, and Pansky’s (2000) description of the human participant as “an active agent that has at his/her disposal an arsenal of cognitive strategies and devices that can be flexibly applied in order to reach certain goals. The choice of such strategies as well as their online regulation is based on the subjective monitoring of these processes.” As theorists have begun to explore how subjective awareness affects task performance, a great deal of effort has been directed at developing methods to better assess the separate roles of memory and metamemory processes in overall performance (e.g., Higham & Tam, 2005; Koriat &

Goldsmith, 1994, 1996).

Koriat and colleagues have designed a number of studies to elucidate mnemonic processes at the metamemory level. Noting that remembering in the real-world is often more concerned with accuracy than with quantity, Koriat & Goldsmith (1994) manipulated response option (i.e., free/forced) and test format (i.e., recognition/recall) to examine monitoring and control of memory accuracy. The authors developed a framework that distinguishes between measures of memory that are quantity-based and accuracy-based, noting that effects shown between recognition and recall are often confounded with response option. That is, forced

12 recognition (i.e., a quantity-based, or input-bound measure) is most frequently compared with free recall (i.e., an accuracy-based, or output-bound measure). The authors suggest that many comparisons of recognition and recall have focused on memory quantity, while overlooking memory accuracy. Subsequent research has shown that forcing output in recall tasks reduces memory accuracy without affecting memory quantity (Koriat & Goldsmith, 1996). Moreover, the increased accuracy in free recall is a function of a reduction in errors of commission, compared to forced report (Koriat & Goldsmith, 1996; Higham & Tam, 2005). In other words, when given an opportunity to withhold responses in recall tasks, participants use metacognitive processes to monitor generated items and control output of those items in an adaptive manner.

While the current research differs considerably from studies comparing free and forced report procedures, the research described above does inform the current investigation of recognition and cued recall. In particular, the current research explored a reported increase in false alarms, or errors of commission, in cued recall. The current paradigm did not include a free-report phase, but Experiment 2 did explore metacognitive processes in recognition and cued recall using a paradigm that I describe below. So despite differences between the paradigms – which preclude direct comparisons between the current results and studies comparing free- and forced-report procedures – work by Koriat & colleagues (Koriat & Goldsmith, 1994, 1996; see also Higham & Tam, 2005) provides the foundation for basic predictions in the current

Experiment 2.

Higham and colleagues (2009) have examined metamemory in recognition, using a variant of signal detection theory (SDT). While type-1 SDT is commonly used to analyze recognition performance, Higham et al. suggest that type-2 SDT is well suited to examine the relation between memory accuracy and confidence. This relationship is necessarily

13 metacognitive, linking memory performance with processes that could allow for the monitoring and control of responding. I describe type-2 SDT in the overview to Experiment 2; for now, note that the current investigations are related to both of these methods for exploring metamemory.

Specifically, the current Experiment 1 compares memory performance within subjects on tests of recognition and cued recall with a measure of memory accuracy analogous to that used by Koriat and Goldsmith (1996). Moreover, the current method uses confidence ratings, which allows for the application of Higham et al.’s method of exploring metamemory with type-2 SDT (see

Experiment 2). I describe the benefits of the current research more fully in the next section.

Directly Comparing Recognition and Recall

Given the extensive literatures on recall and recognition, it is surprising that there is not more research directly comparing these two forms of retrieval. A series of "successive testing" studies by Tulving and others - primarily directed at the phenomenon of recognition failure of recallable words - probably constitutes the most concerted effort to understand relations between recognition and recall. This research is broadly summarized by the empirically-based Tulving-

Wiseman function (Tulving & Wiseman, 1975; see also Flexser & Tulving, 1978; Wiseman &

Tulving, 1976) which predicts a moderate but consistent association between recognition and recall. However, it is important to note that the Tulving-Wiseman function does not directly relate performance on the tests, but rather, shows the predictive relation between recognition and the proportion of recallable words that will also be recognized. That is, the Tulving-Wiseman function was primarily designed to describe the phenomenon of recognition failure of recallable words. Moreover, that paradigm, in which the same items are used in successive tests of recognition and recall, has been strongly criticized on statistical grounds (e.g., Simpson’s paradox, see Hintzman, 1980, 1992, 1993); it also has the drawback that responses to a memory

14 probe in one test format (e.g., recall) may influence responses to that same item in a subsequent test format (e.g., recognition).

Aside from the recognition failure studies by Tulving and colleagues, the majority of studies directly comparing performance across test format find that recognition is typically better than recall (Aggleton & Shaw, 1996). Studies have also revealed dissociations between recognition and recall performance, for example, as a function of word frequency (see Gregg,

1976; Reder, Anderson, & Bjork, 1974). A number of studies of neurological patients have shown that amnesiacs’ recall performance is often impaired to a much greater degree than is their recognition performance (e.g., Haist, et al., 1992; Warrington & Weiskrantz, 1968), but again, these results may have been influenced by the quantity/accuracy distinction described above (see

Koriat & Goldsmith, 1994, 1996). Craik (1979) has used these patterns to suggest that, while recognition performance is typically better, the critical difference between recognition and recall tasks is the amount of effortful, self-initiated processing required.

As described above, the present study is an extension of Toth’s (2005) direct comparisons of recognition and cued recall, which used a remember/know (R/K) procedure (Tulving, 1985).

Tulving’s first use of the R/K procedure compared four memory tasks – recognition, free recall, and two forms of cued recall – and showed that states of subjective awareness vary across tasks.

Hamilton & Rajaram (2003) replicated his work, using a between-groups design to obviate the criticisms of successive testing, and specifically examined recollective experience across tests.

In four experiments, the authors observed no difference between recognition and cued recall for the proportion of retrieved items that were also given “remember” responses (cf. Tables 1 and 2).

Note that, Hamilton and Rajaram compared the dependent measure Remember/total Retrieved, which is analogous to the transformation of cued recall data presented by Toth.

The current paradigm alleviates a number of concerns that have plagued previous comparisons of recognition and recall. Many prior studies employed a successive-testing procedure, in which participants take recognition and recall tests for the same material in succession. By randomizing recognition and recall trials within a single test list, the current mixed-test paradigm avoids confounds that limit successive-testing procedures, including effects of test order, response strategy changes between tests, and memory interference from having already been tested on an item (in the other format). Additionally, because all stimuli are counterbalanced across test format and study condition, the paradigm allows for within-subject comparisons of recognition and recall performance. The final advantage of the current paradigm is the simplicity with which recognition and recall are scaled for comparison.

The current research uses a confidence rating method to explore effects of memory that were previously examined with a R/K methodology. Yonelinas (2001) empirically compared the

R/K and confidence rating methods (as well as the process-dissociation procedure, Jacoby,

1991), and found strong evidence that measures derived from each methodology converged upon similar results about underlying mnemonic processes. His study suggests that experimental results will generalize across test methodologies, and thusly motivated the current research.

Nevertheless, the current paradigm does not come without limitations. Most notably, there are items tested in recognition that are not tested in recall (i.e., not generated to a cue), thus raising questions about the assumptions underlying the application of SDT. One of the goals of the present research is to address this issue by directly comparing SDT parameters derived from recall to those derived from recognition, in which all items are tested. A major goal of this comparison is to better understand the elevated false memory rate observed by Toth (2005) and

McCullough & Toth (2008). As described in the next section, the current project will inform

16 false memory research, as the presently-explored effect may reflect the operation of a novel mechanism underlying false memory (namely, self-generation).

False Memory in Recognition and Recall

False memory has become a central issue in memory research, both for applied reasons and because false memory effects are thought to reveal fundamental characteristics of the processes underlying normal, accurate memory performance (for a review, see Schacter et al.,

1998). There are a handful of mechanisms thought to underlie the occurrence of false memory

(Schacter, 1999), including spreading activation (as shown in the Deese-Roediger-McDermott paradigm; Roediger & McDermott, 1995), fluency misattributions (as shown in the false fame effect; Jacoby, Woloshyn, & Kelley, 1989), and source confusion (as shown in the misinformation effect; Loftus, 1975).

The revelation effect, an increase in false recognition of “revealed” items (Watkins &

Peynirciouglu, 1990), has recently been used to constrain theories of memory (e.g., Verde &

Rotello, 2004). In this line of research, performance on standard recognition trials is compared to trials in which test words are revealed from some degraded form (e.g., presenting test words letter-by-letter, or as an anagram along with a solution code). Using a variety of “revelation tasks”, many researchers have shown increases in false memory compared to standard recognition tasks, but failed to elucidate the underlying nature of this effect (see Verde &

Rotello, 2003). Verde & Rotello (2004) divided two decades of revelation studies into two categories, astutely noting a critical difference. In some studies the revealed item was different than the actual recognition test item (cf. Westerman & Greene, 1996), while in other studies the revelation manipulation was applied to the test items themselves (cf. Peynirciouglu & Tekcan,

1993). The authors thus explored the effects of unrelated-item and same-item revelation tasks,

17 using Receiver Operating Characteristic (ROC) curves to compare each condition to standard recognition.4

Verde & Rotello (2004) found that, compared to standard recognition, the unrelated-item revelation condition produced an increase in “old” responses to both studied and unstudied items. In contrast, same-item revelation was found to increase “old” responses only for unstudied items. Under a signal detection framework, these results suggest that revealing an unrelated item prior to presenting a recognition (test) probe causes a liberal change in response bias, but that revealing the test item itself (prior to recognition judgment) causes a true decrement to memory sensitivity (Verde & Rotello, 2004). In the following section, I describe these theoretical explanations in relation to the current research. Importantly, the authors used confidence-based

ROCs to elucidate memory phenomena that could not be understood via simpler applications of

SDT. Perhaps most important, the effect of the same-item revelation condition is similar to the effect shown by Toth (2005), as both effects are shown only with unstudied items, and neither effect is well described by any of the mechanisms of false memory noted above. Thus, the methods and analyses presented by Verde & Rotello (2004) guided the current project for two specific reasons: (a) the afore-mentioned similarities in methods and findings, and (b) the opportunity to explore the effects of self-generation using confidence based-ROC curves. The application of confidence-based ROC curves (i.e., type-1 SDT) is common in memory research, and is described in the following section.

4 The authors had previously shown that more common measures of performance (e.g., d’) are unable to account for the revelation effect, and argued that ROC-based da is a better measure of memory sensitivity (Verde & Rotello, 2003).

OVERVIEW OF EXPERIMENTS

The goals of the current research are to replicate and further examine the effect of false familiarity due to self-generation. While the increase in false memory in cued recall has only been shown with the remember/know method, the confidence rating method may confer measurement advantages (see Donaldson & Good, 1996), and has probably shaped theories of memory more so than any other experimental methodology (Parks & Yonelinas, 2007).

Moreover, Yonelinas (2001) has shown that the remember/know method captures similar mnemonic phenomena as the confidence rating method, and the latter has recently helped illuminate a similar effect of false memory (Verde & Rotello, 2004). Thus, the current research employed a confidence rating method and two analytic approaches based in signal detection theory (SDT) to investigate the relation between recognition and cued recall, and the increase in false memory shown in cued recall (Toth, 2005). Experiment 1 employed analyses based in type-

1 SDT to compare recognition and cued recall, collecting confidence ratings rather than R/F/N judgments, as done previously by Toth. By subjecting the effect of false familiarity due to self- generation to rigorous signal detection analyses, the present research seeks to bridge the gap between this effect and other mechanisms of false memory. As discussed above, metamemory parameters have permeated theories of memory performance. Thus, Experiment 2 employed analyses based in type-2 SDT to investigate metamemory processes in the current paradigm.

Prior to each experiment, I describe each method and relevant predictions in detail.

EXPERIMENT 1

The goals of Experiment 1 are twofold: (a) to replicate the two patterns observed by Toth

(2005) and McCullough & Toth (2008), that is, comparable performance on studied items and

19 increased false alarms in cued recall, and (b) to further elucidate the nature of the false memory effect by employing type-1 signal detection theory to compare ROC curves. The first important question this experiment will answer is whether the increase in false memory will extend to confidence ratings. The second theoretical issue concerns the relation between present and past investigations of the false memory effect (i.e., exactly how will the cued recall-based increase in false alarms manifest in confidence ratings?). A third issue addressed by this experiment is how the present effect relates to other false memory phenomena, such as the revelation effect. Before presenting specific predictions, I first describe the application of signal detection theory to memory research and briefly explore theoretical interpretations of memory phenomena. I then present the methods and results of Experiment 1.

Memory and Detection Theory

Signal Detection Theory (SDT) has proven a useful statistical framework for collecting and analyzing recognition data. Many factors led quite naturally to the use of SDT in characterizing recognition memory; most notably, (a) the recognition test situation is analogous to a signal detection task, with the signal being “oldness” or “familiarity”; and (b) recognition performance is well capture by the 2x2 combination of test-probe status (studied, unstudied) and the participant’s response (“old”, “new”). These combinations yield the standard SDT measures of hits, misses, false alarms, and correct rejections (see Figure 1a). Theoretically, as shown in

Figure 1b, two distributions of items (studied, unstudied) lie on an axis of increasing evidence, trace strength, or familiarity. The overlapping distributions are presumably separated by the effects of the study experience. Participants are thought to place a response criterion (C) along the strength axis, dividing the distributions into four areas that correspond to the dependent measures. That is, the proportions of responses (“old”, “new”) to studied and unstudied items

20 correspond to the four areas under the theoretical distributions. Useful measures of sensitivity

(e.g., d’) and response bias (e.g., B”D), whose formulas are derived from the underlying theory, can then be calculated (see Macmillan & Creelman, 2005). Sensitivity (i.e., the ability to discriminate studied and unstudied items) is indexed by the distance between the means of the distributions, while response bias (i.e., the tendency to respond “old”) is commonly measured relative to the point where the curves intersect. Recall data is generally not analyzed with SDT, due to the fact that, in most recall paradigms, misses and correct rejections are not reported (i.e. participants withhold them). As such, proportional hit and false alarm rates cannot be computed

(Green & Swets, 1966). Critical to the present research, proportional hit and false alarm rates can be calculated, specifically, because participants provide misses and correct rejections (i.e., “new” responses to relevant items generated in cued recall).

In recent literature, SDT has proven particularly useful in exploring a false memory phenomenon. Verde and Rotello (2004) used a confidence rating procedure to explore the nature of the revelation effect. Figure 2b depicts Verde and Rotello’s theoretical interpretation of the same-item revelation condition. As compared to standard recognition (Figure 2a) – in which the study experience has separated the distributions by two (arbitrarily drawn) units – in the revelation condition, the act of revealing the test item is thought to shift the distribution of unstudied items to the right along the axis of evidence. Thus, sensitivity (as indexed by the distance between the means of the distributions) is reduced, and response bias (as measured relative to the intersection of the two curves) can become more liberal. In comparing Figures 2a and 2b, note that the area representing false alarms differs, while the area for hits remains constant. Thus, a shift in the unstudied-item distribution could also explain the findings of increased false alarms in cued recall, compared to recognition (Toth, 2005).

Figure 2c presents a theoretical depiction of an alternative data pattern, for comparison to the standard recognition depiction (Figure 2a) and the most plausible interpretation of the effects of self-generation (Figure 2b). Figure 2c represents a fluency-based hypothesis, which suggests that the process of generating items increases the processing fluency of those items. Note in this depiction how an increase in fluency would shift both distributions and affect the hit and false alarm rates to similar degrees, thus manifesting as a change in response bias. Figure 2c also depicts a criterion-shift hypothesis, which suggests that the act of generating test items in cued recall causes participants to adopt a more conservative response criterion. Much like the fluency hypothesis, a simple criterion shift would affect the hit and false alarm rates similarly, and thus a simple effect increase of processing fluency or change in response bias does not fit the observed effects of self-generation. Keep in mind, these are only basic interpretations. For example, self- generation might increase the familiarity of studied and unstudied items to different degrees, or it could affect response criterion placement in addition to moving one or both of the distributions.

While signal detection theory describes recognition performance well, the application of

SDT to memory can be intricate, particularly in regard to the assumptions underlying various parameters. Though the debate is beyond the scope of this thesis, many theorists support empirical methods based in SDT, but caution against interpretations based solely on SDT parameters, particularly those that don’t reflect performance across a range of response criteria

(e.g., Donaldson & Good, 1996; Verde & Rotello, 2004; Yonelinas, 1994, 2001; Yonelinas &

Parks, 2007). A confidence-based receiver operating characteristic (ROC) curve is a plot of proportions of responses to studied items against proportions of responses to unstudied items across a range of response confidence. ROCs are typically created by having participants make memory decisions on a scale of confidence (e.g., 1 = sure new to 6 = sure studied) and plotting

22 cumulative response proportions for studied vs. unstudied items at each level of confidence. That is, the proportion of “6” responses to studied items is plotted against the proportion of “6” responses to unstudied items. Next, the cumulative proportion of “5” and “6” responses to studied items is plotted against the same cumulative proportion for unstudied items, and so on through the proportions for confidence level “1”.5 Because the resultant curve reflects memory performance across a range of subjective confidence, measures derived from ROCs have been supported over simpler measures (see Verde & Rotello, 2003). However, despite the utility of

ROC analyses, ROC-derived parameters also require theoretical assumptions, and so do not necessarily obviate the criticisms of applying detection theory to memory research. Nonetheless, a major goal of this research project was to investigate the false memory effect via SDT. Because debate continues over which is most appropriate, I calculated a number of measures of sensitivity.

Current Predictions

In previous research, Toth (2005) observed increased false memory in cued recall, relative to recognition. The current Experiment 1 seeks to replicate the effect in a confidence rating procedure, which allows for a number of predictions at varying levels of analysis. Toward the primary goal of replication, the current empirical design tests two predictions. The first prediction, Hypothesis 1a, is that mean confidence for studied items will not differ significantly across test formats, whereas mean confidence for unstudied items will be significantly higher in cued recall relative to recognition. In prior studies, the effect of false memory was revealed by comparing proportions of responses that are analogous to the SDT-based measures of hit rate

(HR) and false alarm rate (FAR). Thus, for Hypothesis 1b, I predict that proportions of high

5 Note that the cumulative proportion of confidence judgments at the lowest level of confidence includes all trials, assuming participants must provide a confidence judgment for every test trial.

23 confidence responses to studied items (i.e., simple HR) will be equal across test formats, but that proportions of high confidence responses to unstudied items (i.e., simple FAR) will be significantly increased in cued recall.6

The two predictions above are proposed to most directly replicate the results of Toth

(2005); however, a secondary goal of this project was to vigorously explore the false memory effect using SDT. While Hypotheses 1a and 1b will satisfy the goal of replication, the former is not a signal detection analysis, and the latter is based upon measures that have been criticized in studies of memory (e.g., Yonelinas, 2001). Drawing on the signal detection interpretations described above, increased confidence selective to unstudied items (in cued recall) is indicative of a decrease in memory sensitivity (see Figure 2a). Thus, for Hypothesis 2, I predict that sensitivity will be significantly reduced in cued recall, compared to recognition. There are a number of ways to test this prediction, which I discuss below. In addition to a change in sensitivity, Toth observed a change in response bias in five of eight conditions, such that responding was significantly more liberal in cued recall relative to recognition. The effect was highly variable in size, and, in some cases, was found despite observing a significant effect for only unstudied items. Thus, for Hypothesis 3, I predict that responding will be liberal in cued recall, compared to recognition. As the primary effect of self-generation is not thought to be on response bias, I predict only a trend toward more liberal responding in cued recall, as measured by B”D (Donaldson, 1992).

In prior studies, Toth (2005) observed a significant effect of test format on sensitivity in seven of eight conditions. However, because those studies employed a remember/know methodology, only the simplest measures of sensitivity could be computed. However, the SDT-

6 To compute simple hit and false alarm rates from confidence ratings, a common method was used in which the response scale (1-6) is bifurcated to represent responses of “new” (1-3) and “old” (4-6).

24 based measures of HR and FAR (i.e., those used by Toth, and in Hypothesis 1b) only sample memory performance at a single response criterion. A number of methods for quantifying memory sensitivity have been proposed from the basis of SDT, only some of which are calculated based on HR and FAR (e.g., d’, A’). Because of such limitations, Verde & Rotello

(2003) supported the sensitivity measure da, suggesting that it better accounts for the asymmetry of observed ROCs. Still others suggest that, when using the confidence rating method, Az is the best measure of sensitivity (Stanislaw & Todorov, 1999). Taking a different approach, Yonelinas

(1994) suggested the application of non-linear regression to observed ROC data, to establish the best-fitting quadratic function. The method provides an estimate of the recollection process thought to alter ROCs (see Yonelinas, 1994), and simple calculations can then provide a highly sensitive measure of the area under the curve, A’r (see Donaldson & Good, 1996). Because each measure has unique limitations, I report each of them. So, despite the fact that recognition and cued recall performance only differ for unstudied items, prior studies have revealed changes in both sensitivity and response bias. However, those studies were not designed for SDT-based analyses, and only the simplest measures could be computed. A primary goal of the current project was to clarify those results by employing an experimental procedure directly based in signal detection theory.

Methods

Participants and design

Thirty-two students at the University of North Carolina Wilmington (3 male, mean age =

18.3) volunteered to participate via an online recruitment system. Students were given course credit for the participation. The experiment employed a 2x2, within-subject, factorial design, in

25 which test format (recognition, cued recall) and item status (studied, unstudied) were manipulated.

Materials

Stimuli consisted of 188 five-letter words (180 test items, plus 8 buffer items) all of moderate frequency (e.g., “glass”, “clown”). The words were selected such that each had a three- letter word stem that was unique within the experimental list but had at least two common

English completions. Test words were distributed into four 45-word sublists that were equated for mean frequency of usage (Kucera & Francis, 1967) and number of words beginning with the same letter. For each participant, two of the sublists were presented at study, while the other two sublists served as unstudied words on the subsequent memory test. The four sublists were rotated over participants such that each sublist, and thus each word, was presented equally often in the four conditions creating by crossing item status (studied, unstudied) with test format

(recognition, cued recall). Word presentation order was separately randomized for each participant in both the study and test phases. All testing was conducted on Windows-based personal computers equipped with colored monitors and with e-Prime v1.2.

Procedure

All participants were tested individually in a closed room by a single experimenter.

Participants began by completing a paper consent form and brief demographics and medical questionnaires. The experimenter then initiated the experimental program, displaying instructions for the study phase on the monitor directly in front of the participant. All instructions, critical items, and cues were centrally presented in white against a black background in Arial font (size 16). Participants read all instructions at their own pace, and verbally described each task (i.e., study and test) to the experimenter prior to beginning. The

26 experimenter, present to ensure task comprehension and attentiveness, was seated next to the participant.

Instructions informed participants that they would be seeing a list of words presented one at a time and that their task was to say each word aloud, and remember as many of the words as possible for a later memory test. Target words were presented for 2 s each, and were separated by a 500 ms interval during which the screen was blank. Four primacy and four recency buffers

(i.e., untested items) were presented at the beginning and end of the study list.

Immediately following the study list, participants were told that the memory test would consist of two kinds of trials, recognition and recall. For recognition trials, participants were told that complete words would be presented in the center of the screen and that they were to make a

6-point confidence rating about whether they thought each word was on the study list. The response scale (1 = very sure new, 2 = somewhat sure new, 3 = guess new, 4 = guess old, 5 = somewhat sure old, 6 = very sure old) was presented at the bottom of the screen in yellow, Arial font (size 16), and was displayed 500 ms after the word appeared. Participants entered their responses using the keyboard.

For cued recall trials, participants were told that a word stem would be presented as a retrieval cue for a previously studied word. They were told to type in their completion word, guessing if necessary, and to avoid using plurals and proper nouns (neither of which were on the study list). Once entered, the participant's completion word was displayed below the word stem, just as in recognition, at which point, they were to make a 6-point confidence rating for their completion word. Following a response and a 500 ms intertrial interval (ITI), the next trial began. If a participant was unable to provide a completion to the cue, no other responses were

27 collected. Once displayed, test items, the response scale, recall cues, and completions remained on screen until the ITI.

Results and discussion

Initial analyses focused on observed confidence ratings, as well as simple Hit Rates (HR) and False Alarm Rates (FAR) derived from these confidence ratings. I then report simple measures of sensitivity and bias that are calculated from HRs and FARs, and conclude with analyses of measures derived from ROC curves. For all analyses in Experiment 1, the alpha level was set at .05.

Confidence ratings and simple measures

Initial analyses compared mean confidence ratings across test format for both studied and unstudied items. Following the analyses by Toth (2005; see also Hamilton & Rajaram, 2003), analyses of cued recall data included only trials in which a critical item was output. For studied items, a paired-samples comparison revealed no significant difference in mean confidence ratings between cued recall (M = 4.90, SD = 0.42) and recognition (M = 4.86, SD = 0.41), t(27) =

0.516, p = .610. However, for unstudied items, confidence ratings were significantly higher in cued recall (M = 2.90, SD = 0.71), compared to recognition (M = 2.69, SD = 0.63), t(27) = 2.165.

These results support Hypothesis 1a, showing that subjective confidence for studied items is similar for recognition and cued recall, and that the false memory increase in cued recall extends to confidence ratings.

For the next analyses, I collapsed confidence ratings 1-3 into a “new” response and ratings 4-6 into an “old” response for each type of test trial (recognition, cued recall). For both recognition and cued recall, I calculated a simple HR (i.e., proportion of studied items given high confidence) and FAR (i.e., proportion of unstudied items given high confidence) for each

28 participant. Mean HRs and FARs, displayed in Table 2a, were compared across test format with paired-samples t-tests. Consistent with the prior analyses, no significant difference was revealed in HRs between recognition and cued recall [t(27) = 1.27, p = .215], but there was a significant increase in FARs in cued recall compared to recognition [t(27) = 2.26]. Hypothesis 1b was thus supported, in that mean response proportions for target trials (i.e., HRs) did not differ between test formats, while mean response proportions to non-target trials (i.e., FARs) were selectively increased in the cued recall format. These data replicate the two critical patterns observed by

Toth (2005), first in mean observed confidence, and again in simple signal detection measures.

Sensitivity and bias

The collapsed HRs and FARs were used to calculate, for each participant, a measure of response bias (B”D) and two simple measures of sensitivity (d’, A’). Table 2a also lists group means and standard deviations of these simple parameters. Paired-samples comparisons across test type revealed a significant difference only for B”D, such that responding was significantly more liberal in the cued recall condition, t(27) = 2.108. The numerical decreases in d’ and A’ in cued recall failed to reach statistical significance [t(27) = 1.111, p = .276, and t(27) = 1.014, p =

.320, respectively]. Thus, analyses of simple measures suggest a change in response bias is the main difference between recognition and cued recall performance. However, as previously mentioned, there is general agreement that conclusions drawn solely upon these measures can be misleading (Yonelinas, 2001; see also MacMillan & Creelman, 2005; Stanislaw & Todorov,

1999; Verde & Rotello, 2004).

ROC-based measures

To create observed ROC curves for each participant, cumulative response proportions were calculated for studied and unstudied items for both recognition and cued recall, starting at

29 the highest confidence level (cf. Verde & Rotello, 2004). As in previous analyses, the proportions for recall trials were calculated according to the number of trials in which a relevant item was generated. The cumulative proportions for studied and unstudied items were then plotted against each other in standard fashion. The observed ROCs lend themselves to a number of statistical analyses. First, as suggested by Yonelinas (2001) and others (e.g., MacMillan &

Creelman, 2005), observed ROCs created for individual participants were visually inspected to ensure that responding met specific criteria necessary for ROC analyses. In particular, ROC analyses are only effective if responses are spread across the entire response scale. To avoid problems of truncation (i.e., a participant not using the entire response scale), the data from four participants were removed from the following analyses. For each participant, I calculated two additional measures of sensitivity, da and Az, for each test format (see Table 2b). Paired-samples comparisons across test format failed to reach significance for da [t(23) = 0.367, p = .717] or Az

[t(23) = 1.489, p = .150]. Despite the initial findings of an effect on only unstudied items (which confirmed Hypotheses 1a and 1b), these non-significant trends parallel the analyses of simple sensitivity measures.

For the final analyses, individual ROCs were first aggregated across participants, following the method presented by Yonelinas (2001). Mean ROCs for recognition and cued recall were created by averaging (across participants) the observed ROC points at each level of confidence. Curves were fit to the mean ROC points for both recognition and cued recall using a form of non-linear regression that allows the y-intercept parameter to vary (see Yonelinas,

1998).7 Figure 3 shows the resultant ROC curves, which reflect memory performance in the two test conditions as confidence changes. The points along the mean ROC curves for each test condition were then used in a least squares regression analysis to compute the area measure A’r

7 The estimate of the y-intercept is sometimes interpreted as an index of the process of Recollection. 30

(Donaldson & Good, 1996), which is given in Table 2b. This final measure was compared across test type with a paired-samples t-test, revealing a significant difference in sensitivity between test conditions, t(23) = 2.199, p = .038.

Experiment 1 replicated and extended Toth’s (2005) findings of elevated false memory in cued recall, compared to recognition. Initial analyses revealed data patterns highly consistent with prior studies, with performance being equal between recognition and cued recall for studied items, but selectively impaired in cued recall for new items. Note that mean confidence is the simplest observed data analyzed, and that comparisons of HRs and FARs (from collapsed response scale) most closely parallel the analyses presented by Toth.

The second goal of Experiment 1 was to examine how the effect of false familiarity due to self-generation manifests in SDT-based parameters. While prior observations of this effect and the logic of detection theory suggest a difference in sensitivity between recognition and cued recall, statistical comparisons revealed a significant difference for only one of the five measures computed. However, the predicted trend of decreased sensitivity in cued recall was shown in all five measures. The finding that a significant difference was revealed only for A’r might imply that A’r is the best (i.e., most sensitive) measure of memory sensitivity. However, the computation of A’r required data from four participants to be removed from the analysis. Re- analyses of the simpler single-point parameters, excluding those four participants, revealed a

different result only for response bias [i.e., the difference in B”D between test formats became non-significant, t(23) = 1.834, p = .080]. That is, the significant result of A’r is a function both of the precision of the measure, and of the fact that the computation of A’r requires the removal of data that exhibit extreme response bias. The failure to find a clear difference in sensitivity between recognition and cued recall may have been due to a lack of power. After computing

Cohen’s d for all five sensitivity measures, the average effect size (d = .28) was used in a power analysis (Faul, Erdfelder, Lang, & Buchner, 2007), and the results indeed suggest that

Experiment 1 may have had insufficient power to detect the relatively small effect [t(47) =

1.677]. So overall, the initial analyses clearly demonstrated the increase in false familiarity due to self-generation in cued recall. In addition, responding was significantly more liberal in cued recall, and a clear trend of decreased sensitivity in cued recall may have been insignificant due to a lack of statistical power. The implications of these results are discussed further after

Experiment 2.

This consistent increase in false memory, now shown in subjective confidence, may be a result of the generation process in cued recall (Toth, 2005). Based on the finding that, when a studied item is generated, recall performance is similar to recognition, Toth concluded that this paradigm most likely leads to the use of a generate-recognize strategy for cued recall trials. In the cases when an unstudied item is generated (but matched to unstudied items presented to others in the recognition format), the generation process alters the outcome of the subsequent recognition process such that confidence (or the tendency to say “Familiar”) is significantly increased. The finding that the test-induced effect, shown again for only unstudied items, resulted in non-significant differences in sensitivity is likely related to the lack of power.

Moreover, the effect shown on response bias may be a function of the 6-point rating scale used after recall output. Thus, Experiment 2 used a more traditional judgment procedure (i.e., yes/no ratings), in addition to a metacognitive judgment that allowed participants to express the operation of metacognitive processes thought to be involved in memory tasks (e.g., Koriat &

Goldsmith, 1994, 1996; Higham & Tam, 2005; Higham et al., 2009).

EXPERIMENT 2

Experiment 1 replicated the increase of false memory in cued recall compared to recognition, but analyses from type-1 signal detection theory (SDT) did not fully elucidate the underlying nature of the memory effect. As discussed above, many contemporary theories posit that metacognitive processes have considerable effects on memory performance. Approaches for investigating metacognitive aspects of memory have been applied to tests of recall (e.g., Higham

& Tam, 2005; Koriat & Goldsmith, 1996) and recognition (e.g., Higham et al., 2009; Hicks &

Marsh, 2005). The rationale behind Experiment 2 results from the integration of two ideas; the primary impetus is derived from the recent incorporation of metacognition in theories of memory performance, particularly in situations in which accuracy is critical and false memory extremely consequential (e.g., academic settings, eye-witness testimony). Secondly, experimental findings suggest that recall tasks, in particular, are highly dependent on metacognitive processes (e.g.,

Koriat & Goldsmith, 1994; Higham & Tam, 2005). As the current research explores false memory in recall, relative to recognition, Experiment 2 examined how metacognitive processes may play a role in the expression of false memory.

In order to examine metamemory in the current paradigm, I adapted a procedure used by

Higham and colleagues (2009, see also Galvin et al., 2003) that is based in type-2 SDT. While type-1 SDT is a useful tool for analyzing performance on “stimulus-contingent discrimination” tasks, type-2 SDT is used to analyze performance on “response-contingent discrimination” tasks

(Higham et al., 2009). In other words, while a type-1 judgment is a discrimination between states in the world (i.e., studied/unstudied), a type-2 judgment is a discrimination regarding the accuracy of performance (i.e., correct/incorrect). In the case of lab-based memory experiments, for a type-1 task, the experimenter defines which set of items are studied and which are not.

However, for a type-2 task, the to-be-discriminated states are determined by the participant (as a type-1 judgment) (see Galvin et al., 2003).

Similar to the analyses of type-1 SDT, type-2 SDT is used to calculate areas under distribution curves, but the distributions represent correct and incorrect responses. A potentially misleading property of type-2 analyses is that all four basic SDT measures can be calculated for both studied and unstudied items; this is because the critical property defining a type-2 response is the accuracy of the preceding type-1 response (i.e., correct/incorrect; see Figure 4a). In type-2 analyses, the Hit Rates (HR) and miss rates are derived from correct type-1 responses, while incorrect type-1 responses are used to compute the type-2 False Alarm Rates (FAR) and correct rejection rates (see Figure 4b). These basic measures are used to compute the metacognitive parameters of monitoring (analogous to type-1 sensitivity) and confidence bias (analogous to type-1 response bias). Because type-2 discriminations are given for the participant’s own responses, the monitoring parameter provides an index of the relation between memory accuracy and confidence, while the confidence bias parameter reflects participants’ general willingness to accept their own answers as correct. Type-2 ratings can also be used to create confidence-based receiver operating characteristic (ROC) curves, by plotting cumulative response proportions for correct against proportions for incorrect responses (Higham et al., 2009). Thus, type-2 analyses require separate ROC curves for studied and unstudied items (see Figure 4b). Akin to the measures and parameters derived from type-1 discrimination tasks and type-1 SDT, parameters derived from type-2 ROCs are more robust because they incorporate responding across response criteria (Galvin et al., 2003).

Current Predictions

Experiment 2 employed a modified version of the procedure from Experiment 1 such that, for every trial, participants provided a type-1 old/new judgment, followed by a type-2 confidence rating. The method thus allowed metacognitive processes to be explored in the current paradigm (see also, Higham et al. 2009). Note that this method also allows the original effect of increased false memory to be replicated. Therefore, the first two hypotheses of

Experiment 2 predicted a replication, in type-1 old/new judgments, of Toth’s (2005) finding of equal performance for studied items and increased false memory in cued recall. That is,

Hypothesis 1a was that the proportions of “old” responses to studied items would not differ significantly between recognition and cued recall. Hypothesis 1b was that proportions of “old” responses to unstudied items would be significantly increased in cued recall, compared to recognition. From those predictions, it follows that the trend of decreased sensitivity in cued recall (see Experiment 1) would be replicated.

Given the exploratory nature of Experiment 2, no firm predictions were made about the effects of test format on monitoring. Nevertheless, some basic predictions can be founded upon the false memory and metacognition literatures previously discussed. First, based on the assumption that participants have some ability to distinguish accurate from inaccurate responses,

Hypothesis 2 was that, for both test formats, mean confidence would be significantly higher for correct, compared to incorrect, type-1 responses. Secondly, concerning confidence bias, research by Koriat and colleagues (e.g., Koriat & Goldsmith, 1996) suggests that in free recall tasks, participants use metacognitive processes to adjust their report criterion (i.e., withhold more or less self-generated responses). As the current paradigm strongly encouraged output in recall (i.e., participants were not given full control of output), the cued recall procedure more closely resembles forced report than free report. However, in Experiment 2, the addition of a

35 metacognitive rating stage more clearly allowed participants the opportunity to express the operation of metacognitive processes. In other words, participants were “forced” to output test items, some of which they would have withheld if given the opportunity. As the reduction in memory accuracy observed in forced recall, compared to free recall, is primarily a function of

“errors of commission” (see Koriat & Goldsmith, 1996), it stands to reason that participants would want to withhold more unstudied items than studied items. Thus, Hypothesis 3 predicted that confidence bias would be significantly more conservative in cued recall, compared to recognition. Experiment 2 was designed to explore metacognition in this false memory paradigm, and thus no strong predictions were made regarding metacognitive monitoring.

Assuming the predicted difference in type-1 performance was observed (i.e., an increase in false alarms in cued recall), there were two possible outcomes of comparisons of monitoring across test formats: Monitoring could be equal between test formats (i.e., participants could be equally able to distinguish between incorrect and correct memories in cued recall as in recognition), or monitoring could be decreased in cued recall (i.e., participants could be less sensitive to their own accuracy in cued recall as in recognition).

Methods

Participants, design, and procedure

Twenty-eight UNCW undergraduate students (5 male, mean age = 19.0), recruited from the same source as in Experiment 1, participated in return for course credit. The stimuli and experimental design were identical to Experiment 1. After a study phase in which participants read words aloud, recognition and cued recall trials were randomly presented. The only change to the procedure involved the response phase of the test. For each test trial, participants provided a binary type-1 discrimination (i.e., “old”/”new”) using the o or n key, for either a recognition

36 probe or a word generated to a recall cue. Participants then rated their confidence in the accuracy of their decision, using a 6-point scale (1 = not at all confident, 2 = low confidence, 3 = somewhat confident, 4 = moderately confident, 5 = highly confident, 6 = extremely confident).

The instructions clearly conveyed the metacognitive nature of the confidence rating task by specifically stating that “you could be extremely confident, or not at all confident, in responses of ‘old’ or ‘new’,” and participants were encouraged to use the entire scale.

Results & Discussion

I first present analyses of type-1 data (i.e., memory performance), again progressing from simple observed response proportions to SDT parameters calculated from these measures. I then present simple analyses of the type-2 confidence rating data, followed by analyses of type-2 SDT parameters derived from these data. For all analyses, the alpha level was set at .05.

Type-1 analyses

For the type-1 recognition and cued recall data, I calculated an overall Hit Rate (HR) and

False Alarm Rate (FAR), displayed in Table 3. As in Experiment 1, the cued recall response proportions were conditionalized on the number of list items output by each participant, then compared to recognition with two-tailed, paired-samples t-tests. Hypotheses 1a and 1b were confirmed, as the analyses revealed a significant increase in FAR in cued recall, relative to recognition [t(23) = 2.173], but no difference in HRs between test formats [t(23) = 0.836, p =

.412]. Table 3 also shows, for recognition and cued recall, the measures of sensitivity and response bias that can be computed from type-1 HRs and FARs (i.e., d’, A’, B”D). Two-tailed, paired-samples comparisons of each measure across test format revealed significant differences in d’ [t(23) = 3.313] and A’ [t(23) = 2.940], but no difference in B”D [t(23) = 1.031, p = .313].

Thus, in contrast to Experiment 1, analyses of simple SDT parameters revealed a significant

37 reduction in sensitivity in cued recall, but no change in response bias between test formats. In the

General Discussion, I describe how this change in data patterns may be related to the change in testing procedure. Overall, the false memory effect was replicated, and seemingly magnified, in

Experiment 2.

Type-2 confidence

For recognition and cued recall, confidence-in-accuracy ratings were initially grouped according to the accuracy of the preceding type-1 response (see Table 4a). Paired-samples comparisons confirmed Hypothesis 2, revealing that confidence was significantly higher for correct than for incorrect responses, in both recognition [t(23) = 12.635] and cued recall [t(23) =

13.433]. The observed means were also compared across test format, revealing no difference in confidence for correct [t(23) = 0.693, p = .495] or incorrect [t(23) = 1.344, p = .192] type-1 responses. The former results suggest that participants were aware, to some degree, of when they were correct in their memory decisions. However, the latter finding that mean type-2 confidence for incorrect responses did not differ between tests, despite a difference in memory performance, suggests that participants’ confidence-in-accuracy did not reflect the increase in false memory in cued recall.

Metamemory parameters

To compute simple type-2 HRs (i.e., high confidence in correct responses) and FARs

(i.e., high confidence in incorrect responses) for studied and unstudied items, confidence ratings were collapsed into two categories (1-3 = “low”; 4-6 = “high”). Table 5 displays the group means and standard deviations of these measures, and the simple type-2 parameters derived from them, for studied and unstudied items. Each measure was compared across test format with a two-tailed, paired-samples t-test. For studied items, the analysis revealed a marginally significant

38 decrease in type-2 HR in cued recall, compared to recognition [t(23) = 2.120, p = .045]. The difference in type-2 FAR for studied items was not statistically significant, t(23) = .888, p = .384.

Analyses of simple type-2 parameters for studied items revealed no significant differences in d’,

A’, or B”D [t(23) = 0.159, p = .875, t(23) = 0.285, p = .778, t(23) = 1.646, p = .113, respectively].

However, the analyses of unstudied items revealed strikingly different results, such that significant decreases were revealed both in type-2 HR [t(23) = 2.857] and FAR [t(23) = 2.153] in cued recall, relative to recognition. These results were corroborated by analysis of type-2 B”D for unstudied items, which revealed a highly significant change in confidence bias, t(23) = 3.418.

Comparisons of simple measures of monitoring for unstudied items revealed no effect of test format on d’ or A’ [t(23) = 0.413, p = .684, t(23) = 1.055, p = .302, respectively]. Thus, analyses of simple type-2 parameters suggest that, (a) for studied items, metamemory in recognition and cued recall do not differ, and (b) for unstudied items, confidence bias is significantly more conservative in cued recall.

ROC-based measures

Type-2 ROC curves were created for recognition and cued recall in similar fashion to the type-1 curves created in Experiment 1 (cf. Higham et al., 2009). In the case of type-2 data, however, the response categories are defined by the accuracy of the preceding type-1 judgment

(see Figure 4). Thus, for each test format, separate ROCs were created for studied and unstudied items by plotting cumulative response proportions of type-2 ratings given for correct responses against ratings givens for incorrect responses. These response proportions represent type-2 hit

(i.e., confidence in a correct type-1 response) and false alarm (i.e., confidence in an incorrect type-1 response) rates at each level of confidence. In other words, to create a type-2 ROC for studied items, the proportion of “6” responses given to type-1 hits was plotted against the

39 proportion of “6” responses given to type-1 misses; then the proportion of responses of “5” and

“6” given to hits plotted against the proportion of “5” and “6” responses given to misses, and so on through ratings of “1”. The type-2 ROC for unstudied items was similarly created by plotting cumulative proportions of ratings given to type-1 correct rejections against ratings given to type-

1 false alarms. Figure 5 displays, for recognition and cued recall, the mean type-2 ROCs for studied and unstudied items.

Individual type-2 ROCs (i.e., each participant’s ROC for studied and unstudied items in each test format) were used to compute two additional measures of monitoring, da and Az (Table

5). Paired-samples comparisons of these measures across test format corroborated the patterns reported above. That is, no difference was revealed in da or Az for studied or unstudied items (all t’s < 1). Overall, analyses of type-2 parameters suggest that participants did not appropriately adjust their confidence-in-accuracy to reflect the difference in performance between recognition and cued recall.

Perhaps the most interested finding in the type-2 data is the partial confirmation of

Hypothesis 3. Based on experimental evidence that recall output is highly controlled (e.g., Koriat

& Goldsmith, 1994), type-2 responding was predicted to be more conservative in cued recall, for both studied and unstudied items. However, only for unstudied items was the difference significant, despite large numerical differences for both item types. Considering that the type-1 effect occurs only for unstudied items, this may indeed suggest that metacognitive processes (in this case, confidence bias) are specifically involved in situations of false memory (cf. Koriat &

Goldsmith, 1996).

A final set of analyses attempted to determine if there was a relationship between (a) the lack of an effect of test format on type-2 confidence for incorrect responses (i.e., equal

40 confidence despite more incorrect responses in cued recall) and (b) the dissociation of type-2 parameters for studied and unstudied items (i.e., significant effects only for unstudied items).

First, to determine if participants altered their confidence-in-accuracy as a function of their type-

1 response, the type-2 confidence ratings were grouped according to type-1 response (i.e., “old”,

“new”) irrespective of type-1 accuracy. Paired-samples comparisons across test format revealed no difference in mean confidence for “old” responses between cued recall (M = 4.60, SD = 0.57) and recognition (M = 4.71, SD = 0.52), [t(23) = 1.237, p = .239], and a marginally significant decrease in confidence for “new” responses in cued recall (M = 3.50, SD = 0.83) relative to recognition (M = 3.85, SD = 0.71) [t(23) = 2.091, p = .048]. This latter result may reflect an unwillingness of participants to discount (i.e., call “new”) items generated to recall cues. The type-2 confidence ratings were then re-grouped according to the outcome of the preceding type-1 response. Table 4b provides mean confidence-in-accuracy according to type-1 response class.

Two-tailed, paired-samples comparisons across test format revealed a significant difference only for type-1 correct rejections [t(23) = 3.016], such that mean confidence for correct rejections was significantly lower in cued recall. Although the same numerical pattern was observed for type-1 hits, false alarms, and misses (i.e., lower type-2 confidence in recall than recognition), none of the differences reached significance [t(23) = 1.889, p = .072, t(23) = 1.886, p = .075, t(23) =

1.081, p = .291, respectively].

In summary, Experiment 2 explored how metacognitive processes are involved in the effect of self-generation, which can be described as an increased frequency of incorrect “old” responses to unstudied items. Type-2 confidence for incorrect responses did not differ between test formats, nor did type-2 confidence for “old” responses. Thus, the increase in false memory was not ameliorated by a reduction in confidence selective to false alarms. Despite the

41 implications of these results, it is not the case that metacognitive processes did not differ between tests. Rather, subsequent analyses revealed that the modest effect on confidence-in-accuracy of

“new” responses was actually a significant effect selective to correct rejections. Thus, compared to recognition, when an unstudied item is self-generated, and then correctly classified as “new”, confidence-in-accuracy of the response was reduced. This may provide further evidence that self-generation increases the subjective familiarity of test items.

Generally speaking, the results support the use of type-2 SDT in metamemory research.

That is, these data suggest that participants can discriminate between correct and incorrect responses, and that participants are insensitive to the performance difference between cued recall and recognition (i.e., no difference was observed for any measure of monitoring). Moreover, the finding of opposing effects of test format on type-1 response bias (i.e., cued recall is more liberal) and type-2 confidence bias (i.e., cued recall is more conservative) provides clear evidence that type-1 and type-2 discrimination tasks are performed differently, and thus supports the claim that type-1 and type-2 analyses explore different cognitive phenomena.

GENERAL DISCUSSION

Two experiments are presented to replicate and elucidate an effect of increased false memory in cued recall, compared to recognition (Toth, 2005). Toth (and McCullough & Toth,

2008) have shown, using variations of the Remember/Know procedure (Tulving, 1985), that self- generation of potential memory items increases the proportion of “Familiar” responses to unstudied items, but not to studied items. A specific impetus of the current research was Toth’s findings of significant effects on both sensitivity and response bias across experiments, despite observing a significant increase in Familiar responses only for unstudied items. Thus,

Experiment 1 used a standard confidence rating procedure that allowed the computation of more precise signal detection parameters.

In Experiment 1, the general patterns observed by Toth (2005) – equal performance for studied items and increased false memory in cued recall – were replicated in mean confidence.

However, analyses of simple measures (d’, A’) and ROC-based parameters (da, Az), revealed no difference in sensitivity between recognition and cued recall, but a marginal effect on response bias (B”D). For one relatively obscure measure of sensitivity (A’r), a significant decrease was revealed (in cued recall), but the analysis required the removal of data from participants with extreme response bias. Moreover, the marginally significant trend of more liberal responding in cued recall (which was highly variable across Toth’s experiments), was greatly reduced when those same data were excluded. Thus, the significant effect in observed measures (i.e., increased confidence, or “Familiar” responses, selective to unstudied items) may have been obscured – by the extreme responding of a few participants – and thus did not manifest clearly in signal detection parameters. Finally, the results of a post-hoc statistical power analysis suggest there may have been insufficient power to detect the change in sensitivity. In general, Experiment 1 revealed that false familiarity due to self-generation manifests as increased confidence in cued recall relative to recognition for unstudied items, but not studied items; yet sensitivity did not differ significantly between test formats, and response bias did (marginally). I argue that these results are better understood in conjunction with the results of Experiment 2.

The results of Experiment 2 clearly replicate the increase in false memory in cued recall, evidenced by an increase in “old” responses to unstudied items, as well as by a significant decrease in sensitivity (d’, A’) and no difference in response bias (B”D). Note that this is the first time this effect of self-generation has been reported in a yes/no judgment procedure, and the

43 effects shown in simple signal detection parameters appeared to follow directly from predictions based in detection theory (i.e., the significant effect selective to new items was revealed as a difference in sensitivity and not response bias). Analyses of type-2 parameters revealed no difference in monitoring of performance between recognition and cued recall (for studied or unstudied items), suggesting participants are equally able to discriminate their correct and incorrect responses in the two test; thus, confidence-in-accuracy somehow reflects the performance difference between recognition and cued recall. Moreover, analysis of type-2 confidence bias suggested that a metacognitive strategy was used to generally reduce confidence in (type-1) responses given to recalled items, compared to confidence in recognition responses.

Despite large numerical differences in type-2 B”D for both studied and unstudied items, only for unstudied items was type-2 confidence bias significantly more conservative in cued recall.

Further analyses localized the significant decrease in type-2 confidence-in-accuracy to type-1 correct rejections.

As no significant difference was shown in participants’ confidence-in-accuracy for false alarms, the false memory effect was not ameliorated. In contrast, confidence in cued recall responses was reduced the most for unstudied items that were (generated and then) correctly classified as “new”. Overall, participants were generally less confident in their responses to items that were self-generated, compared to items that were presented in recognition. This general reduction is most likely the result of participants exerting metacognitive control over quasi- forced responses (as recall output was strongly encouraged; see Koriat & Goldsmith, 1996), and in a more general sense, due to the overt difference in effort required for recognition and recall tasks (Craik, 1979). Most critically, the general reduction in confidence was poorly specified to correct rejections, suggesting that participants did not want to reject (i.e., call “new”) items they

44 had generated to recall cues. Thus, the finding that the greatest reduction in cued recall type-2 confidence was for correctly rejected items supports the idea that the act of generating an item increases its subjective familiarity.

I argue that the current research clarifies prior explorations of the effects of self- generation (e.g., Toth, 2005; McCullough & Toth, 2008). Given that recall tasks generally do not require a memory judgment after output, participants in Experiment 1 (and prior studies) may have implemented a metacognitive strategy for cued recall (despite the stimulus-based, type-1 rating procedure). That is, after generating an item, some participants may have viewed the rating stage (or the R/F/N classification) as a metacognitive judgment (i.e., a type-2 decision). In such a case, the opposing effects of response bias and confidence bias (type-1 and type-2, respectively) would vary unpredictably across participants. However, the adapted procedure used in the current Experiment 2 allowed participants to rely on their initial mnemonic response for the type-1 judgment, and to use subjective awareness and metacognitive feedback to primarily influence the successive type-2 rating. I argue that the procedural change clarifies the results shown by Toth (2005), and replicated by the initial analyses in the current Experiments 1 and 2

(i.e., mean confidence and yes/no judgments, respectively). These results suggest that the occurrence of false memory is increased in cued recall, compared to recognition, and also that metacognitive confidence bias is far more conservative in cued recall. Toth established that the effect of false familiarity due to self-generation represents an increase in the non-specific familiarity of an item that is self-generated, rather than externally presented (as opposed to an increase in false recollective detail). The consistency with which this pattern is found (i.e., it has not been observed in binary judgments, subjective judgments, and confidence ratings) suggests the effect of self-generation is quite robust, and the type-2 metamemory data reported here

45 suggest that participants are somewhat unsuccessful in attempts to mediate the increase in false memory in cued recall.

In the introduction to Experiment 1, I drew parallels between the effects of self- generation and another false memory phenomenon, the revelation effect. The primary difference between revelation tasks (in same-item conditions; see Verde & Rotello, 2004) and the current cued recall task is the participants’ role in generating test items. Nonetheless, the effects of revelation and self-generation can be interpreted similarly according to type-1 signal detection theory. That is, revealing or self-generating test items causes a shift of the distribution of unstudied items, thus reducing sensitivity in those conditions relative to recognition.

Consequently, the results of Experiment 2 suggest that the act of revealing a test item might also have interesting effects on metacognitive processes. Specifically, the current results suggest that metamemory confidence-in-accuracy would be reduced for responses to revealed items, but most significantly reduced for responses to unstudied items that are (first revealed and then) correctly rejected.

The current results also bear on the issue of how metacognitive processes are involved in the expression of false memory (Koriat & Goldsmith, 1994, 1996; see also Higham & Tam,

2005). Koriat and Goldsmith (1996) demonstrated that metacognitive processes are engaged to reduce errors of commission in free recall tasks, compared to forced report tasks. The current study showed that, in situations where participants are encouraged to output generated items, metacognitive processes are engaged to alter confidence-in-accuracy, specifically for unstudied items. Thus, a critical implication of the current research is that one could train these metacognitive processes such that the increase in false memory caused by self-generation could be ameliorated via reduced confidence-in-accuracy.

Underlying the current research project is the fundamental distinction between recognition and recall, which, at its root, is a function of how memory abilities are tested and analyzed in various ways (both in the lab and in the world, at large). Direct empirical comparisons, such as those presented here, are uncommon in memory literature. This research sought to extend investigations of the effect of false familiarity in cued recall, previously shown only in “remember”/”know” judgments (Toth, 2005), to a confidence rating methodology. In general, the results provide evidence that the technique used by Toth to compare recognition and cued recall performance can be applied to confidence rating methodologies based in type-1 and type-2 signal detection theories. That is, the effect of false familiarity due to self-generation was clearly replicated in the current research. Experiment 1, which employed analyses based in type-

1 signal detection theory, revealed that the effect manifests as increased mean confidence for unstudied items in cued recall, with no difference in mean confidence for studied items between recognition and cued recall. Experiment 2 revealed the (type-1) effect in a binary procedure that allowed for memory performance and metamemory judgments to be observed. The type-2 results, in general, support the use of type-2 signal detection theory in metamemory research.

That is, in comparison to the patterns observed exploring false memory in Experiment 1 (i.e., type-1 ratings), Experiment 2 results suggest that type-2 ratings of confidence-in-accuracy reflect different psychological processes than type-1 ratings of confidence-in-oldness. The type-2 results suggest that participants have equal monitoring abilities in recognition and cued recall, and that participants are significantly less liberal in their confidence-in-accuracy (i.e., conservative in the metamemory judgments) for memory decisions about items they have generated, compared to items that were presented to them. However, the change in confidence bias was not appropriately

47 specified to falsely remembered items, but rather to correctly rejected items, thus not reducing the false memory effect.

Table 1

Proportions of Recollect, Familiar, and New Judgments for Recognition and Cued Recall a) Studied Items Unstudied Items Total Rec. Fam. New Total Rec. Fam. New Recognition 1.00 .71 .24 .04 1.00 .05 .23 .73 Conditionalized Cued Recall 1.00 .72 .24 .04 1.00 .05 .30 .65

Unconditionalized Cued Recall .68 .49 .16 .03 .36 .01 .11 .24

b) Studied Items Unstudied Items Total Rec. Fam. New Total Rec. Fam. New Recognition 1.00 .38 .35 .26 1.00 .04 .16 .80 Conditionalized Cued Recall 1.00 .37 .39 .24 1.00 .03 .26 .71

Unconditionalized Cued Recall .56 .20 .21 .13 .32 .01 .08 .23

Note. Analyses of response proportions for studied items (recognition vs. conditionalized cued recall) revealed no significant differences (all t’s < 1). In contrast, analyses of unstudied items showed that Familiar responses were significantly greater in conditionalized cued recall compared to recognition [(.26 vs. .16), t(27) = 2.70, p = .006]; the difference for Recollect responses was non-significant (p > .05).

Table 2

Type-1 Signal Detection Measures and Parameters for Recognition and Cued Recall in

Experiment 1 a) Observed Parameters of Proportions Response Bias and Sensitivity

HR FAR B”D d' A’

Recognition .80 .30 - .153 .499 .835 (.09) (.16) (.45) (.14) (.06)

Cued Recall .82 .35 - .325 .468 .821 (.10) (.19) (.47) (.18) (.08)

b) ROC-based Sensitivity Parameters

da Az A’r

Recognition 1.45 .477 .837 (.34) (.23) (.06)

Cued Recall 1.38 .400 .789 (.52) (.16) (.09)

Note. The observed proportions of hit rate (HR) and false alarm rate (FAR) are used to calculate simple parameters (panel a). Only ROC-based parameters (panel b) reflect performance across varying levels of confidence.

Table 3

Type-1 Signal Detection Measures and Parameters for Recognition and Cued Recall in

Experiment 2

Observed Response Bias and Proportions Sensitivity Parameters

HR FAR B”D d' A’

Recognition .80 .30 - .188 .477 .837 (.09) (.15) (.48) (.23) (.06)

Cued Recall .78 .38 - .300 .400 .789 (.13) (.18) (.48) (.16) (.09)

Note. Analyses of type-1 measures of sensitivity (recognition vs. cued recall) revealed significant differences for d’ [t(23) = 3.313, p = .003] and A’ [t (23) = 2.940, p = .007]. Analysis of B”D revealed no significant difference in response bias between recognition and cued recall, [t(23) = 1.031, p = .313].

Table 4 a)

Mean Type-2 Confidence for Correct and Incorrect Responses in Recognition and Cued Recall

Correct Incorrect

Recognition 4.55 3.59 (.58) (.62)

Cued Recall 4.50 3.40 (.60) (.67)

Mean Type-2 Confidence for each Type-1 Response Class in Recognition and Cued Recall

False Correct Hits Alarms Misses Rejections

Recognition 5.07 3.54 3.47 3.92 (.51) (.75) (.86) (.75)

Cued Recall 4.91 3.29 3.28 3.50 (.57) (.69) (.91) (.86)

Note. These values in panel (b) represent the mean type-2 confidence ratings for type-1 responses, categorized by the signal-detection classification. That is, the Hits and False Alarms columns represent confidence in “old” responses (correct and incorrect, respectively), and Misses and Correct Rejections columns represent confidence in “new” responses (incorrect and correct, respectively).

Table 5

Type-2 Signal Detection Measures and Parameters for Studied (a) and Unstudied (b) Items in

Recognition and Cued Recall in Experiment 2

a) Observed Parameters of Confidence ROC Proportions Bias and Monitoring Parameters

HR FAR B”D d' A’ da Az_

Recognition .84 .45 - .428 .385 .784 1.32 .871 (.12) (.28) (.68) (.25) (.12) (.62) (.11)

Cued Recall .79 .39 - .166 .397 .772 1.22 .870 (.15) (.33) (.80) (.29) (.16) (.42) (.08) b) Observed Parameters of Confidence ROC Proportions Bias and Monitoring Parameters

HR FAR B”D d' A’ da Az_

Recognition .61 .49 - .136 .120 .533 1.04 .010 (.23) (.27) (.66) (.28) (.29) (.31) (.78)

Cued Recall .46 .37 .277 .087 .434 .890 .010 (.29) (.23) (.66) (.32) (.39) (.35) (.52)

Note. The type-1 hit rate (HR) and false alarm rate (FAR) reflect high confidence-in-accuracy

(for correct and incorrect responses, respectively). For studied items (panel a), statistical analyses revealed no differences in monitoring or confidence bias between recognition and cued recall (all t’s < 1). By contrast, analyses of unstudied items (panel b) revealed a significant difference in confidence bias (B”D) between tests [t(23) = 3.418], and no difference in monitoring (all t’s < 1).

For unstudied items, monitoring of accuracy did not differ significantly from chance (all t’s < 1).

Figure 1. Classification model for Type-1 responses and theoretical signal detection distributions.

Test Item Unstudied Studied

Type 1 “new” “old” “new” “old” Response

Type 1 Correct False Miss Hit Outcome Rejection Alarm

C “new” “old”

Unstudied Item Studied Item Distribution Distribution CR Hit

Miss FA

Strength of Evidence

Note: The top panel (a) defines the classification of Type-1 responses for a binary memory decision according to signal detection theory. The four types of observed responses [Correct Rejections (CR), Misses, Hits, False Alarms (FA)] correspond to the four areas under theoretical distributions (panel b).

Figure 2. General Interpretations of Memory Data According to Signal Detection Theory a) C “new” “old”

Unstudied Studied Items Items CR Hit

Miss FA

Strength of Evidence b) C “new” “old”

Unstudied Studied Items Items CR Hit

Miss

Strength of Evidence c) C “new” “old”

Unstudied Studied Items Items Hit

Miss CR

Strength of Evidence Note: The top panel (a) provides a generic depiction of recognition performance, in which two distributions are divided into hits (H), misses (M), false alarms (FA), and correct rejections (CR). Panel b depicts a basic interpretation of the effect of self-generation. The bottom panel (c) depicts an effect of fluency of fluency (which would manifest similarly to a liberal shift in response bias).

Figure 3. Receiver Operating Characteristic (ROC) Curves for Recognition and Cued Recall in Experiment 1. a) Mean ROC Curves for Recognition and Cued Recall '

!"&

!"%

!"$

Recognition !"# Cued Recall

! ! !"# !"$ !"% !"& ' False Alarm Rate b) Mean zROC Curves for Recognition and Cued Recall &$%

"$%

#$% Recognition Cued Recall

# !& !"$% !" !#$% # #$% " "$%

!#$%

!" False Alarm Rate

Note: The overlapping curves suggest that sensitivity did not differ between recognition and cued recall; however, the shift of points along the curve represents a change in response bias between test formats that became increasingly large as confidence decreased.

Figure 4. Classification model for type 2 responses and theoretical signal detection distributions. a) Test Item Unstudied Studied

Type 1 “old” “new” “new” “old” Response

Type 1 False Correct Miss Hit Outcome Alarm Rejection

Accuracy Incorrect Correct Incorrect Correct

Type 2 Low High Low High Low High Low High Confidence

Type 2 Correct False Miss Hit Correct False Miss Hit Outcome Rejection Alarm Rejection Alarm

b) CC CC “low” “high” “low” “high”

Incorrect Response Correct Response Incorrect Response Correct Response Distribution Distribution Distribution Distribution (Type 1 FA) (Type 1 CR) (Type 1 Miss) (Type 1 Hit) CR Hit CR Hit

Miss FA Miss FA

Confidence in Accuracy Confidence in Accuracy (of responses to Unstudied Items) (of responses to Studied Items)

Note: The top panel (a) defines the classification of type 2 responses, according to the preceding type 1 response. In type-2 analyses, the four observed measures [Hits, False Alarms (FA), Misses, Correct Rejections)] can be derived from both Unstudied and Studied items, thus yielding two sets of theoretical distributions (b).

Figure 5. Type 2 Receiver Operating Characteristic (ROC) Curves for Recognition and Cued Recall in Experiment 2. a) Mean Type -2 ROC Curves Mean Type -2 zROC Curves for Studied Items for Studied Items ' &$%

& !"&

"$%

!"% "

#$% Recognition !"$ Cued Recall

# Recognition !"$% !" !#$% # #$% " "$% & !"# Cued Recall !#$%

! !" ! !"# !"$ !"% !"& '

FAR2 FAR2 p ( confidence | Miss ) 1 p ( confidence | Miss 1 ) b) Mean Type -2 ROC Curves Mean Type -2 zROC Curves for Unstudied Items for Unstudied Items ' &

"#$

!"&

!"% %#$

% !"$ !"#$ !" !%#$ % %#$ " "#$ &

!%#$ Recognition Recognition !"# Cued Recall Cued Recall !"

! !"#$ ! !"# !"$ !"% !"& '

FAR2 FAR2 p ( confidence | FA ) 1 p ( confidence | FA 1 )

Note: For studied items (panel a), participants showed successful monitoring of correct and incorrect responses, and recognition and recall performance was only differentiated at high levels of confidence. For unstudied items (panel b), monitoring was significantly reduced for both recognition and cued recall, and performance differed between the tests across a wider range of confidence.

References

Aggleton, J. P., & Shaw, C. (1996). Amnesia and recognition memory: A re-analysis of

psychometric data. Neuropsychologia, 34 (1), 51-62.

Anderson, J., & Bower, G. (1972). Recognition and retrieval processes in free recall.

Psychological Review, 79 (2), 97-123.

Bahrick, H. P. (1970). A two-phase model for prompted recall. Psychological Review, 77, 215-

222.

Bellezza, F. S. (2003). Evaluation of six multinomial models of conscious and unconscious

processes with the recall-recognition paradigm. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 29 (5), 779-796.

Bernbach, H. (1967). Decision Processes in Memory. Psychological Review, 74 (6), 462-480.

Bodner, G. E., Masson, M. E. J., Caldwell, J. I.. (2000). Evidence for a generate-recognize model

of episodic influences on word-stem completion. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 26 (2), 267-293.

Cabeza, R., Kapur, S., Craik, F.I.M., McIntosh, A., Houle, S., & Tulving, E. (1997). Functional

neuroanatomy of recall and recognition: A PET study of episodic memory. Journal of

Cognitive Neuroscience, 9 (2), 254-269.

Craik, F. (1979). Human Memory. Annual Review in Psychology, 30, 63-102.

Curran, T. & Hintzman, D.L. (1995). Violations of the independence assumption in process

dissociation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21

(3), 531-537.

Donaldson, W. (1992). Measuring recognition memory. Journal of Experimental Psychology:

General, 121 (3), 275-277.

Donaldson, W. & Good, C. (1996). A’r: An estimate of area under isosensitivity curves.

Behavioral Research Methods, 28 (4), 590-597.

Dunlosky, J. & Bjork, R. A. (2008). Handbook of Metamemory and Memory. New York, NY.

CRC Press.

Dunn, J. C. (2004). Remember-know: A matter of confidence. Psychological Review, 111 (2),

524-542.

Eichenbaum, H., Yonelinas, A. P., Ranganath, C. (2007). The medial temporal lobe and

recognition memory. Annual Review of Neuroscience, 30, 123-152.

Elfman, K. W., Parks, C. M., Yonelinas, A. P. (2008). Testing a neurocomputational model of

recollection, familiarity, and source recognition. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 34 (4), 752-768.

Faul, F., Erdfelder, E., Lang, A. G., Buchner, A. (2007). G*Power 3: A flexible statistical power

analysis program for the social, behavioral, and biomedical sciences. Behavior Research

Methods, 39, 175-191.

Flexser, A. J., & Tulving, E. (1978). Retrieval independence in recognition and recall.

Psychological Review, 85 (3), 153-171.

Galvin, S., Podd, J., Drga, V., & Whitmore, J. (2003). Type-2 tasks in the theory of signal

detectability: Discrimination between correct and incorrect decisions. Psychonomic

Bulletin & Review, 10 (4), 843-876.

Gardiner, J., Ramponi, C., & Richardon-Klavehn, A. (1998). Experiences of remembering,

knowing, and guessing. Consciousness & Cognition, 7 (1), 1-26.

Gardiner, J., Ramponi, C., & Richardson-Klavehn, A. (2002). Recognition memory and decision

processes: A meta-analysis of remember, know, and guess responses. Memory, 10 (2),

83-98.

Gardiner, J. & Richardson-Klavehn, A. (2000). Remembering and Knowing. The Oxford

Handbook of Memory 229-244.

Gardiner, J., Richardson-Klavehn, A, & Ramponi, C. (1998). Limitations of the signal detection

model of the remember-know paradigm: A reply to Hirshman. Consciousness &

Cognition, 7 (2), 285-288.

Gillund, G. & Shiffrin, R. M. (1984). A retrieval model for both recognition and recall.

Psychological Review, 91 (1), 1-67.

Gregg, V. (1976). Word frequency, recognition and recall. In J. Brown (Ed.), Recall and

recognition. Oxford, England: John Wiley and Sons, Inc.

Green, D. M., & Swets, J. A. (1966). Recognition memory. In A. Swets (Ed.), Signal detection

theory and psychophysics (pp. 337-338). New York, NY: John Wiley and Sons, Inc.

Guynn, M. G., & McDaniel, M. A.. (1999). Generate - sometimes recognize, sometimes not.

Journal of Memory and Language, 41 (3), 398-415.

Haist, F., Shimamura, A. P., & Squire, L. R. (1992). On the relationship between recall and

recognition memory. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 18 (4), 691-702.

Hamilton, M. & Rajaram, S. (2003). States of awareness across multiple memory tasks:

Obtaining a ‘pure’ measure of conscious recollection. Acta Psychologica, 112 (1), 43-69.

Hicks, J. L. & Marsh, R. L. (2002). On predicting the future states of awareness for recognition

of unrecallable items. Memory & Cognition, 30 (1), 60-66.

Hintzman, D. L. (1980). Simpson’s paradox and the analysis of memory retrieval. Psychological

Review, 87 (4), 398-410.

Hintzman, D. L. (1988). Recognition and recall in MINERVA 2: Analysis of the ‘recognition-

failure’ paradigm.

Hintzman, D. L. (1992). Mathematical constraints and the Tulving-Wiseman law. Psychology

Review, 99 (3), 536-542.

Hintzman, D. L. (1993). On variability, Simpson’s paradox, and the relation between recognition

and recall: Reply to Tulving and Flexser. Psychological Review, 100 (1) 143-148.

Higham, P. A., & Tam, H. (2005). Generation failure: Estimating metacognition in cued recall.

Journal of Memory and Language, 52, 595-617.

Higham, P. A., Perfect, T. J., & Bruno, D. (2009). Investigating strength and frequency effects in

recognition memory using type-2 signal detection theory. Journal of Experimental

Psychology: Learning, Memory, and Cognition, 35 (1), 57-80.

Hunt, R. R., & Ellis, H. C. (2004). Fundamentals of Cognitive Psychology, 7th ed., New York,

NY. McGraw-Hill Companies, Inc.

Jacoby, L. L. (1991). A process-dissociation framework: Separating automatic from intentional

uses of memory. Journal of Memory and Language, 30 (5), 513-541.

Jacoby, L. L. (1998). Invariance in automatic influences of memory: Toward a user’s guide for

the process dissociation procedure. Journal of Experimental Psychology: Learning,

Memory and Cognition, 24 (1), 3-26.

Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and

perceptual learning. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 110, 306-340.

Jacoby, L. L., & Hollingshead, A. (1990). Toward a generate/recognize model of performance on

direct and indirect tests of memory. Journal of Memory and Language, 29, 433-454.

Jacoby, L. L. & Rhodes, M. (2006). False remembering in the aged. Current Directions in

Psychological Science, 15 (2), 49-53.

Jacoby, L., Shimizu, Y., Daniels, K., & Rhodes, M. (2005). Modes of cognitive control in

recognition and source memory: Depth of retrieval. Psychonomic Bulletin & Review, 12

(5), 852-857.

Jacoby, L. L., & Whitehouse, K. (1989). An illusion of memory: False recognition influenced by

unconscious perception. Journal of Experimental Psychology: General, 118 (2), 126-135.

Jacoby, L. L., Woloshyn, V., & Kelley, C. (1989). Becoming famous without being recognized:

Unconscious influences of memory produced by dividing attention. Journal of

Experimental Psychology: General, 118 (2), 115-125.

Johnson, M. K., Hastroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological

Bulletin, 114, 1, 3-28.

Kintsch, W. (1970). Models for free recall and recognition. In D. A. Norman (Ed.), Models of

human memory. New York, NY: Academic Press.

Koriat, A. & Goldsmith, M. (1994). Memory in naturalistic and laboratory contexts:

Distinguishing the accuracy-oriented and quanitity-oriented approaches to memory

assessment. Journal of Experimental Psychology: General, 123 (3), 297-315.

Koriat, A. & Goldsmith, M. (1996). Monitoring and Control Processes in the Strategic

Regulation of Memory Accuracy. Psychological Review, 103 (3), 490-517.

Koriat, A., Goldsmith, M., & Pansky, A. (2000). Toward a psychology of memory accuracy.

Annual Review of Psychology, 51, 481-537.

Kucera, H. & Francis, W. N. (1967). Computational analysis of present-day American English.

Providence, RI: Brown University Press.

Loftus, E. F. (1975). Leading questions and the eyewitness report. Cognitive Psychology, 7 (4),

560-572.

MacMillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. Mahway, NJ:

Lawrence Erlbaum Associates Publishers.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review,

87 (3), 252-271.

McCullough, A. M., & Toth, J. P. Remember/Know: A direct comparison of methods.

Unpublished data, University of North Carolina Wilmington. (2008).

Mickes, L., Wixted, J., Wais, P. (2007). A direct test of the unequal variance signal detection

model of recognition memory. Psychonomic Bulletin & Review, 14 (5), 858-865.

Oppenheimer, D. (2008). The secret life of fluency. Trends in Cognitive Sciences, 12 (6), 237-

241.

Parks, C. M. & Yonelinas, A. (2007). Moving beyond pure signal detection models: Comment

on Wixted (2007). Psychological Review, 114 (1), 188-202.

Perfect, T. J. & Schwartz, B. L. (2002). Applied Metacognition. Cambridge, UK: Cambridge

University Press.

Peynirciouglu, Z. F. & Tekcan, A. I. (1993). Revelation effect: Effort or priming does not create

the sense of familiarity. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 19 (2), 382-388.

Reder, L. M., Anderson, J. R., & Bjork, R. A. (1974). A semantic interpretation of encoding

specificity. Journal of Experimental Psychology, 102 (4) 648-656.

Rotello, C. M., MacMillan, N. A., Reeder, J. A., & Wong, M. (2005). The remember response:

Subject to bias, graded, and not a process-pure indicator of recollection. Psychonomic

Bulletin & Review, 12 (5), 865-873.

Roediger, H. L. & McDermott, K. B. (1995). Creating false memories, Remembering words not

presented in lists. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 21 (4), 803-814.

Schacter, D. L. (1999). The seven sins of memory: Insights from psychology and cognitive

neuroscience. American Psychology, 54 (3), 182-203.

Schacter, D. L., Norman, K. A., & Koutstaal, W. (1998). The cognitive neuroscience of

constructive memory. Annual Review of Psychology, 49, 289-318.

Schacter, D. L. & Tulving, E. (Eds.) (1994). Memory systems: 1994. Cambridge, MA. MIT

Press.

Stanislaw, H. & Todorov. N. (1999). Calculations of signal detection theory measures.

Behavioral Research Methods, Instruments & Computers, 31 (1), 137-149.

Toth, J. P. States of awareness in memory: A remarkable correspondence between recognition

and cued recall. Unpublished manuscript. (2005).

Tulving, E. & Thomson, D. (1973). Encoding specificity and retrieval processes in episodic

memory. Psychological Review, 30 (5), 352-373.

Tulving, E. & Wiseman, S. (1975). Relation between recognition and recognition failure of

recallable words. Bulletin of the Psychonomic Society, 6, 79-82.

Tulving, E. (1983). Elements of Episodic Memory. Cambridge, MA. Oxford University Press.

Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26 (1), 1-12.

Verde, M. F. & Rotello, C. M. (2003). Does familiarity change in the revelation effect? Journal

of Experimental Psychology: Learning, Memory, and Cognition, 29 (5), 739-746.

Verde, M. F. & Rotello, C. M. (2004). ROC curves show that the revelation effect is not a single

phenomenon. Psychonomic Bulletin & Review, 11 (3), 560-566.

Vilberg, K. L. & Rugg, M. D. (2007). Dissociation of the neural correlates of recognition

memory according to familiarity, recollection, and amount of recollected information.

Neuropsychologia,, 45 (10), 2216-2225.

Warrington, E. K. & Weiskrantz, L. (1968). A study of learning and retention in amnesic

patients. Neuropsychologia, 6 (3), 283-291.

Watkins, M. J. (1974). When is recall spectacularly higher than recognition? Journal of

Experimental Psychology. 102 (1), 161-163.

Watkins, M. J. & Peyniricioglu, Z. F. (1990). The revelation effect: When disguising a test item

induces recognition. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 16 (6), 1012-1020.

Westerman, D. L. & Greene, R. L. (1996). On the generality of the revelation effect. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 22 (5), 1147-1153.

Whittlesea, B. (1993). Illusions of familiarity. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 19 (6), 1235-1253.

Whittlesea, B. & Williams, L. (2001). The discrepancy-attribution hypothesis: I. The heuristic

basis of feelings and familiarity. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 27 (1), 3-13

Wiseman, S. T., & Tulving, E. (1976). Encoding specificity: Relation between recall superiority

and recognition failure. Journal of Experimental Psychology: Human Learning and

Memory, 2 (4), 349-361.

Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory.

Psychological Review, 114 (1), 152-176.

Wixted, J. T. & Stretch, V. (2004). In defense of the signal-detection interpretation of

remember/know judgments. Psychonomic Bulletin & Review, 11 (4), 616-641.

Yonelinas, A. P. (1994). Receiver-operating characteristics in recognition memory: Evidence for

a dual-process model. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 20 (6), 1341-1354.

Yonelinas, A. P. (2001). Consciousness, control, and confidence: The 3 C’s of recognition

memory. Journal of Experimental Psychology: General, 130 (3), 361-379.

Yonelinas, A. P., Kroll, N. E., Dobbins, I., Lazzara, M., & Knight, R. T. (1998). Recollection

and familiarity deficits in amnesia: Convergence of remember-know, process

dissociation, and receiver operating characteristic data. Neuropsychology, 12 (3), 323-

339.

Yonelinas, A. P. & Parks, C. M. (2007). Receiver operating characteristics (ROCs) in

recognition memory: A review. Psychological Bulletin, 133 (5), 800-832.