, 2009, 17 (5), 493Á501

Learners’ choices and beliefs about self-testing

Nate Kornell University of California, Los Angeles, CA, USA

Lisa K. Son Barnard College, New York, NY, USA

Students have to make scores of practical decisions when they study. We investigated the effectiveness of, and beliefs underlying, one such practical decision: the decision to test oneself while studying. Using a -like procedure, participants studied lists of word pairs. On the second of two study trials, participants either saw the entire pair again (pair mode) or saw the cue and attempted to generate the target (test mode). Participants were asked either to rate the effectiveness of each study mode (Experiment 1) or to choose between the two modes (Experiment 2). The results demonstrated a mismatch between metacognitive beliefs and study choices: Participants (incorrectly) judged that the pair mode resulted in the most learning, but chose the test mode most frequently. A post-experimental questionnaire suggested that self-testing was motivated by a desire to diagnose learning rather than a desire to improve learning.

Keywords: Self-testing; Testing effect; Flashcards; Judgements of learning.

Self-regulated learning (e.g., homework) requires A large body of data suggests that self-testing students to make scores of decisions about how to enhances memory. This so-called testing effect study. Laboratory research has shown that these refers to the finding that taking a test improves decisions are affected by many factors, including learning more than passively reading the same the difficulty of the materials (Metcalfe, 2002); the information (Glover, 1989; McDaniel & Fisher, student’s goals (Thiede & Dunlosky, 1999); the 1991; Metcalfe, Kornell, & Son, 2007; Roediger & Downloaded By: [Williams College] At: 16:29 16 July 2009 amount of time pressure the student is under Karpicke, 2006b). The testing effect has been (Metcalfe & Kornell, 2003); a student’s age shown to occur when the test is followed by (Kuhn, 2002); and, of course, the type of decision corrective feedback (e.g., Carrier & Pashler, to be made, whether it be strategic (e.g., should I 1992; Cull, 2000) and when it is not (Carpenter & make flashcards?), item-specific (e.g., do I need to DeLosh, 2005; Landauer & Bjork, 1978). More- review this chapter again?), or global (e.g., should over, testing appears to be especially effective for I study tonight?). However, many of the practical promoting long-term learning (Hogan & Kintsch, decisions that students face have not been 1971; Roediger & Karpicke, 2006a). addressed by experimental research (see, e.g., In addition to enhancing memory directly, Kornell & Bjork, 2007). One such practical ques- testing has a second benefit: It allows learners tion, which is the focus of the present research, is to accurately diagnose what they do and do not whether or not to test oneself while studying. know. Making such diagnoses accurately can play

Address correspondence to: Nate Kornell, UCLA Department of Psychology, 1285 Franz Hall, Los Angeles, CA 90095-1563, USA. E-mail: [email protected] We thank Robert A. Bjork for his guidance and support, and Bridgid Finn for her suggestions and help in conducting the experiments. Grant 29192G from the McDonnell Foundation to Robert A. Bjork supported this research.

# 2009 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business http://www.psypress.com/memory DOI:10.1080/09658210902832915 494 KORNELL AND SON

an important role in guiding study decisions. to decide, for each pair, whether to re-read the Learners’ diagnoses of what they know are often pair or test themselves. The first-graders chose to assessed via judgements of learning (JOLs)Áthat test themselves on items that they felt they knew is, ratings of how probable it is that one will well, and preferred re-presentation on less well- remember an item on a future test. The correlation known items*although they may have chosen between the JOLs and actual memory increases tests on items that they thought they could when learners can test themselves as they study answer correctly not as a study strategy, but (e.g., Dunlosky & Nelson, 1992). To the degree simply to impress their teachers. that JOLs guide study decisions, accurate JOLs promote effective study decisions, including deci- sions about which items to study and how long to CHOOSING SELF-TESTING: A PILOT spend studying (Kornell & Metcalfe, 2006; Nelson, EXPERIMENT Dunlosky, Graf, & Narens, 1994), and even when to schedule study (Son, 2004). We wanted to follow up on the question of self-testing using a procedure that seemed to more strongly resemble real-world learning. BELIEFS CONCERNING SELF-TESTING Thus we used a -like procedure. When learners use flashcards, which are a ubiquitous Two recent studies suggest that people do not study tool (Kornell, in press; Kornell & Bjork, recognise the benefits of testing*instead they 2007), they generally cycle through a set of cards seem convinced that presentations result in more repeatedly, looking at the front of each card, learning than does testing (Agarwal, Karpicke, testing themselves, and then looking at the back. Kang, Roediger, & McDermott, 2008; Roediger & Flashcard-like programmes are becoming increas- Karpicke, 2006a). In both studies participants who ingly popular online. Asking participants to study read a text passage multiple times gave higher computer-based flashcards seemed to be an JOLs than participants who read the passage and appropriate experimental context in which to then took one or more tests (Roediger & Kar- investigate self-testing. picke, 2006a, used free tests; Agarwal et al., In the pilot study the learning materials were 2008, used cued recall tests). This evidence sug- EnglishÁIndonesian vocabulary (e.g., LeftÁKiri, gests that, at least in the context of reading text HotÁPanas, LateÁTerlambat), which were pre- passages, participants thought re-reading was sented for study on a computer. Participants more effective than testing. were allowed to decide how they studied: They There is evidence that in some situations people could choose to read the pairs intact (pair mode), recognise the benefits of tests. In experiments on

Downloaded By: [Williams College] At: 16:29 16 July 2009 or to have the cue presented alone before the the generation effect*the finding that generating an answer, for example by unscrambling an target appeared (test mode). If a participant anagram or filing in missing letters in a word, changed modes, which they could do at any time leads to better later recall than seeing the word using buttons on the computer screen, the remain- presented whole (Hirshman & Bjork, 1988; ing items in the list would be presented in Jacoby, 1978; Slamecka & Graf, 1978)*partici- whichever mode was selected, unless the mode pants have been shown to give higher JOLs for was changed again. There were three between- words that they generated than for words that participants conditions: In addition to the condi- were presented (Begg, Vinski, Frankovich & tion in which participants could choose the study Holgate, 1991; Mazzoni & Nelson, 1995), or to mode, there was an all-pair mode in which there give equal ratings on both types of trials (Maki, were no test trials, and an all-test mode in Foley, Kajer, Thompson, & Willert, 1990). which there were no pair trials. In the same way Thus there is evidence to suggest that, in some that students doing homework are under time situations, people believe they can learn more pressure, our participants were given a time limit via re-reading than being tested. But do students of 10 minutes to study, and they controlled the choose to test themselves? To investigate this timing of presentations as they studied. The fixed question, Son (2005) asked first-grade students time limit gave participants an incentive to man- to study a set of cueÁtarget synonym pairs (e.g., age their time wisely, because studying quickly occupation Á job). The participants were asked translated into more study trials later (unlike to judge how well they knew each pair, and then most previous self-regulated study research; see SELF-TESTING 495

Son & Metcalfe, 2000). There was a final test at the vary widely depending on the subject matter and end of the experiment. expected type of test*to take an obvious exam- The results from the pilot study showed that, ple, literature students read more and solve when given the choice, participants began by problems less than physics students*and how studying in presentation mode but quickly people rate testing in one context cannot necessa- switched to test mode. There was also a testing rily be used to predict how they will rate, or utilise, effect: On the final test the recall accuracy of testing in another context. participants in the choice condition*who only In the present experiments we investigated tested themselves on an average of 55% of the choices and beliefs regarding self-testing. Two trials*matched that of participants in the all-test experiments, which shared a single set of materi- mode (mean performance63% for both condi- als, differed in only one aspect of their proce- tions), and surpassed that of participants in the dures. In Experiment 1 participants were asked to all-pair mode (mean performance37%). Thus it judge the relative effectiveness of testing versus appears that a mix of presentation trials and test presentation. In Experiment 2 participants were trials was no less effective than constant test trials. asked to choose between testing and presenta- Participants in the pilot experiment chose to tion. Based on previous research, we predicted test themselves. The question addressed in the that participants would choose self-testing rather present research was why did they choose to do than presentation. In the same situation, however, so? Tests enhance learning, as the pilot experiment we also predicted that participants would rate demonstrated, but another reason to test oneself is presentation as benefiting their learning more as a way to diagnose one’s memory. In a survey of than testing*which would demonstrate a mis- 472 UCLA undergraduates (Kornell & Bjork, match between study choices and metacognitive 2007), for example, 91% reported testing them- judgements (JOLs). selves regularly while studying. Out of all respon- Feedback was also manipulated in the present dents, 68% reported that they test themselves experiments. Feedback plays an important role in primarily to make a diagnosis of what they do and the benefits of tests (e.g. Metcalfe & Kornell, do not know. Thus students seem to be aware of 2007; Pashler Cepeda, Wixted, & Rohrer, 2005): the metacognitive benefits of testing (also see Without feedback, if one is unable to answer a Dunlosky, Serra, Matvey, & Rawson, 2005). How- question initially there is little hope of recovering ever, they seem less aware of the direct memory it later, whereas when feedback is provided errors benefits of testing: only 18% reported testing can be corrected, and tests are endowed with the themselves to enhance their learning directly. benefits of a presentation while also maintaining Thus the majority of students seem to view tests the additional benefits of testing. Thus we pre- the way their teachers might: primarily as assess- dicted that participants would choose testing

Downloaded By: [Williams College] At: 16:29 16 July 2009 ments, not learning tools. more often, and rate testing more favourably, when feedback was provided than when it was absent. THE PRESENT EXPERIMENTS

In sum, previous research provides a puzzling EXPERIMENT 1 picture of learners’ attitudes towards testing: People believe testing is less effective than In Experiment 1 participants studied a list of re-reading (e.g., Roediger & Karpicke, 2006a) word pairs twice. The first study trial was always and yet, as our pilot results demonstrate, they in pair mode (i.e., the cue and target were choose self-testing preferentially over re-presen- presented together). In the pair condition the tation. One explanation of these seemingly con- second study trial was also in pair mode; in the tradictory findings is that people choose testing to test condition the second study trial was in test diagnose their learning, not to enhance it (Kornell mode (i.e., the cue was presented and participants & Bjork, 2007). Another explanation, however, is were asked to type in the target). After the a difference in the learning materials and test type: second trial, participants were asked to predict People rated re-reading as effective when studying how many of the pairs they would remember on a passages (Roediger & Karpicke) but chose self- final test that would occur a short time later. This testing while studying translations on digital aggregate JOL allowed us to examine partici- flashcards (in our pilot study). Study strategies pants’ metacognitive beliefs about the relative 496 KORNELL AND SON

effectiveness of testing versus presentation. We and then, after the distractor task, took a test on also manipulated whether feedback was given all four lists. Two of the lists were assigned to the during the test. all-pair mode (either the first two lists or the last The hypothesis in Experiment 1 was that parti- two lists) and two were assigned to the all-test cipants would rate the pair condition as more mode. Within each list, half of the items were easy effective than the test condition. Such a finding and half were difficult. would replicate previous findings (e.g., Roedger & The study phase for a given list comprised Karpicke, 2006a), but it would also differ from three phases: initial presentation, restudy, and previous work in important ways. First, Roediger JOL. During initial presentation the 12 word and Karpicke’s task, free-recalling text passages pairs were presented for study, in random order, without feedback*which students rarely do when for 5 seconds each. During restudy, for lists studying*is probably less attractive than testing presented in the pair condition, the full word oneself in a flashcards-like paradigm, which pairs were presented again for 5 second each; for students naturally do (Kornell & Bjork, 2008). lists presented in the test condition the cue word Moreover, in Roedger and Karpicke’s study the was presented alone, and participants were asked participants’ predictions were actually accurate to type in the target word. Feedback was provided with respect to the immediate test*that is, (i.e., the correct answer was presented for 1 consistent with the participants’ predictions, on second after the participant responded) to parti- an immediate test study trials resulted in better cipants in the feedback condition. The JOL phase performance than test trials. The predictions were followed restudy. Participants were asked to inaccurate with respect to Roediger and Kar- make an aggregate JOL (i.e., a judgement about picke’s delayed test condition, but people often an entire list, rather than a single item) by fail to predict that forgetting will occur at all completing this sentence: ‘‘When I am tested on (Koriat, Bjork, Sheffer, & Bar, 2004), much less that word list in about 15 minutes, I think I will differential forgetting rates following test trials get about __ out of 12 correct.’’ compared to presentation trials. Thus, in addition After the fourth list was presented there was a to providing a point of comparison for Experiment 5-minute distractor task, during which partici- 2, Experiment 1 differed in important ways from pants were asked to identify well-known people previous results regarding people beliefs about the based on photographs presented upside-down. benefits of testing. The test followed the distractor task. Each cue was presented, one at a time, and participants Method were asked to type in the answer and press return. The items from list one were tested first, in random order, followed by the items from list Participants. A total of 35 college students Downloaded By: [Williams College] At: 16:29 16 July 2009 two, and so on. participated for course credit: 19 in the feedback After completing the test participants were condition and 16 in the no-feedback condition. asked ‘‘Which best describes the reason you quiz Materials. The materials were 48 English word yourself when you study?’’ There were four pairs. Half were relatively easy, related pairs response options: (a) I quiz myself because I (e.g., whaleÁmammal), which had forward associa- learn more that way than I would through tion strengths of between .05 and .054 based on presentation, (b) I quiz myself to figure out how free association norms (Nelson, McEvoy, & well I have learned the information I’m studying, Schreiber, 1998). The other half were more (c) I quiz myself because I find quizzing more difficult, unrelated pairs (e.g., inanityÁcapacity), enjoyable than presentation, (d) None of the with zero forward association strength. above. Design. We used a 22 mixed design. Mode (pair or test) was manipulated within participants; Results and discussion feedback (feedback vs no feedback) was manipu- lated between-participants. Final test accuracy was higher for items studied in Procedure. The procedure comprised three the test mode than the pair mode, F(1, 33)7.60, 2 phases: study, distractor, and test. There were pB.01, hp.19 (Figure 1a). The effect of feedback four lists of 12 word pairs; participants studied was not significant, F(1, 33).60, p.44, nor was and made judgements about each list individually, the modefeedback interaction, F(1, 33).40, SELF-TESTING 497

p.53. In opposition to actual accuracy, JOL ratings were higher for items in the pair mode 2 than the test mode, F(1, 33)7.74, pB.01, hp.19 (Figure 1b). Again, the effect of feedback was not significant, F(1, 33)1.13, p.30, nor was the interaction, F(1, 33).34, p.57. Thus testing enhanced learning, but participants rated extra presentations as more effective than tests. In addition, the survey showed that the main reason participants chose to self-test was to diagnose their level of learning, not to improve it (see Table 1). When interpreting the post-experimental question, it is important to remember that participants were allowed to select only one response; thus Table 1 presents participants’ main, but not necessarily only, reason for testing themselves.

EXPERIMENT 2

Method

Experiment 2 was identical to Experiment 1, with one exception. In Experiment 1 lists were randomly assigned to either the pair mode or test mode. In Experiment 2 the participants were allowed to choose between the two modes (as they had done in the pilot experiment). During the study phase, after studying a list for the first time, participants were asked: ‘‘Now it’s time to study that list again. What would you like to do: see the pairs presented again, or take a practice quiz?’’ They selected one of two buttons, corre- sponding to the pair condition and the test Downloaded By: [Williams College] At: 16:29 16 July 2009 condition respectively, labelled ‘‘Present again’’ and ‘‘Practice quiz’’. Participants. A total of 35 college-aged students participated for course credit: 16 in the feedback

TABLE 1 Responses to the post-experimental question

% of Multiple-choice response respondents

I learn more that way than I would through 20% presentation To figure out how well I have learned the 66% information I’m studying Figure 1. Results of Experiments 1 and 2. (A) Proportion I find quizzing more enjoyable than presentation 4% correct in Experiment 1. (B) Judgements of learning in None of the above 10% Experiment 1. (C) Proportion of lists on which participants chose to be tested, as function of list, in Experiment 2 (the Responses to the post-experimental question ‘‘Which best dashed line represents indifference to testing vs presentation). describes the reason you quiz yourself when you study?’’ Error bars represent standard errors. combined across Experiments 1 and 2. 498 KORNELL AND SON

condition and 19 in the no-feedback condition. It Nevertheless, the results of Experiment 2 were should be noted that the two conditions took consistent with the results of Experiment 1: place during the summer and fall semesters Participants gave higher JOLs following the pair respectively, and thus participants could not be mode than the test mode, F(1, 17)16.40, pB 2 assigned to conditions randomly. .001, hp.49, despite performing in the opposite manner. Final test accuracy was numerically higher for items in the test mode than in the pair Results mode, but the difference was not significant, 2 F(1, 17)1.34, p.26, hp.07. Feedback did As Figure 1c illustrates, on the first list partici- not have significant effects on JOLs or final test pants chose the pair mode and test mode equally accuracy. often, each at a rate of 50% (and there was no A final analysis combined Experiments 1 and 2 difference between the feedback and no-feedback in examining participants’ responses to the post- groups). With experience, however, participants experimental survey. As Table 1 shows, the developed a preference for testing. There was a majority of participants reported testing them- significant effect of list, F(1, 33)4.43, pB.01, 2 selves to diagnose their , not to enhance hp.12, but no significant effect of feedback them directly, consistent with the findings of overall, F(1, 33).64, p.43, and no feedback list interaction, F(1, 33).85, p.47. Kornell and Bjork (2007). To examine participants’ preferences for the test mode versus the pair mode statistically, we chose to examine study choices on list 4 because Discussion that is the list on which participants had the most experience with the experimental procedure, and Participants in Experiment 2 chose to test them- thus could make the most informed choices. A selves rather than to receive re-presentations as planned comparison showed that participants in they studied. This finding demonstrates a mis- the feedback condition chose testing at a rate match between metacognitive beliefs and study significantly above 50%, t(15)4.39, pB.001, choices: Participants in Experiment 1 believed, d1.10. The effect was not significant in the no- wrongly, that testing was less effective than feedback condition, t(18)1.16, p.26. Choosing presentation, but in the same situation partici- testing at an especially high rate when tests were pants in Experiment 2 chose testing rather than followed by feedback seems adaptive given that, in presentation. Survey results suggest that the the absence of feedback, participants had little reason people chose to test themselves, despite chance to learn items that they could not recall. believing that doing so impaired their learning, Downloaded By: [Williams College] At: 16:29 16 July 2009 However, comparing the feedback condition and was as a way to diagnose their own learning. the no-feedback condition, there was not a sig- nificant difference in the rate at which participants chose testing, t(33)1.66, p.11. When the data GENERAL DISCUSSION from the two feedback conditions were collapsed, the data demonstrated a significant preference for The current experiments examined learners’ use testing, t(34)3.24, pB.01, d.55. of self-testing as a study strategy. Taken together, Final test accuracy and JOLs were not the focus of Experiment 2, although we report the data for Experiments 1 and 2 showed that, in a paradigm completeness. However, the findings should be that resembled studying computer-based flash- interpreted with caution for two reasons: first cards, participants preferred testing themselves because the participants assigned themselves to rather than re-presentation*and benefited from conditions, and second because 16 of the 35 doing so*but judged re-presentation as more participants were excluded from the analyses, 11 effective than self-testing. Thus there was a because they chose the test mode exclusively and 5 mismatch between study choices, which favoured because they chose the pair mode exclusively. testing, and metacognitive beliefs, which favoured These concerns do not apply to Experiment 1, presentation. The data also suggest that people which is why Experiment 1 was conducted and why test themselves in order to diagnose what they do it serves as the basis for the claim that participants and do not know, often without realising that rated presentation as more effective than testing. doing so enhances their memory. SELF-TESTING 499

The benefits of tests possible that participants were astutely attempting to optimise their learning. It is more likely, based There has been a recent resurgence in research on on the post-experimental questions in Experi- the testing effect (e.g., Roediger & Karpicke, ments 1 and 2, that participants chose testing to 2006b). When different degrees of testing have discern what they did and did not know. A third been compared, the maximum amount of testing possibility is that participants simply wanted to has generally resulted in the maximum benefit answer correctly, and to serve that goal they (e.g., Hogan & Kintsch, 1971; Roediger & Kar- waited to test themselves until they felt that they picke, 2006a; but see also Izawa 1970). However, would be able to do well on the tests. the current results indicate that more testing may not always be better: The pilot data showed that being tested 55% of the time resulted in perfor- A metacognitive dissociation mance that was as good as being tested 100% of the time. This finding implies that when a test Research on memory monitoring is often moti- occurs may be as crucial as the mere fact that it vated by the idea that memory monitoring guides occurs. study decisions (Benjamin, Bjork, & Schwartz, Why did high levels of learning result when l998; Dunlosky & Hertzog, 1997; Dunlosky & participants chose to test themselves on only 55% Thiede, 1998; Kornell & Metcalfe, 2006; Metcalfe of the trials, primarily at the end of the study & Kornell, 2003, 2005; Son, 2004; Son & Metcalfe, 2000). A number of studies have established that period? One advantage of beginning in presenta- study decisions are based on JOLs (e.g., Son & tion mode may have been that presentations Metcalfe, 2000; Thiede & Dunlosky, 1999; communicate new information more quickly, although Koriat, Ma’ayan, & Nussinson, 2006, and efficiently, than a test, because tests require suggest that JOLs are based on study decisions). two steps, a question followed by feedback, Our participants, by contrast, chose a study whereas a presentation requires only one step technique (testing) that they thought was less (Izawa, 1992). Moreover, early in the study phase effective than the alternative (presentation). This unknown items were predominant, and for such finding suggests that memory monitoring was not items presentations and tests may be similarly the primary basis for the study decisions that were valuable*although retrieval attempts seem to made in the current experiments. enhance learning, relative to presentations, even In conjunction with prior research, the current for unknown items (see Kornell, Hays, & Bjork, data suggest that there is no single answer to the in press; Richland, Kornell, & Kao, 2009)* question of what guides study decisions. A satis- because tests can take more time to complete factory theory of study-time allocation should

Downloaded By: [Williams College] At: 16:29 16 July 2009 than presentations. However, as the study phase reflect the fact that students’ goals include more went on, the participants learned an increasing than just finding the study strategy that will have number of the items. When an item can already the largest direct benefit for learning (Flavell, be recalled, additional presentations, at least in 1979; Thiede & Dunlosky, 1999). Our participants some circumstances, appear to confer little or no tested themselves to diagnose their memories*a benefit, whereas additional tests can have large worthy goal, especially because doing so can guide effects (Karpicke & Roediger, 2007, 2008). Thus, future study decisions. Ironically, though, our as time went on, and the participants became able participants’ decisions were at odds with the to recall more items, choosing the presentation standard assumption*that is, that people study mode would have become increasingly unwise. In with the goal of maximising their learning directly. short, participants may have benefited from The difference between the current findings and presentations because of efficiency initially, and previous experiments may be related to the fact then benefited from the benefits of that we asked people to make a strategic decision tests, relative to the ineffectiveness of re-studying, about how to study (i.e., whether or not to test later in the study phase when they knew many of themselves), as opposed to item-based decisions the items. about whether and for how long to study a specific Although pilot participants’ choice to self-test word pair. may have enhanced their learning, that is not There is precedent for the mismatch between necessarily why they chose to test themselves, as memory monitoring and study choices. For exam- Experiments 1 and 2 demonstrate. It is certainly ple, Moulin, Perfect, and Jones (2000) found a 500 KORNELL AND SON

dissociation between judgements of learning and Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). study time allocation in participants with Alzhei- The mismeasure of memory: When retrieval fluency mer’s disease. Lee (2005) also showed such a is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55Á68. dissociation; participants reported that they Carpenter, S. K., & DeLosh, E. L. (2005). Application went to lectures before reading their textbooks, of the testing and spacing effects to name learning. despite thinking that reading the textbook and Applied Cognitive Psychology, 19, 619Á636. then going to class was more effective*probably Carrier, M., & Pashler, H. (1992). The influence of because they also rated reading the textbook first retrieval on retention. Memory & Cognition, 20, as more difficult than going to the lecture first. 633Á642. Cull, W. L. (2000). Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive Psychology, 14, 215Á235. Conclusion Dunlosky, J., & Hertzog, C. (1997). Older and younger adults use a functionally identical algorithm to select Researchers often assume that people try to study items for restudy during multitrial learning. Journal in ways that maximise learning. In most situations of Gerontology: Series B: Psychological Sciences & Social Sciences, 52B, 178Á186. that assumption is surely valid. However, partici- Dunlosky, J., & Nelson, T. O. (1992). Importance of the pants in the present experiments chose a sub- kind of cue for judgements of learning (JOL) and optimal study strategy*they chose to test them- the delayed-JOL effect. Memory & Cognition, 20, selves, despite believing that re-presentation 374Á380. would do more to improve their memories. Dunlosky, J., Serra, M. J., Matvey, G., & Rawson, K. A. Learners study in sub-optimal ways in a variety (2005). Second-order judgements about judgements of situations*for example, by prematurely ceas- of learning. Journal of General Psychology, 132, 335Á346. ing to study information that they do not yet know Dunlosky, J., & Thiede, K. W. (1998). What makes (Kornell & Bjork, 2007)*but in most cases the people study more? An evaluation of factors that learners are trying to study optimally, but they do affect people’s self-paced study and yield ‘‘labor- not understand what strategy would be most and-gain’’ effects. Acta Psychologica, 98, 37Á56. effective. Such a misunderstanding occurred in Flavell, J. H. (1979). Metacognition and cognitive the present results; the learners thought presenta- monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34, 906Á911. tion was more effective than testing. The unique Glover, J. A. (1989). The ‘‘testing’’ phenomenon: Not aspect of the current findings is that the strategy gone but nearly forgotten. Journal of Educational learners chose was the one that they believed was Psychology, 81, 392Á399. least effective. They did not try to maximise Hirshman, E. L., & Bjork, R. A. (1988). The generation learning; instead they tried to maximise the effect: Support for a two-factor theory. Journal of accuracy of their metacognitive monitoring. By Experimental Psychology: Learning, Memory, &

Downloaded By: [Williams College] At: 16:29 16 July 2009 Cognition, 14, 484Á494. demonstrating that learners sometimes prioritise Hogan, R. M., & Kintsch, W. (1971). Differential enhancing their ability to monitor their learning effects of study and test trials on long-term recogni- over enhancing their learning itself, the current tion and recall. Journal of Verbal Learning and results underscore the importance of goals in Verbal Behavior, 10, 562Á567. determining how people study. Izawa, C. (1970). Optimal potentiating effects and forgetting-prevention effects of tests in paired- Manuscript received 17 March 2008 associate learning. Journal of Experimental Psychol- Manuscript accepted 29 March 2009 ogy, 83, 340Á344. First published online 26 May 2009 Izawa, C. (1992). Test trials contributions to optimiza- tion of learning processes: Study/test trials interac- tions. In A. F. Healy & S. M. Kosslyn (Eds.), Essays in honor of William K. Estes: Vol. 1. From learning REFERENCES theory to connectionist theory (pp. 1Á33). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Jacoby, L. L. (1978). On interpreting the effects of Roediger, H. L., & McDermott, K. B. (2008). repetition: Solving a problem versus remembering a Examining the testing effect with open- and solution. Journal of Verbal Learning and Verbal closed-book tests. Applied Cognitive Psychology, Behavior, 17, 649Á667. 22, 861Á876. Karpicke, J. D., & Roediger, H. L. III. (2007). Repeated Begg, I., Vinski, E., Frankovich, L., & Holgate, B. retrieval during learning is the key to long-term (1991). Generating makes words memorable, but so retention. Journal of Memory and Language, 57, does effective reading. Memory & Cognition, 19, 487Á497. 151Á162. SELF-TESTING 501

Karpicke, J. D., & Roediger, H. L. (2008). The critical Metcalfe, J., & Kornell, N. (2005). A region of proximal importance of retrieval for learning. Science, 319, learning model of study time allocation. Journal of 966Á968. Memory and Language, 52, 463Á477. Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. Metcalfe, J., & Kornell, N. (2007). Principles of (2004). Predicting one’s own forgetting: The role of cognitive science in education: The effects of gen- experience-based and theory-based processes. Jour- eration, errors and feedback. Psychonomic Bulletin nal of Experimental Psychology: General, 133, & Review, 14, 225Á229. 643Á656. Metcalfe, J., Kornell, N., & Son, L. K. (2007). A Koriat, A., Ma’ayan, H., & Nussinson, R. (2006). The cognitive-science based program to enhance study intricate relationships between monitoring and con- efficacy in a high and low-risk setting. European trol in metacognition: Lessons for the cause-and- Journal of Cognitive Psychology, 19, 743Á768. effect relation between subjective experience and Moulin, C. J. A., Perfect, T. J., & Jones, R. W. (2000). behavior. Journal of Experimental Psychology: Gen- The effects of repetition on allocation of study time eral, 135, 36Á69. and judgements of learning in Alzheimer’s disease. Kornell, N. (in press). Optimizing learning using Neuropsychologia, 38, 748Á756. flashcards: Spacing is more effective than cramming. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). Applied Cognitive Psychology. The University of South Florida word association, Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study. Psychonomic Bulletin rhyme, and word fragment norms. Retrieved August & Review, 14, 219Á224. 28, 2003, from: http://www.usf.edu/FreeAssociation/ Kornell, N., & Bjork, R. A. (2008). Optimizing self- Nelson, T. O., Dunlosky, J., Graf, A., & Narens, L. regulated study: The benefits-and costs-of dropping (1994). Utilisation of metacognitive judgements in flashcards. Memory, 16, 125Á136. the allocation of study during multitrial learning. Kornell, N., Hays, M. J., & Bjork, R. A. (in press). Psychological Science, 5, 207Á213. Unsuccessful retrieval attempts enhance subsequent Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. learning. Journal of Experimental Psychology: (2005). When does feedback facilitate learning of Learning, Memory, & Cognition. words? Journal of Experimental Psychology: Learn- Kornell, N., & Metcalfe, J. (2006). Study efficacy and ing, Memory and Cognition, 31,3Á8. the region of proximal learning framework. Journal Richland, L. E., Kornell, N., & Kao, L. S. (2009). The of Experimental Psychology: Learning, Memory, & pretesting effect: Do unsuccessful retrieval attempts Cognition, 32, 609Á622. enhance learning? Manuscript submitted for pub- Kuhn, D. (2002). Metacognitive development. Current lication. Directions in Psychological Science, 9, 178Á181. Roediger, H. L., & Karpicke, J. D. (2006a). Test- Landauer, T. K., & Bjork, R. A. (1978). Optimum enhanced learning: Taking memory tests improves rehearsal patterns and name learning. In M. M. long-term retention. Psychological Science, 17, Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), 249Á255. Practical aspects of memory (pp. 625Á632). London: Roediger, H. L., & Karpicke, J. D. (2006b). The power Academic Press. of testing memory: Basic research and implications Lee, B. G. (2005). Lecture first or text first? Optimizing for educational practice. Perspectives on Psycholo- undergraduate instruction. Dissertation Abstracts gical Science, 1, 181Á210. International, 66(09), 5117B. (UMI No. 3190464.) Slamecka, N. J., & Graf, P. (1978). The generation Downloaded By: [Williams College] At: 16:29 16 July 2009 Maki, R. H., Foley, J. M., Kajer, W. J., Thompson, R. C, effect: Delineation of a phenomenon. Journal of & Willert, M. G. (1990). Increased processing Experimental Psychology: Human Learning & enhances calibration of comprehension. Journal of Memory, 4, 592Á604. Experimental Psychology: Learning, Memory and Son, L. K. (2004). Spacing one’s study: Evidence for a Cognition, 16, 609Á616. metacognitive control strategy. Journal of Experi- Mazzoni, G., & Nelson, T. O. (1995). Judgements of mental Psychology: Learning, Memory and Cogni- learning are affected by the kind of encoding in ways that cannot be attributed to the level of recall. tion, 30, 601Á604. Journal of Experimental Psychology: Learning, Son, L. K. (2005). Metacognitive control: Children’s Memory Cognition, 21, 1263Á1274. short-term versus long-term study strategies. Journal McDaniel, M. A., & Fisher, R. P. (1991). Tests and test of General Psychology, 132, 347Á363. feedback as learning sources. Contemporary Educa- Son, L. K., & Metcalfe, J. (2000). Metacognitive and tional Psychology, 16, 192Á201. control strategies in study-time allocation. Journal Metcalfe, J. (2002). Is study time allocated selectively to of Experimental Psychology: Learning, Memory and a region of proximal learning? Journal of Experi- Cognition, 26, 204Á221. mental Psychology: General, 131, 349Á363. Thiede, K. W., & Dunlosky, J. (1999). Toward a general Metcalfe, J., & Kornell, N. (2003). The dynamics of model of self-regulated study: An analysis of selec- learning and allocation of study time to a region of tion of items for study and self-paced study time. proximal learning. Journal of Experimental Psychol- Journal of Experimental Psychology: Learning, ogy: General, 132, 530Á542. Memory and Cognition, 25, 1024Á1037.