<<

Assessing Challenge as a Motivator to Use a Retrieval Practice Study Strategy

A thesis presented to

the faculty of

the College of Arts and Sciences of Ohio University

In partial fulfillment

of the requirements for the degree

Master of Science

Kyle A. Bayes

August 2017

© 2017 Kyle A. Bayes. All Rights Reserved.

2 This thesis titled

Assessing Challenge as a Motivator to Use a Retrieval Practice Study Strategy

by

KYLE A. BAYES

has been approved for

the Department of Psychology

and the College of Arts and Sciences by

Jeffrey B. Vancouver

Professor of Psychology

Robert Frank

Dean, College of Arts and Sciences

3

Abstract

BAYES, KYLE A., M.S., August 2017, Psychology

Assessing Challenge as a Motivator to Use a Retrieval Practice Study Strategy

Director of Thesis: Jeffrey B. Vancouver

Individuals often do not use effective strategies when studying or learning material on their own. For example, retrieval practice, which is the effortful of information previously reviewed, is an effective but underutilized strategy. One potential explanation for the low usage of retrieval practice is a lack of motivation to use it. The current research explores whether the incorporation of challenge, which can be a strong motivator, will increase individuals’ use of retrieval practice. Specifically, self- regulated practice was assessed over one week as individuals prepared for graduate school exams (e.g., GMAT or GRE). During the week, individuals were provided math questions of various difficulties as a method for studying and preparing for the exams.

Participants were either presented the items (a) randomly, (b) ordered from least to most difficult and allowed to choose the question, or (c) with increasingly challenging questions reflective of a participant’s ability level. In general, the results provided some support for the hypothesis that challenge is an effective method for motivating the use of retrieval practice. Implications for the design of training protocols, motivating individuals to learn effectively, and the generalizability of the challenge approach are discussed. 4

Table of Contents

Page

Abstract ...... 3 List of Tables ...... 6 List of Figures ...... 7 Assessing Challenge as a Motivator to Use Retrieval Practice Study Strategy ...... 8 The Benefits of Successful Learning Strategies ...... 11 Effective Learning Strategies Performance Effects ...... 12 Generalizability of Effective Learning Strategies ...... 14 Retrieval Practice ...... 15 Ways in Which Challenge Motivates ...... 19 Flow Theory ...... 20 Achievement Motivation ...... 21 Self-Determination Theory ...... 22 Adaptive Testing and Retrieval Practice ...... 24 The Present Study ...... 26 Methods...... 27 Participants ...... 27 Measures/Manipulations ...... 28 Challenge manipulation...... 28 Performance...... 30 Retrieval practice...... 30 Time on questions...... 30 Motivation...... 31 Felt challenge manipulation check...... 31 Demographics...... 32 Procedure ...... 32 Results ...... 34 Descriptives ...... 34 Manipulation Check ...... 40 5

Hypothesis Tests ...... 41 Supplementary Analyses ...... 45 Discussion ...... 49 Theoretical Implications ...... 49 Practical Implications ...... 50 Limitations and Future Directions ...... 52 Conclusion ...... 57 References ...... 58 Appendix A: First Contact Email and Forum Post ...... 68 Appendix B: Participant’s Study View ...... 70 Appendix C: Measures ...... 83 Appendix D: Correlation Matrix for Felt Challenge ...... 85 Appendix E: Additional Felt Challenge Measures ...... 86 Appendix F: Analyses Changes after Deletion of Suspicious Retrieval Practice Instances ...... 87

6

List of Tables

Page

Table 1: Correlation table of main variables...... 36

7

List of Figures

Page

Figure 1: Screenshot of the level up message individuals received every time they leveled up...... 30

Figure 2: Frequency distribution of the number of questions completed per group in bins of five ...... 38

Figure 3: Frequency distribution of retrieval practice by group for questions 1-10 ...... 39

Figure 4: Frequency distribution for the average amount of time spent on questions ....39

Figure 5: Frequency distribution for average amount of time spent on questions less than or equal to a minute...... 40

Figure 6: Median number of questions answered by condition ...... 41

Figure 7: Mean self-reported motivation by group. Standard error bars are +/- 2...... 42

Figure 8: Pre and posttest score by condition. Standard error bars are +/- 2 ...... 44

Figure 9: Interaction between condition and pretest score on felt challenge ...... 46

8

Assessing Challenge as a Motivator to Use Retrieval Practice Study Strategy

With the ever-increasing demands of a competitive global economy, the importance of a knowledgeable, productive workforce that is continually learning and developing is crucial (Salas, Tannenbaum, Kraiger, & Smith Jentsch, 2012). U.S. organizations spent an estimated $70 billion on formal training programs in 2013 in an effort to mold productive employees (Bersin, 2014). Given this large investment in training and organizational learning, it behooves I-O psychology to incorporate efficient and effective learning strategies into training. Toward this end, I-O psychologists seek to incorporate advancements in human learning, pioneered by educational and cognitive psychology, into training (Koppes, 2008). Indeed, due to advancements in educational and cognitive psychology, a vast store of knowledge on how individuals acquire, retain, and transfer knowledge is readily available (Healy, Kole, & Bourne, 2014). For example,

I-O psychology knows that (a) spaced practice is better than massed practice (Cepeda et al., 2006), (b) conditions in which learning takes place should vary (Smith, Glenberg, &

Bjork, 1978; Smith & Rathkoph, 1984), and (c) it is better to use retrieval practice (i.e., the effortful recall of material) than to re-read material as a means of studying (Roediger

& Butler, 2011).

However, individuals don’t often engage in effective learning strategies and practice methods (Karpicke, Butler, & Roediger, 2009; Metcalfe & Kornell, 2005;

Dunlosky, & Kornell, 2013). For example, retrieval practice, the recall of information by testing oneself on material that has previously been reviewed (e.g., self-testing), is a highly effective learning strategy that individuals often do not use (Karpicke & Roediger, 9

2009; Karpicke, Butler, & Roediger, 2009). The common explanation for the low use of retrieval practice is a lack of awareness of its benefits (Karpicke & Roediger, 2009).

However, individuals shun retrieval even when aware of its high utility. For example,

Roediger (2015) noted that “students have taken our learning and courses and learning about retrieval practice doesn’t seem to change their studying strategies”

(personal communication, September 19, 2015). Furthermore, recent research implies that even when individuals use self-testing (i.e., retrieval practice) they do not persist with it long (i.e., on average three correct retrievals; Dunlosky & Rawson, 2015). The lack of persistence with retrieval practice and the apathy towards changing study strategies may be due to waning motivation (Dunlosky & Rawson, 2015). Consequently, the present study explores potential mechanisms that may motivate individuals to engage and persist in retrieval practice given its known effectiveness on individuals’ learning.

Fortunately, psychology is replete with theories of motivation that provide different ways to encourage the use of effective learning strategies (Colquitt, Lepine, &

Noe, 2000; Kanfer, 1990). One particular motivational strategy may be especially useful in the case of retrieval practice. Specifically, numerous motivational theories suggest that challenging an individual results in increased motivation (Atkinson, 1957,

Csikszentmihalyi, 1990, Deci & Ryan, 1985; Locke, 1968) and given that retrieval practice is testing oneself, it would be relatively simple to implement challenge into this

“testing.” Consequently, I examined whether challenge motivates use of retrieval practice by exposing individuals to questions calibrated to their ability level. Matching difficulty 10 and ability level may set the conditions for the type of challenge that humans appear to inherently seek (Deci & Ryan, 1985; Csikszentmihalyi, 1990; Atkinson, 1957).

The following paragraphs explain why successful learning strategies should be incorporated into training and why challenge can increase the usage of retrieval practice.

First, I review effective learning strategies by presenting evidence on their overall effectiveness and generalizability. Based on this review, I focus on retrieval practice as an example of a highly effective learning strategy that could be encouraged in training and briefly discuss why individuals do not tend to use it. Next, I discuss challenge as an aspect of motivation that is found in several motivational theories that speak to content

(i.e., what humans find motivating). Finally, I describe a study protocol that uses adaptive testing principles to introduce challenge into a retrieval practice task.

11

The Benefits of Successful Learning Strategies

Previous research indicates that incorporating knowledge on human learning into training is a worthwhile endeavor. For example, both error management training and behavioral modeling training are successful training strategies based on advancements in the understanding of human learning (Taylor, Russ-Eft, & Chan, 2005; Keith & Frese

2008). Error management training is based on the knowledge that making errors during training is not necessarily counterproductive and can actually aid the learning process

(Keith & Frese, 2008). Furthermore, behavioral modeling training involves trainees observing a modeler performing the desired behavior followed by practicing the behavior

(Taylor et al., 2005). Meta-analyses of error management training and behavioral modeling training claim average effects sizes (d) of .44 and .92 when compared to traditional training methods, respectively (Burke & Day, 1986; Keith & Frese, 2008).

Thus, error management training and behavioral modeling training provide two examples of how using empirically validated training protocols can increase the effectiveness of training. However, additional advances can be made by incorporating various effective learning strategies into training. Specifically, the following section focuses on the five learning strategies deemed to be of high or moderate utility (Dunlosky, Rawson, Marsh,

Nathan, & Willingham, 2013). The utility ratings are based off the strategies’ performance effects (i.e., performance on a variety of criterion tasks such as memory and comprehensions tasks) and their generalizability across individuals (e.g., age, prior knowledge), materials (e.g., types of knowledge/contents), and criterion tasks (e.g., 12 different types of learning). Below I cover the effective learning strategies and their respective performance effects and generalizability.

Effective Learning Strategies Performance Effects

Of the learning strategies examined by Dunlosky et al. (2013), five were deemed to be of moderate to high utility due to their significant performance effects (i.e., increasing the proficiency of the task being tested) and wide generalizability. These strategies are interleaved practice, elaborative interrogation, self-explanation, distributed

(spaced) practice, and retrieval practice. Interleaved practice is the notion that an individual should intersperse their training for one skill (e.g., Microsoft Word) with training on another skill (e.g., Excel). Elaborative interrogation is based on asking an individual to pontificate on the nature of an explicit fact. For example, “Why is this true” or “Why would this fact be true of [X] and not of [Y]?” Self-explanation is simply detailing the problem-solving process. For example, asking a delivery driver why they chose to drive route X instead of route Y. Distributed (spaced) practice is allocating practice out over time. For example, distributed practice would entail studying for a test on three consecutive nights as opposed to only the night before the exam. Lastly, retrieval practice is the act of recalling information rather than rereading it or hearing it. Self- testing with practice problems or are two common examples of retrieval practice. All five learning strategies provide avenues to enhance the effectiveness of training in part due to their powerful performance effects.

For example, in one study on interleaved practice, students were tasked with determining the volume of four different geometric solids and split into either an 13 interleaved practice or blocked practice group (Roher & Taylor 2007). Blocked practice is a mirror of interleaved practice in that it is a method of grouping all of the study material on one subject into one section as opposed to mixing study material of different subjects. Both groups had two practice sessions spaced one week apart followed by a final test a week after the final practice session. The accuracy in the interleaved practice condition was 43% higher than the blocked condition (p < 0.05). Similar performance differences were also found for elaborative interrogation (i.e., d =.85), self-explanation

(i.e., d = .64), and distributed practice (i.e., d = .298 to .543; Bahrick, 1979; Berry, 1983;

Janiszewski, Noel, & Sawyer, 2003; Pressley, McDaniel, Turnure, Wood,& Ahmad,

1987; Seifert, 1993). In the case of retrieval practice, Roediger and Marsh (2005a) found that a retrieval practice group performed 23% better on a test of non-fiction than those who just re-read the material. Given the performance boost gained through retrieval practice, as well as the other learning strategies, it is clear that organizations or individuals might benefit from integrating these learning techniques into employee training programs. That being said, a learning technique may provide additional utility to organizations if it creates generalizable skills (Salas et al., 2012) That is, learning techniques which provide skills that generalize over a variety of learning contexts (i.e., educational and workplace settings), individual characteristics (i.e., old vs young or low ability vs high ability), and different criterion tasks (i.e., memory based tasks vs problem solving tasks) may prove more effective. 14

Generalizability of Effective Learning Strategies

According to Dunlosky et al. (2013) retrieval practice had the widest generalizability of the various learning strategies. Specifically, retrieval practice was the only learning strategy of the ten reviewed by Dunlosky et al. (2013) that received positive marks for generalizing well across criterion. Elaborative interrogation, self-explanation, interleaved practice, and distributed practice all had insufficient or conflicting evidence supporting their generalizability across criterion (Dornish & Sperling, 2006; Schneider et al., 1998). For example, distributed practice research mainly uses recall tasks (i.e., tasks that require exact repetition of already learned material), preventing conclusions on its generalizability outside lower-order thinking skills (e.g., recognizing and recall;

Anderson, Krathwohl, & Bloom, 2001). On the other hand, retrieval practice effects are present on both lower-order and higher-order thinking skills (e.g., recall, understand, apply, and analyze), such as fill in the blank and inference-based short answer questions

(e.g., Fazio, Agarwal, March, & Roediger, 2010). In addition, retrieval practice showed good generalizability across different content, such as vocabulary (Pyc & Rawson, 2009), math (Rickard, Lau, & Pashler, 2008; Atkinson & Paulson, 1972), trivia (Butler,

Roediger, & Karpicke, 2008) and numerous other materials (Dunlosky et al., 2013).

Furthermore, all five learning strategies generalized over certain aspects of individuals. For example, all five learning strategies demonstrated benefits through a variety of ages (Bahrick, 1979; Balota, Duchek, & Paullin, 1989; Balota, Duchek,

Sergent-Marshall, & Roediger, 2006; Roediger & Marsh, 2005; Dunlosky et al., 2013;

Toppino, 1991; Fritz, Morris, Nolan, & Singleton, 2007). In addition, self-explanation, 15 elaborative interrogation, and retrieval practice showed some generalizability over different levels of knowledge (Balota et al., 2006; Carrol, Campbell-Ratcliffe, Murnane,

& Perfect, 2007; Chi, de Leeuw, Chiu, & LaVancher, 1994; Woloshyn, Pressley, &

Sneider, 1992). That is, individuals with high or low knowledge on a subject can benefit from self-explanation, elaborative interrogation, and retrieval practice.

Based on both performance effects and generalizability, many of these learning strategies present useful ways to increase training effectiveness. However, retrieval practice might present the most practical way to incorporate learning strategies into training given that it is the most generalizable of the learning strategies covered

(Dunlosky et al., 2013). Furthermore, it does not take many resources to incorporate retrieval practice into formal or self-regulated training. For example, retrieval practice could be a part of mandatory safety training or it could be something encouraged in the workplace (e.g., a firm encourages lawyers to spend an hour a day reviewing case law).

Retrieval Practice

Retrieval practice is the act of recalling information from one’s mind rather than rereading it or hearing it, which produces a meta-cognitive effort that allows for better retention (Roediger & Butler, 2011). Thus, not only does retrieval practice demonstrate good generalizability and improvements in performance it also enhances retention. These enhancements are discussed below.

In a recent study, Karpicke and Blunt (2011) demonstrated that retrieval practice effects near transfer (i.e., transfer to a similar context). For example, individuals were assigned to either a study condition (i.e., studied text once), a repeated study condition 16

(i.e., studied text multiple times), a concept mapping condition (i.e., studied once and then created a model of the constructs and their relationships), or a retrieval practice condition. Across conditions, the participants studied information from a science text and took a test to assess their knowledge. No significant differences were found on the knowledge test. One week after the initial test, participants were assessed again with both verbatim (i.e., questions that are exactly the same as questions in the practice test) and inferential questions (i.e., questions that required amalgamation of multiple concepts from the text). Participants in the retrieval practice condition performed at least 25% and

12% better on the verbatim and inference based questions, respectively. Further, when collapsed across question type, retrieval practice increases retention by around 50% (d =

1.5). These effects were all significant at the .05 level.

Retrieval practice is also shown to positively affect far transfer (i.e., generalizability to dissimilar contexts) as demonstrated by Butler (2010). In Butler’s study, students studied six prose passages, each containing an important concept. The study consisted of three groups, where participants either repeatedly restudied the critical concepts from two of the passages, repeatedly studied isolated sentences that contained the critical concepts from a different two passages, or practiced retrieval of the critical concepts from the remaining two passages and after one week, trainees took a final test that required far transfer to succeed. The group that engaged in retrieval practice answered about 20% more of the questions correct (d =.99) than those in the other two groups. The ability to transfer to both similar and dissimilar contexts speaks to retrieval practice’s potential usefulness as a training mechanism. Further, when high transferability 17 is combined with the significant increases in both retention and performance, a practical training technique emerges.

Given the promising findings regarding retrieval practice, researchers have begun to explore ways to best maximize its benefits. For example, when retrieval practice is paired with feedback, performance can increase 20% beyond the retrieval-practice effect

(Butler, Karpicke, & Roediger, 2008). Furthermore, not only does feedback increase retention on incorrect answers, but it also increases retention of correct answers (Butler,

Karpicke, & Roediger, 2008). That is, individuals who answer correctly receive the metacognitive benefits of more accurate confidence judgements when presented with feedback. Thus, a reminder of their success (e.g., presentation of correct answer) helps individuals retain information.

Given retrieval practice’s substantial performance and generalizability benefits, one would assume that retrieval practice is a common method employed to increase learning. However, recent studies found that individuals do not typically use retrieval practice and that most training programs in organizations are demonstrations (i.e., lectures, video instructions) that do not require active recall (i.e., stimulating memory to remember information; Patel, 2010). For example, a study of student study habits revealed that few used retrieval practice (11%) or indicated that they would use it to study (18%) and only 8% of the students indicated that they would choose to use retrieval practice because they believed it would help them do well on the upcoming exam

(Karpicke & Rodiger, 2009). This evidence is used to support the most common 18 explanation for the lack of retrieval practice use. That is, students do not use retrieval practice because they lack awareness of its benefits.

However, the lack of awareness hypothesis has been challenged. For example, a study found that 68% of the participants understood the metacognitive benefits of self- testing (Kornell & Bjork, 2007). Furthermore, an experiment in which individuals either chose to engage in retrieval practice by testing themselves, study, or to judge the quality of the previously recalled responses found that individuals often chose to self-test

(Dunlosky & Rawson, 2015). These results provide evidence that individuals do in fact use retrieval practice. In addition, the same study found that individuals engaged in retrieval practice, but not long enough to reap the maximum benefits (i.e., in this case recalling definitions correctly multiple times). To explain this, the authors proposed that the individuals were not motivated enough and suggested that future research explore effective ways for fostering motivation. This conclusion is supported by Roediger, who suggested that motivation may be the reason individuals do not change their study habits when told about the benefits of retrieval practice (2015, personal communication). Thus, it is possible that finding a way to motivate the correct use of retrieval practice is critical to accessing its benefits. To explore this notion, the following section reviews the motivational nature of challenge as a way to enhance the usage of retrieval practice.

19

Ways in Which Challenge Motivates

Individuals are motivated by a wide array of internal and external factors generally referred to as intrinsic and extrinsic motivators (Cerasoli, Nicklin & Ford,

2014; Deci & Ryan, 1985). Indeed, many different motivational theories have emerged to help explain the vast degree of potential motivators. Several of these theories, such as flow (Csikszentmihalyi, 1990), achievement motivation (Atkinson, 1957), and self- determination theory (SDT; Deci & Ryan, 1985), contain the notion that challenge intrinsically motivates individuals to direct and maintain resources toward challenging activities. For instance, previous research on gamification (i.e., enhancing an aspect of a service or task by using various game-based mechanisms such as levels) demonstrates that challenge motivates effort in practical contexts such as education, organizations, and health care (Hamari, Koivisto, & Sarsa, 2014).

Two of the primary methods for gamifying an experience are the inclusion of challenge and the implementation of levels (Hamari et al., 2014). For example, one study used progressive levels of energy saving activities (i.e., energy saving became harder every level) and feedback on these activities (e.g., how much energy was saved) to motivate families to reduce energy consumption in their homes (Gustafsson, Katzeff, &

Bang, 2009). Energy consumption dropped as much as 31% in some cases. Likewise, another study used challenge to increase the quality of crowdsourcing (Eickhoff, Harris, de Vries, & Srinivasan, 2012), which is the use of large samples of individuals to help generate more accurate data (Le, Edmonds, Hester, & Biewald, 2010). Specifically, the authors created a categorization game that consisted of ten rounds where individuals must 20 relate a target concept with one of four categories. Between each round the time allotted to connect a target concept to a category was decreased, which increased the challenge.

After the experiment, they measured the quality of the data by examining the percentage of agreement between individual responses and the gold standard answer. This gamified version of crowdsourcing produced higher quality responses than non-gamified crowdsourcing, providing some evidence of challenge affecting motivation as those who produce higher quality responses might have done so because the challenge motivated them to produce better responses. However, the authors did not directly measure motivation, making this interpretation of the results suspect. Despite not directly addressing the issue, research on gamification provides evidence for the potential practical value of implementing challenge. Below I provide theoretical support for the motivating nature of challenge from research stemming from three theories incorporating challenge: Flow, achievement motivation, and self-determination theories.

Flow Theory

The notion that challenge motivates is readily apparent in Czikszentmihaly’s

(1990) theory of flow. The flow process is found in contexts such as art

(Csikszentmihalyi & Robinson, 1990), sports (Jackson, 1995), the work world (Debus,

Sonnentag, Deutsch, & Nussbeck, 2014), and everyday life (Csikszentmihalyi &

LeFevere, 1989). Flow is defined as a subjective feeling of total immersion within an activity during which individuals do not notice the passing of time or the rise of fatigue.

In addition, flow leads to feelings of happiness and helps individuals avoid feelings of boredom and anxiety (Csikszentmihalyi & LeFevere, 1989; Csikszentmihalyi, 2014; 21

Csikszentmihalyi & Nakamura, 1989). Thus, individuals are motivated to achieve and maintain flow because it helps them stay in a state of happiness. To achieve flow, an individual must have (a) a close match between their ability and the difficulty of the task they are completing, (b) clear goals, and (c) feedback. In one study designed to assess the effects of flow, participants were surveyed 7-8 times per day for a week. The surveys asked for a report on their happiness level, challenge level of the current activity, and skill level on the current activity (Csikzentmihalyi, 2014). Their responses fell into one of eight categories including a flow category defined by a match between challenges and skills (i.e., high skills and high challenge). All respondents reported the highest levels of happiness in the flow category. In sum, the theory of flow suggests that when individuals are challenged appropriately (i.e., a match between skill level and task difficulty) they are in a flow state which engenders happiness.

Achievement Motivation

Another theory that supports the notion that challenge motivates is achievement motivation (Atkinson, 1957). Atkinson’s (1957) achievement motivation research views motivation a bit differently than flow theory. Atkinson believed that an individual’s propensity to approach a task was determined by (a) an individual’s motive to achieve success, (b) an individual’s motive to avoid failure, (c) the perceived probability of success, and (d) the value of success on the task. In addition, the value of success is positively correlated with task difficulty. Thus, two motivational predispositions exist that help determine an individual’s motivation, the motive to achieve success and the motive to avoid failure. If an individual has a higher motive for success than a motive for 22 failure, then the individual will be more motivated by intermediate difficulty tasks (i.e., tasks that are challenging but not to challenging). Conversely, individuals who fall higher on the motive to avoid failure choose the hardest or easiest task as a way to avoid feelings of failure (on high difficulty tasks they are not expecting to succeed). However, when choice of task is constrained both groups are expected to be the most motivated to perform well under conditions of moderate challenge. For example, in a study on female undergraduates, participants were presented with two 20-minute tasks and were told that a small monetary incentive would be provided for good performance and were told the probability of success. The probability of success varied based on the number of monetary prizes given and the number of individuals the participant was competing against. Regardless of whether individuals score higher on motive for success or failure, they performed best when the probability of success was 50% (moderate challenge)

(Atkinson, 1958). Thus, Atkinson’s conceptualization of needs achievement states that individuals are motivated by a moderate level of challenge if their choice of task is constrained.

Self-Determination Theory

SDT is yet another theory that contains the notion that challenge is a motivator.

SDT is built on the assumptions that humans are naturally active, development-oriented organisms that seek mastery over their environment (Deci & Ryan, 1985; Ryan & Deci,

2002). That is, individuals are intrinsically motivated to satisfy their basic psychological needs to more fully develop themselves (Deci & Ryan, 1985; Deci & Ryan, 2002).

According to SDT, the three basic psychological needs are relatedness, autonomy, and 23 competence. These needs are viewed as necessary nutrients for survival and humans must constantly seek these nutrients to preserve and enhance their functioning (Deci & Ryan,

2002). Consequently, SDT assumes that the need for competence motivates individuals to seek out challenges that are on par with their abilities (Deci & Ryan, 1975). By meeting these challenges, individuals receive evidence that their competence is increasing, whereas meeting trivial challenges provides little feedback about one’s competence

(Eggen & Kauchak, 2004). Thus, it is not only that challenge is present, but that the level of challenge is properly matched with ones abilities that engenders the feeling of competence. Support has been found for this inherent need to find optimally challenging tasks in studies with children. For example, when allowing children to choose activities on which to work children chose the activities that were slightly above their current level of competence (Danner & Lonky, 1981). This need for competence not only drives individuals to seek out optimally challenging tasks, but also motivates individuals to constantly enhance skills through continued practice (Deci & Ryan, 2002). Thus, SDT like flow theory and achievement motivation, asserts that challenge, and specifically optimal challenge, motivates.

Thus, flow theory, achievement motivation theory, and SDT all contain the notion that a match between challenge and ability can potentially increase motivation. One potential practical method for providing the correct amount of challenge are adaptive testing principles (Wise, 2005). Below, I detail what adaptive testing is and how it provides a potential way to provide challenge to individuals during retrieval practice.

24

Adaptive Testing and Retrieval Practice

Adaptive tests are designed to precisely determine the capability of the test taker by matching questions to the individual’s skill level (Weiss, 1985). To do this, adaptive tests continually estimate an individual’s ability (denoted theta) and present questions that closely match that estimated level to further confirm the estimate. If the test-taker answers the question right (wrong) the adaptive test upwardly (downwardly) revises the theta estimate and then presents a new question that closely matches the revised estimate.

The unintended consequence of this process is that the items provide a constant challenge for test-takers because they are often answering questions near their actual ability level, and this constant challenge has been shown to be motivating. For example, one early study on adaptive testing found that low-ability college students reported significantly higher motivation for adaptive tests compared to fixed item tests (Betz & Weiss, 1976a).

In a similar study, a group of high school students took both an adaptive test and a conventional test with a questionnaire on motivation following each test (Pine, Church,

Gialluca, Weiss, 1979). Participants self-reported higher levels of motivation when they were taking the adaptive test than when they were taking the conventional test (e.g., did you care about how well you did on the test). Another positive result comes from an adaptive test version of the Armed Service Vocational Aptitude Battery (ASVAB).

Military recruits that took a computerized adaptive test version of the ASVAB reported significantly higher motivation than those who took the fixed item version (Arvey,

Strickland, Drauden, & Martin, 1990). An exception to the positive effect for adaptive testing came from an experiment that found test performance for high anxiety students 25 was negatively affected by taking the adaptive version of the test (Ortner & Caspers,

2011). However, students were not told how adaptive testing works, which might have been the source of these adverse effects (Wise, 2014).

26

The Present Study

This study sought to address multiple issues. Foremost, the present study sought to explore whether challenge would motivate the use of retrieval practice. To do this I created a challenge condition (i.e., manipulation condition) in which individuals were provided with an appropriate level of challenge for their ability, and two control conditions, and compared the amount of retrieval practice between groups. The first control condition was a random difficulty condition in which participants were presented questions randomly. Given that individuals have control over the difficulty of their practice in real world self-regulated learning contexts it seemed prudent to create a control to model this situation. Thus, a self-chosen difficulty condition was formed to act as a more realistic control. In the self-chosen difficulty condition individuals could pick the difficulty of the questions they pursued. However, due to a programming error the amount of retrieval practice and amount of time spent on the questions could not be explored for the self-chosen difficulty group. I also measured participants’ self-reported motivation, and whether improvement in ability would vary between groups.

Given this experimental design, I hypothesized the following:

Hypothesis 1: Participants will engage in more retrieval practice (i.e., attempt to answer more questions) in the challenge condition than in the two control conditions.

Hypothesis 2: Participants will report higher motivation in the challenge condition than in two control conditions.

Hypothesis 3: Participants’ performance will improve more in the challenge condition than in the two control conditions. 27

Methods

Participants

To recruit participants I used two methods. First, a search for “Universities in

Idaho” was typed into an internet search for each state. The first twenty colleges of each state were searched for various clubs (e.g., honors psychology club, honors math club or clubs identified as academic in nature) that might have individuals interested in taking the

GRE or GMAT. The clubs were sent an email with general study instructions that contained a link to the consent form and pretest (see Appendix A for the email). Second, the same email was posted on forums that might have interested individuals (e.g., gmatclub.com). In addition to the general instructions, the email provided instructions regarding the compensation. That is, the email instructed individuals that if they completed the posttest they would receive a $10 Amazon gift card and prep material from

VERITAS PREP (an academic testing company).Two hundred and forty eight individuals responded to recruitment efforts and clicked on the link that took them to the pretest. Two hundred and sixteen of those 248 individuals completed the pretest. One hundred and seventy individuals completed the pretest, posttest, self-reported motivation, and felt challenge instruments across all conditions. Eighty four individuals were in the random difficulty condition, 86 individuals were in the self-chosen difficulty condition, and 78 were in the challenge condition. One hundred percent of the individuals in the random difficulty and challenge conditions completed the pretest and questions during the practice week. However, only 62.80% of individuals in the self-chosen difficulty condition completed the pretest and the number of individuals who completed questions 28 during the practice week is unknown due to a programming error. In regards to the posttest, 70.23%, 62.80%, and 73.10% of individuals in the random difficulty, self- chosen difficulty, and challenge conditions completed the posttest, respectively. An a priori ANCOVA power analysis determined that 159 and 128 participants were needed to detect a medium effect when power equals .80 and alpha equals .05 (Cohen, 1992) for three and two groups, respectively. Thus, all tests were sufficiently powered.

Measures/Manipulations

Challenge manipulation. Participants were randomly assigned to one of three conditions and provided instructions for what to expect in each condition. Participants in all three conditions were provided with up to 177 math questions that would help them prepare for graduate school exams (e.g., GMAT; GRE). The items were discontinued practice items provided by VERITAS PREP, a graduate testing preparation firm. Because the items had been used extensively by VERITAS PREP, they were able to provide the difficulty level estimates for each item based on fitting a three parameter logistic IRT model to each item. These difficulty levels were used in operationalizing the conditions.

Everyone in the challenge condition received questions that became calibrated to their ability level over time. This was accomplished by splitting the items into four sets based on their difficulty levels. All individuals in the challenge condition were initially given items in the lowest difficulty set. Thus, all individuals started with the exact same questions and answered them in the same order. However, after correctly answering the equivalent of four items of the average difficulty level for that set, individuals could move up in difficulty and thus not all participants answered questions in the same order 29 throughout. The average difficulty level for each set was determined by transforming the item difficulties within a set to z-scores and then shifting them up by adding the lowest z- score in the set. For example, the easiest question in set two had a z-score of -1.77. Thus,

I added 1.77 to each question’s z score in the set. This shifted the average value of the set from zero to 1.77 and the threshold based on the sum of four correct items to 7.08 (e.g.,

1.77*4 = 7.08). Once the sum of items a participant got correct within a set exceeded the set’s threshold the participant moved to the next more difficult set. Via this procedure more capable participants could move to more difficult items relatively quickly. Of note, participants who completed every question in each level were moved up regardless of whether they got the questions right or not. However, this was deemed unlikely to affect the results as there would be no meaningful incentive to click through the questions as the money was not tied to the practice week questions. Individuals in the challenge condition were told what difficulty level they were in each time they achieved a new level (see

Figure 1).

Participants in the random difficulty condition received questions in a random order via a random number generator (see Appendix B).

Individuals in the third condition, the self-chosen difficulty condition, chose the difficulty of the items they worked on by selecting which question they wanted from a list of questions that was ordered from easiest to most difficult (see Appendix B).

Participants were aware that the list was ordered on difficulty, but were not aware of the exact difficulty of any question.

30

Figure 1. Screenshot of the level up message individuals received every time they leveled up.

Performance. Two 10-item tests provided the pre and posttest performance (i.e., percentage of correct answers) for each individual. Each test was comprised of 10

VERITAS PREP math items of varying difficulty reflective of the overall sample of questions. That is, two or three items were taken from each difficulty level used in the challenge condition for each test. The pretest and posttest had reliabilities of α = .71 and

α = .63, respectively.

Retrieval practice. I used the number of questions answered during the practice week as a measure of retrieval practice.

Time on questions. Total time on questions was measured by combining amount of time spent on each question and solution for each participant. I also assessed the average time spent per question per participant. In addition, percent of solutions viewed was measured as the percent of times participants indicated yes on the “would you like to see the solution” question. Thus, if an individual answered three questions and chose to 31 see the solution for two, the percentage would be 67%. One issue to note, there was no time data available for the self-chosen difficulty condition due to the programming error.

Further, there was one difference between the random difficulty condition and the challenge condition in terms of what was included in the time on questions. That is, the question where participants were asked “would you like to see the solution” (see

Appendix B) was not timed in the challenge condition. To estimate the impact of this error, the question was timed over 60 trials and the average time it took to answer this question was 3 seconds, implying that missing this measure created a small error.

Moreover, given the hypothesis, it also created a conservative error.

Motivation. Given that I needed a measure of motivation for a specific task I created one by consulting Dr. Vancouver, a motivational expert. It asked participants about their motivation at the end of the study. Items assessed motivation on a 7 point agree disagree format (i.e., 1 = strongly disagree to 7 = strongly agree). Example items include “I was motivated to complete practice problems” and “my motivation decreased the further the practice week progressed” (reverse coded). The full measure can be found in Appendix C. A reliability analysis on the self-reported motivation measure (SRM) indicated that the internal reliability was good (α = .80). Based on the reliability results, I created a composite of the six items and used this measure for all analyses involving

SRM.

Felt challenge manipulation check. A researcher-developed scale measured felt challenge by asking participants about the amount of challenge they experienced during the practice week. The felt challenge scale was developed with consultation from the 32 same expert in motivation mentioned above. Per standard scale development practices, more items than might ultimately be used were developed because the psychometric quality of the items was unknown. Items assessed felt challenge on a 7-point Likert-type scale (i.e., 1 = strongly disagree to 7 = strongly agree). Example items include "I felt challenged during the practice week" and "The challenge seemed consistent throughout the week.” An inter-item correlation matrix of the felt challenge items can be seen in

Appendix D. A review of the corrected item-total correlations necessitated dropping one item (i.e., item #5) given that the item did not correlate well with the other items in the scale. Thus, a 5-item felt challenge scale was created without item #5. After dropping this item, the reliability was .57, indicating a relatively low reliability.

Demographics. A short questionnaire measured age, gender, year in college, racial and ethnic background, most recent math class, and interest in taking a graduate school exam.

Procedure

The entire study was conducted online using Qualtrics survey software, which could be reached via a link in the emails and forum posts used to recruit the participants.

The link took participants directly to an online consent form that they were asked to read and indicate their consent to participate in the study. The consent form detailed general study instructions and an inducement to complete the posttest (i.e. $10 Amazon gift card).

After consenting, participants took the pretest and provided their email address.

Participants were then randomly assigned to one of the three conditions and emailed instructions relevant to the assigned condition (see Appendix B for instructions by 33 condition). During the practice week, participants were presented one question at a time in multiple-choice format. After each question they were asked if they would like to see the solution. If the participant said “yes” they were presented with the solution and if they said “no” they moved on to the next question. When the week concluded, participants were emailed a link that included the posttest measure, self-reported motivation measure, felt challenge measure, and demographics. Participants received a $10 Amazon gift via email upon completion of the posttest.

34

Results

Descriptives

The means, standard deviations, and intercorrelations of the major variables are provided in Table 1. Furthermore, Figure 2 provides frequency distributions of the amount of retrieval practice in bins of 10. To further breakdown the first bin in Figure 2,

Figure 3 provides the frequency distributions for this first bin at the item level. Figure 4 provides a frequency distribution for the average amount of time on questions by condition. In addition, Figure 5 breaks down the first bin of Figure 4 to capture the frequency distribution of the average time spent per question for those participants who spent a minute or less on average per question. In addition, among those in the challenge condition 39.5%, 12.35%, and 7.41% reached the second, third, and fourth levels of difficulty, respectively. Preliminary analysis indicated that the number of questions answered (i.e., retrieval practice), total time on question, and the average time spent per question were positively skewed (i.e., skew = 2.73; SE = .19, skew = 6.02; SE = .19; skew = 10.53 SE = .19, respectively). Thus, these variables were log transformed for statistical analysis. The log transformation changed the skew for the number of questions answered, total time on the questions, and average time on questions to .38 (SE =.19), .79

(SE =.19), and .98 (SE = .19), respectively. Thus, the variables in question were no longer skewed and no outliers, defined as plus or minus three standard deviations, existed in any of these variables. All further tests with these variables were conducted using the log transformed values. 35

With exception to the test of Hypothesis 3, all analyses were conducted with individuals who completed at least one question during the practice week. Hypothesis 3 was conducted with all individuals who completed the pretest and posttest regardless of the number of questions completed to allow for the equal comparison of all groups. 36

Table 1. Correlation table of main variables Variable 1 2 3 4 5 6 7 8 9 10

1. Pretest Score __

2. Posttest Score .46** __

3. Self-Reported Motivation .12 .36** __

4. Felt Challenge .03 .18* .49** __

5. FC1 .15* .05 .19* .36** __

6. Felt-Ease -.12 .13 .36** .76** - __ .19* 7. Retrieval Practice -.21** - .14 -.14 0 -.17 __ .25** 8. Total Time on Questions(m) -.16* -.21* -.18 -.23* -.12 - .62** __ .21* 9. Average Time on -.05 -.09 -.15 -.18 -.17 -.12 .01 .78* __ Questions(m) 10. Percent of Solutions -.19* -.16 .05 -.13 - -.03 .04 .41** .46** __ Viewed .21* M 62.2 64.9 5.1 4.8 4.4 4.9 19.6 166.74 6.88 .70

SD 24 24.4 1.2 .9 1.0 1.7 30.7 678.74 38.42 .39 37

Note. The correlations for retrieval practice, total time on questions and average time on questions, are based on their log transformed values. However, their means and standard deviations are based on their non-log transformed values as it is more informative. FC1 is the first item in the felt challenge measure. m = minutes, M = mean, SD = standard deviation, * indicates p < .05 and ** indicates p < 38

Frequency Distribution of Retrieval Practice 40 35 30 25 20

Frequency 15 10 5 0

Questions Completed Random Group Challenge Group

Figure 2. Frequency distribution of the number of questions completed per group in bins of five.

39

Frequency Distribution of Retrieval Practice 14 12 10 8 6 4 2

Number Number of Participants 0 1 2 3 4 5 6 7 8 9 10 Questions Completed

Random Group Challenge Group

Figure 3. Frequency distribution of the number of questions completed per group for questions 1-10.

Frequency Distribution for Average Amount of Time Spent Per Question 80 70 60 50 40 30 Frequency 20 10 0 0-1 1.0-2 2.0-3 3.0-4 4.0-5 5.0-6 6.0-7 7.0-8 8.0-9 9.0-10 >10 Average Time Per Question (Minutes)

Random Group Challenge Group

Figure 4. Frequency distribution for the average amount of time spent on questions per participants for both groups. 40

Frequency Distribution for Average Amount of Time Spent on Questions Less than or Equal to a Minute 20 18 16 14 12 10 8 Random Group Frequency 6 Challenge Group 4 2 0 0-.2 .2-.4 .4-.6 .6-.8 .8-.1 Average Time Per Question (Minutes)

Figure 5. Frequency distribution for the average amount of time spent on the questions, broken down by seconds, for each group.

Manipulation Check

To examine if individuals in the felt challenge condition experienced more challenge than individuals in the other conditions, I conducted an independent samples t- test. The t-test revealed that felt challenge was significantly higher, t (109) = -1.10, p <

.05; d = .51, on average, for participants in the challenge condition (M = 4.90; SD = .93) as compared to those in the self-chosen difficulty condition (M = 4.45; SD = .84).

However, there was not a significant difference, t (113) = .12, p = .90, between the average amount of felt challenge in the challenge and random difficulty conditions (M =

4.90; SD = .90). Two alternate measures of felt challenge were also considered. The measures and the results using them are discussed in Appendix E. 41

Hypothesis Tests

Hypothesis 1 predicted that individuals in the challenge condition would attempt more practice problems than individuals in the random difficulty condition. Figure 2 shows the distributions and Figure 6 shows the median number of questions answered for each group. An independent samples t-test revealed a statistically significant difference, t

(152) = -2.30, p < .05; d = .39, between the mean log-transformed number of questions answered by those in the random difficulty condition (M = 2.05, SD =1.31) as compared to those in the challenge condition (M = 2.50, SD = 1.0). These results supported the first hypothesis.

Amount of Retreival Practice 14 12 10 8 6 4 2 0 Challenge Group Random Difficulty Median Number of Questions Answered Number of Questions Median Condition

Figure 6. Median number of questions answered by condition.

42

The second hypothesis stated that individuals would report higher motivation in the challenge condition than in the random difficulty and the self-chosen difficulty conditions. Figure 7 provides the means of the motivation measure by condition. As can be seen, the mean for motivation in the challenge condition (M = 5.30; SD = 1.11) was in between the random difficulty condition (M = 5.43; SD = 1) and self-chosen difficulty condition (M = 4.70; SD = 1.23). An independent samples t-test indicated that the difference between challenge and self-chosen difficulty conditions was statistically significant, t (109) = -2.80, p < .05; d = .51, and in the expected direction. The difference between the challenge and random difficulty condition was not significant, t (113) = .78, p = .44. Thus, the second hypothesis was partially supported.

Figure 7: Mean self-reported motivation by group. Standard error bars are +/- 2. 43

The third hypothesis predicted that participants’ math scores would improve more in the challenge condition than in the random difficulty condition. That is, individuals in the challenge condition would learn more than those in the random difficulty condition.

The mean pre and posttest scores were 61.30 (SD = 24.93) and 66.31 (SD = 23.11) for the challenge condition, 64.20 (SD = 25) and 67.80 (SD = 24.30) for the random difficulty condition, and 60.6 (SD = 21.40) and 60.20 (SD = 25.51) for the self-chosen difficulty condition, respectively (See Figure 8). An omnibus ANOVA indicated there was no significant difference between the three groups on the pretest or the posttest scores, F

(2,213) = .46, p = .63 and F (2,167) = 1.53, p = .21, respectively. To test Hypothesis 3 an

ANCOVA with pretest as the covariate, posttest as the dependent variable, and group as the fixed factor, revealed no significant difference in performance between the three groups, F (2,164) = .085, p = .92. Thus, Hypothesis 3 was not supported.

44

Figure 8. Pre and posttest score by condition. Standard error bars are +/- 2.

Although the log transformation addressed the outlier issue via scale adjustments, the log transformed values still contained individuals that practiced questions (i.e., retrieval practice) either very quickly or for very long. This might have presented a problem as instances of retrieval practice that are very quick may not represent retrieval practice and instances of very long times may indicate times when an individual’s is actually directed elsewhere for much of the time, making the time measure invalid as an index of time on task. Unfortunately, it is difficult to know how long an individual needed to look at a question before it should be counted as retrieval practice, given that each question was different and each person had different abilities. Likewise, determining a value that represents a distracted individual, as opposed to one struggling 45 to understand the concepts represented in the problem, is somewhat problematic. That said, I made two decision rules. Specifically, retrieval practice was not counted if an individual spent less than a minute or more than eight hours looking at a specific question/solution. The analyses were re-run under this conditions. These analyses are presented in Appendix F.

Supplementary Analyses

To provide further insight into the processes and issues involved in this study, several supplementary analyses were run. For example, one concern regarding the challenge manipulation is that, due to the lag in ability-difficulty calibration, many individuals in that condition did not feel much challenge. If this were the case, felt challenge might be negatively correlated with pretest score (i.e., initial ability) in the challenge condition due to high skill individuals potentially not feeling challenge (i.e., whether individuals would feel challenged). To test this, I regressed felt challenge on condition, pretest score (i.e., initial ability level), and their interaction. The coefficient for the interaction was significant, F (1, 111) = .01 p < .05, ∆R2 =.05, suggesting that initial ability moderates the relationship between condition and felt challenge such that in the random difficulty group, the higher the initial ability the less felt challenge and vice versa for the challenge group (see Figure 9). To test the simple main effects, I correlated pretest score and felt challenge by condition. The correlation between pretest score and felt challenge was significantly negative for the random difficulty group and not significant and positive for the challenge group, r (52) = -.275, p < .05 and r (52) = .224, p =.1, respectively. 46

Random Difficulty Challenge Group

Felt Challenge Felt

Pretest Score

Figure 9. Interaction between condition and pretest score on felt challenge.

Given that individuals in the challenge group started with relatively easy questions (i.e., it takes a few questions for the challenge to adapt to an individual’s ability level), a concern was that individuals in the challenge condition with high ability dropped out before they were challenged. If high ability individuals did not continue till they were challenged they would likely report little felt challenge given their lack of exposure to the manipulation, which could account for the weak manipulation check effect. To assess this possibility, I ran a regression on felt challenge for only those individuals in the challenge condition. I entered ability (i.e., pretest score) and retrieval practice (i.e., number of items answered) into the first block and the product of these variables in the second block. The product term was not significant, F (1, 55) = 1.43, p = .87, ∆R2 =.001. In addition, a negative relationship between pretest score (i.e., initial ability) and amount of retrieval 47 practice would indicate that those with high ability might have not have experienced the manipulation. Yet, ability and retrieval practice for those in the challenge condition was not significant r (77) = -0.2, p = .09, indicating that the concern of high ability individuals not being challenged was unwarranted.

Further, to provide additional insight into whether participants were more motivated in the challenge group than the random difficulty group I examined differences in the total time spent on questions and the amount of time spent per question on average.

A difference between groups on the average amount of time spent per question or total time spent on questions might indicate that individuals were differentially motivated in the groups. However, individuals neither spent more total time on questions or more time per question on average in the challenge condition vs the random condition; t (153) = -

.33, p = .74, t (153) = 3.0, p = .10, respectively.

Yet, another potential way to examine whether individuals might have been differentially motivated between groups is to look at the difference in the percentage of solutions examined. Participants that looked at more solutions could be said to be engaging in more retrieval practice. To test this proposition I conducted an independent samples t-test. That is, did the groups differ in the percentage of solutions they viewed with respect to the amount of questions the viewed? The random difficulty group had a significantly higher percentage of solutions viewed, t (151) = 3.4, p < .05; d = .56.

Lastly, given that individuals did not differ in learning between groups, it was important to examine why. It was possible that there was not enough time for individuals to increase their ability. Thus, I looked at the partial correlation between amount of 48 retrieval practice and posttest score controlling for pretest score. If this correlation is significant and positive, it shows that those who did more questions got higher scores on the posttest, meaning that the material was helping them learn and that individual likely did not have enough time with the material to change their ability. However, if the correlation was negative and significant, it shows that individuals that did more questions actually did worse on the posttest, meaning the items were not helping the individuals learn. The correlation was significantly negative, r (105) = -.2, p < .05.

49

Discussion

The present study investigated whether the motivational nature of challenge would encourage the use of retrieval practice, which is a known, effective, but underused study strategy. I tested this by providing a group of individuals with practice math questions that provided a level of challenge that adapted to their ability level. As controls,

I provided another group of individuals questions in no specific order of difficulty, and for a third group I allowed participants to choose any practice items they desired from a list that was ordered by difficulty. In support of the main hypothesis, I found that the challenge group engaged in a significantly greater amount of retrieval practice than the random difficulty group. Furthermore, those in the challenge condition felt more motivation than those in the self-chosen difficulty condition. Unfortunately, no differences were detected on the distal outcome variable (i.e., improvement in algebra performance) or the timing variables (i.e., with the exception of solutions viewed). The discussion focuses on the theoretical, methodological, and practical implications of the study, along with limitations and future directions.

Theoretical Implications

Various theories such as achievement motivation (Atkinson, 1957), flow theory

(Csikszentmihalyi, 1990) and SDT (Deci & Ryan, 1985; Locke, 1968) suggest that appropriate challenge acts as a motivator. However, surprisingly little research evaluates this claim directly. For example, in the past, the notion that challenge improves motivation has only been evaluated indirectly (i.e., motivation was inferred from increased task performance; Porter, Van Maanen, Yeager & Crampton, 1975; Taylor, 50

1981). The current study provides some support for the idea that challenge is motivating via assessing self-reported motivation after challenging study questions. Specifically, the challenge group reported more motivation than the self-chosen difficulty group.

However, the self-chosen condition’s layout in Qualtrics may not have been user-friendly and this could have had some effect on the participant’s motivation. Indeed, the lack of a difference in self-reported motivation between the challenge and random difficulty groups implies that the evidence in favor of the motivating value of challenge remains elusive.

Practical Implications

The more immediate objective of this study was to find an effective way to motivate the use of retrieval practice. One common motivational technique is to increase individuals’ beliefs in the value of a behavior. In the case of retrieval practice, this simple approach was shown to be ineffective (Dunlosky & Rawson, 2015). Yet, alternative methods for motivating retrieval practice are available. In this case, I found that providing challenge is potentially one such method as the challenge group engaged in significantly more retrieval practice than the random difficulty group.

Furthermore, though the training program requires obtaining item difficulty metrics, I used a relatively simple delivery mechanism to operationalize challenge, which is less onerous than a fully implemented computer adaptive testing algorithm (Walter,

Becker, Bjorner, Fliege, Klapp, & Rose, 2007; Young, Shermis, Brutten, & Perkins,

1996). That is, the study design sought to produce challenge by presenting items matched to an individual’s skill level. Specifically, individuals were given more difficult items 51 after answering a sufficient amount of questions in a lower level of difficulty. This simple delivery mechanism may allow organizations to reap the benefits of increased motivation without investing a great deal of resources.

One finding that might be of particular interest to practitioners is that ability moderated the relationship between condition and felt challenge such that the relationship between ability (i.e., pretest score) and felt challenge was negatively related for the random difficulty group and positively related for the felt challenge group (see Figure 9 above). This finding suggests that individuals felt more challenge in the challenge group if they had high initial ability. The implication of the finding is that the lag in presenting items of the appropriate level of challenge may not have adversely affected those high in ability in the challenge condition. This might be due to higher base level motivation for those high in ability, which could have compelled them to get through the first set of questions even though they would be relatively easy for them. Another possible conclusion drawn from the moderated relationship is that individuals more facile with the material might potentially benefit more from using an adaptive challenge strategy for item presentation than individuals less facile with the material, which might be caused by high ability individuals being more apt to want to challenge themselves. Thus, top employees of an organization might benefit most from a challenge practice item delivery system if increasing employee motivation to practice is an organizational goal. Of course, despite some promise, practical implications should not be drawn from this current study until research is conducted to address its limitations. 52

Limitations and Future Directions

As with all research, the current study has some limitations. First, the conservative nature in which the ability-challenge match was calculated (i.e., the program did not revise and adapt to the skill level of the participants after every question practiced) limits the claim that challenge might lead to increased retrieval practice. That is, in the current study, an individual who possessed ability above the first level of questions at the start of the study (i.e., the level at which every participant starts) were not exposed to questions at their ability level until they answer several questions correctly,

Thus, not every participant with high ability experienced the manipulation, and the point at which they experienced challenge was likely different depending on ability. In addition, as shown in Figures 2 and 4 many did not answer enough questions to experience challenge, leading to questions about the robustness of the data. In particular, when more conservative criteria was applied to determining what data was legitimate, no significant effects were found (see Appendix F). Further, there was also the possibility that individuals could continually get questions wrong and move up through all the levels in the challenge condition. The above issues may have undermined the ability-item difficulty match that those in the challenge condition were meant to experience.

Nonetheless, several results imply the aforementioned limitation may not have been a substantial problem. First, high ability individuals were more likely to report being challenged and were not any less likely to practice items than lower ability individuals.

Nonetheless, future research should seek to replicate these findings with a more advanced way to match ability and challenge (e.g., computer adaptive testing) that 53 quickly places individuals into the most ideal ability-challenge match. For example, a pretest might be used to place individuals accurately into levels, though such a test would likely need more than the ten items used in this study. Of course a concern is that in a real world context it is unlikely that individuals would want to go through an extensive pretest before training even began and thus this method might decrease the amount of people that use the training program which ultimately makes it unattractive in a practical sense.

Another possible limitation is that the weak measures likely undermined the findings with regards to learning and felt challenge. For example, I did not find an effect of retrieval practice on learning (i.e., change in performance). However, substantial research under more controlled conditions has robustly found the value of retrieval practice on learning (Butler, 2010; Karpicke & Blunt, 2011; Karpicke & Roediger, 2008).

The absence of a learning affect indicates the learning measure was likely not sensitive enough to pick up on whatever ability changes might have occurred after only a week of possibly practicing some math problems. Although the practice period could have been extended, such a procedure might have reduced the number of participants that completed the study (i.e., took the posttest). Indeed, because of this concern I provided a financial incentive and intentionally kept the tests and practice period short to avoid the threats differential attrition might introduce (Shadish, Cook, & Campbell, 2002). Yet, even with the financial incentive and short time frame about 50% of individuals who started the study did not take the posttest at the end of the study. However, the pre and posttests were only somewhat reliable (α = .71 and α = .63, respectively). Thus, more reliable pre and posttest measures via adding more items might also increase the construct validity of 54 the learning measure. Further, the felt challenge measure also did not show strong psychometric properties. Thus, a better measure that more accurately taps the felt challenge construct should be developed to assess this psychological construct.

Potentially including focus groups in the development process might help clarify any conceptual misunderstanding of the items (Gehlbach & Brinkworth, 2011). In addition, potentially asking for felt challenge after every time an individual stopped practicing may reduce some of the potential noise that could be caused by the lag between an individuals practice and when they completed the felt challenge measure.

In addition to addressing the limitations mentioned, future research could be aimed at exploring alternate methods to engender the use of retrieval practice. For example, challenge is likely only one way to motivate the use of retrieval practice and similar learning strategies. Another method might be to provide an extrinsic incentive, such as money. However, this has been tried and the results were not encouraging. In a prior study, Dunlosky and Rawson (2015) found that a $25 gift card did not motivate students to use retrieval practice correctly. Yet, in that study the extrinsic reward was not tied to the completion of retrieval practice directly; rather it was tied to performance on a test (Dunlosky & Rawson, 2015). Thus, future research could look into tying extrinsic rewards directly to the use of efficient learning or study strategies. In addition, retrieval practice goals could be set for employees given that goals are seen as effective ways to increase motivation (Locke & Latham, 1990). For example, an employer might institute a job specific retrieval practice goal such as reviewing safety procedures once a week (e.g., for construction workers). 55

Despite the present findings’ inconclusiveness in regards to challenge as a motivator, intrinsic motivators are thought to be more motivating than extrinsic motivators for self-regulation processes (Ryan & Deci, 2000). Thus, it may be better to increase intrinsic motivation through ways other than challenge. However, merely instructing individuals on the benefits of retrieval practice does not work. A potential explanation for individuals not using retrieval practice after being told of its benefits is that they do not feel it makes a difference. That is, they do not feel the technique has a sense of impact (i.e., how much individuals feel they make a difference), which has been identified as one way to increase intrinsic motivation (Thomas & Velthouse, 1990). It might be possible to increase this sense of impact by not only exposing the individuals to the knowledge that retrieval practice works, but also to the results of retrieval practice.

That is, have individuals take a test with and without retrieval practice in order to show them the benefits. This process might make the benefits very salient and thus increase motivation to use the technique.

The theory of planned behavior suggests another potential way to increase the use of retrieval practice. That is, the theory of planned behavior suggests that behavior is guided by behavioral beliefs, beliefs about the expectations of other people, and the beliefs about how much control one has over the situation (Ajzen, 2002). The combination of these three beliefs leads to behavioral intention and ultimately behavior.

Thus, if one could modify all three beliefs they might be able to increase the probability of the desired behavior. For instance, reminding individuals that they have control over their learning should lead to positive perceived behavioral control. Reinforcing the belief 56 that retrieval practice positively increases the chances of obtaining value outcomes would also be a part of this approach. Finally, creating social pressure to engage in retrieval practice, perhaps by making it a common practice in educational settings, might fulfill the subjective norm category. Indeed, having retrieval practice as a habit formed in childhood would create a situation in which the normal situation is individuals using retrieval practice to study. Thus, modifying all three beliefs in this way would satisfy the theory of planned behavior. Given the above modifications, it would be reasonable to assume the use of retrieval practice would increase. Future research should explore these potential motivational mechanisms in regards to motivating the use of efficient learning strategies.

57

Conclusion

The importance of finding ways to motivate effective learning is impactful for nearly all individuals, especially in the workplace. This study provides some evidence that challenge motivates the increased use of retrieval practice in a self-regulated learning context. For example, the challenge group completing more retrieval practice than the random difficulty group and reporting more motivation than the self-chosen difficulty group provides some promise. Yet, despite the challenge of testing these relationships, future research is needed to determine if the relationships in question are robust or motivation techniques can help increase the use of effective learning strategies.

58

References

Ajzen, I. (2002). Perceived behavioral control, self-efficacy, locus of control, and the

theory of planned behavior. Journal of applied social psychology, 32, 665-683.

Arthur Jr, W., Bennett Jr, W., Edens, P. S., & Bell, S. T. (2003). Effectiveness of training

in organizations: a meta-analysis of design and evaluation features. Journal of

Applied psychology, 88, 234-245.

Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001) A taxonomy for learning,

teaching, and assessing: A revision of Bloom's taxonomy of educational

objectives. New York, NY: Longman.

Arvey, R. D., Strickland, W., Drauden, G., & Martin, C. (1990). Motivational

components of test taking. Personnel Psychology, 43, 695-716.

Atkinson, J. W. (1957). Motivational determinants of risk-taking behavior. Psychological

Review, 64, 359-372.

Atkinson, J. W. (1958). Towards experimental analysis of human motivation in terms of

motives, expectancies, and incentives. Motives in Fantasy, Action and Society,

(pp. 288–305). Princeton, NJ: Van Nostrand.

Atkinson, R. C., & Paulson, J. A. (1972). An approach to the psychology of

instruction. Psychological Bulletin, 78, 49-61.

Bahrick, H. P. (1979). Maintenance of knowledge: Questions about memory we forgot to

ask. Journal of Experimental Psychology: General, 108, 296-308.

Balota, D. A., Duchek, J. M., & Paullin, R. (1989). Age-related differences in the impact

of spacing, lag, and retention interval. Psychology and Aging, 4, 3-9. 59

Balota, D. A., Duchek, J. M., Sergent-Marshall, S. D., & Roediger III, H. L. (2006). Does

expanded retrieval produce benefits over equal-interval spacing? Explorations of

spacing effects in healthy aging and early stage Alzheimer’s disease. Psychology

and Aging, 21, 19-31.

Bangert-Drowns, R., Kulik, C., Kulik, J., & Morgan, M. (1991). The Instructional-Effect

of Feedback in Test-Like Events. Review of Educational Research, 61, 213–238.

Berry, D. C. (1983). Metacognitive experience and transfer of logical reasoning. The

Quarterly Journal of Experimental Psychology, 35, 39–49.

Bersin, J. (2014). Spending on corporate training soars: Employee capabilities now a

priority. Forbes Magazine.

http://www.forbes.com/sites/joshbersin/2014/02/04/the-recovery-arrives-

corporate-training-spend-skyrockets/#5553837d4ab7

Betz, N. E., & Weiss, D. J. (1976). Psychological Effects of Immediate Knowledge of

Results and Adaptive Ability Testing. Research Report 76-4.

Burke, M. J., & Day, R. R. (1986). A cumulative study of the effectiveness of managerial

training. Journal of Applied Psychology, 71, 232-245.

Butler, A. C. (2010). Repeated testing produces superior transfer of learning relative to

repeated studying. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 36, 1118-1133.

Butler, A. C., Karpicke, J. D., & Roediger III, H. L. (2008). Correcting a metacognitive

error: feedback increases retention of low-confidence correct responses. Journal

of Experimental Psychology: Learning, Memory, and Cognition, 34, 918-928. 60

Carroll, M. Campbell-Ratcliffe, J. Murnane, H., & Perfect, T. (2007). Retrieval-induced

in educational contexts: Monitoring, expertise, text integration, and test

format. European Journal of Cognitive Psychology, 19, 580-606.

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed

practice in verbal recall tasks: A review and quantitative synthesis. Psychological

Bulletin, 132, 354-380.

Cerasoli, C. P., Nicklin, J. M., & Ford, M. T. (2014). Intrinsic motivation and extrinsic

incentives jointly predict performance: A 40-year meta-analysis. Psychological

Bulletin, 140, 980-1008.

Chi, M. T., De Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-

explanations improves understanding. Cognitive science, 18, 439-477.

Cohen, J. (1992). A power primer. Psychological bulletin, 112, 155-159.

Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of

training motivation: a meta-analytic path analysis of 20 years of research. Journal

of Applied Psychology, 85, 678-707.

Csikszentmihalyi, M. (1990). The psychology of optimal experience. New York, NY:

Harper & Row.

Csikszentmihalyi, M. (2014). Flow and the foundations of positive psychology: The

collected works of Mihaly Csikszentmihalyi. Berlin, Germany: Springer.

Csikszentmihalyi, M., & LeFevre, J. (1989). Optimal experience in work and leisure.

Journal of Personality and Social Psychology, 56, 815-822. 61

Csikszentmihalyi, M., & Nakamura, J. (1989). The dynamics of intrinsic motivation: A

study of adolescents. Research on Motivation in Education, 3, 45–71.

Csikszentmihalyi, M., & Robinson, R. E. (1990). The art of seeing: An interpretation of

the aesthetic encounter. Los Angeles, CA: Getty Publications.

Danner, F. W., & Lonky, E. (1981). A cognitive-developmental approach to the effects of

rewards on intrinsic motivation. Child Development, 52, 1043-1052.

Debus, M. E., Sonnentag, S., Deutsch, W., & Nussbeck, F. W. (2014). Making flow

happen: The effects of being recovered on work-related flow between and within

days. Journal of Applied Psychology, 99, 713-722.

Deci, E. L. (1975). Intrinsic motivation. New York, NY: Plenum Press.

Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human

behavior. New York, NY: Plenum Press.

Deci, E. L., & Ryan, R. M. (2002). Overview of self-determination theory: An

organismic dialectical perspective. Handbook of Self-Determination Research,

(pp. 3–33). University of Rochester Press.

Dornisch, M. M., & Sperling, R. A. (2006). Facilitating learning from technology-

enhanced text: Effects of prompted elaborative interrogation. The Journal of

Educational Research, 99, 156–166.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013).

Improving students’ learning with effective learning techniques promising

directions from cognitive and educational psychology. Psychological Science in

the Public Interest, 14, 4–58. 62

Dunlosky, J., & Rawson, K. A. (2015). Do students use testing and feedback while

learning? A focus on key concept definitions and learning to criterion. Learning

and Instruction, 39, 32-44.

Eggen, P., & Kauchak, D. (2005). Introduction to teaching: Becoming a professional.

Upper Saddle River, NJ: Prentice Hall.

Eickhoff, C., Harris, C. G., de Vries, A. P., & Srinivasan, P. (2012). Quality through flow

and immersion: gamifying crowdsourced relevance assessments. In Proceedings

of the 35th international ACM SIGIR conference on Research and development in

information retrieval (pp. 871–880). New York, NY: ACM.

Fazio, L. K., Agarwal, P. K., Marsh, E. J., & Roediger, H. L. (2010). Memorial

consequences of multiple-choice testing on immediate and delayed tests. Memory

& Cognition, 38, 407-418.

Gehlbach, H., & Brinkworth, M. E. (2011). Measure twice, cut down error: A process for

enhancing the validity of survey scales. Review of General Psychology, 15, 380-

387.

Glover, J. A. (1989). The “testing” phenomenon: Not gone but nearly forgotten. Journal

of Educational Psychology, 81, 392-399.

Gustafsson, A., Katzeff, C., & Bang, M. (2009). Evaluation of a pervasive game for

domestic energy engagement among teenagers. Computers in Entertainment

(CIE), 7, 1-19. 63

Hamari, J., Koivisto, J., & Sarsa, H. (2014). Does gamification work?–a literature review

of empirical studies on gamification. In System Sciences (HICSS), 2014 47th

Hawaii International Conference on (pp. 3025–3034). IEEE.

Healy, A. F., Kole, J. A., & Bourne, L. E. (2014). Training principles to advance

expertise. Frontiers Karpicke in Psychology, 5, 166-169.

Jackson, S. A. (1995). Factors influencing the occurrence of flow state in elite athletes.

Journal of Applied Sport Psychology, 7, 138–166.

Kanfer, R. (1990). Motivation theory and industrial and organizational psychology.

Handbook of Industrial and Organizational Psychology, 1, 75–130.

Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than

elaborative studying with concept mapping. Science, 331, 772–775.

Karpicke, J. D., Butler, A. C., & Roediger III, H. L. (2009). Metacognitive strategies in

student learning: do students practise retrieval when they study on their own?

Memory, 17, 471–479.

Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for

learning. Science, 319, 966–968.

Keith, N., & Frese, M. (2008a). Effectiveness of error management training: a meta-

analysis. Journal of Applied Psychology, 93, 59-69.

Koppes, L. L. (2014). Historical perspectives in industrial and organizational

psychology. Psychology Press.

Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study.

Psychonomic Bulletin & Review, 14, 219-224. 64

Le, J., Edmonds, A., Hester, V., & Biewald, L. (2010). Ensuring quality in crowdsourced

search relevance evaluation: The effects of training question distribution. In

SIGIR 2010 workshop on crowdsourcing for search evaluation (pp. 21–26).

Locke, E. A. (1968). Toward a theory of task motivation and incentives. Organizational

Behavior and Human Performance, 3, 157–189.

Locke, E. A., & Latham, G. P. (1990). A theory of goal setting & task performance.

Prentice-Hall.

McGeoch, J. A. (1942). The psychology of human learning: An introduction.

New York, NY: Longmans, Green.

Metcalfe, J., & Kornell, N. (2005). A region of proximal learning model of study time

allocation. Journal of Memory and Language, 52, 463–477.

Ortner, T. M., & Caspers, J. (2011). Consequences of test anxiety on adaptive versus

fixed item testing. European Journal of Psychological Assessment.

Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When Does Feedback

Facilitate Learning of Words? Journal of Experimental Psychology. Learning,

Memory & Cognition, 31, 3–8.

Patel, L. (2010). 2010 State of the Industry Continue Dedication. T+D, 64, 48-53.

Pine, S. M., Church, A. T., Gialluca, K. A., & Weiss, D. J. (1979). Effects of

Computerized Adaptive Testing on Black and White Students. DTIC Document.

Porter, L. W., Van Maanen, J., Yeager, F., & Crampton, W. J. (1971). Continuous

monitoring of employees' motivational attitudes during the initial employment

period. 65

L. Porter, E. Lawler, & R. Hackman (Eds.), (1975). Behavior in organizations. New

York, NY: McGraw-Hill.

Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater

difficulty correctly recalling information lead to higher levels of memory?

Journal of Memory and Language, 60, 437–447.

Rickard, T. C., Lau, J. S.-H., & Pashler, H. (2008). Spacing and the transition from

calculation to retrieval. Psychonomic Bulletin & Review, 15, 656–661.

Roediger, H. L., (personal communication, September 18, 2015)

Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-

term retention. Trends in Cognitive Sciences, 15, 20–27.

Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning taking memory tests

improves long-term retention. Psychological science, 17, 249-255.

Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of

multiple-choice testing. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 31, 1155-1159.

Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves

learning. Instructional Science, 35, 481–498.

Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: Classic definitions

and new directions. Contemporary educational psychology, 25, 54-67.

Salas, E., Tannenbaum, S. I., Kraiger, K., & Smith-Jentsch, K. A. (2012). The science of

training and development in organizations: What matters in practice?

Psychological Science in the Public Interest, 13, 74–101. 66

Schneider, V. I., Healy, A. F., & Bourne Jr, L. E. (1998). Contextual interference effects

in foreign language vocabulary acquisition and retention. Foreign Language

Learning: Psycholinguistic Studies on Training and Retention (pp. 77–90).

Mahwah, NJ: Lawrence Erlbaum Associates.

Seifert, T. L. (1993). Effects of elaborative interrogation with prose passages. Journal of

Educational Psychology, 85, 642-651.

Smith, S. M., Glenberg, A., & Bjork, R. A. (1978). Environmental context and human

memory. Memory & Cognition, 6, 342–353.

Smith, S. M., & Rothkopf, E. Z. (1984). Contextual enrichment and distribution of

practice in the classroom. Cognition and Instruction, 1, 341–358.

Taylor, M. S. (1981). The motivational effects of task challenge: A laboratory

investigation. Organizational Behavior and Human Performance, 27, 255-278.

Taylor, P. J., Russ-Eft, D. F., & Chan, D. W. (2005). A meta-analytic review of behavior

modeling training. Journal of Applied Psychology, 90, 692-709.

Thomas, K. W., & Velthouse, B. A. (1990). Cognitive elements of empowerment: An

“interpretive” model of intrinsic task motivation. Academy of management

review, 15, 666-681.

Toppino, T. C., Kasserman, J. E., & Mracek, W. A. (1991). The effect of spacing

repetitions on the recognition memory of young children and adults. Journal of

Experimental Child Psychology, 51, 123–138.

Walter, Becker, Bjoner, & Fliege (2007). Development and evaluation of a computer

adaptive test for ‘Anxiety’(Anxiety-CAT)." Quality of Life Research 16, 143-155. 67

Weiss, D. J. (1985). Adaptive testing by computer. Journal of consulting and clinical

psychology, 53, 774-789.

William R.. Shadish, Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-

experimental designs for generalized causal inference. Wadsworth Cengage

learning.

Wise, S. L. (2014). The utility of adaptive testing in addressing the problem of

unmotivated examinees. Journal of Computerized Adaptive Testing, 2, 1–17.

Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment:

Problems and potential solutions. Educational Assessment, 10, 1–17.

Woloshyn, V. E., Pressley, M., & Schneider, W. (1992). Elaborative-interrogation and

prior-knowledge effects on learning of facts. Journal of Educational Psychology,

84, 115-124.

Young, R., Shermis, M. D., Brutten, S. R., & Perkins, K. (1996). From conventional to

computer-adaptive testing of ESL reading comprehension. System, 24, 23-40.

68

Appendix A: First Contact Email and Forum Post

Subject Of Email/ Forum Title=Get Paid to Prep for the GMAT/GRE and Help with Research

Hi, my name is Kyle Bayes and I'm a researcher from Ohio University. I am emailing various clubs/organizations that might have members (or friends) interested in taking the GMAT/GRE.

I am conducting a study that provides free GRE/GMAT math prep material. In addition, the first 200 individuals to complete the study will also receive a $10 amazon gift card upon completion of the study. If you have/ know of any individuals that might be interested please forward this message (this is our main method of recruiting so it is essential to the study). I appreciate anyone you can send this information to!

The study will be discontinued after 200 participants have completed the study.

**Below I detail the study**

HELP WITH RESEARCH AND PREPARE FOR THE GMAT or GRE!

Hello, I am conducting a study to explore factors that might influence study habits. During this week long study individuals will have access real test-prep questions from VERITAS Prep – company that creates material for those interested in training for exams used by graduate programs. These questions are pulled directly from database used to test individuals who sign up for their tutoring. In addition, Veritas has provided participants of this study with portions of their test prep books free of charge. Thus, participating in this study not only helps advance research, but also it may help you prepare for the math portion of a graduate school exam (the questions are specifically from the GMAT prep).

Let me briefly describe the general layout of the study.

1. Individuals are presented a consent form and then given a link to 10-question pretest to gauge current ability.

2. After the pretest, individuals will be provided a link to a set of practice questions with solutions and shortcuts for each question available upon request.

3. These practice questions will be available for a week.

4. Following this week, individuals will take another 10-question posttest and answer a few brief questions about yourself. At this point the Veritas Prep Material can be 69 downloaded for your indefinite use. Furthermore, individuals will be sent a $10 amazon gift card on the Friday following their completion of the posttest.

Click the link below to get started (or copy paste into search bar) https://ohio.qualtrics.com/SE/?SID=SV_eM3ApM7kIBFJtad

70

Appendix B: Participant’s Study View

Challenge Condition

Step 1: After email login participants are presented with an instructions slide that also details the condition they are in. Upon first login in the challenge condition they are told they are currently in level 1 of the math prep and that we will notify them as they reach level 2.

71

Step 2: Participants must answer then question presented to them. That is, they cannot click the submit button and skip a question.

72

Step 3: Participants are then taken to a different page in which they are asked “would you like to see the solution” as seen in the below screenshot and they must pick and answer and hit page submit.

73

Step 4: Participants are taken to a different page in which they are shown the solution as seen in the screenshot below. On this page they click submit and then are redirected to the next question in which the process repeats. If participants click no and hit page submit they a directed to another question and the process repeats.

74

Random Difficulty

Step 1 Participants are given instructions and must press the submit button to continue.

75

Step 2: Participants are randomly presented with a question that they can choose not to answer if they wish.

76

Step 2.1: If they choose to pick an answer they are presented with the following

77

Step 2.2: If they clicked yes (they would like to see the solution) they are presented the screenshot below. If no is entered they must then click the page submit button and the above process repeats.

78

Self-Chosen Difficulty Condition

Step 1: After logging in participants are presented with instructions and then must click the page submit button to continue.

79

Step 2: Participants are then presented with the following screen in which they are able to pick any question they like. They must also click the page submit button to continue to the question. Participants are also able to press the submit button without selecting a question but it just redirects them to the same page

80

Step 3. An individual is presented with the question they picked and can either choose to not answer the question by clicking submit or answer the question.

81

Step 3.1: If the participant chooses to answer they are presented with the following option to either see or not see the solution.

82

Step 3.2. If they click no they are not presented with the solution and must press the page submit button to continue. They are then presented with the list of questions and the entire process is repeated. However, if they click yes they are presented with the solution and must hit the page submit button before the process repeats.

83

Appendix C: Measures

Self-reported motivation

Instructions: Please indicate to what degree you agree or disagree with the following items.

1. I was motivated to complete practice problems.

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

2. My motivation to complete practice problems was steady throughout the week.

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

3. I focused when completing the practice problems.

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

4. I put little effort into the practice questions (Reverse scored)

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

5. I was motivated when completing tough questions.

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

6. I routinely skipped questions without trying to answer them correctly (Reverse scored).

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

Felt Challenge

1. I felt challenged during the practice week. (Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

2. The practice was often too easy. (Reverse scored) (Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

3. The practice was often too difficult (Reverse scored).

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree) 84

4. I often felt anxiety when completing practice problems.

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

5. The challenge seemed consistent throughout the week

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

6. I often felt bored when completing the practice problems (Reverse Scored).

(Strongly Disagree) 1 2 3 4 5 6 7(Strongly Agree)

Demographics

Please answer the following questions.

What is your gender? Male Female

What is your age? _____

What is your grade? ____

What is your ethnicity? ____

What is your major? ______

Do you plan on taking a graduate school test such as the GRE or GMAT? Yes No

If yes, have you started studying? Yes No

Have you taken a graduate school test or practice test before? Yes No

Have you purchased any graduate school test prep materials? Yes No

What is the highest level of math you have taken (e.g., high school geometry, college algebra, college calculus)?

85

Appendix D: Correlation Matrix for Felt Challenge

Inter-Item Correlation Matrix for Felt Challenge

FC1 FC2 FC3 FC4 FC5 FC6 FC1 1.0 ______FC2 .19 1.00 ______FC3 -.29 .00 1.00 ______FC4 .21 -.25 .08 1.00 __ __ FC5 -.02 .19 .2 .02 1.00 __ FC6 .00 .63 .23 -.25 .27 1.00

86

Appendix E: Additional Felt Challenge Measures

Given the low reliability of the felt challenge measure, I considered two alternate measures to explore if they better tapped the challenge construct. First, I conducted an

ANOVA to determine if there were differences on just the first item of the felt challenge scale (i.e., I felt challenged during the practice week), as that item might directly tap challenge given that it seemed the most content valid of all the items. However, no significant differences across conditions were found, F (2,167) = .342, p = .711. Second,

I combined the second (i.e., I often felt the items were too easy) and sixth (I often felt bored when completing the practice problems) items into a measure (i.e., measure of felt- ease) given that they both tapped the opposite of felt challenge and had a reasonably high correlation with one another (r = .63). This measure might have provided a reliable way to discriminate between the groups on a measure that tapped the opposite of challenge.

This felt ease measure had an alpha of .77, but the groups did not significantly differ on the measure, F (2,167) = 2.643, p = .074.

87

Appendix F: Analyses Changes after Deletion of Suspicious Retrieval Practice

Instances

The frequency distributions of the times individual spent on items made it clear that many individuals may not have been engaging in retrieval practice. To address this, instances in which individuals spent an inappropriate amount of time on an item were excluded and the analyses were rerun. Specifically, items were dropped from counting as retrieval practice or in the timing data if they were less than a minute or greater than eight hours. This resulted in 1946 items being dropped out of 3175 that were included in the original analyses. In addition, the deletion of these items also eliminated some individuals from all analyses due to the decision to exclude anyone that did not complete at least one instance of retrieval practice. Thus, the sample sizes for all independent samples t-tests dropped from 154 to 127, which decreases the power substantially and some results changed. Specifically, Hypothesis 1 was no longer significant, meaning that individuals in the challenge condition (M = 1.34) did not do significantly more questions than those in the random difficulty condition (M = 1.43), t (125) = -.42, p =.667; d = .07. The results for the second hypothesis, third hypothesis, and the felt challenge manipulation check were unchanged. In regards to the timing data, the challenge group spent more total time on the questions than the random group using a liberal alpha level, t (125) = -.188, p

=.06; d = .3, but did not differ in the average amount of time spent per question, t (125) =

1.51, p =.13; d = .27.

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! Thesis and Dissertation Services