Unbounded Human Learning: Optimal Scheduling for Spaced Repetition
Total Page:16
File Type:pdf, Size:1020Kb
Unbounded Human Learning: Optimal Scheduling for Spaced Repetition Siddharth Reddy Igor Labutov Siddhartha Banerjee Department of Computer Electrical and Computer Operations Research and Science Engineering Information Engineering Cornell University Cornell University Cornell University [email protected] [email protected] [email protected] Thorsten Joachims Department of Computer Science Cornell University [email protected] ABSTRACT 1. INTRODUCTION In the study of human learning, there is broad evidence that The ability to learn and retain a large number of new our ability to retain information improves with repeated ex- pieces of information is an essential component of human posure and decays with delay since last exposure. This plays learning. Scientific theories of human memory, going all the a crucial role in the design of educational software, leading way back to 1885 and the pioneering work of Ebbinghaus [9], to a trade-off between teaching new material and reviewing identify two critical variables that determine the probability what has already been taught. A common way to balance of recalling an item: reinforcement, i.e., repeated exposure this trade-off is spaced repetition, which uses periodic review to the item, and delay, i.e., time since the item was last re- of content to improve long-term retention. Though spaced viewed. Accordingly, scientists have long been proponents repetition is widely used in practice, e.g., in electronic flash- of the spacing effect for learning: the phenomenon in which card software, there is little formal understanding of the periodic, spaced review of content improves long-term reten- design of these systems. Our paper addresses this gap in tion. three ways. First, we mine log data from spaced repetition A significant development in recent years has been a grow- software to establish the functional dependence of retention ing body of work that attempts to `engineer' the process of on reinforcement and delay. Second, we use this memory human learning, creating tools that enhance the learning model to develop a stochastic model for spaced repetition process by building on the scientific understanding of hu- systems. We propose a queueing network model of the Leit- man memory. These educational devices usually take the ner system for reviewing flashcards, along with a heuristic form of ‘flashcards' { small pieces of information content approximation that admits a tractable optimization prob- which are repeatedly presented to the learner on a sched- lem for review scheduling. Finally, we empirically evaluate ule determined by a spaced repetition algorithm [4]. Though our queueing model through a Mechanical Turk experiment, flashcards have existed for a while in physical form, a new verifying a key qualitative prediction of our model: the exis- generation of spaced repetition software such as SuperMemo tence of a sharp phase transition in learning outcomes upon [20], Anki [10], Mnemosyne [2], Pimsleur [18], and Duolingo increasing the rate of new item introductions. [3] allow a much greater degree of control and monitoring of the review process. These software applications are growing CCS Concepts in popularity [4], but there is a lack of formal mathemati- cal models for reasoning about and optimizing such systems. •Applied computing ! Computer-assisted instruc- In this work, we combine memory models from psychology tion; •Mathematics of computing ! Queueing the- with ideas from queueing theory to develop such a mathe- ory; matical model for these systems. In particular, we focus on one of the simplest and oldest spaced repetition methods: Keywords the Leitner system [13]. Spaced Repetition; Queueing Models; Human Memory The Leitner system, first introduced in 1970, is a heuristic for prioritizing items for review. It is based on a series of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed decks of flashcards. After the user sees a new item for the for profit or commercial advantage and that copies bear this notice and the full citation first time, it enters the system at deck 1. The items at each on the first page. Copyrights for components of this work owned by others than the deck form a first-in-first-out (FIFO) queue, and when the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission user requests an item to review, the system chooses a deck i and/or a fee. Request permissions from [email protected]. according to some schedule, and presents the top item. If the KDD ’16, August 13 - 17, 2016, San Francisco, CA, USA user does not recall the item, the item is added to the bottom c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. of deck i−1; else, it is added to the bottom of deck i+1. The ISBN 978-1-4503-4232-2/16/08. $15.00 aim of the scheduler is to ensure that items from lower decks DOI: http://dx.doi.org/10.1145/2939672.2939850 are reviewed more often than those from higher decks, so the 1.2 Our Contributions user spends more time working on forgotten items and less The key contributions of this paper fall into two cate- time on recalled items. Existing schemes for assigning review gories. First, the paper introduces a principled methodol- frequencies to different decks are based on heuristics that are ogy for designing review scheduling systems with various not founded on any formal reasoning, and hence, have no learning objectives. Second, the models we develop provide optimality guarantees. One of our main contributions is a qualitative insights and general principles for spaced repe- principled method for determining appropriate deck review tition. The overall argument in this paper consists of the frequencies. following three steps: The problem of deciding how frequently to review different decks in the Leitner system is a specific instance of the more 1. Mining large-scale log data to validate human memory general problem of review scheduling for spaced repetition models: First, we perform observational studies on data software. The main challenge in all settings is that schedules from Mnemosyne [2], a popular flashcard software tool, must balance competing priorities of introducing new items to compare different models of retention probability as and reviewing old items in order to maximize the rate of a function of reinforcement and delay. Our results, pre- learning. While most existing systems use heuristics to make sented in Section2, add to the existing literature on mem- this trade-off, our work presents a principled understanding ory models and provide the empirical foundation upon of the tension between novelty and reinforcement. which we base our mathematical model of spaced repeti- tion. 2. Mathematical modeling of spaced repetition systems: Our 1.1 Related Work main contribution lies in embedding the above memory The scientific literature on modeling human memory is model into a stochastic model for spaced repetition sys- highly active and dates back more than a century. One of the tems, and using this model to optimize the review sched- simplest memory models, the exponential forgetting curve, ule. Our framework, which we refer to as the Leitner was first studied by Ebbinghaus in 1885 [9] { it models the Queue Network, is based on ideas from queueing theory probability of recalling an item as an exponentially-decaying and job scheduling. Though conceptually simple and easy function of the time elapsed since previous review and the to simulate, the Leitner Queue Network does not provide memory `strength'. The exact nature of how strength evolves a tractable way to optimize the review schedule. To this as a function of the number of reviews, length of review in- end, we propose a (heuristic) approximate model, which tervals, and other factors is a topic of debate, though there is in simulations is close to our original model for low ar- some consensus on the existence of a spacing effect, in which rival rates, and which leverages the theory of product- spaced reviews lead to greater strength than massed reviews form networks [11,6] to greatly simplify the scheduling (i.e., cramming) [8,5]. Recent studies have proposed more problem. This allows us to study several relevant ques- sophisticated probabilistic models of learning and forgetting tions: the maximum rate of learning, the effect of item [17, 15], and there is a large body of related work on item difficulties, and the effect of a learner's review frequency response theory and knowledge tracing [14,7]. Our work on their overall rate of learning. We present our model, both contributes to this literature (via observational studies theory, and simulations in Section3. on log data from the Mnemosyne software) and uses it as the basis for our queueing model and scheduling algorithm. 3. Verifying the mathematical model in controlled experi- Though used extensively in practice (see [4] for an excel- ments: Finally, we use Amazon Mechanical Turk [1] to lent overview), there is very limited literature on the de- perform large-scale experiments to test our mathemati- sign of spaced repetition software. One notable work in cal models. In particular, we verify a critical qualitative this regard is that of Novikoff et al. [16], who propose a prediction of our mathematical model: the existence of a theoretical framework for spaced repetition based on a set phase transition in learning outcomes upon increasing the of deterministic operations on an infinite string of content rate of introduction of new content beyond a maximum pieces. They assume identical items and design schedules threshold. Our experimental results agree well with our to implement deterministic spacing constraints, which are model's predictions, reaffirming the utility of our frame- based on an intuitive understanding of the effect of mem- work.