Probabilistic Models of Student Learning and Forgetting
Total Page:16
File Type:pdf, Size:1020Kb
Probabilistic Models of Student Learning and Forgetting by Robert Lindsey B.S., Rensselaer Polytechnic Institute, 2008 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2014 This thesis entitled: Probabilistic Models of Student Learning and Forgetting written by Robert Lindsey has been approved for the Department of Computer Science Michael Mozer Aaron Clauset Vanja Dukic Matt Jones Sriram Sankaranarayanan Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. IRB protocol #0110.9, 11-0596, 12-0661 iii Lindsey, Robert (Ph.D., Computer Science) Probabilistic Models of Student Learning and Forgetting Thesis directed by Prof. Michael Mozer This thesis uses statistical machine learning techniques to construct predictive models of human learning and to improve human learning by discovering optimal teaching methodologies. In Chapters 2 and 3, I present and evaluate models for predicting the changing memory strength of material being studied over time. The models combine a psychological theory of memory with Bayesian methods for inferring individual differences. In Chapter 4, I develop methods for delivering efficient, systematic, personalized review using the statistical models. Results are presented from three large semester-long experiments with middle school students which demonstrate how this \big data" approach to education yields substantial gains in the long-term retention of course material. In Chapter 5, I focus on optimizing various aspects of instruction for populations of students. This involves a novel experimental paradigm which combines Bayesian nonparametric modeling techniques and probabilistic generative models of student performance. In Chapters 6 and 7, I present supporting laboratory behavioral studies and theoretical analyses. These include an examination of the relationship between study format and the testing effect, and a parsimonious theoretical account of long-term recency effects. iv Acknowledgements I would like to thank Mike Mozer, Jeff Shroyer, and Cathie Knutson for their help. I am also indebted to Hal Pashler and Sean Kang for their advice, and to my parents for their sup- port. v Contents Chapter 1 Extended Summary 1 2 Modeling background 5 2.1 Knowledge states and forgetting . 5 2.2 Theory-based approaches . 10 2.2.1 Kording, Tenenbaum, & Shadmehr (2007) . 11 2.2.2 Multiscale context model . 15 2.2.3 ACT-R . 17 2.2.4 Discussion . 19 2.3 Data-driven approaches . 20 2.3.1 Item response theory . 21 2.3.2 Bayesian knowledge tracing . 23 2.3.3 Clustering and factorial models . 26 3 Modeling students' knowledge states 29 3.1 Preliminary investigation 1 . 29 3.1.1 Study Schedule Optimization . 31 3.1.2 Models to Evaluate . 32 3.1.3 Comparing Model Predictions . 35 3.1.4 Randomized Parameterizations . 36 vi 3.1.5 Discussion . 40 3.2 Preliminary investigation 2 . 42 3.2.1 Approaches to consider . 44 3.2.2 Results . 48 3.2.3 Discussion . 54 3.3 Individualized modeling of forgetting following one study session . 55 3.3.1 Models for predicting student performance . 57 3.3.2 Simulation results . 62 3.3.3 Discussion . 66 3.4 Individualized modeling of forgetting following multiple study sessions . 66 3.4.1 Other models that consider time . 70 3.4.2 Hierarchical distributional assumptions . 70 3.4.3 Gibbs-EM inference algorithm . 71 3.4.4 Simulation results . 73 4 Improving students' long-term knowledge retention through personalized review 78 4.1 Introduction . 78 4.2 Main Experiment . 79 4.2.1 Results . 82 4.2.2 Discussion . 85 4.2.3 Additional information . 89 4.3 Followup Experiment 1 . 101 4.3.1 Results and Discussion . 103 4.4 Followup Experiment 2 . 106 4.4.1 Results . 109 4.4.2 Additional information . 112 vii 5 Optimizing instruction for populations of students 117 5.1 Introduction . 117 5.2 Optimization of instructional policies . 119 5.2.1 Surrogate-based optimization using Gaussian process regression . 121 5.2.2 Generative model of student performance . 122 5.2.3 Active selection . 125 5.2.4 Experiment 1: Presentation rate optimization . 126 5.2.5 Experiment 2: Training sequence optimization . 128 5.2.6 Discussion . 132 5.3 Other human optimization tasks . 133 5.3.1 Experiment 3: Donation optimization . 133 5.3.2 Vision . 137 6 Effectiveness of different study formats 142 6.1 Experiment 1: Constant time per trial . 143 6.1.1 Participants . 143 6.1.2 Materials . 143 6.1.3 Procedure . 144 6.1.4 Results and discussion . 146 6.2 Experiment 2: Self-paced trials . 146 6.2.1 Subjects . 146 6.2.2 Procedure . 146 6.2.3 Results . 147 6.3 Experiment 3: Self-paced trials, long retention intervals . 148 6.3.1 Participants . 148 6.3.2 Procedure . 149 6.3.3 Results . 149 viii 6.4 Discussion . 149 7 Long term recency is nothing more than ordinary forgetting 151 7.1 Introduction . 152 7.2 Formalization of the decay hypothesis . 154 7.3 Empirical phenomena associated with LTR . 156 7.3.1 Absence of LTR in recognition tasks . 157 7.3.2 Effect of list length . 159 7.3.3 Ratio rule . 160 7.3.4 Systematic deviations from the ratio rule . 162 7.4 Conclusion . 163 8 Major Contributions 167 References 169 Bibliography 169 ix Tables Table 3.1 Distributional assumptions of the generative Bayesian response models. The hybrid both model shares the same distributional assumptions as the hybrid decay and hybrid scale models. 61 3.2 Experimental data used for simulations . 61 4.1 Presentation statistics of individual student-items over entire experiment . 82 4.2 Calendar of events throughout the Main Experiment. 92 4.3 Calendar of events throughout Followup Experiment 1. 102 4.4 Calendar of events throughout Followup Experiment 2. 113 x Figures Figure 2.1 (left) Histogram of proportion of items reported correctly on a cued recall task for a population of 60 students learning 32 Japanese-English vocabulary pairs (S. H. K. Kang, Lindsey, Mozer, & Pashler, 2014); (right) Histogram of proportion of subjects cor- rectly reporting an item on a cued recall task for a population of 120 Lithuanian- English vocabulary pairs being learned by roughly 80 students (Grimaldi, Pyc, & Rawson, 2010) . 7 2.2 Typical spacing experiments have one or more study sessions separated by interstudy intervals (ISIs) with a final test administered after a fixed retention interval. Student performance on the test is sensitive to the ISIs and RI. 8 2.3 (upper left) Illustration of a forgetting curve. Test performance for a population decreases as a power-law function of time (Wixted & Carpenter, 2007). (lower left) Illustration of a spacing curve. For a fixed RI, increased spacing initially improves test performance but then decreases it. The ISI corresponding to the maximum of the spacing curve is termed the optimal ISI. (right) The relationship between the RI and optimal ISI from a meta-analysis conducted by Cepeda, Pashler, Vul, Wixted, and Rohrer (2006). Each point represents a spacing effect study. The optimal ISI systematically increases with the RI. 9 xi 2.4 The KTS graphical model. An item's importance is assumed to vary over time as a set of independent random walks, each representing a different timescale. A rational learner must attribute an observed need, the noise-corrupted total importance, to the appropriate timescale. 12 2.5 (left) We performed a least-squares fit of KTS to the spacing curves from a longitu- dinal spacing effect study in which subjects underwent two study sessions spaced in time and then a later test (Cepeda, Vul, Rohrer, Wixted, & Pashler, 2008)..