MASARYKOVA UNIVERZITA F}w¡¢£¤¥¦§¨ AKULTA INFORMATIKY !"#$%&'()+,-./012345 Computer-assisted language learning using spaced repetition DIPLOMATHESIS Bc. Daniel Keder Brno, 2009 Declaration I declare that this diploma thesis is my original work and I have worked it out on my own. Any sources of information and literature used in creating this document are properly cited in the bibliography part of this document. Advisor: doc. Ing. Michal Brandejs, CSc. ii Acknowledgement I would like to thank my advisor, doc. Ing. Michal Brandejs, CSc. and the whole IS MU development team, especially Mgr. Jan Kasprzak and RNDr. Miroslava Misakov´ a.´ Without their support and advice this work would never have come into being. iii Keywords spaced repetition, spacing effect, language learning, Dril, Information System MU iv Abstract This thesis attends to the technique of learning called spaced repetition. Spaced repetition helps the student to memorize a great number of small and relatively independent pieces of knowledge, such as a foreign language vocabulary. It is utilizing the psychological spacing effect. v Contents 1 Introduction ...... 3 2 What is spaced repetition ...... 4 2.1 Spacing effect ...... 4 2.2 Learning process ...... 5 2.3 Forgetting curve ...... 6 2.4 Specifics of learning foreign languages ...... 7 3 Spaced repetition software ...... 8 3.1 SuperMemo ...... 8 3.2 Anki ...... 9 3.3 Mnemosyne ...... 10 3.4 FullRecall ...... 11 4 Dril features ...... 12 4.1 Dril and other e-learning applications in IS MU ...... 12 4.2 Features ...... 12 4.3 Algorithm ...... 13 4.4 Overview of used technologies ...... 15 4.5 Future ...... 17 5 Dril guide ...... 18 5.1 Subscribe to a coursebook ...... 18 5.2 Learning ...... 18 5.3 Editing cards ...... 20 5.4 Notifying the user he has to learn ...... 20 5.5 Erasing user data ...... 21 5.6 Dril Management ...... 22 5.7 Creating and editing coursebooks ...... 22 5.8 Importing new cards ...... 22 5.9 Format of the data file ...... 24 5.10 Format of the template file ...... 25 5.11 Viewing existing lessons and cards ...... 27 5.12 How to use Dril effectively ...... 27 6 Dril statistics ...... 28 6.1 Number of users ...... 28 6.2 Average intervals ...... 29 6.3 Interval, factor and level distributions ...... 30 1 6.4 Forgetting curve ...... 30 6.5 Amount of learning material ...... 31 7 Conclusion ...... 35 A Code samples ...... 36 2 Chapter 1 Introduction This diploma thesis attends to the technique of learning called spaced repetition. Spaced repetition helps the student to memorize a great number of small and relatively indepen- dent pieces of knowledge, such as the foreign language vocabulary with as small effort as possible. Spaced repetition is using the psychological spacing effect. The spacing effect is known since the 19th century, but it could be conveniently utilized only with the advent of informa- tion technology. The reason is that the “administrative” tasks of the learning process could be done by a computer, thus liberating the student of this burden. It also enabled using of more complex algorithms, because they are no longer computed manually. The work consists of five chapters. The first chapter gives a short overview of what the spaced repetition is and how does it work. The second chapter contains a description of existing spaced repetition software. The third and fourth chapter are both devoted to Dril — a new application in the MU Information System that helps its users in learning. The third chapter contains overview of Dril features and used technology, whereas the fourth chapter provides exhaustive description of Dril. In the last chapter there are statistics of the application usage and summarized achievements of its users. 3 Chapter 2 What is spaced repetition Spaced repetition is a learning technique based on reviewing of the knowledge in increasing time intervals. It is designed to help the student memorize large amount of small indepen- dent pieces of knowledge with the following goals in mind: • Maximize amount of remembered information. • Minimize learning time. One can argue that there is no need to memorize anything — after all we have Internet and everybody can look up needed information in a few seconds. That is true. But human brain is very good at making associations and if we want it to work effectively, we have to “load the data into memory” first. Spaced repetition is intended for long-term learning as the results can appear after sev- eral weeks or even months of work. On the other hand, learning of such a great amount of material in a traditional way would be considerably more demanding and would take even more time and energy. However, spaced repetition is not suitable for learning complex material that can not be effectively broken down to smaller pieces. For example, the student could memorize the steps of a complex mathematical proof, but the “big picture” would still evade him and he will soon forget what he learned in the flood of other things. 2.1 Spacing effect Spaced repetition exploits the psychological spacing effect. What is the spacing effect and how the human memory works is described in the following citation: According to Wozniak, when we learn something for the first time we experience a slight increase in the stability and retrievability in synapses involved in cod- ing the particular stimulus-response association. In time, retrievability declines rapidly; the phenomenon equivalent to forgetting. At the same time, the stabil- ity of memory remains at the approximately same level. However, if we repeat the association before retrievability drops to zero, retrievability regains its initial value, while stability increases to a new level, substantially higher than at pri- mary learning. Before the next repetition takes place, due to increased stability, 4 2. WHAT IS SPACED REPETITION retrievability decreases at a slower pace, and the inter-repetition interval might be much longer before forgetting takes place. Two other important properties of memory should also be noted: (1) repetitions have no power to increase the stability at times when retrievability is high (spacing effect), (2) upon forgetting, stability declines rapidly. [3] In other words, too much repetitions in too short time intervals have no positive effect on better memorization. On the other hand, if the repetitions are spread too much, retrievability drops too low and the information is eventually forgotten. 2.2 Learning process There were implemented several systems utilizing the spaced repetition, but the general outline of the learning process is the same in all of them. The material that the student wants to learn is first divided into small pieces usually called items, flashcards or just cards. Each card contains a question and an answer. In one session the student is usually given cards that are scheduled for review and sev- eral new cards. The learning of one card usually looks like this: 1. The student is shown a question. 2. He has to recollect the correct answer. 3. Next, he has to evaluate whether he did or didn’t know the answer. 4. Finally, he provides a feedback of how well he knew the answer. The system then decides when the user will get the card next time. If the student knew the answer well, the card will be reviewed later than in case he didn’t know the answer. The length of intervals can span from a few days to months or even years. The computation of the review interval is the most crucial part of the algorithm. The review interval is usually calculated from the feedback the student provides and from the previously computed interval. It can involve other things too, such as the number of trials user needed to correctly answer the question or a difficulty rating of the card. Ideally, the card should be scheduled for review when the probability of recollecting the information is 90% (let’s call it retention index). This is more difficult to achieve than it sounds — the moment when the student is about to forget what he learned depends on many factors, while many of them can not be influenced or predicted. The factors include fatigue, stress, amount and type of material already learned etc. Choosing the retention index too high or too low (over 90% or below 50%) is counter- productive. If it is chosen too high, review intervals will be too short and the learning would be frustrating because of slow progress. Choosing too low retention index is useless as well because the amount of remembered information would be too low. 5 2. WHAT IS SPACED REPETITION 2.3 Forgetting curve Experiments conducted by Hermann Ebbinghaus [1] showed that the process of forgetting is exponential. The forgetting curve can be described with the following formula: R = e−a.t (2.1) Where R is the probability of remembering information, t is time and a is the relative strength of memory. Figure 2.1 is visualizing this formula. Curves show the probability of remembering in- formation after a certain time. The red curve depicts forgetting of a newly acquired informa- tion. It does not matter how many repetitions were needed in order to learn this information, only the fact that it has been learned is important. As the time progresses, the probability of remembering the information is decreasing rapidly — after five days the probability of remembering is only about 50%! But if the information is reviewed in the following days before it is forgotten (i.e. when the probability of recollecting is 90%), the memory is revived and the time interval till the next review can be increased (the subsequent forgetting curves are less steep). New information learned 1st review 2nd review 100% 90% Retention 0% 5 10 15 Time remembered (days) Figure 2.1: Illustration of a forgetting curve 6 2. WHAT IS SPACED REPETITION 2.4 Specifics of learning foreign languages Spaced repetition can be used for learning of virtually any kind of information, but most people use it for learning of foreign languages. However, it has also some drawbacks. In order for the student to learn a language, he has to master many skills. Memorization of vocabulary is only a small part of them. Spaced repetition can hardly (if at all) help the student with writing and listening skills or with correct pronunciation. Therefore students should not rely on spaced repetition as an exclusive way of learning languages. Teachers should provide students with extra material that puts the learned information into context, such as example sentences or explanatory notes when the reading of a card might not be clear. 7 Chapter 3 Spaced repetition software This chapter contains an overview of already existing spaced repetition programs. Most of them are available for multiple operating systems, some of them even provide online versions. Many are based on or at least inspired by SuperMemo. Please note that all the screenshots in this chapter are taken from their respective official websites. 3.1 SuperMemo Website: http://supermemo.com/ SuperMemo is a computer program designed by Dr. Piotr Wozniak. He has been devel- oping it for over 20 years, so an enormous amount of work has been put into it. It is available for Windows, Palm OS and Pocket PC operating systems. There is also an online version of SuperMemo at http://supermemo.net that is free to use, but most of the learning material has to be purchased. However there is some free learning material available. Figure 3.1: Screenshot of SuperMemo Piotr Wozniak developed the first version of algorithm (called SM-0) in 1985. It did not require a computer, it was a pen-and-paper method. He has been continuously improving 8 3. SPACED REPETITION SOFTWARE the algorithm, the most recent version available today is called SM-11. The descriptions of all SM algorithms are publicly available at the SuperMemo website [4]. The most recent version of SuperMemo was released in 2006 and it uses the algorithm SM-11. 3.2 Anki Website: http://ichi2.net/anki/ Anki is open-source spaced repetition program made by Damien Elmes. It uses a modi- fied SuperMemo SM-2 algorithm with changes allowing priorities on cards. It is available for Linux, Mac OS X and Windows. Anki doesn’t have a separate online version, however it supports synchronization with a free online server. Users can upload their learning material to the server and keep synchronized copies on multiple computers. Figure 3.2: Screenshot of Anki Anki has a unique way of storing data. It keeps a database for facts and another database for cards. The first one holds facts consisting of several fields, such as “Spelling of a word in foreign language”, “Pronunciation” and “Spelling of a word in native language”. The second database specifies, how the cards will be created from the facts. It is usually used to create cards that ask a translation of a word in both direction or cards that ask pronunciation of a certain word. Because the facts are all stored in one place, it is easy to make changes that affect a whole group of cards. Anki uses four levels of feedback labeled “Again”, “Hard”, “Good” and “Easy”. Users can see in advance, what will be the next interval if they would choose a particular button. 9 3. SPACED REPETITION SOFTWARE 3.3 Mnemosyne Website: http://fullrecall.com/ Mnemosyne is open-source program as well. Because it is written in Python, it is avail- able for Windows, Linux or Mac OS X. It is based on SuperMemo SM-2 algorithm too. There are some modifications regarding handling of early and late repetitions and slight randomization of computed intervals. It also allows to organise cards into categories. The categories can be activated and deactivated at will so users are able to choose what they learn. Figure 3.3: Screenshot of Mnemosyne Users can upload their learning statistics to the project’s server and participate in a long- term memory research project. Authors of Mnemosyne claim they will use acquired statisti- cal data to get insight to how the human memory works and improve the algorithm. 10 3. SPACED REPETITION SOFTWARE 3.4 FullRecall Website: http://fullrecall.com/ FullRecall is a proprietary program available for Linux, Mac OS X, FreeBSD, Windows, Pocket PC and Maemo (Nokia Internet Tablets). There seems to be an online version of FullRecall, but at the time of writing this work it was not functional. Figure 3.4: Screenshot of FullRecall FullRecall uses unique algorithm for scheduling cards that is based on artificial neural networks. 11 Chapter 4 Dril features The practical part of this thesis is Dril — a new application in the MU Information Sys- tem that enables its users to learn using spaced repetition. Dril is available in the Personal Administration of MU Information System under the link Dril in the menu to the left, or directly at https://is.muni.cz/auth/dril/ . 4.1 Dril and other e-learning applications in IS MU The MU Information System already contains many applications that are related to e-learning, but most of them are “teacher-driven”. That means the teacher controls what material stu- dents get and ensures that students use it properly (or at all). One example of this approach is the Revision, Opinion Poll and Testing Agenda. The teacher is the main motivator here: he creates a test that students have to (or should) pass in order to pass the course. It assumes that the student is more or less a passive participant of the learning process and has to be motivated by the teacher to do something. Dril is more “student-driven” than the Testing agenda. It presumes that the student wants to learn something, understands the costs in time and effort and is willing to un- dertake them. In order to reap the fruits of learning, one has to learn for several weeks or months. This obviously goes beyond “one term — one course” paradigm that most students are used to. 4.2 Features Dril has many features that are unique or unusual and alternative software presented in Chapter 2 is missing them. They include: • User created lecture books — Users can create their own lecture books and can share them with others. • Card customization — Users can create their own version of cards that contain e.g. example sentences. • Support for related cards — Cards that contain the same information are grouped and presented to the user on different days. The groups can contain arbitrary number of cards. 12 4. DRIL FEATURES • Possibility to learn in advance — User can choose to learn cards scheduled for the next day or cards that are about to be scheduled in coming days. • Handling of late reviews — If the user doesn’t answer card on time, Dril will still behave correctly. • Reminder — IS will notify the user that he or she has outstanding cards to review. • Banning of cards — Users can ban cards they do not want to learn. • Card contents — Contents of cards are not limited to simple plaintext. Cards can be composed of: – HTML — including text formatting, images, sounds and videos. – LATEX code — cards can contain LATEX formatted text. – Randomly permuted list of items, e.g. enumeration of possible answers. 4.3 Algorithm The learning process consists of three phases: • Review phase — reviewing cards that are after deadline. • New cards phase — the user is given a set of new cards. • Drilling phase — only cards that the user didn’t know well in the previous two phases are given. The cards are being given repeatedly until the user answers them correctly. Apart from that, if there are any cards scheduled for drilling phase older than 10 minutes, they are presented immediately regardless of what phase the user is currently in. This is good especially in case there are many cards in the current phase and the user would forget the card he didn’t know by the time he gets to the drilling phase. Selecting cards for review Basically, the procedure that selects cards for review is quite simple. If the card’s previous review date plus its interval is more recent than the present time, it will be included in the selection. Also only the cards that are not marked as banned by the user are selected and there must not exist a related card that has been answered in the last 12 hours. 13 4. DRIL FEATURES Calculating the time of next review When the user sends feedback, the application has to schedule the card for next review. But there is a problem: How to calculate the length of review interval? The next review should be scheduled when there is about 90% chance of remembering the card or, in other words, the user can remember at least 90% of cards learned. Dril is based on SM-2 algorithm [5]. The procedure for calculating the new interval takes into consideration these parameters: • user feedback, it tells how well was the answer recollected, • number of wrong answers before the user knew the correct answer, • card factor (represents the card difficulty, it is associated with both the card and user), • previous interval of the card, • the real interval of the review. The main and most influential parameter is the feedback value. It is without doubt that if the user knew the correct answer, the application should calculate longer interval than in case he didn’t know the answer well. The real interval of the review is used to determine whether the card was reviewed pre- maturely, on time or too late. If the card was reviewed prematurely and the user didn’t know the answer, we can assume that the previous interval was too long and it is shortened (be- cause if the user didn’t know the answer now, he certainly wouldn’t know it later). If he knew the answer, it still wouldn’t tell us whether he would know it at the time the review was scheduled. Therefore the calculated interval will be the same as before. If the card was reviewed too late after deadline, the situation would be similar. In case the card was reviewed on time, we can tell whether the previous interval was too long or too short. So the procedure can recalculate the interval according to the feedback provided. The card factor is another variable used in calculation of the new interval. It represents the difficulty of the card, or rather “easiness”, because the lesser the value is, the harder the card is. The new cards of course do not have any interval or factor values that can be used in calculation of new ones. Therefore these cards receive a pre-defined values. Related cards Related cards are cards that contain the same information but, generally speaking, present it from different angles. For example, cards that ask the user to translate a word into for- eign language and back (bi-directional translation) are related cards. Groups can contain arbitrary number of cards even from different lessons. 14 4. DRIL FEATURES There is a reason for special handling of related cards. The algorithm that selects cards for review ensures that the student will see at most one card from the same group of related cards in one day. If it were otherwise, learning would be less effective because of the spacing effect: The student would have the information already in his short-term memory. 4.4 Overview of used technologies The fact that IS is a web portal narrowed the selection of technologies that could be used in Dril. Dril uses an Oracle database for storing data, Perl for server-side application logic and HTML with CSS styles for presenting user interface. However, the simple web application using only HTML would not suffice — Dril has to provide seamless user interface and fast response times, and if possible, without page reloading after every action. The reason is obvious: the users wouldn’t want to wait for the page to load after every answered card, especially if they are using mobile devices with low bandwidth.. This is the place where AJAX1 comes in. It allows creating dynamic web applications that immediately react to users’ actions and can request additional information off the server, without reloading the page every time. AJAX Most modern web browsers provide an interface for making asynchronous HTTP requests in their DOM2 API, called XMLHttpRequest object. Specification of this object and its prop- erties were published as a Working draft by the World Wide Web Consortium on 15 April 2008 [6]. The figure 4.1 shows the usage of the XMLHttpRequest object. First we have to create an instance of XMLHttpRequest object and then assign the onreadychange event han- dler. This handler is called after every change in the request state which is reflected by the readyState property. Comments in the code example explain the states of the request. After the request completed, the response is available in the responseText property. If the response is a XML document, the responseXml property contains a proper Document object containing parsed XML. It is usually a good idea to check the status property con- taining the HTTP status code. Dril doesn’t use XML in communication between client and server. It uses JSON [2] in- stead. JSON (JavaScript Object Notation) is a simple plain-text data format that can be used to encode arbitrary data structures. Its syntax is compatible with that of Javascript, so pars- ing is simple (a Javascript eval suffices). There are no special methods needed for working with parsed data, the resulting value is a classic Javascript object. JSON can express arbitrary data structures, including lists, associative lists, simple data types (number, string, boolean) or any combination of thereof. Associative list is expressed 1. Asynchronous Javascript and XML 2. Document Object Model 15 4. DRIL FEATURES http_req = new XMLHttpRequest(); http_req.onreadystatechange = function() { if (http_req.readyState == 0) { // The request is not initialized } else if (http_req.readyState == 1) { // The request has been set up } else if (http_req.readyState == 2) { // The request has been sent } else if (http_req.readyState == 3) { // The request is in progress } else if (http_req.readyState == 4) { // The request is complete if (http_req.status == 200) { // Handle response alert (http_req.responseText); } } }; http_req.open("GET", url, true); http_req.send(); Figure 4.1: Working with the XMLHttpRequest object using braces with key-value pairs delimited with a colon, while list is denoted as a comma separated list of values enclosed in square brackets. Complete syntax reference is available at [2]. The figure 4.2 contains an example of a simple data structure encoded with JSON along with a demonstration of how to parse it and work with it. var json_text = ’{ "NUMERIC_VALUE" : 1234, "STRING_VALUE" : "Winston Smith", "LIST" : [ 3.14, "Hello world" ] }’; var json_obj = eval("(" + json_text + ")"); var numeric_value = json_obj.NUMERIC_VALUE; Figure 4.2: Working with a JSON encoded data structure Database Dril is using the Oracle database for storing data. The figure 4.3 shows the database schema used in Dril. 16 4. DRIL FEATURES Figure 4.3: Schema of database tables Cards are organized into a three tier hierarchy. Every card belongs to a lesson, while each lesson is part of a lecture book. Lecture books are assigned to an area. Examples of the areas can be languages — English, Spanish, German etc. This kind of hierarchy is necces- sary, because otherwise users would soon lost track in a flood of coursebooks and lessons. Aditionally, cards are grouped to form groups of related cards. 4.5 Future There are many planned improvements to the application, for example: • Statistics — More statistics directly in the application. For example, users will be able to see their own forgetting curve. • User collaboration — For example users would be able to see each others’ changes to the cards and cherry-pick changes they like. • Algorithm improvement — Further improvement of the algorithm that calculates op- timal review intervals. • Simpler card management — Revise card management and make it easier to use. 17 Chapter 5 Dril guide This chapter describes features of Dril in more detail and documents most of the operations currently present in Dril. 5.1 Subscribe to a coursebook If the user wants to start using Dril, he has to subscribe to a coursebook first. This is accom- plished by clicking on the link Aktivovat ucebniceˇ (Subscribe coursebook) on the title page and choosing the area and coursebook of interest. Then the user chooses which lessons does he want to subscribe. There are three tabs on this page, labeled Aktivace lekc´ı, Reaktivace lekc´ı and Deaktivace lekc´ı (Subscribe lessons, Resubscribe lessons and Unsubscribe lessons, respectively). On the Resubscribe tab the user can resubscribe lessons whose contents had changed. The recognized changes are card addition and deletion. The number of added and deleted cards is shown in the table next to the lesson name. The Unsubscribe tab lists subscribed lessons, the user can unsubscribe them if he doesn’t want to learn them anymore. When unsubscribing, user’s data concerning the unsubscribed lesson is not erased from the database. It is merely flagged as inactive, so if the user sub- scribes the same lesson again, his data will be restored. The list of coursebooks that contain subscribed lessons is shown on the title page in the section Aktivovane´ ucebniceˇ (Subscribed coursebooks) along with additional details such as user’s progress in the coursebook, average card interval or count of people that are learning from the same coursebook. 5.2 Learning The most important part of Dril is the learning application. It is accessible from the title page in the section Ucitˇ se (Start learning). Clicking on the area title will take the user directly to the learning application, whereas clicking on the link detaily (details) will take him to the page where he can choose what to learn in more detail. The learning application UI is divided into two parts: to the left are boxes for the question and the answer and a row of feedback buttons, to the right is a panel with boxes containing various information about learning phases, current and previous card and available actions. 18 5. DRILGUIDE Figure 5.1: Subscribe lessons page The left part contains initially only the question and the Show answer button. After the user recalls the answer in his mind, he can click on the button and the correct answer is re- vealed. Next he has to evaluate his answer and click on one of the feedback buttons labeled 1 to 6. The meaning of the buttons is explained in the table below the buttons. The applica- tion then processes the feedback and displays next question. If there are no more cards in the current phase, the user is taken to the What to learn page. More experienced users can use keyboard to control the application: keys 1-6 or the top- row keys QWERT[YZ] select the respective feedback button, Space or Enter confirm the selection. The keys 1 and 2 work even while the answer is hidden so easy cards wouldn’t waste user’s time. The topmost box in the right part of the screen contains information about the current learning phase and a number of remaining cards. There are three learning phases in Dril: Review, New cards and Drilling phase. The first two are quite self-explanatory: in the Re- view phase Dril presents outstanding cards, in the New cards phase there are given cards the user had not seen yet. The Drilling phase is initially empty, but it eventually contains failed cards from the previous two phases. The cards in the Drilling phase are repeatedly 19 5. DRILGUIDE Figure 5.2: Screenshot of a “What to learn” page presented to the user in random order until he answers them correctly. Additionally, if there are cards in the Drilling phase older than 10 minutes, they are presented to the user imme- diately regardless of the phase he is currently in. One of the next boxes in the right panel contains available operations. Currently users can edit the card or ban it if they don’t want to see it anymore. 5.3 Editing cards All users can edit the card, if they think it is unclear or incorrect. All the changes users make are private — they are visible only to the user who made the change. The author of the coursebook can decide to edit the card globally. In this case the change will be visible to everybody. Links to edit the card are available either while learning in the right-hand box or in the Dril management. 5.4 Notifying the user he has to learn Because users should attend to Dril every day, there is a need to notify them they have scheduled cards for review. The notification should be bold enough so users will notice it, but at the same time it should not be annoying. 20 5. DRILGUIDE Figure 5.3: Screenshot of learning in progress The notification has been implemented as a highlighted link to the application in the left hand menu (see figure 5.4). 5.5 Erasing user data Users can erase all their data and statistics tied to a particular area if they decide so. The option is available on the title page, under the link Smazat me´ udaje´ o oblasti (Erase my data). 21 5. DRILGUIDE Figure 5.4: Screenshot of active and inactive notification 5.6 Dril Management The title page of Dril Management contains a list of existing coursebooks which can be fil- tered by area and coursebook author. Users can see only their own or public coursebooks. This page provides links to other parts of application, such as creating and editing course- books, importing cards to lessons etc. 5.7 Creating and editing coursebooks All users can create their own coursebooks and share them with other users. The coursebook creation form is displayed after clicking on the link Vytvoritˇ ucebniciˇ (Create coursebook) on the Dril Management title page. The required fields are only the first two — the name of the coursebook and the area it will belong to. The rest is optional, however often useful. The field Popis ucebniceˇ (Description) should contain a short description of a course- book. This description is shown in various places across the application. In the following field authors can also specify HTML code that will be displayed in the learning application when the user encounters the card from this coursebook. The last two fields contain Javascript code that will be evaluated before the question or the answer is displayed when learning — that is useful in implementing various “ex- tra” features, such as including a multimedia files in question or answer. There are already coursebooks that use these fields to provide automatic pronunciation of foreign words. The form for editing a coursebook contains also a checkbox for publishing the course- book. The coursebook is private by default, if the user wants others to see and learn from it, he has to publish it first. 5.8 Importing new cards One of the most complex parts of Dril is the card import application. It is used for creating lessons and importing new cards to coursebooks. The import application uses two types of input: data files that contain data from which the cards are constructed and template files that specify how will be the cards created. One can compare it to the calling of a function in imperative programming languages — the 22 5. DRILGUIDE Figure 5.5: Dril can play audio files as well template is the function and data are the parameters passed to the function. The application had to be as easy to use as possible so it could be used by ordinary users, but at the same time, it had to take into account power users who decide to use some or all of the more advanced features. This is achieved by making the template optional. Every user has to write data files, but only advanced users have to write template files. Looking at generated cards or dry-run The dry-run option is an extremely useful feature because it allows users to see the cards that will be imported to the database in advance, without changing anything in the database. Users can experiment with templates or data files without being afraid that they will break something. This feature is available as a checkbox in the card import form. Overwriting existing cards If the user wants re-import cards because he changed the template, he must check the re- spective checkbox in the card import form. However the overwrite doesn’t work well when the data file has been changed as well. In this case there will be created redundant cards with similar values and the user is responsible for deleting obsolete ones. Error handling While importing cards, there are many situations where something can go wrong, especially concerning the user input. Writing template and data file can be error-prone even for an experienced user and the application should not dissuade him with bad error handling. If an error occurs, it is shown as a message containing as much information as possible, so the user is able to fix the error easily. Dril makes these checks while processing the input files: • Template file validation — the application checks whether the template file contains 23 5. DRILGUIDE valid data (using a RelaxNG schema). • Data file validation — checks whether the records in the data file reference existing templates, whether they contain correct number of parameters etc. • Sanity of generated cards — whether the cards are not being imported into existing lessons when the user requested that a new lesson should be created, whether ques- tion/answer is not an empty string etc. • Conflicts — ensures that the generated cards are not already in the database. • Duplicates — checks if the generated cards contain duplicates. 5.9 Format of the data file Simple data file In the simplest case the import application requires only the data file. The data file in this case looks like this: apple apple jablko Question: jablko apple
Question:
orange
Answer:pomeranc
ˇQuestion:
pomeranc
ˇ Answer:orange
Additionaly, the cards in a pair are marked as related. More on this subject will be given later, but in a shortcut: when the user is learning related cards, he gets only one of them in one learning session.Data file for use with template
In case the user wants to use the template, the format of data file is a bit different. The first value on the line has to be the name of the template, then follow arguments of the template.
24 5. DRILGUIDE
The previous example rewritten to use the template named template1 would look like this: template1
Line wrapping
Lines of the data file can be wrapped using the backslash \ character. This is useful in case the user wants to preserve text formatting (for use in code snippets, for example) or he just doesn’t like too long lines for the sake of clearness. For example: template1
Note that if the backslash character is not the last character on the line, you will get unexpected results.
5.10 Format of the template file
If the user wants to use more advanced features of Dril, such as custom HTML formatting of questions and answers, he has to write a template. Template is a XML file that specifies how will be the cards created.
Element
The root element of the template is
Element
This element specifies, what will be done with one line of the data file. Element has two required attributes: name Name of the template. params Number of parameters the function takes.
Element
The elements contain one or more elements
25 5. DRILGUIDE
lesson Name of the lesson the card will be imported to. There can be cards that are related but don’t belong to the same lesson. uniq This value is used in detection of similar cards. Basically it should have the same value as the question of the card, but without HTML format- ting and constant strings. Users usually don’t need to specify this at- tribute, there is a reasonable default value created automatically.
Elements
The element Cards containing scientific notation Some coursebook authors might want to create cards containing scientific notation (such as mathematical formulas). Scientific notation can be included using the special element Cards containing the randomly permuted list of items Cards can contain a randomly permuted list of items, for example a list of possible answers. It is included using the element Including sounds and videos Authors of coursebooks can include sounds or videos in the cards. The exact way of how it will be accomplished depends on the particular coursebook authors, but generally the play- ers will use be implemented using HTML elements 26 5. DRILGUIDE special HTML element identified by id attribute that serves as a placeholder for the player. This element will be substituted by the player just before the respective question or answer is shown. It also enables to separate the method of reproduction from the URL of the sound file. Moreover, if the player is hardcoded to the cards and user lists these cards on the same page, sounds from all the cards would start playing at the same moment, which is unacceptable behaviour. An example describing how to implement a sound player with 5.11 Viewing existing lessons and cards The existing lessons and cards can be viewed in Dril Management using the link Zobrazit lekce (Show lessons). Users can view or edit the cards here, but the changes are visible only to them. Authors of the coursebooks have more options — they can also delete, undelete or edit cards. 5.12 How to use Dril effectively If the student wants to maximize the effectiveness of learning, he should use Dril on a daily basis. If he attends to it once a month, he is just wasting time. A typical daily routine should look like this: 1. Pass the Review phase if there are any cards scheduled for review. If the user has too much cards in this phase because he didn’t learned for a couple of days, he can leave some to the next day. In this case he shouldn’t pass the New cards phase. 2. Pass the New cards phase at least once (user can choose how many new cards does he want). Optimal number of new cards is 20-40 per day. 3. Finish by passing the Drilling phase. This phase should be passed only once per day, if possible. However, it should not dissuade the user to pass the New cards phase again even after he has pased the Drilling phase if he has time later on that day. 27 Chapter 6 Dril statistics Because the learning progress of each user is logged, various interesting statistics can be created. All the statistics and graphs are current to the date of compilation of this work. 6.1 Number of users At the time of writing of this work Dril had a total of 4561 users, while 3869 users have answered at least one card and 1942 users have answered at least 100 cards. The graph on figure 6.1 shows the number of answered cards per month since Dril had been launched. Because Dril was launched on Nov 9th 2008, the count in this month is substantially lower than in the rest of the months. The count in April 2009 is almost two times higher then the counts in previous months because the link to Dril has been added to the title page of IS MU and supposedly many people noticed the application’s existence. 1000000 800000 600000 400000 200000 Number of cards answered 0 1. 2009 2. 2009 3. 2009 4. 2009 5. 2009 11. 2008 12. 2008 Figure 6.1: Number of cards answered monthly since Dril was launched Figure 6.2 shows number of users by areas and faculties (including MU institutes). As you can see, more than 50% of all users are learning English language and most users come 28 6. DRIL STATISTICS from the Faculty of Informatics and Faculty of Arts. 3500 3000 Users who have answered at least one card Total users 2500 2000 1500 1000 Number of users 500 0 £ ¢ ¢ ¢ ¢ ¢¢ ina ¢ ¢ tina tina Jin ¢tina tina tina tina tina tina tina tina ¡ ¡m Latina l Ru Fin N Nor Angli ¤pan Japon Bulhar Hebrej Rumun Francouz 1000 Users who have answered at least one card 800 Total users 600 400 Number of users 200 0 MU ¡ RMU F MU s MU FI MU FF MU LF MU ¢ PrF MU VT MU ESF MU P PdF MU CJV MU IBA MU FSS MU CUS MU CZS MU SKM MU FSpS MU SUKB MU Teiresi Figure 6.2: Number of users by areas, faculties and institutes 6.2 Average intervals Average interval tells us indirectly if the users learn regularly and if the learning is effective. The longer average interval is sign of a better average retention among users. Figure 6.3 shows average interval values by areas and by faculties and institutes. 29 6. DRIL STATISTICS 6.3 Interval, factor and level distributions Three graphs on figure 6.4 show distributions of interval, factor and level (Level of a card is the number of times the user has seen the card). The interval distribution graph contain peaks at the values 2.5, 4 and 5. These are the most common intervals computed for the new cards. The gradient of values around the peaks is caused by fuzziness function applied to the computed interval values. The function ensures that the computed values are varied slightly. Similar peaks and gradients are found in the factor distribution graph on values 1.3, 2, 3 and 3.2. The level distribution graph shows (unsurprisingly) that most cards have level 1, then follow cards with level 2 and so on. Most of the cards will never reach levels higher than 10, because the review intervals are already too large (several months or years). 6.4 Forgetting curve Chapter 1 presented the idea of a forgetting curve. In this section I will try to construct the forgetting curve from the data acquired by Dril. The constructed forgetting curve should reveal the quality of the algorithm computing optimum review intervals. Because Dril is logging each received feedback into the database, the log already contains thousands of feedbacks. The logs (concerning one person) are in format: Card ID Date and time Feedback 1 20.11.2008 20:11:35 6 1 20.11.2008 20:14:35 4 1 20.11.2008 20:14:58 1 2 22.11.2008 10:34:25 3 2 22.11.2008 10:36:47 2 For the sake of constructing the forgetting curve the following format is sufficient. The log records can be easily converted to it. Card ID Interval since the last review (days) State 1 2 + 2 3 + 3 6 - 4 8 + 5 3 - 6 7 + The converted log records are saying whether the student did or didn’t know certain card after a certain time since the last review (or since the initial learning). In order to plot a forgetting curve, it is neccessary to pick a set of cards that share a common property, such as number of repetitions needed to initially learn the card. 30 6. DRIL STATISTICS As stated before, the forgetting curve can be described using the formula: R = e−a.t (6.1) So the problem of plotting the forgetting curve is reduced to finding a suitable a that will correspond to the recorded data. What is one recorded feedback saying about the student’s knowledge about the card? If the student’s feedback is 1 or 2 (he did know the answer), it can be assumed that he knew the answer before as well, all the way to the previous review. But it doesn’t say whether he knew it afterwards — he might remember it for ages or he could forget it in a matter of minutes. If the feedback is more than 2 (the student didn’t know the answer), the situation is opposite. This seems like a good algorithm that will plot a correct forgetting curve. But it is false. It uses only a part of available information, it is ignoring the parts where the level of student’s knowledge cannot be evaluated clearly. Ignoring this information is rapidly deforming the curve. To calculate the curve properly, more advanced means have to be involved. Figure 6.5 contains the listing of a script in R programming language that is computing the desired value of a. The script is reducing the problem of finding a to optimization problem, i.e. maximizing the function tomax. The function tomax tries to fit the recorded data to the approximated forgetting curve. By maximizing the function result best fitting value is com- puted. Finally, the forgetting curve can be plotted. Figure 6.6 contains three forgetting curves. The chosen set of cards satisfies these conditions: • Initially 3 repetitions were needed to learn the card. • The first review feedback was 3. • The second review feedback was 1 or 2. 6.5 Amount of learning material Most of the coursebooks existing in Dril are created by students themselves. Totally there are 229 coursebooks, including 56 public coursebooks. All the coursebooks contain totally 724 lessons. There are 81928 cards grouped in 47137 groups. The latter is also the number of unique pieces of information that the people can learn using Dril. 31 6. DRIL STATISTICS 45 40 35 30 25 20 15 Average interval 10 5 0 ¤ ¡¡¡ ¡ ¡ ¡ ¡ ina ¡ tina tina ¡tina Jin tina tina tina tina tina tina tina £ l Latina £m Fin Ru Nor N Angli ¢pan Japon Bulhar Hebrej Rumun Francouz 120 100 80 60 40 Average interval 20 0 MU ¢RMU F MU s MU FI MU FF MU LF MU¡ PrF MU VT MU P IBA MU PdF MU CJV MU ESF MU CUS MU FSS MU SKM MU CZS MU FSpS MU Teiresi Figure 6.3: Average interval values by areas, MU faculties and institutes 32 6. DRIL STATISTICS 25000 20000 15000 Count 10000 5000 0 2 3 4 5 6 7 8 9 10 Intervals 60000 50000 40000 30000 Count 20000 10000 0 1 2 3 4 5 6 Factors 350000 300000 250000 200000 150000 Count 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 34 37 38 Levels 33 Figure 6.4: Distribution of intervals, factors and levels 6. DRIL STATISTICS tomax <- function(a, data) { res <- function(xy) { if (as.character(xy[2]) == "+"){ punif(exp(-a*as.numeric(xy[1])), 0, 1) } else{ 1 - punif(exp(-a*as.numeric(xy[1])), 0, 1) } } resf <- apply(data, 1, res) resf <- sum(log(resf)) return(as.numeric(resf)) } optimexp <- function(file, bounds=c(0, 10), tolerance=0.001) { data <- read.csv(file, header=FALSE, quote="’") res <- optimize(tomax, interval=bounds, tol=tolerance, maximum=1, data) return(res) } Figure 6.5: Script for computation of a 1.0 0.8 0.6 0.4 0.2 % of cards remembered 0.0 0.00 3.00 6.00 9.00 12.00 15.00 18.00 21.00 24.00 27.00 30.00 Figure 6.6: Forgetting curves 34 Chapter 7 Conclusion Although spaced repetition is known for almost 30 years, there are not many schools or universities using it. This work has shown, what the spaced repetition is and that it can be integrated into the university environment. The practical part of this work is Dril, an application in the university Information sys- tem. Several months of operation showed that Dril is quite successful. There are many users who use it on a regular basis and many users who create their own coursebooks and share them with others. Dril is still being improved so users can expect new features in the future. Further research efforts could be directed towards analysis of acquired feedback data. Among other things, results of such analysis could be used to improve the card scheduling algorithm or to better comprehend, how the human memory works. 35 Appendix A Code samples $1 $2 Figure A.1: Example of a template file 36 A.CODESAMPLES Question: Translate into Czech: apple jablko Question: Translate into English: jablko apple Question: Translate into Czech: orange pomeranc Question: Translate into English: pomeranc orange Figure A.2: Example of cards generated using a template Data file: What is the formula for mass-energy relationship? Figure A.3: Example of a data file and template defining a LATEX formatted card 37 A.CODESAMPLES Data file: How do you list contents of directory in UNIX? Figure A.4: Example of a data file and template defining a card with a randomly permuted list Question or answer has to contain: Figure A.5: How to implement a sound player in the card 38 Bibliography [1] Hermann Ebbinghaus. Memory: A Contribution to Experimental Psychology. Teachers College, Columbia University, 1885. [2] Json website. Available from: http://www.json.org/. [3] J. Kowalski. Forget about forgetting. 1994. Available from: http://supermemo.com/ articles/kowal.htm#Retrievability. [4] SuperMemo website. Available from: http://supermemo.com/english/ contents.htm#Articles. [5] SuperMemo website. Available from: http://supermemo.com/english/ol/sm2. htm. [6] The XMLHttpRequest Object, W3C Working Draft 15 April 2008. Available from: http: //www.w3.org/TR/XMLHttpRequest/. 39