MASARYKOVA UNIVERZITA F}w¡¢£¤¥¦§¨  AKULTA INFORMATIKY !"#$%&'()+,-./012345

Computer-assisted language using

DIPLOMATHESIS

Bc. Daniel Keder

Brno, 2009 Declaration

I declare that this diploma thesis is my original work and I have worked it out on my own. Any sources of information and literature used in creating this document are properly cited in the bibliography part of this document.

Advisor: doc. Ing. Michal Brandejs, CSc.

ii Acknowledgement

I would like to thank my advisor, doc. Ing. Michal Brandejs, CSc. and the whole IS MU development team, especially Mgr. Jan Kasprzak and RNDr. Miroslava Misakov´ a.´ Without their support and advice this work would never have come into being.

iii Keywords spaced repetition, , language learning, Dril, Information System MU

iv Abstract

This thesis attends to the technique of learning called spaced repetition. Spaced repetition helps the student to memorize a great number of small and relatively independent pieces of knowledge, such as a foreign language vocabulary. It is utilizing the psychological spacing effect.

v Contents

1 Introduction ...... 3 2 What is spaced repetition ...... 4 2.1 Spacing effect ...... 4 2.2 Learning process ...... 5 2.3 ...... 6 2.4 Specifics of learning foreign languages ...... 7 3 Spaced repetition ...... 8 3.1 SuperMemo ...... 8 3.2 ...... 9 3.3 ...... 10 3.4 FullRecall ...... 11 4 Dril features ...... 12 4.1 Dril and other e-learning applications in IS MU ...... 12 4.2 Features ...... 12 4.3 Algorithm ...... 13 4.4 Overview of used technologies ...... 15 4.5 Future ...... 17 5 Dril guide ...... 18 5.1 Subscribe to a coursebook ...... 18 5.2 Learning ...... 18 5.3 Editing cards ...... 20 5.4 Notifying the user he has to learn ...... 20 5.5 Erasing user data ...... 21 5.6 Dril Management ...... 22 5.7 Creating and editing coursebooks ...... 22 5.8 Importing new cards ...... 22 5.9 Format of the data file ...... 24 5.10 Format of the template file ...... 25 5.11 Viewing existing lessons and cards ...... 27 5.12 How to use Dril effectively ...... 27 6 Dril statistics ...... 28 6.1 Number of users ...... 28 6.2 Average intervals ...... 29 6.3 Interval, factor and level distributions ...... 30

1 6.4 Forgetting curve ...... 30 6.5 Amount of learning material ...... 31 7 Conclusion ...... 35 A Code samples ...... 36

2 Chapter 1 Introduction

This diploma thesis attends to the technique of learning called spaced repetition. Spaced repetition helps the student to memorize a great number of small and relatively indepen- dent pieces of knowledge, such as the foreign language vocabulary with as small effort as possible. Spaced repetition is using the psychological spacing effect. The spacing effect is known since the 19th century, but it could be conveniently utilized only with the advent of informa- tion technology. The reason is that the “administrative” tasks of the learning process could be done by a computer, thus liberating the student of this burden. It also enabled using of more complex algorithms, because they are no longer computed manually. The work consists of five chapters. The first chapter gives a short overview of what the spaced repetition is and how does it work. The second chapter contains a description of existing spaced repetition software. The third and fourth chapter are both devoted to Dril — a new application in the MU Information System that helps its users in learning. The third chapter contains overview of Dril features and used technology, whereas the fourth chapter provides exhaustive description of Dril. In the last chapter there are statistics of the application usage and summarized achievements of its users.

3 Chapter 2 What is spaced repetition

Spaced repetition is a learning technique based on reviewing of the knowledge in increasing time intervals. It is designed to help the student memorize large amount of small indepen- dent pieces of knowledge with the following goals in mind:

• Maximize amount of remembered information.

• Minimize learning time.

One can argue that there is no need to memorize anything — after all we have Internet and everybody can look up needed information in a few seconds. That is true. But human brain is very good at making associations and if we want it to work effectively, we have to “load the data into ” first. Spaced repetition is intended for long-term learning as the results can appear after sev- eral weeks or even months of work. On the other hand, learning of such a great amount of material in a traditional way would be considerably more demanding and would take even more time and energy. However, spaced repetition is not suitable for learning complex material that can not be effectively broken down to smaller pieces. For example, the student could memorize the steps of a complex mathematical proof, but the “big picture” would still evade him and he will soon forget what he learned in the flood of other things.

2.1 Spacing effect

Spaced repetition exploits the psychological spacing effect. What is the spacing effect and how the human memory works is described in the following citation:

According to Wozniak, when we learn something for the first time we experience a slight increase in the stability and retrievability in synapses involved in cod- ing the particular stimulus-response association. In time, retrievability declines rapidly; the phenomenon equivalent to forgetting. At the same time, the stabil- ity of memory remains at the approximately same level. However, if we repeat the association before retrievability drops to zero, retrievability regains its initial value, while stability increases to a new level, substantially higher than at pri- mary learning. Before the next repetition takes place, due to increased stability,

4 2. WHAT IS SPACED REPETITION

retrievability decreases at a slower pace, and the inter-repetition interval might be much longer before forgetting takes place. Two other important properties of memory should also be noted: (1) repetitions have no power to increase the stability at times when retrievability is high (spacing effect), (2) upon forgetting, stability declines rapidly. [3]

In other words, too much repetitions in too short time intervals have no positive effect on better . On the other hand, if the repetitions are spread too much, retrievability drops too low and the information is eventually forgotten.

2.2 Learning process

There were implemented several systems utilizing the spaced repetition, but the general outline of the learning process is the same in all of them. The material that the student wants to learn is first divided into small pieces usually called items, flashcards or just cards. Each card contains a question and an answer. In one session the student is usually given cards that are scheduled for review and sev- eral new cards. The learning of one card usually looks like this:

1. The student is shown a question.

2. He has to recollect the correct answer.

3. Next, he has to evaluate whether he did or didn’t know the answer.

4. Finally, he provides a feedback of how well he knew the answer. The system then decides when the user will get the card next time.

If the student knew the answer well, the card will be reviewed later than in case he didn’t know the answer. The length of intervals can span from a few days to months or even years. The computation of the review interval is the most crucial part of the algorithm. The review interval is usually calculated from the feedback the student provides and from the previously computed interval. It can involve other things too, such as the number of trials user needed to correctly answer the question or a difficulty rating of the card. Ideally, the card should be scheduled for review when the probability of recollecting the information is 90% (let’s call it retention index). This is more difficult to achieve than it sounds — the moment when the student is about to forget what he learned depends on many factors, while many of them can not be influenced or predicted. The factors include fatigue, stress, amount and type of material already learned etc. Choosing the retention index too high or too low (over 90% or below 50%) is counter- productive. If it is chosen too high, review intervals will be too short and the learning would be frustrating because of slow progress. Choosing too low retention index is useless as well because the amount of remembered information would be too low.

5 2. WHAT IS SPACED REPETITION

2.3 Forgetting curve

Experiments conducted by [1] showed that the process of forgetting is exponential. The forgetting curve can be described with the following formula:

R = e−a.t (2.1)

Where R is the probability of remembering information, t is time and a is the relative strength of memory. Figure 2.1 is visualizing this formula. Curves show the probability of remembering in- formation after a certain time. The red curve depicts forgetting of a newly acquired informa- tion. It does not matter how many repetitions were needed in order to learn this information, only the fact that it has been learned is important. As the time progresses, the probability of remembering the information is decreasing rapidly — after five days the probability of remembering is only about 50%! But if the information is reviewed in the following days before it is forgotten (i.e. when the probability of recollecting is 90%), the memory is revived and the time interval till the next review can be increased (the subsequent forgetting curves are less steep).

New information learned 1st review 2nd review

100%

90% Retention

0% 5 10 15 Time remembered (days)

Figure 2.1: Illustration of a forgetting curve

6 2. WHAT IS SPACED REPETITION

2.4 Specifics of learning foreign languages

Spaced repetition can be used for learning of virtually any kind of information, but most people use it for learning of foreign languages. However, it has also some drawbacks. In order for the student to learn a language, he has to master many skills. Memorization of vocabulary is only a small part of them. Spaced repetition can hardly (if at all) help the student with writing and listening skills or with correct pronunciation. Therefore students should not rely on spaced repetition as an exclusive way of learning languages. Teachers should provide students with extra material that puts the learned information into context, such as example sentences or explanatory notes when the reading of a card might not be clear.

7 Chapter 3 Spaced repetition software

This chapter contains an overview of already existing spaced repetition programs. Most of them are available for multiple operating systems, some of them even provide online versions. Many are based on or at least inspired by SuperMemo. Please note that all the screenshots in this chapter are taken from their respective official websites.

3.1 SuperMemo

Website: http://supermemo.com/

SuperMemo is a computer program designed by Dr. Piotr Wozniak. He has been devel- oping it for over 20 years, so an enormous amount of work has been put into it. It is available for Windows, Palm OS and Pocket PC operating systems. There is also an online version of SuperMemo at http://supermemo.net that is free to use, but most of the learning material has to be purchased. However there is some free learning material available.

Figure 3.1: Screenshot of SuperMemo

Piotr Wozniak developed the first version of algorithm (called SM-0) in 1985. It did not require a computer, it was a pen-and-paper method. He has been continuously improving

8 3. SPACED REPETITION SOFTWARE the algorithm, the most recent version available today is called SM-11. The descriptions of all SM algorithms are publicly available at the SuperMemo website [4]. The most recent version of SuperMemo was released in 2006 and it uses the algorithm SM-11.

3.2 Anki

Website: http://ichi2.net/anki/

Anki is open-source spaced repetition program made by Damien Elmes. It uses a modi- fied SuperMemo SM-2 algorithm with changes allowing priorities on cards. It is available for , Mac OS X and Windows. Anki doesn’t have a separate online version, however it supports synchronization with a free online server. Users can upload their learning material to the server and keep synchronized copies on multiple computers.

Figure 3.2: Screenshot of Anki

Anki has a unique way of storing data. It keeps a database for facts and another database for cards. The first one holds facts consisting of several fields, such as “Spelling of a word in foreign language”, “Pronunciation” and “Spelling of a word in native language”. The second database specifies, how the cards will be created from the facts. It is usually used to create cards that ask a translation of a word in both direction or cards that ask pronunciation of a certain word. Because the facts are all stored in one place, it is easy to make changes that affect a whole group of cards. Anki uses four levels of feedback labeled “Again”, “Hard”, “Good” and “Easy”. Users can see in advance, what will be the next interval if they would choose a particular button.

9 3. SPACED REPETITION SOFTWARE

3.3 Mnemosyne

Website: http://fullrecall.com/

Mnemosyne is open-source program as well. Because it is written in Python, it is avail- able for Windows, Linux or Mac OS X. It is based on SuperMemo SM-2 algorithm too. There are some modifications regarding handling of early and late repetitions and slight randomization of computed intervals. It also allows to organise cards into categories. The categories can be activated and deactivated at will so users are able to choose what they learn.

Figure 3.3: Screenshot of Mnemosyne

Users can upload their learning statistics to the project’s server and participate in a long- term memory research project. Authors of Mnemosyne claim they will use acquired statisti- cal data to get insight to how the human memory works and improve the algorithm.

10 3. SPACED REPETITION SOFTWARE

3.4 FullRecall

Website: http://fullrecall.com/

FullRecall is a proprietary program available for Linux, Mac OS X, FreeBSD, Windows, Pocket PC and Maemo (Nokia Internet Tablets). There seems to be an online version of FullRecall, but at the time of writing this work it was not functional.

Figure 3.4: Screenshot of FullRecall

FullRecall uses unique algorithm for scheduling cards that is based on artificial neural networks.

11 Chapter 4 Dril features

The practical part of this thesis is Dril — a new application in the MU Information Sys- tem that enables its users to learn using spaced repetition. Dril is available in the Personal Administration of MU Information System under the link Dril in the menu to the left, or directly at https://is.muni.cz/auth/dril/ .

4.1 Dril and other e-learning applications in IS MU

The MU Information System already contains many applications that are related to e-learning, but most of them are “teacher-driven”. That means the teacher controls what material stu- dents get and ensures that students use it properly (or at all). One example of this approach is the Revision, Opinion Poll and Testing Agenda. The teacher is the main motivator here: he creates a test that students have to (or should) pass in order to pass the course. It assumes that the student is more or less a passive participant of the learning process and has to be motivated by the teacher to do something. Dril is more “student-driven” than the Testing agenda. It presumes that the student wants to learn something, understands the costs in time and effort and is willing to un- dertake them. In order to reap the fruits of learning, one has to learn for several weeks or months. This obviously goes beyond “one term — one course” paradigm that most students are used to.

4.2 Features

Dril has many features that are unique or unusual and alternative software presented in Chapter 2 is missing them. They include:

• User created lecture books — Users can create their own lecture books and can share them with others.

• Card customization — Users can create their own version of cards that contain e.g. example sentences.

• Support for related cards — Cards that contain the same information are grouped and presented to the user on different days. The groups can contain arbitrary number of cards.

12 4. DRIL FEATURES

• Possibility to learn in advance — User can choose to learn cards scheduled for the next day or cards that are about to be scheduled in coming days.

• Handling of late reviews — If the user doesn’t answer card on time, Dril will still behave correctly.

• Reminder — IS will notify the user that he or she has outstanding cards to review.

• Banning of cards — Users can ban cards they do not want to learn.

• Card contents — Contents of cards are not limited to simple plaintext. Cards can be composed of:

– HTML — including text formatting, images, sounds and videos.

– LATEX code — cards can contain LATEX formatted text. – Randomly permuted list of items, e.g. enumeration of possible answers.

4.3 Algorithm

The learning process consists of three phases:

• Review phase — reviewing cards that are after deadline.

• New cards phase — the user is given a set of new cards.

• Drilling phase — only cards that the user didn’t know well in the previous two phases are given. The cards are being given repeatedly until the user answers them correctly.

Apart from that, if there are any cards scheduled for drilling phase older than 10 minutes, they are presented immediately regardless of what phase the user is currently in. This is good especially in case there are many cards in the current phase and the user would forget the card he didn’t know by the time he gets to the drilling phase.

Selecting cards for review

Basically, the procedure that selects cards for review is quite simple. If the card’s previous review date plus its interval is more recent than the present time, it will be included in the selection. Also only the cards that are not marked as banned by the user are selected and there must not exist a related card that has been answered in the last 12 hours.

13 4. DRIL FEATURES

Calculating the time of next review

When the user sends feedback, the application has to schedule the card for next review. But there is a problem: How to calculate the length of review interval? The next review should be scheduled when there is about 90% chance of remembering the card or, in other words, the user can remember at least 90% of cards learned. Dril is based on SM-2 algorithm [5]. The procedure for calculating the new interval takes into consideration these parameters:

• user feedback, it tells how well was the answer recollected,

• number of wrong answers before the user knew the correct answer,

• card factor (represents the card difficulty, it is associated with both the card and user),

• previous interval of the card,

• the real interval of the review.

The main and most influential parameter is the feedback value. It is without doubt that if the user knew the correct answer, the application should calculate longer interval than in case he didn’t know the answer well. The real interval of the review is used to determine whether the card was reviewed pre- maturely, on time or too late. If the card was reviewed prematurely and the user didn’t know the answer, we can assume that the previous interval was too long and it is shortened (be- cause if the user didn’t know the answer now, he certainly wouldn’t know it later). If he knew the answer, it still wouldn’t tell us whether he would know it at the time the review was scheduled. Therefore the calculated interval will be the same as before. If the card was reviewed too late after deadline, the situation would be similar. In case the card was reviewed on time, we can tell whether the previous interval was too long or too short. So the procedure can recalculate the interval according to the feedback provided. The card factor is another variable used in calculation of the new interval. It represents the difficulty of the card, or rather “easiness”, because the lesser the value is, the harder the card is. The new cards of course do not have any interval or factor values that can be used in calculation of new ones. Therefore these cards receive a pre-defined values.

Related cards

Related cards are cards that contain the same information but, generally speaking, present it from different angles. For example, cards that ask the user to translate a word into for- eign language and back (bi-directional translation) are related cards. Groups can contain arbitrary number of cards even from different lessons.

14 4. DRIL FEATURES

There is a reason for special handling of related cards. The algorithm that selects cards for review ensures that the student will see at most one card from the same group of related cards in one day. If it were otherwise, learning would be less effective because of the spacing effect: The student would have the information already in his short-term memory.

4.4 Overview of used technologies

The fact that IS is a web portal narrowed the selection of technologies that could be used in Dril. Dril uses an Oracle database for storing data, Perl for server-side application logic and HTML with CSS styles for presenting user interface. However, the simple web application using only HTML would not suffice — Dril has to provide seamless user interface and fast response times, and if possible, without page reloading after every action. The reason is obvious: the users wouldn’t want to wait for the page to load after every answered card, especially if they are using mobile devices with low bandwidth.. This is the place where AJAX1 comes in. It allows creating dynamic web applications that immediately react to users’ actions and can request additional information off the server, without reloading the page every time.

AJAX Most modern web browsers provide an interface for making asynchronous HTTP requests in their DOM2 API, called XMLHttpRequest object. Specification of this object and its prop- erties were published as a Working draft by the World Wide Web Consortium on 15 April 2008 [6]. The figure 4.1 shows the usage of the XMLHttpRequest object. First we have to create an instance of XMLHttpRequest object and then assign the onreadychange event han- dler. This handler is called after every change in the request state which is reflected by the readyState property. Comments in the code example explain the states of the request. After the request completed, the response is available in the responseText property. If the response is a XML document, the responseXml property contains a proper Document object containing parsed XML. It is usually a good idea to check the status property con- taining the HTTP status code. Dril doesn’t use XML in communication between client and server. It uses JSON [2] in- stead. JSON (JavaScript Object Notation) is a simple plain-text data format that can be used to encode arbitrary data structures. Its syntax is compatible with that of Javascript, so pars- ing is simple (a Javascript eval suffices). There are no special methods needed for working with parsed data, the resulting value is a classic Javascript object. JSON can express arbitrary data structures, including lists, associative lists, simple data types (number, string, boolean) or any combination of thereof. Associative list is expressed

1. Asynchronous Javascript and XML 2. Document Object Model

15 4. DRIL FEATURES http_req = new XMLHttpRequest(); http_req.onreadystatechange = function() { if (http_req.readyState == 0) { // The request is not initialized } else if (http_req.readyState == 1) { // The request has been set up } else if (http_req.readyState == 2) { // The request has been sent } else if (http_req.readyState == 3) { // The request is in progress } else if (http_req.readyState == 4) { // The request is complete if (http_req.status == 200) { // Handle response alert (http_req.responseText); } } }; http_req.open("GET", url, true); http_req.send();

Figure 4.1: Working with the XMLHttpRequest object using braces with key-value pairs delimited with a colon, while list is denoted as a comma separated list of values enclosed in square brackets. Complete syntax reference is available at [2]. The figure 4.2 contains an example of a simple data structure encoded with JSON along with a demonstration of how to parse it and work with it. var json_text = ’{ "NUMERIC_VALUE" : 1234, "STRING_VALUE" : "Winston Smith", "LIST" : [ 3.14, "Hello world" ] }’; var json_obj = eval("(" + json_text + ")"); var numeric_value = json_obj.NUMERIC_VALUE;

Figure 4.2: Working with a JSON encoded data structure

Database

Dril is using the Oracle database for storing data. The figure 4.3 shows the database schema used in Dril.

16 4. DRIL FEATURES

Figure 4.3: Schema of database tables

Cards are organized into a three tier hierarchy. Every card belongs to a lesson, while each lesson is part of a lecture book. Lecture books are assigned to an area. Examples of the areas can be languages — English, Spanish, German etc. This kind of hierarchy is necces- sary, because otherwise users would soon lost track in a flood of coursebooks and lessons. Aditionally, cards are grouped to form groups of related cards.

4.5 Future

There are many planned improvements to the application, for example:

• Statistics — More statistics directly in the application. For example, users will be able to see their own forgetting curve.

• User collaboration — For example users would be able to see each others’ changes to the cards and cherry-pick changes they like.

• Algorithm improvement — Further improvement of the algorithm that calculates op- timal review intervals.

• Simpler card management — Revise card management and make it easier to use.

17 Chapter 5 Dril guide

This chapter describes features of Dril in more detail and documents most of the operations currently present in Dril.

5.1 Subscribe to a coursebook

If the user wants to start using Dril, he has to subscribe to a coursebook first. This is accom- plished by clicking on the link Aktivovat ucebniceˇ (Subscribe coursebook) on the title page and choosing the area and coursebook of interest. Then the user chooses which lessons does he want to subscribe. There are three tabs on this page, labeled Aktivace lekc´ı, Reaktivace lekc´ı and Deaktivace lekc´ı (Subscribe lessons, Resubscribe lessons and Unsubscribe lessons, respectively). On the Resubscribe tab the user can resubscribe lessons whose contents had changed. The recognized changes are card addition and deletion. The number of added and deleted cards is shown in the table next to the lesson name. The Unsubscribe tab lists subscribed lessons, the user can unsubscribe them if he doesn’t want to learn them anymore. When unsubscribing, user’s data concerning the unsubscribed lesson is not erased from the database. It is merely flagged as inactive, so if the user sub- scribes the same lesson again, his data will be restored. The list of coursebooks that contain subscribed lessons is shown on the title page in the section Aktivovane´ ucebniceˇ (Subscribed coursebooks) along with additional details such as user’s progress in the coursebook, average card interval or count of people that are learning from the same coursebook.

5.2 Learning

The most important part of Dril is the learning application. It is accessible from the title page in the section Ucitˇ se (Start learning). Clicking on the area title will take the user directly to the learning application, whereas clicking on the link detaily (details) will take him to the page where he can choose what to learn in more detail. The learning application UI is divided into two parts: to the left are boxes for the question and the answer and a row of feedback buttons, to the right is a panel with boxes containing various information about learning phases, current and previous card and available actions.

18 5. DRILGUIDE

Figure 5.1: Subscribe lessons page

The left part contains initially only the question and the Show answer button. After the user recalls the answer in his mind, he can click on the button and the correct answer is re- vealed. Next he has to evaluate his answer and click on one of the feedback buttons labeled 1 to 6. The meaning of the buttons is explained in the table below the buttons. The applica- tion then processes the feedback and displays next question. If there are no more cards in the current phase, the user is taken to the What to learn page. More experienced users can use keyboard to control the application: keys 1-6 or the top- row keys QWERT[YZ] select the respective feedback button, Space or Enter confirm the selection. The keys 1 and 2 work even while the answer is hidden so easy cards wouldn’t waste user’s time. The topmost box in the right part of the screen contains information about the current learning phase and a number of remaining cards. There are three learning phases in Dril: Review, New cards and Drilling phase. The first two are quite self-explanatory: in the Re- view phase Dril presents outstanding cards, in the New cards phase there are given cards the user had not seen yet. The Drilling phase is initially empty, but it eventually contains failed cards from the previous two phases. The cards in the Drilling phase are repeatedly

19 5. DRILGUIDE

Figure 5.2: Screenshot of a “What to learn” page

presented to the user in random order until he answers them correctly. Additionally, if there are cards in the Drilling phase older than 10 minutes, they are presented to the user imme- diately regardless of the phase he is currently in. One of the next boxes in the right panel contains available operations. Currently users can edit the card or ban it if they don’t want to see it anymore.

5.3 Editing cards

All users can edit the card, if they think it is unclear or incorrect. All the changes users make are private — they are visible only to the user who made the change. The author of the coursebook can decide to edit the card globally. In this case the change will be visible to everybody. Links to edit the card are available either while learning in the right-hand box or in the Dril management.

5.4 Notifying the user he has to learn

Because users should attend to Dril every day, there is a need to notify them they have scheduled cards for review. The notification should be bold enough so users will notice it, but at the same time it should not be annoying.

20 5. DRILGUIDE

Figure 5.3: Screenshot of learning in progress

The notification has been implemented as a highlighted link to the application in the left hand menu (see figure 5.4).

5.5 Erasing user data

Users can erase all their data and statistics tied to a particular area if they decide so. The option is available on the title page, under the link Smazat me´ udaje´ o oblasti (Erase my data).

21 5. DRILGUIDE

Figure 5.4: Screenshot of active and inactive notification

5.6 Dril Management

The title page of Dril Management contains a list of existing coursebooks which can be fil- tered by area and coursebook author. Users can see only their own or public coursebooks. This page provides links to other parts of application, such as creating and editing course- books, importing cards to lessons etc.

5.7 Creating and editing coursebooks

All users can create their own coursebooks and share them with other users. The coursebook creation form is displayed after clicking on the link Vytvoritˇ ucebniciˇ (Create coursebook) on the Dril Management title page. The required fields are only the first two — the name of the coursebook and the area it will belong to. The rest is optional, however often useful. The field Popis ucebniceˇ (Description) should contain a short description of a course- book. This description is shown in various places across the application. In the following field authors can also specify HTML code that will be displayed in the learning application when the user encounters the card from this coursebook. The last two fields contain Javascript code that will be evaluated before the question or the answer is displayed when learning — that is useful in implementing various “ex- tra” features, such as including a multimedia files in question or answer. There are already coursebooks that use these fields to provide automatic pronunciation of foreign words. The form for editing a coursebook contains also a checkbox for publishing the course- book. The coursebook is private by default, if the user wants others to see and learn from it, he has to publish it first.

5.8 Importing new cards

One of the most complex parts of Dril is the card import application. It is used for creating lessons and importing new cards to coursebooks. The import application uses two types of input: data files that contain data from which the cards are constructed and template files that specify how will be the cards created. One can compare it to the calling of a function in imperative programming languages — the

22 5. DRILGUIDE

Figure 5.5: Dril can play audio files as well

template is the function and data are the parameters passed to the function. The application had to be as easy to use as possible so it could be used by ordinary users, but at the same time, it had to take into account power users who decide to use some or all of the more advanced features. This is achieved by making the template optional. Every user has to write data files, but only advanced users have to write template files.

Looking at generated cards or dry-run

The dry-run option is an extremely useful feature because it allows users to see the cards that will be imported to the database in advance, without changing anything in the database. Users can experiment with templates or data files without being afraid that they will break something. This feature is available as a checkbox in the card import form.

Overwriting existing cards

If the user wants re-import cards because he changed the template, he must check the re- spective checkbox in the card import form. However the overwrite doesn’t work well when the data file has been changed as well. In this case there will be created redundant cards with similar values and the user is responsible for deleting obsolete ones.

Error handling

While importing cards, there are many situations where something can go wrong, especially concerning the user input. Writing template and data file can be error-prone even for an experienced user and the application should not dissuade him with bad error handling. If an error occurs, it is shown as a message containing as much information as possible, so the user is able to fix the error easily. Dril makes these checks while processing the input files:

• Template file validation — the application checks whether the template file contains

23 5. DRILGUIDE

valid data (using a RelaxNG schema).

• Data file validation — checks whether the records in the data file reference existing templates, whether they contain correct number of parameters etc.

• Sanity of generated cards — whether the cards are not being imported into existing lessons when the user requested that a new lesson should be created, whether ques- tion/answer is not an empty string etc.

• Conflicts — ensures that the generated cards are not already in the database.

• Duplicates — checks if the generated cards contain duplicates.

5.9 Format of the data file

Simple data file

In the simplest case the import application requires only the data file. The data file in this case looks like this: applejablko orangepomerancˇ As you can see, it is just a simple CSV file with each record on a single line and with fields separated by a Tab character. What might not be obvious is that a single line defines two cards. This is because the most common use case is creating pairs of cards (for learning foreign language vocabulary). Hence the two-line example above will create four cards: Question:

apple

Answer:

jablko

Question:

jablko

Answer:

apple

Question:

orange

Answer:

pomeranc

ˇ

Question:

pomeranc

ˇ Answer:

orange

Additionaly, the cards in a pair are marked as related. More on this subject will be given later, but in a shortcut: when the user is learning related cards, he gets only one of them in one learning session.

Data file for use with template

In case the user wants to use the template, the format of data file is a bit different. The first value on the line has to be the name of the template, then follow arguments of the template.

24 5. DRILGUIDE

The previous example rewritten to use the template named template1 would look like this: template1applejablko template1orangepomerancˇ

Line wrapping

Lines of the data file can be wrapped using the backslash \ character. This is useful in case the user wants to preserve text formatting (for use in code snippets, for example) or he just doesn’t like too long lines for the sake of clearness. For example: template1very very long line \ that continues on the next line

Note that if the backslash character is not the last character on the line, you will get unexpected results.

5.10 Format of the template file

If the user wants to use more advanced features of Dril, such as custom HTML formatting of questions and answers, he has to write a template. Template is a XML file that specifies how will be the cards created.

Element

The root element of the template is . It contains one or more elements