System for Adaptive Learning of Japanese Based on Language Data
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty of Informatics System for adaptive learning of Japanese based on language data Bachelor’s Thesis Alexander Macinský Brno, Spring 2020 Masaryk University Faculty of Informatics System for adaptive learning of Japanese based on language data Bachelor’s Thesis Alexander Macinský Brno, Spring 2020 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Alexander Macinský Advisor: doc. RNDr. Aleš Horák, Ph.D. i Acknowledgements This way I would like to thank my advisor for directing me in the process of creation of this thesis. Also, my thanks go to all the respon- dents who were willing to take part in the testing process and helped me evaluate the project. ii Abstract Japanese language learners need to deal with a substantial amount of repetitive mental work. One of the solutions is to create a web browser application to simplify the process. The inspiration comes from other systems with similar functionality, a summary of these is provided. The created application tries to take the best from the existing solutions, as well as implement some original ideas. The result is a dictionary viewer, Japanese text reading aiding tool, flashcards editor and a tool for learning with flashcards all in one application. The thesis text details the analysis, implementation and evaluation of the developed application. iii Keywords adaptive learning, natural language processing, pop-up dictionary, web application, Japanese iv Contents Introduction 1 1 Overview of similar and related projects 3 1.1 Learning and memorisation ..................3 1.1.1 Duolingo . .3 1.1.2 Anki . .4 1.1.3 MEMRISE . .5 1.1.4 Kanji Study . .6 1.2 Reading Japanese texts with a dictionary ...........7 1.2.1 Japanese IO . .7 1.2.2 Tango.Risto . .8 1.3 Japanese dictionary viewers ..................8 1.3.1 Jisho . .8 1.3.2 Tangorin . .9 1.3.3 Kanji Study . .9 1.4 Example sentence search tools ................9 1.4.1 Tatoeba . .9 1.4.2 Tangorin . 10 2 Data sources 11 2.1 Dictionary files ........................ 11 2.1.1 JMdict . 11 2.1.2 KANJIDIC2 . 12 2.1.3 ENAMDICT/JMnedict . 12 2.2 Language corpora ....................... 13 2.2.1 Tatoeba corpus . 13 2.3 JLPT levels .......................... 14 2.4 Kanji radicals ......................... 14 2.5 Grammar ........................... 15 2.6 Literary works ......................... 15 2.6.1 Aozora Bunko . 15 2.7 Overview of data used in japread ............... 16 3 Web application analysis 17 3.1 Dictionary ........................... 18 3.1.1 Words . 18 v 3.1.2 Kanji . 18 3.1.3 Search . 19 3.2 Flashcards ........................... 20 3.2.1 Adaptive learning . 21 3.3 Text reader ........................... 23 3.3.1 Entering the reading mode . 24 3.3.2 Tokenization . 24 3.3.3 Collections . 25 3.3.4 Known issues . 26 4 Implementation 27 4.1 Front end ........................... 27 4.1.1 Technologies, libraries and other resources . 27 4.1.2 Support of web browsers . 28 4.1.3 User interface . 28 4.2 Back end ............................ 28 4.2.1 Interaction with the database . 29 4.3 Client-server communication ................. 29 4.4 Data extraction and collection ................ 29 4.4.1 Other libraries and resources . 30 5 Evaluation 32 5.1 The testing system ...................... 32 5.1.1 The structure of the testing system . 32 5.1.2 The testing set . 36 5.1.3 Selection of the respondents . 37 5.1.4 Collected data . 37 5.2 Results ............................ 38 5.2.1 Respondents . 38 5.2.2 Results of the vocabulary tests . 41 5.2.3 Results of the questionnaire . 44 6 Conclusion and the future of japread 48 Glossary 49 Acronyms 51 Bibliography 52 vi A The source code of japread 56 B Results of the testing 58 vii List of Tables 2.2 Entries of the JMdict dictionary file with glosses in languages other than English [22] 12 2.4 Overview of data used in japread 16 5.2 Count of respondents per last finished action 39 5.3 Results of the questionnaire 46 viii List of Figures 5.1 Respondents per JLPT level of the vocabulary sets 40 5.2 Respondents per number of points 42 5.3 Respondents per number of points (with and without flashcards) 43 A.1 A view of a dictionary entry page in the application 57 A.2 A view of the text reader application. Note that this image contains an excerpt from a literary work published on Aozora Bunko, which was described in section 2.6.1 Aozora Bunko 57 A.3 A view of the flashcards editor 58 A.4 A view of the flashcards study application 59 B.1 A view of the description of the Preparation step from the testing application. 59 B.2 A view of one of the cards that the respondents could see during the Preparation step of the testing process 60 B.3 A view of the vocabulary list in the Memorizing without the application step of the testing process 61 B.4 A view of a multiple choice quiz question in the flashcards application and the Memorizing with the application step of the testing process 62 B.5 A view of the results page, which the respondents could see after filling in the test in the Testing the progress step of the testing process. 63 B.6 A view of the questionnaire page in the A little questionnaire step of the testing process. 64 B.7 A view of the thank-you page, which the respondents could see after finishing the questionnaire in the testing process. (And a thank-you page for the readers who got all the way to here.) 65 ix Introduction Learning a language is a simple process, which, however, includes a load of repetitive mental work. In addition, it is nontrivial to find a suitable way of learning a language that is effective and convenient enough not to discourage the learner from their persistence. It can be especially problematic not to get overwhelmed by the amount of vocabulary any language has. And Japanese introduces no relief to its speakers. Whereas in most of the languages, the learner faces the problem of learning a few thousand words, the Japanese language presents the learner with a challenge of mastering approximately 2000 characters on top of it [1]. This thesis aims to implement a system that would be capable of aiding learners in learning Japanese by simplifying the process of maintaining their vocabulary lists and study materials. For the pur- poses of this thesis, the main feature of the application is an interactive flashcard tool implementing a spaced repetition system1 [2]. Further, it is desired that the system is capable of adapting to the individual user, helping them to be effective in fulfilling the task of learning a large amount of vocabulary. Nonetheless, the application is not meant to be just a simple memorization tool. The intention is to make an interactive and adaptable personal dictionary with example sentences, word meanings, and entries on kanji2, where all these should be linked. The links should help by introducing the users to the context of the vocabulary and kanji entries. The thesis project, or as it is referred to in the thesis – japread, is primarily a web browser application. Both its front end and back end are implemented in JavaScript. The project is available for use at japread.com. The thesis starts with an overview of related applications. There are descriptions, lists of advantages, and lists of disadvantages of ap- plications that let their users fulfill tasks similar to the ones japread 1. A system that shows more difficult and newer flashcards more often, while older and less frequent flashcards are shown less often. 2. Originally Chinese characters that are used in Japanese. While they are adopted from Chinese, to some extent they differ from the characters used in Taiwan or Mainland China. 1 aims to help with. Then comes an overview of data sources. There are descriptions of dictionary files, language corpora, and other related data sources. Some of them are used in japread; others are just men- tioned. Further, there is an analysis and a description of the system implementation. This part offers more details on japread as a web ap- plication, and the algorithms and mechanisms implemented in the system are also explained in this part. Finally, the thesis is closed with an evaluation of the system based on data collected from the application’s users. Overall, this thesis outlines the creation of a language-learning application, and also more generally, it outlines the implementation of a web application. Resources and inspiration for such an application are also summarized. Furthermore, an insight into the basics of natu- ral language processing and processing of the Japanese language in particular are also depicted. With no less importance, the application itself is ready for use and available online to the public at japread.com. Nonetheless, the system is not yet finished, and further develop- ment of the project is expected to bring even more benefits to its users. The thesis is essentially a basis for further improvements. Among planned functionality and features, there are advanced dictionary search options, more options for sharing of flashcards, various learn- ing modes. Other improvements need to be made in text tokenization and text analysis.