Second Language Acquisition Modeling

Second Language Acquisition Modeling Burr Settles∗ Chris Brust∗ Erin Gustafson∗ Masato Hagiwara∗ Nitin Madnani† ∗Duolingo, Pittsburgh, PA, USA †ETS, Princeton, NJ, USA {burr,chrisb,erin,masato}@duolingo.com [email protected] Abstract Then we report on the results of a “shared task” challenge organized by the authors using this SLA We present the task of second language acquisition (SLA) modeling. Given a history of er- modeling corpus, which brought together 15 re- rors made by learners of a second language, the search teams. Our goal for this work is three- task is to predict errors that they are likely to fold: (1) to synthesize years of research in cog- make at arbitrary points in the future. We de- nitive science, linguistics, and machine learning, scribe a large corpus of more than 7M words (2) to facilitate cross-dialog among these disci- produced by more than 6k learners of English, plines through a common large-scale empirical Spanish, and French using Duolingo, a popular task, and in so doing (3) to shed light on the most online language-learning app. Then we report on the results of a shared task challenge aimed effective approaches to SLA modeling. studying the SLA task via this corpus, which attracted 15 teams and synthesized work from 2 Shared Task Description various fields including cognitive science, lin- Our learner trace data comes from Duolingo: guistics, and machine learning. a free, award-winning, online language-learning 1 Introduction platform. Since launching in 2012, more than 200 million learners worldwide have enrolled in As computer-based educational apps increase in Duolingo’s game-like courses, either via the web- popularity, they generate vast amounts of student site1 or mobile apps. learning data which can be harnessed to drive per- Figure 1(a) is a screen-shot of the home screen, sonalized instruction. While there have been some which specifies the game-like curriculum. Each recent advances for educational software in do- icon represents a skill, aimed at teaching themati- mains like mathematics, learning a language is cally or grammatically grouped words or concepts. more nuanced, involving the interaction of lexi- Learners can tap an icon to access lessons of new cal knowledge, morpho-syntactic processing, and material, or to review material once all lessons are several other skills. Furthermore, most work that completed. Learners can also choose to get a per- has applied natural language processing to lan- sonalized practice session that reviews previously- guage learner data has focused on intermediate-to- learned material from anywhere in the course by advanced students of English, particularly in as- tapping the “practice weak skills” button. sessment settings. Much less work has been de- voted to beginners, learners of languages other 2.1 Corpus Collection than English, or ongoing study over time. To create the SLA modeling corpus, we sampled We propose second language acquisition (SLA) from Duolingo users who registered for a course modeling as a new computational task to help and reached at least the tenth row of skill icons broaden our understanding in this area. First, we within the month of November 2015. By limit- describe a new corpus of language learner data, ing the data to new users who reach this level of containing more than 7.1M words, annotated for the course, we hope to better capture beginners’ production errors that were made by more than broader language-learning process, including re- 6.4k learners of English, Spanish, and French, dur- peated interaction with vocabulary and grammar ing their first 30 days of learning with Duolingo 1 (a popular online language-learning app). https://www.duolingo.com 56 Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 56–65 New Orleans, Louisiana, June 5, 2018. c 2018 Association for Computational Linguistics (a) home screen (b) reverse_translate (c) reverse_tap (d) listen Figure 1: Duolingo screen-shots for an English-speaking student learning French (iPhone app, 2017). (a) The home screen, where learners can choose to do a “skill” lesson to learn new material, or get a personalized practice session by tapping the “practice weak skills” button. (b–d) Examples of the three exercise types included in our shared task experiments, which require the student to construct responses in the language they are learning. over time. Note that we excluded all learners who learner: wen can help took a placement test to skip ahead in the course, reference: when can I help ? since these learners are likely more advanced. label: 2.2 Three Language Tracks Figure 2: An illustration of how data labels are gener- An important question for SLA modeling is: to ated. Learner responses are aligned with the most similar reference answer, and tokens from the reference that what extent does an approach generalize across do not match are labeled errors. languages? While the majority of Duolingo users learn English—which can significantly improve job prospects and quality of life (Pinon and Hay- translate it into the L2. Figure 1(c) illustrates a re- don, 2010)—Spanish and French are the second verse tap item, which is a simpler version of the and third most popular courses. To encourage re- same format: learners construct an answer using a searchers to explore language-agnostic features, bank of words and distractors. Figure 1(d) is a lis- or unified cross-lingual modeling approaches, we ten item, where learners hear an utterance in the L2 created three tracks: English learners (who speak they are learning, and must transcribe it. Duolingo Spanish), Spanish learners (who speak English), does include many other exercise formats, but we and French learners (who speak English). focus on these three in the current work, since con- structing L2 responses through translation or tran- 2.3 Label Prediction Task scription is associated with deeper levels of pro- The goal of the task is as follows: given a his- cessing, which in turn is more strongly associated tory of token-level errors made by the learner in with learning (Craik and Tulving, 1975). the learning language (L2), accurately predict the Since each exercise can have multiple correct errors they will make in the future. In particular, answers (due to synonyms, homophones, or ambi- we focus on three Duolingo exercise formats that guities in tense, number, formality, etc.), Duolingo require the learners to engage in active recall, that uses a finite-state machine to align the learner’s re- is, they must construct answers in the L2 through sponse to the most similar reference answer form translation or transcription. a large set of acceptable responses, based on token Figure 1(b) illustrates a reverse translate item, string edit distance (Levenshtein, 1966). For ex- where learners are given a prompt in the language ample, Figure 1(b) shows an example of corrective they know (e.g., their L1 or native language), and feedback based on such an alignment. 57 Figure 2 shows how we use these alignments to TRAIN DEV TEST generate labels for the SLA modeling task. In this Track Users Tokens (Err) Tokens (Err) Tokens (Err) case, an English (from Spanish) learner was asked English 2.6k 2.6M (13%) 387k (14%) 387k (15%) to translate, “¿Cuándo puedo ayudar?” and wrote Spanish 2.6k 2.0M (14%) 289k (16%) 282k (16%) “wencanhelp”insteadof“WhencanIhelp?” This French 1.2k 927k (16%) 138k (18%) 136k (18%) produces two errors (a typo and a missing pro- Overall 6.4k 5.5M (14%) 814k (15%) 804k (16%) noun). We ignore capitalization, punctuation, and Table 1: Summary of the SLA modeling data set. accents when matching tokens. 2.4 Data Set Format fore). Practice sessions (22%) should contain only Sample data from the resulting corpus can be found previously-seen words and concepts. Test sessions in Figure 3. Each token from the reference an- (1%) are mini-quizzes that allow a student to skip swer is labeled according to the alignment with the out of a single skill in the curriculum (i.e., the stu- learner’s response (the final column: 0 for cor- dent may have never seen this content before in the rect and 1 for incorrect). Tokens are grouped Duolingo app, but may well have had prior knowl- together by exercise, including user-, exercise-, edge before starting the course). and session-level meta-data in the previous line It is worth mentioning that for the shared task, (marked by the # character). We included all ex- we did not provide actual learner responses, only ercises done by the users sampled from the 30-day the closest reference answers. Releasing such data data collection window. (at least in the TEST set) would by definition give The overall format is inspired by the Universal away the labels and might undermine the task. 2 Dependencies (UD) format . Column1isaunique However, we plan to release a future version of the B64-encodedtoken ID,column 2is atoken (word), corpus that is enhanced with additional meta-data, and columns 3–6 are morpho-syntactic features including the actual learner responses. from the UD tag set (part of speech, morphology features, and dependency parse labels and edges). 2.5 Challenge Timeline These were generated by processing the aligned The data were released in two phases. In phase 1 reference answers with Google SyntaxNet (Andor (8 weeks), TRAIN and DEV partitions were re- et al., 2016). Because UD tags are meant to be leased with labels, along with a baseline system language-agnostic, it was our goal to help make and evaluation script, for system development. In cross-lingual SLA modeling more straightforward phase 2 (10 days), the TEST partition was released by providing these features.

Load more