Multiple Admissibility: Judging Grammaticality Using Unlabeled Data in Language Learning

Multiple Admissibility in Language Learning: Judging Grammaticality Using Unlabeled Data Anisia Katinskaia, Sardana Ivanova, Roman Yangarber University of Helsinki, Department of Computer Science, Finland [email protected] Abstract because negative feedback for an acceptable answer misleads and discourages the learner. We present our work on the problem of detec- tion Multiple Admissibility (MA) in language We examine MA in the context of our language learning. Multiple Admissibility occurs when learning system, Revita (Katinskaia et al., 2018). more than one grammatical form of a word fits Revita is available online2 for second language syntactically and semantically in a given con- (L2) learning beyond the beginner level. It is in text. In second-language education—in partic- use in official university-level curricula at several ular, in intelligent tutoring systems/computer- major universities. It covers several languages, aided language learning (ITS/CALL), systems many of which are highly inflectional, with rich generate exercises automatically. MA implies that multiple alternative answers are possi- morphology. Revita creates a variety of exercises ble. We treat the problem as a grammatical- based on input text materials, which are selected ity judgement task. We train a neural network by the users. It generates exercises and assesses with an objective to label sentences as gram- the users’ answers automatically. matical or ungrammatical, using a “simulated For example, consider the sentence in Finnish: learner corpus”: a dataset with correct text “Ilmoitus vaaleista tulee kotiin postissa .” and with artificial errors, generated automat- (“Notice about elections comes to the house in the ically. While MA occurs commonly in many mail.”) languages, this paper focuses on learning Rus- sian. We present a detailed classification of In practice mode, Revita presents the text to the the types of constructions in Russian, in which learner in small pieces—“snippets” (about 1 para- MA is possible, and evaluate the model using graph each)—with all generated exercises. This a test set built from answers provided by users is important, because grammatical forms in exer- of the Revita language learning system. cises usually depend on a wider context. 1 Introduction For the given example, Revita can generate cloze exercises hiding several tokens (surface The problem of Multiple Admissibility (MA) oc- forms) and providing their lemmas as hints: curs in the context of language learning. In “Ilmoitus vaali tulla koti posti .” “cloze” exercises (fill-in-the-blank), the learner re- (“Notice election come house mail .”). ceives a text with some word removed, and a base For the verb ”tulla” (”come”) and nouns ”vaali” form1 of the removed word as a hint. The task is to (“election”), ”koti” (“home”) and ”posti” (”mail”) produce the correct grammatical form of the miss- the learner should insert the correct grammatical ing word, given the context. The answer given by forms. Revita expects them to be the same as the the user is checked automatically by the language forms in the original text, and will return negative learning system. Therefore, the system should feedback otherwise. However, in this example, be able to accept more than one answer, if there postitse (“via email”) is also acceptable, although are grammatically and semantically valid alterna- it is not in the original text. tives in the given context. Otherwise, the language The MA problem is also relevant in the context learning system returns negative (actually incor- of exercises that require a free-form answer, such rect) feedback to the learner. This is a problem, as an answer to a question or an essay. The learner 1The base or “dictionary” form will be referred to as lemma in this paper. 2https://revita.cs.helsinki.fi/ 12 Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 12–22, Florence, Italy, 2 August 2019. c 2019 Association for Computational Linguistics can produce “unexpected” but nevertheless valid larger base of Revita users. grammatical forms. 2.1 Types of Multiple Admissibility In the current work, we restrict our focus to MA of multiple surface forms of the same lemma, In analyzing the answers given by our users, we given the context; we do not consider synonyms, discovered several types of the most frequent con- which can also fit the same context. The latter texts where MA appears. topic is part of the future work. Present/Past tense: The most clear case of MA The structure of the paper is as follows: In sec- in Russian (as in many other languages) is the case tion2 we formulate the problem and provide a de- of interchangeable forms of present and past tense tailed classification of the types of MA found in of verbs. Russian has three tenses (present, past, the learner data. In section3 we describe our ap- future), of which past and future tenses can have proach, in particular, the procedure for generating perfective or imperfective aspect. artificial grammatical errors and creating test sets. In the next example, if both verbs are chosen 3 Section4 presents previous work on artificial er- for as exercises and the learner sees two lem- ror generation. Section5 describes our model and mas (”задержаться” and ”вернуться”), she has experimental setup. In section6 we discuss the re- a possibility to produce verb forms in any tense de- sults and error analysis, and conclude in section7. pending on the context beyond the given sentence. It may be narrative past tense or present tense, or 2 Multiple Admissibility: Problem the text may be a message where the communica- Overview tive goal of the speaker is to inform the reader about future events, so future tense is expected. We use data from several hundred registered stu- 4 dents learning Russian (and other languages), “Мы задержались(PST) у друзей и practicing with exercises based on texts, or an- вернулись(PST) домой поздно.” swering questions in test sessions. Currently, Re- “We were delayed with friends and returned vita does not provide a mode for written essays, home late.” because it does not check free-form answers. The following option may be acceptable: Except for cloze exercises, Revita also generates exercises where the users are asked to se- “Мы задержались(PST) у друзей и lect among various surface forms based on a given вернемся(FUT) домой поздно.” lemma—a correct surface form and a set of auto- “We were delayed with friends and matically generated distractors; or to type the word will return home late.” they hear. This allows us to collect a wide range The future cannot precede the past in this sen- of errors, though not all kinds of possible errors; tence, so the next variant answer is grammatically e.g., currently we do not track punctuation errors, incorrect: word order errors, insertion errors, errors resulting from choosing an incorrect lemma, etc. *“Мы задержимся(FUT) у друзей и We annotated 2884 answers from the Revita вернулись(PST) домой поздно.” database, which were automatically marked as “We will be delayed with friends and “not matching the original text.” This work returned home late.”5 was done by two annotators (90% agreement) Some cases are more difficult because the both with native-level competency in Russian, and choice of tense can depend on knowledge beyond background in linguistics and teaching Russian. the text: Among all annotated answers, 7.5% were actually correct, but differed from the original text. “Ученым удалось установить, что Also, we checked answers given by three students у минойцев существовало (PST) with C1 CEFR proficiency level in Russian (es- несколько видов письма.” tablished independently by their teachers); 15.8% 3If one of the verbs is chosen as an exercise, the user may of these answers were grammatically and seman- get a hint from the surface form of the other verb, that they tically valid in the context. Thus, for advanced should be coordinated. 4We use standard abbreviations from the Leipzig users, the problem of MA is twice as relevant as on Glossing Rules. average; we plan to investigate these results with a 5*Star is used to mark an incorrect sentence. 13 “Scientists were able to establish that the Mi- “Она была загадочной (INS) и noans had several types of writing systems.” непонятной (INS) для меня”. In a non-fictional narrative, only the past tense “She was mysterious and incomprehensible can be used, since the Minoans do not exist in the to me”. present. These examples show that deciding which Genitive/Accusative case: usage of genitive tenses are acceptable in the context is difficult. vs. accusative is a complex topic, beyond the Singular/Plural: Singular and plural nouns can scope of this paper. We mention a few examples be grammatically valid in the same context, if briefly, where MA can appear—denotation of a there is no dependency on words beyond the sen- part of the whole and negation. Usually the gen- tence boundaries. For instance: itive is used to denote a part of the whole. In the “Из последних разработокскафандр following example, usage of the accusative is in- для работы (SG.) в открытом космосе.” correct: “Из последних разработокскафандр “Пожалуйста, отрежь хлеба” (GEN) для работ (Pl.) в открытом космосе.” *“Пожалуйста, отрежь хлеб” (ACC) “From the latest developments—the space- suit for work in open space.” “Please, cut some bread.” Short/Full adjective:6 in many constructions, In some contexts both meanings are possible— short and full forms of adjectives can be used as a of a part and of the whole—resulting in MA: part of a compound predicate (Vinogradov, 1972). “Нам оставили хлеба и вина” (GEN) The difference between these is that the short form typically expresses temporal meaning and that it “We were left with some bread and wine.” is a phenomenon of literary language, whereas its full alternative form sounds more colloquial.

Multiple Admissibility: Judging Grammaticality Using Unlabeled Data in Language Learning

The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology

GRAMMATICALITY and UNGRAMMATICALITY of TRANSFORMATIVE GENERATIVE GRAMMAR : a STUDY of SYNTAX Oleh Nuryadi Dosen Program Studi Sa

Experimental Methods in Studying Child Language Acquisition Ben Ambridge∗ and Caroline F

Syntax Corrected

Grammaticality and Language Modelling

Theoretical and Methodological Perspectives on the Use of Grammaticality Judgment Tasks in Linguistic Theory

Judgements of Grammaticality in Aphasia: the Special Case of Chinese

Roots of Language

Grammaticality Judgements of Children with and Without Language Delay

Grammaticality, Paraphrase and Imagery

The History of the Concept of Grammaticalisation

Grammaticality Judgment in Aphasia: Deficits Are Not Specific to Syntactic Structures, Aphasic Syndromes, Or Lesion Sites