Generating Code-Switched Text for Lexical Learning
Total Page:16
File Type:pdf, Size:1020Kb
Generating Code-switched Text for Lexical Learning Igor Labutov Hod Lipson Cornell University Cornell University [email protected] [email protected] Abstract nique relies on a phenomenon that elicits a nat- ural simulation of L1-like vocabulary learning in A vast majority of L1 vocabulary acqui- adults — significantly closer to L1 learning for L2 sition occurs through incidental learning learners than any model studied previously. By in- during reading (Nation, 2001; Schmitt et fusing foreign words into text in the learner’s na- al., 2001). We propose a probabilistic ap- tive tongue into low-surprisal contexts, the lexi- proach to generating code-mixed text as cal acquisition process is facilitated naturally and an L2 technique for increasing retention non-obtrusively. Incidentally, this phenomenon in adult lexical learning through reading. occurs “in the wild” and is termed code-switching Our model that takes as input a bilingual or code-mixing, and refers to the linguistic pattern dictionary and an English text, and gener- of bilingual speakers swapping words and phrases ates a code-switched text that optimizes a between two languages during speech. While this defined “learnability” metric by construct- phenomenon had received significant attention ing a factor graph over lexical mentions. from both a socio-linguistic (Milroy and Muysken, Using an artificial language vocabulary, 1995) and theoretical linguistic perspectives (Be- we evaluate a set of algorithms for gener- lazi et al., 1994; Bhatt, 1997) (including some ating code-switched text automatically by computational studies), only recently has it been presenting it to Mechanical Turk subjects hypothesizes that “code-switching” is a marking and measuring recall in a sentence com- of bilingual proficiency, rather than deficiency pletion task. (Genesee, 2001). 1 Introduction Until recently it was widely believed that inci- dental lexical acquisition through reading can only Today, an adult trying to learn a new language is occur for words that occur at sufficient density likely to embrace an age-old and widely accepted in a single text, so as to elicit the “noticing” ef- practice of learning vocabulary through curated fect needed for lexical acquisition to occur (Cobb, word lists and rote memorization. Yet, it is not 2007). Recent neurophysiological findings, how- uncommon to find yourself surrounded by speak- ever, indicate that even a single incidental expo- ers of a foreign language and instinctively pick up sure to a novel word in a sufficiently constrained words and phrases without ever seeing the defini- context is sufficient to trigger an early integra- tion in your native tongue. Hearing “pass le sale tion of the word in the brain’s semantic network please” at the dinner table from your in-laws vis- (Borovsky et al., 2012). iting from abroad, is unlikely to make you think An approach explored in this paper, and moti- twice about passing the salt. Humans are extraor- vated by the above findings, exploits “constrain- dinarily good at inferring meaning from context, ing” contexts in text to introduce novel words. A whether this context is your physical surround- state-of-the-art approach for generating such text ing, or the surrounding text in the paragraph of the is based on an expert annotator whose job is to word that you don’t yet understand. decide which words to “switch out” with novel Recently, a novel method of L2 language teach- foreign words (from hereon we will refer to the ing had been shown effective in improving adult “switched out” word as the source word and to the lexical acquisition rate and retention 1. This tech- “switched in” word as the target word). Conse- 1authors’ unpublished work quently the process is labor-intensive and leads to 562 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 562–571, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics a “one size fits all solution” that is insensitive to the disciplines of sociolinguistics, theoretical and the learner’s skill level or vocabulary proficiency. psycholinguistics and even literary and cultural This limitation is also cited in literature as a sig- studies (predominantly in the domain of Spanish- nificant roadblock to the widespread adaptation English code-switching) (Lipski, 2005). of graded reading series (Hill, 2008). A reading- Code-switching that occurs naturally in bilin- based tool that follows the same principle, i.e. by gual populations, and especially in children, has systematic exposure of a learner to an incremen- for a long time been considered a marking of tally more challenging text, will result in more ef- incompetency in the second language. A more fective learning (Lantolf and Appel, 1994). recent view on this phenomenon, however, sug- To address the above limitation, we develop an gests that due to the underlying syntactic com- approach for automatically generating such “code- plexity of code-switching, code-switching is ac- switched” text with an explicit goal of maximizing tually a marking of bilingual fluency (Genesee, the lexical acquisition rate in adults. Our method 2001). More recently, the idea of employing is based on a global optimization approach that code-switching in the classroom, in a form of incorporates a “knowledge model” of a user with conversation-based exercises, has attracted the the content of the text, to generate a sequence of attention of multiple researchers and educators lexical “switches”. To facilitate the selection of (Moodley, 2010; Macaro, 2005), yielding promis- “switch points”, we learn a discriminative model ing results in an elementary school study in South- for predicting switch point locations on a corpus Africa. that we collect for this purpose (and release to the community). Below is a high-level outline of this 2.2 Computational Approaches to paper. Code-switching We formalize our approach within a prob- Additionally, there has been a limited number • abilistic graphical model framework, infer- of studies of the computational approaches to ence in which yields “code-switched” text code-switching, and in particular code-switched that maximizes a surrogate to the acquisition text generation. Solorio and Liu (2008), record rate objective. and transcribe a corpus of Spanish-English code- mixed conversation to train a generative model We compare this global method to sev- • (Naive Bayes) for the task of predicting code- eral baseline techniques, including the strong switch points in conversation. Additionally they “high-frequency” baseline. test their trained model in its ability to generate code-switched text with convincing results. Build- We analyze the operating range in which • ing on their work, (Adel et al., 2012) employ ad- our model is effective and motivate the near- ditional features and a recurrent network language future extension of this approach with the model for modeling code-switching in conversa- proposed improvements. tional speech. Adel and collegues (2011) propose 2 Related Work a statistical machine translation-based approach for generating code-switched text. We note, how- Our proposed approach to the computational gen- ever, that the primary goal of these methods is in eration of code-switched text, for the purpose of the faithful modeling of the natural phenomenon L2 pedagogy, is influenced by a number of fields of code-switching in bilingual populations, and that studied aspects of this phenomenon from dis- not as a tool for language teaching. While useful tinct perspectives. In this section, we briefly de- in generating coherent, syntactically constrained scribe a motivation from the areas of socio- and code-switched texts in its own right, none of these psycho- linguistics and language pedagogy re- methods explicitly consider code-switching as a search that indicate the promise of this approach. vehicle for teaching language, and thus do not take on an optimization-based view with an ob- 2.1 Code-switching as a natural phenomenon jective of improving lexical acquisition through Code-switching (or code-mixing) is a widely stud- the reading of the generated text. More recently, ied phenomenon that received significant attention and concurrently with our work, Google’s Lan- over the course of the last three decades, across guage Immersion app employs the principle of 563 code-switching for language pedagogy, by gener- “noticing” effect — a prerequisite to lexical acqui- ating code-switched web content, and allowing its sition (Schmidt and Schmidt, 1995; Cobb, 2007). users to tune it to their skill level. It does not, how- ever, seem to model the user explicitly, nor is it 3 Model clear if it performs any optimization in generating 3.1 Overview the text, as no studies have been published to date. The formulation of our model is primarily moti- vated by two hypotheses that have been validated 2.3 Computational Approaches to Sentence experimentally in the cognitive science literature. Simplification We re-state these hypotheses in the language of Although not explicitly for teaching language, “surprisal”: computational approaches that facilitate accessi- 1. Inserting a target word into a low surprisal bility to texts that might otherwise be too difficult context increases the rate of that word’s inte- for its readers, either due to physical or learning gration into a learner’s lexicon. disabilities, or language barriers, are relevant. In the recent work of (Kauchak, 2013), for example 2. Multiple exposures to the word in low sur- demonstrates an approach to increasing readability prisal contexts increases rate of that word’s of texts by learning from unsimplified texts. Ap- integration. proaches in this area span methods for simplify- Hypothesis 1 is supported by evidence from ing lexis (Yatskar et al., 2010; Biran et al., 2011), (Borovsky et al., 2012; Frank et al., 2013), and hy- syntax (Siddharthan, 2006; Siddharthan et al., pothesis 2 is supported by evidence from (Schmidt 2004), discourse properties (Hutchinson, 2005), and Schmidt, 1995).