Weak Semantic Context Helps Phonetic Learning in a Model of Infant Language Acquisition

Weak semantic context helps phonetic learning in a model of infant language acquisition Stella Frank Naomi H. Feldman Sharon Goldwater [email protected] [email protected] [email protected] ILCC, School of Informatics Department of Linguistics ILCC, School of Informatics University of Edinburgh University of Maryland University of Edinburgh Edinburgh, EH8 9AB, UK College Park, MD, 20742, USA Edinburgh, EH8 9AB, UK Abstract Feldman et al., 2013a; McMurray et al., 2009; Val- labha et al., 2007). Learning phonetic categories is one of the Models without any semantic information are first steps to learning a language, yet is hard likely to underestimate infants’ ability to learn pho- to do using only distributional phonetic in- netic categories. Infants learn language in the wild, formation. Semantics could potentially be and quickly attune to the fact that words have (pos- useful, since words with different mean- sibly unknown) meanings. The extent of infants’ ings have distinct phonetics, but it is un- semantic knowledge is not yet known, but existing clear how many word meanings are known evidence shows that six-month-olds can associate to infants learning phonetic categories. We some words with their referents (Bergelson and show that attending to a weaker source of Swingley, 2012; Tincoff and Jusczyk, 1999, 2012), semantics, in the form of a distribution over leverage non-acoustic contexts such as objects or ar- topics in the current context, can lead to ticulations to distinguish similar sounds (Teinonen improvements in phonetic category learn- et al., 2008; Yeung and Werker, 2009), and map ing. In our model, an extension of a pre- meaning (in the form of objects or images) to new vious model of joint word-form and pho- word-forms in some laboratory settings (Friedrich netic category inference, the probability of and Friederici, 2011; Gogate and Bahrick, 2001; word-forms is topic-dependent, enabling Shukla et al., 2011). These findings indicate that the model to find significantly better pho- young infants are sensitive to co-occurrences be- netic vowel categories and word-forms than tween linguistic stimuli and at least some aspects a model with no semantic knowledge. of the world. In this paper we explore the potential contribu- 1 Introduction tion of semantic information to phonetic learning Infants begin learning the phonetic categories of by formalizing a model in which learners attend to their native language in their first year (Kuhl et al., the word-level context in which phones appear (as 1992; Polka and Werker, 1994; Werker and Tees, in the lexical-phonetic learning model of Feldman 1984). In theory, semantic information could offer et al., 2013a) and also to the situations in which a valuable cue for phoneme induction1 by helping word-forms are used. The modeled situations con- infants distinguish between minimal pairs, as lin- sist of combinations of categories of salient ac- guists do (Trubetzkoy, 1939). However, due to a tivities or objects, similar to the activity contexts widespread assumption that infants do not know the explored by Roy et al. (2012), e.g.,‘getting dressed’ meanings of many words at the age when they are or ‘eating breakfast’. We assume that child learn- learning phonetic categories (see Swingley, 2009 ers are able to infer a representation of the situ- for a review), most recent models of early phonetic ational context from their non-linguistic environ- category acquisition have explored the phonetic ment. However, in our simulations we approximate learning problem in the absence of semantic infor- the environmental information by running a topic mation (de Boer and Kuhl, 2003; Dillon et al., 2013; model (Blei et al., 2003) over a corpus of child- directed speech to infer a topic distribution for each 1The models in this paper do not distinguish between pho- situation. These topic distributions are then used as netic and phonemic categories, since they do not capture input to our model to represent situational contexts. phonological processes (and there are also none present in our synthetic data). We thus use the terms interchangeably. The situational information in our model is simi- 1073 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1073–1083, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics lar to that assumed by theories of cross-situational word learning (Frank et al., 2009; Smith and Yu, 200 2008; Yu and Smith, 2007), but our model does not 400 iy uw require learners to map individual words to their ref- ih oo ei er oa erents. Even in the absence of word-meaning map- 600 ae pings, situational information is potentially useful eh uh aw because similar-sounding words uttered in similar F1 800 situations are more likely to be tokens of the same ah lexeme (containing the same phones) than similar- 1000 sounding words uttered in different situations. In simulations of vowel learning, inspired by 1200 Vallabha et al. (2007) and Feldman et al. (2013a), 3500 3000 2500 2000 1500 1000 500 F2 we show a clear improvement over previous models in both phonetic and lexical (word-form) cate- Figure 1: The English vowel space (generated from gorization when situational context is used as an Hillenbrand et al. (1995), see Section 6.2), plotted additional source of information. This improve- using the first two formants. ment is especially noticeable when the word-level context is providing less information, arguably the more realistic setting. These results demonstrate improved phonetic learning. that relying on situational co-occurrence can im- Our own Topic-Lexical-Distributional (TLD) prove phonetic learning, even if learners do not yet model extends the LD model to include an addi- know the meanings of individual words. tional type of context: the situations in which words appear. To motivate this extension and clarify the 2 Background and overview of models differences between the models, we now provide a high-level overview of both models; details are Infants attend to distributional characteristics of given in Sections 3 and 4. their input (Maye et al., 2002, 2008), leading to the hypothesis that phonetic categories could be acquired on the basis of bottom-up distributional 2.1 Overview of LD model learning alone (de Boer and Kuhl, 2003; Vallabha Both the LD and TLD models are computational- et al., 2007; McMurray et al., 2009). However, this level models of phonetic (specifically, vowel) cat- would require sound categories to be well sepa- egorization where phones (vowels) are presented rated, which often is not the case—for example, to the model in the context of words.2 The task is see Figure 1, which shows the English vowel space to infer a set of phonetic categories and a set of that is the focus of this paper. lexical items on the basis of the data observed for Recent work has investigated whether infants each word token xi. In the original LD model, the could overcome such distributional ambiguity by observations for token xi are its frame fi, which incorporating top-down information, in particular, consists of a list of consonants and slots for vowels, the fact that phones appear within words. At six and the list of vowel tokens wi. (The TLD model months, infants begin to recognize word-forms includes additional observations, described below.) such as their name and other frequently occurring A single vowel token, wij, is a two dimensional words (Mandel et al., 1995; Jusczyk and Hohne, vector representing the first two formants (peaks 1997), without necessarily linking a meaning to in the frequency spectrum, ordered from lowest to these forms. This “protolexicon” can help differen- highest). For example, a token of the word kitty tiate phonetic categories by adding word contexts would have the frame fi = k t , containing two in which certain sound categories appear (Swingley, consonant phones, /k/ and /t/, with two vowel phone 2009; Feldman et al., 2013b). To explore this idea slots in between, and two vowel formant vectors, further, Feldman et al. (2013a) implemented the Lexical-Distributional (LD) model, which jointly 2For a related model that also tackles the word segmenta- learns a set of phonetic vowel categories and a set tion problem, see Elsner et al. (2013). In a model of phonological learning, Fourtassi and Dupoux (submitted) show that of word-forms containing those categories. Simula- semantic context information similar to that used here remains tions showed that the use of lexical context greatly useful despite segmentation errors. 1074 3 wi0 = [464, 2294] and wi1 = [412, 2760]. 2.2 Overview of TLD model Given the data, the model must assign each To demonstrate the benefit of situational informa- vowel token to a vowel category, wij = c. Both tion, we develop the Topic-Lexical-Distributional the LD and the TLD models do this using inter- (TLD) model, which extends the LD model by as- mediate lexemes, `, which contain vowel category suming that words appear in situations analogous assignments, v`j = c, as well as a frame f`. If a to documents in a topic model. Each situation h word token is assigned to a lexeme, xi = `, the is associated with a mixture of topics θh, which is vowels within the word are assigned to that lex- assumed to be observed. Thus, for the ith token in 4 eme’s vowel categories, wij = v = c. The word `j situation h, denoted xhi, the observed data will be and lexeme frames must match, fi = f . ` its frame fhi, vowels whi, and topic vector θh. Lexical information helps with phonetic catego- From an acquisition perspective, the observed rization because it can disambiguate highly over- topic distribution represents the child’s knowledge lapping categories, such as the ae and eh categories of the context of the interaction: she can distin- in Figure 1.

Load more