Exploiting Social Information in Grounded Language Learning Via

Exploiting Social Information in Grounded Language Learning via Grammatical Reductions Mark Johnson Katherine Demuth Department of Computing Department of Linguistics Macquarie University Macquarie University Sydney, Australia Sydney, Australia [email protected] [email protected] Michael Frank Department of Psychology Stanford University Stanford, California [email protected] Abstract grounded learning that learns a mapping between words and objects from a corpus of child-directed This paper uses an unsupervised model of utterances in a completely unsupervised fashion. It grounded language acquisition to study the exploits five different social cues, which indicate role that social cues play in language acqui- which object (if any) the child is looking at, which sition. The input to the model consists of (orthographically transcribed) child-directed ut- object the child is touching, etc. Our models learn terances accompanied by the set of objects the salience of each social cue in establishing refer- present in the non-linguistic context. Each ence, relative to their co-occurrence with objects that object is annotated by social cues, indicating are not being referred to. Thus, this work is consis- e.g., whether the caregiver is looking at or tent with a view of language acquisition in which touching the object. We show how to model children learn to learn, discovering organizing prin- the task of inferring which objects are be- ciples for how language is organized and used so- ing talked about (and which words refer to which objects) as standard grammatical in- cially (Baldwin, 1993; Hollich et al., 2000; Smith et ference, and describe PCFG-based unigram al., 2002). models and adaptor grammar-based colloca- We reduce the grounded learning task to a gram- tion models for the task. Exploiting social matical inference problem (Johnson et al., 2010; cues improves the performance of all mod- Borschinger¨ et al., 2011). The strings presented to els. Our models learn the relative importance our grammatical learner contain a prefix which en- of each social cue jointly with word-object mappings and collocation structure, consis- codes the objects and their social cues for each ut- tent with the idea that children could discover terance, and the rules of the grammar encode rela- the importance of particular social informa- tionships between these objects and specific words. tion sources during word learning. These rules permit every object to map to every word (including function words; i.e., there is no “stop word” list), and the learning process decides 1 Introduction which of these rules will have a non-trivial proba- From learning sounds to learning the meanings of bility (these encode the object-word mappings the words, social interactions are extremely important system has learned). for children’s early language acquisition (Baldwin, This reduction of grounded learning to grammat- 1993; Kuhl et al., 2003). For example, children who ical inference allows us to use standard grammati- engage in more joint attention (e.g. looking at par- cal inference procedures to learn our models. Here ticular objects together) with caregivers tend to learn we use the adaptor grammar package described in words faster (Carpenter et al., 1998). Yet compu- Johnson et al. (2007) and Johnson and Goldwater tational or formal models of social interaction are (2009) with “out of the box” default settings; no rare, and those that exist have rarely gone beyond parameter tuning whatsoever was done. Adaptor the stage of cue-weighting models. In order to study grammars are a framework for specifying hierarchi- the role that social cues play in language acquisition, cal non-parametric models that has been previously this paper presents a structured statistical model of used to model language acquisition (Johnson, 2008). 883 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 883–891, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics Social cue Value ing information: child.eyes objects child is looking at child.hands objects child is touching • the sequence of orthographic words uttered by mom.eyes objects care-giver is looking at the care-giver, mom.hands objects care-giver is touching • a set of available topics (i.e., objects in the non- mom.point objects care-giver is pointing to linguistic objects), • the values of the social cues, and Figure 1: The 5 social cues in the Frank et al. (to appear) • a set of intended topics, which the care-giver corpus. The value of a social cue for an utterance is a refers to. subset of the available topics (i.e., the objects in the non- linguistic context) of that utterance. Figure 2 presents this information for an example utterance. All of these but the intended topics are pro- vided to our learning algorithms; the intended top- A semanticist might argue that our view of refer- ics are used to evaluate the output produced by our ential mapping is flawed: full noun phrases (e.g., the learners. dog), rather than nouns, refer to specific objects, and Generally the intended topics consist of zero or nouns denote properties (e.g., dog denotes the prop- one elements from the available topics, but not al- erty of being a dog). Learning that a noun, e.g., dog, ways: it is possible for the caregiver to refer to two is part of a phrase used to refer to a specific dog (say, objects in a single utterance, or to refer to an object Fido) does not suffice to determine the noun’s mean- not in the current non-linguistic context (e.g., to a ing: the noun could denote a specific breed of dog, toy that has been put away). There is a considerable or animals in general. But learning word-object rela- amount of anaphora in this corpus, which our mod- tionships is a plausible first step for any learner: it is els currently ignore. often only the contrast between learned relationships Frank et al. (to appear) give extensive details on and novel relationships that allows children to in- the corpus, including inter-annotator reliability in- duce super- or sub-ordinate mappings (Clark, 1987). formation for all annotations, and provide detailed Nevertheless, in deference to such objections, we statistical analyses of the relationships between the call the object that a phrase containing a given noun various social cues, the available topics and the in- refers to the topic of that noun. (This is also appro- tended topics. That paper also gives instructions on priate, given that our models are specialisations of obtaining the corpus. topic models). Our models are intended as an “ideal learner” ap- 1.2 Previous work proach to early social language learning, attempt- There is a growing body of work on the role of social ing to weight the importance of social and structural cues in language acquisition. The language acqui- factors in the acquisition of word-object correspon- sition research community has long recognized the dences. From this perspective, the primary goal is importance of social cues for child language acqui- to investigate the relationships between acquisition sition (Baldwin, 1991; Carpenter et al., 1998; Kuhl tasks (Johnson, 2008; Johnson et al., 2010), looking et al., 2003). for synergies (areas of acquisition where attempting Siskind (1996) describes one of the first exam- two learning tasks jointly can provide gains in both) ples of a model that learns the relationship between as well as areas where information overlaps. words and topics, albeit in a non-statistical framework. Yu and Ballard (2007) describe an associative 1.1 A training corpus for social cues learner that associates words with topics and that Our work here uses a corpus of child-directed exploits prosodic as well as social cues. The rela- speech annotated with social cues, described in tive importance of the various social cues are spec- Frank et al. (to appear). The corpus consists ified a priori in their model (rather than learned, as of 4,763 orthographically-transcribed utterances of they are here), and unfortunately their training cor- caregivers to their pre-linguistic children (ages 6, 12, pus is not available. Frank et al. (2008) describes a and 18 months) during home visits where children Bayesian model that learns the relationship between played with a consistent set of toys. The sessions words and topics, but the version of their model that were video-taped, and each utterance was annotated included social cues presented a number of chal- with the five social cues described in Figure 1. lenges for inference. The unigram model we de- Each utterance in the corpus contains the follow- scribe below corresponds most closely to the Frank 884 .dog # .pig child.eyes mom.eyes mom.hands # ## wheres the piggie Figure 2: The photograph indicates non-linguistic context containing a (toy) pig and dog for the utterance Where’s the piggie?. Below that, we show the representation of this utterance that serves as the input to our models. The prefix (the portion of the string before the “##”) lists the available topics (i.e., the objects in the non-linguistic context) and their associated social cues (the cues for the pig are child.eyes, mom.eyes and mom.hands, while the dog is not associated with any social cues). The intended topic is the pig. The learner’s goals are to identify the utterance’s intended topic, and which words in the utterance are associated with which topic. Sentence Topic.pig Words.pig T.None Topic.pig Word.None Words.pig NotTopical.child.eyes T.pig Topic.None Word.None Words.pig NotTopical.child.hands Topical.child.eyes Word.pig NotTopical.mom.eyes Topical.child.hands NotTopical.mom.hands Topical.mom.eyes NotTopical.mom.point Topical.mom.hands Topical.mom.point .dog # .pig child.eyes mom.hands # ## wheres the piggie Figure 3: Sample parse generated by the Unigram PCFG.

Exploiting Social Information in Grounded Language Learning Via

AUTHOR a Longitudinal Study of the Language Development Of

Two-Word Utterances Chomsky's Influence

The Utterance, and Otherbasic Units for Second Language Discourse Analysis

Stochastic Language Generation in Dialogue Using Factored Language Models

Annotation of Utterances for Conversational Nonverbal Behaviors

Utterance Structure in Initial L2 Acquisition

An Analysis of Code Switching in the Novel 9 Summers 10 Autumns

The Semantics of Onomatopoeic Speech Act Verbs

Chapter 9. Utterances and Their Meanings

Utterance Production

The Neat Summary of Linguistics

Utterance-Level Multimodal Sentiment Analysis