<<

Ad-hoc scalar in adults and children Alex Stiller Noah D. Goodman Michael C. Frank [email protected] [email protected] [email protected] Symbolic Systems Program Department of Psychology Department of Psychology Stanford University Stanford University! Stanford University ! ! Abstract ! ! ! "##! ! Linguistic communication relies on pragmatic ! ! $%&'! ! such as the inference that if “some students passed the test,” ! (%)'! ! ! not all did. Yet young children perform poorly on tests of implicature, especially scalar implicatures using “some” and ! ! ! ! “all,” until quite late in development. We investigate the ori- ! gins of scalar implicature using tasks in which the scale arises $*+#',! from real-world rather than conventional contrasts be- !"#$%&'$()*+* tween lexical items. Experiment 1 shows that these ad-hoc ! ! ! ! implicatures are easy for preschool children, suggesting that ! children have an early competence at pragmatic inference, and ! that failures in standard scalar implicature tasks are due in- ! stead to problems contrasting lexical items. Experiments 2 and 3 compare a Gricean, counterfactual account of implica- (%!$*+#',! ture with a linguistic alternatives account and find that neither !"#$%&'$(),** predicts effects of contextual informativeness. We conclude -*.(/*0* that an account of pragmatic implicature must integrate world ! ! ! ! knowledge, linguistic structure, and social reasoning. ! Keywords: Scalar implicature; ; language acquisi- Figure 1: Example stimuli from our ad-hoc scalar implicature tion. ! task. The utterance “My friend has glasses” receives different Introduction interpretations when the context given to the listener is Row 1 versus Row 2. Each has a similar logical structure to the Sometimes the absence of a description says just as much as conventional some-not-all implicature (top). its presence. A professor who says “some students passed the test” implies that some students failed—if all had passed, a cooperative speaker would have made the stronger statement “all students passed.” Scalar implicature refers to the conver- sational shorthand of using weak terms to imply the analysis, the relevant scale arises from the logical structure of stronger ones that lie along the same “scale.” In this pa- of the possible statements that could have, counterfactually, per we investigate the origins of scalar implicature, and the been uttered. Although neo-Gricean accounts have modified nature of scales, by investigating a spectrum of tasks that are some parts of this basic inferential mechanism, the general logically equivalent to conventional scalar implicature but in predictions remain (Levinson, 2000). In response to appar- which the scale arises (or fails to arise) from the real-world ent over-prediction of implicatures by the counterfactual the- context rather than the lexical items—ad-hoc implicatures. ory, the linguistic-alternatives theory claims that implicatures Implicatures surface in a variety of contexts beyond arise by a process in which a statement is strengthened by the case of quantifiers, including modal operators such as negating the alternative statements—where the alternatives, “might” and “must” (Noveck, 2001), inclusive and exclusive and hence scales, derive from the lexical and grammatical disjunction (Braine & Rumain, 1981), and numerals (Barner structure of language; importantly, more complex statements & Bachrach, 2010). A wide variety of theoretical frameworks are not taken to be alternatives (Fox, 2007; Chierchia et al., have been proposed to explain implicature, with the two most 2008). influential being (1) Gricean approaches that we will collec- Consider the three situations shown schematically in Fig- tively call the counterfactual theories (Grice, 1975; Levinson, ure 1. At the top, the word “all” is logically stronger than 2000) and (2) views based on grammatically computed lin- the word “some”, though some applies whenever all does. guistic alternatives (Fox, 2007; Chierchia, Fox, & Spector, There is thus a natural scale of informativeness set up by the 2008). conventional semantic content of the words. In , the Grice (1975) offers two maxims from which scalar im- feature words “glasses” and “top hat” have no conventional plicatures are meant to follow: make your contribution as ordering, but in the context of the three faces in the middle informative as is required, and do not make your contribu- row (“scales” condition), top hat is similarly stronger than tion more informative than is required. From these it fol- glasses, though glasses applies to any object that top hat does. lows that any alternative statement which is more informative If a speaker says “the one with glasses” we may draw the im- than the spoken statement must be false—because the speaker plicature that she means the middle face (an intuition which could have said that statement had it been true. Under this we test in Experiment 1)—the situation itself seems to set up the scale from which we can draw an implicature. However, this intuition is significantly weaker for the bottom row (“no scales” condition), despite an identical logical structure (an Methods intuition we test in Experiment 2). Participants Data were collected from 12 3–4 year-olds The acquisition of scalar implicature is late: children fail (M=3;6) and 12 4–5 year-olds (M = 4;5) at Stanford’s Bing overwhelmingly at scalar pragmatic tasks where adults suc- Nursery School. Twenty-four adult participants were re- ceed (Papafragou & Musolino, 2003). Smith (1980), in one cruited via Amazon’s Mechanical Turk web-based crowd- of the earliest acquisitional investigations of scalar pragmatic sourcing platform. abilities, found that children with syntactic mastery of quan- Stimuli We constructed pictorial stimuli for four trials— tifiers such as “some” and “none” still failed to make the faces, houses, pasta, and beds. For each inference trial, two some/not-all implicature. Noveck (2001) found that 87% of properties varied (e.g., presence of glasses and top hat for the children accepted statements such as “Some elephants have “face” trial). In each trial, three objects were presented—one trunks” whereas only 41% of adults did. Huang and Snedeker with neither feature (Distractor), one with exactly one fea- (2009) replicated these findings, observing that children be- ture (One-feature), and one with both features (Two-feature). tween ages five and nine construe weak scalar statements log- Positions of the three objects were counterbalanced over six ically, while adults interpret such statements pragmatically. orders. Example stimuli are shown in Figure 1. Two unam- We use the ad-hoc implicature setting to ask whether devel- biguous filler trials were constructed: in these, participants opmental delays in success at scalar implicature tasks reflect were asked to pick a car of a particular color and a fruit of a inability to draw pragmatic inferences at all, or inability to ac- particular type. cess the scales inherent in the conventional of some words. Procedures In each trial (inference and filler), preschoolers were presented with three alternatives shown on laminated Thus, the goal of the current experiments was to explore cards; adults performed a parallel task, picking alternatives two related questions. The first was whether children younger by clicking on corresponding radio buttons in a webpage. The than those tested in standard linguistic scalar implicature cover story involved a puppet, Furble, who asked the partici- tasks would be able to succeed in ad-hoc implicatures (Ex- pants to help identify various people and objects. For exam- periment 1). The second was to explore the roots of “scales” ple, in the “house” trial Furble said “My house has a flower in pragmatic inference, and the ability of either simple coun- outside. Can you show me my house?” The adult version terfactual or simple linguistic-alternatives theories to explain involved the same script, with a picture of Furble substitut- scalar implicature in general (Experiments 2 and 3). ing for the puppet. On each experimental trial Furble used a Taken together, the results of our studies suggest that the description that could apply to either the One-feature or the inferential mechanisms underlying scalar implicature may be Two-feature object (e.g. “My friend has glasses” in Figure 1, present earlier in development than previously assumed, and middle). Adults were informed that the task was designed for that the scales involved in implicature derive from world- children. knowledge, such as the base-rates of different properties, rather than logical structure of the context or conventional Results and Discussion linguistic knowledge alone. Our data rule out the simplest All three groups performed above chance, selecting the One- version of both the Gricean counterfactual theory and the lin- feature item for which a more informative description was guistic alternatives theory. Instead, they point the way to- not available (rather than the Two-feature item, with an al- wards an account in which linguistic and social factors are ternative, unique description or the Distractor). Means and integrated probabilistically with world knowledge. confidence intervals are shown in Figure 2. We analyzed these results using a logistic mixed-effects model (Gelman Experiment 1 & Hill, 2007), predicting correct performance as a function of age group, with crossed random effects of participant and Experiment 1 compares the performance of adults and chil- item. Coefficient estimates from this model are shown in Ta- dren at an ad-hoc implicature task in which a pragmatic in- ble 1. Adults were reliably more accurate than the children, ference derives from an explicit context. We constructed a and there is no difference between three- and four-year-olds. paradigm in which participants heard a sentence whose lit- Note that children never chose the logically incorrect answer eral meaning was ambiguous between two referents. Though (the Distractor), so we treat chance as 50% rather than 33%. no conventional scale existed among the possible descriptions Previous work has suggested that scalar implicature is dif- of these objects (e.g. “has glasses”, “has top hat”), they var- ficult for children until quite late in development (Noveck, ied along a contextually salient scale (Figure 1, middle). If 2001). Nevertheless three- and four-year-olds performed reli- participants were competent at pragmatic inference we ex- ably better than chance in our ad-hoc implicature task which pect them to succeed on this task, even though they may fail has a similar logical structure to conventional scalar impli- at an equivalent task in which the scale depends on lexical catures (see Figure 1). This result suggests that children’s knowledge. difficulties in scalar implicature tasks may not be caused by iinllnusi rwrdkolde nEprmn we 2 Experiment In ad- involve world-knowledge. or or objects, of linguistic context ditional immediate 1 the Experiment by in given implicature to are lead which ask scales we 3) even the Experiment whether later 1, (and conventional 2 a Experiment Experiment draw In to in implicature? fail would implicature who robust children young a among to leads What literal how see given” to instructions.” than is information experiment logi- more “the to] have and is that task items di- “[the separate study logical/pragmatic were cally responses the this typical or think Two scales you chotomy. mentioned did “What about?” responses question was the debriefing of our (29%) may seven to performance strategies: adults’ explicit However reflect also pragmatic time. growing over reflecting competence possibly children, performed & Bale, Barner, 2010). 2010; Brooks, ex- Bachrach, & made (Barner are available alternatives plicitly is scalar This when under- quantifiers their tasks. of strengthen standing children such that in findings used with items issues consistent lexical by particular instead the but inferences, with pragmatic draw to inability signifi- are groups ever All child chance. distractor. no above cantly false because logically items, the three chose were (50%) chance there bars represents though Error line even Dashed 1. bootstrap. non- Experiment subject-wise parametric a in by inference created groups intervals on confidence age 95% show performance three all correct for percent trials Mean 2: Figure Percent Correct huhcide ucee torts,aut tl out- still adults task, our at succeeded children Though

0 20 40 60 80 100 3 years xeiet2 Experiment Age Group 4 years [sic] adult n ae the takes one fcetetmtsaegvni al .Teitretreflects intercept The Co- 2. Table 2. in Experiment given in are group estimates new efficient 1 the Experiment of performance in the performance logistic with mixed adults’ used compare again We to 3. regression compared Figure are in 1 Results Experiment draw chance. with not at did performing participants implicature, adult an condition, scales” “no this In Discussion and Results 1. Experiment answer Procedures the of orders. six position over the counterbalanced and were one stimulus choices as target well The as feature one. old the pre- new had that now item feature features stimulus one no the had viously while had features, previously novel that two bot- had item thus 1, positive stimulus Figure The distinct see features; two row. of tom of absence the addition replace the to features with 1, those Experiment to parallel in constructed were stimuli separate four beds), Stimuli HIT. a paid completing were for Participants gave each trials. $0.20 that filler two participants the from on answers responses correct 24 included and Turk Participants “has Methods to alternative an isn’t X” have X”). not “does complex, therefore linguistically doesn’t and more one are if which least see (e.g. alternatives at to knowledge consider is linguistic expect effect by not the driven do if partially We however, chance, face. above other haveperformance the must they about that talking indicates been instead hat”— “top “glasses” said said have they would they that glasses, and face hat the about top talk the to with wanted speaker a if This pure inference: 1. a pragmatic Experiment from in the prediction did the find we be should as would 2 we Experiment then in effect alternatives, same logical the by entirely re- row). is bottom feature 1, a (Figure of feature absence context alternate the an the with which to placed in equivalent but logically 1, is Experiment which of context a up set groups. between contrast significant a tests is as there the interpreted whether be In of can adults coef- and the year-olds thus 4–5 group. intercept, for the ficients age as coded by are year-olds 3–4 regres- performance model, logistic predicting mixed model a from sion estimates Coefficient 1: Table – er .305 .20.41 3.73 0.65 0.82 0.52 0.01 2.41 2.50 0.43 0.35 Adults 0.87 years 4–5 yrs) (3–4 Intercept fteifrnilefc on nEprmn sderived is 1 Experiment in found effect inferential the If o aho h ortil fcs oss at,and pasta, houses, (faces, trials four the of each For usinpopswr dnia otoein those to identical were prompts Question epse 8HT nAao’ Mechanical Amazon’s on HITS 28 posted We of t.Error Std. Coef. counterfactual p z hoyof theory < 0.001 ( | z | ) a o a”ad“yfin a aealcp r negated, are cap” baseball a has friend “my and linguistically hat” is top a condition this has scales” because “no the included For not not complex. is does more hat”, hat” friend top top “my a alternative a has the have friend of negation “my the alternative however strength- the is negating glasses” “scales” by has the ened friend for “my 2: statement and the 1 condition, Experiments between disparity the language bear. to about ad- brought or that be world must suggest the about and either implicature counterfac- knowledge pure ad-hoc ditional a this pragmatic on of a doubt account make cast tual to data adults these lead Thus, to inference. enough not is immedi- the context of ate structure logical the that suggests experiments about. was study the what stated idea simply no participants had they Many that study.” some personality was “selecting of “It to kind from similarity” significant ranged content. of objects question pragmatic two any between debriefing had the task to the Answers that unaware seemed 2 1). (Experiment coef- condition significant “scales” highly the a for was ficient there level contrast, chance In (indicating zero responding). from and significantly 2) differ (Experiment not condition did scales” “no the in performance (50%). chance represents bootstrap. line non-parametric Dashed subject-wise a confidence 95% by show created bars intervals Error Experiment 2. for Experiment and performance (adults) correct 1 percent Mean 3: Figure h igitcatraie hoyfrsbte nexplaining in better fares theory alternatives linguistic The two these between performance in difference dramatic The Experiment in participants 1, Experiment with contrast In

Percent Correct

0 20 40 60 80 100 no scales Condition scales both m friend “my bu epewt o assol eto hi as Con- hats. their talk mention to want should who hats speakers top rare, with are people hats about top If way: Put another correspondingly. increase rarity should as implicatures then implicature), increases, sets allow informativeness orderings if (and that orderings parametri- the predict up to We to glasses rarity. pre-exposure their and vary a hat cally top use like We features of condition. distribution succeed scales” to “no participants the allow in will features of formativeness nat- a up setting assumption rarity Experi- scale. this informativeness in to ural inference due be “scales” infor- may the 1 more of ment is strength feature The a reasoning pervasive “errors”). of explained (which presence absence its the than fea- mative and assume rare, people are that argued tures task, (1996) selection Wason Chater the and of analysis Oaksford Experi- influential in an scale in ad-hoc 1: the ment about reasoning in participants of (because informative). mohawk” hence a and has rare are friend mohawks “my say bear might “my don’t ex- but saying they so by principle mention) legs, person has This everyone a (because out legs” feature. has pick friend this never has would we object why it an informative plains that more the note alter- is, to feature linguistic is a pure it rarer the 3, The and test theory: 2 to natives Experiments manipulation of a simple details suggest this the does that all expect capture not about will do talked model we being Although is context. object a which within about information more conveys of literally bits expression informative an information that via theory: informativeness defined (2009) Tenenbaum not but descriptions. per- -knowledge improve altering additional to providing possible be by should formance it then descriptions from sible derives scale a if contrast, In descriptions. “no available it the the then changing in without succeed 2, condition participants scales” and explana- make 1 to complete impossible Experiments be a between should difference is the theory of alternatives tion linguistic the If top like features of glasses. distribution and knowl- the hat world in participants’ changes through further manipulating edge we by 3 theory Experiment this In test performance. chance to leading (“scales”). 1 Experiment the for as fit coded is is coefficient scales”) a (“no while intercept, 2 Ex- Experiment in 2. adults and 1 by regres- periments performance logistic judgment mixed adult a predicting from sion estimates Coefficient 2: Table xeiet3tsstehptei htmnpltn h in- the manipulating that hypothesis the tests 3 Experiment success the explain also can account informativeness This and Lai, Goodman, Frank, in described model simple The cls(1 .305 6.37 0.09 0.55 -1.69 0.23 3.53 -0.38 (E1) Scales (E2) Intercept of t.Error Std. Coef. xeiet3 Experiment informativeness p z < ftepos- the of 0.001 ( | z | ) versely, if only the structure of alternative descriptions matter 100 (as in the linguistic alternatives account described above for

Experiment 2), then no effect of rarity should be found. 80 Methods

Participants We posted 344 total HITS to Amazon’s Me- 60 chanical Turk and received 216 responses (24 per condition) in which participants successfully answered the manipulation 40 check trials (described below) and both filler trials. Partici- Percent Correct pants were compensated $0.20 for completing a HIT.

Stimuli The test stimuli were identical to those in Experi- 20 ment 2. Familiarization sets of 10 images were constructed, systematically varying the relative frequency of the features. The frequency distribution of objects in the nine conditions is 0 shown at the bottom of Figure 4. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Procedures Experiment 3 was identical to Experiment 2, Frequency of 'Top Hat' During Training except that participants viewed a context set that included 10 | | | | | | | | | images immediately before completing each question as in 8 7 5 4 3 2 1 1 0 Experiments 1 and 2. They were told that e.g. “In Furble’s 1 1 2 2 2 2 2 1 1 world there are lots of houses. Here’s a picture of the houses.”

Participants were grouped into distinct between-subjects con- 1 1 2 2 2 2 2 1 1 ditions such that each group saw a different distribution of 0 1 1 2 3 4 5 7 8 objects in the context set. Immediately below each context set, participants were asked to select the most frequent ob- Times Seen in Training Phase by Stimulus ject in the set to ensure that they attended to this phase. This question was used as a manipulation check to ensure that par- ticipants were paying attention. Figure 4: Mean percent correct performance on inference tri- als for Experiment 1; each point represents a separate con- Results and Discussion dition. The horizontal axis shows the relative frequency of “top hat” (the non-named property) in the context trials; the There was a strong relationship between the frequency of the diagram below shows the number of each object in the famil- non-target feature (top hat) and the proportion of participants iarization phase. Error bars show 95% confidence intervals making implicatures. Results are plotted in Figure 4. We an- by bootstrap, dashed line represents chance. alyzed the data with a mixed linear model, reported in Table 3. The results show that the rarer the omitted portion of an implicature is, the more informative is its omission. This finding supports the hypothesis outlined above: that significance of statistics, and whether or not their understand- the rarer a feature is, the more informative it is and hence the ing of statistics affects their answers to the questions.” Nev- more likely a speaker would be to mention it to pick out a ref- ertheless, it did not seem to be the case that participants were erent. Conversely, when a certain feature is seen as normal, applying simple heuristics related to the presence or absence it no longer needs to be included in an informative descrip- of features or their conjunctions. For example, the condition tion. Thus, a failure to mention a feature like “top hat” in a with no face with both a top hat and glasses together was no world where top hats are rare strongly implies that the speaker different than the one that included one example of this pair does not want to refer to the face with the top hat. Crucially, (at .1 and .2 on the horizontal axis in Figure 4, respectively), this result rules out a simple linguistic alternatives account so the absence of the conjunction did not seem to be critical (as described above): In this experiment, neither the avail- in participants’ responses. able linguistic descriptions nor their complexity changed as the feature base-rates varied, yet pragmatic inferences varied General Discussion strikingly. In three experiments we have investigate the nature of scalar Participants seemed somewhat aware of the relevant factors implicatures using what we have called ad-hoc scales—scales underlying Experiment 3, with common debriefing responses constructed from contextual, rather than conventional linguis- taking the form of “you wanted to see if people naturally grav- tic factors. In contrast to standard scalar implicatures (which itate to what is most common” and “the object of this study are difficult for children below the age of 5), even three-year- is to determine whether or not the participant understands the olds were able to use ad-hoc scales to disambiguate the refer- Acknowledgments Table 3: Mixed linear model fitting observations in Experi- ment 3 to frequency of properties during training. Special thanks to David Barner, Irene Heim, Danny Fox, and Allison Kraus for valuable discussion. Thanks to the staff, Coef. Std. Error z p(|z|) teachers, and children at Bing Nursery School for help with Intercept -2.46 0.29 -8.73 <0.001 Experiment 1. “Top hat” freq. -4.42 0.49 -9.07 <0.001 References ent of a logically ambiguous expression (Experiment 1). This Barner, D., & Bachrach, A. (2010). Inference and exact results suggests that the inferential mechanisms underlying numerical representation in early language development. Cognitive Psychology 60 implicature are present in young children, but, congruent with , (1), 40 - 62. previous work, children have difficulty construing quantifiers Barner, D., Bale, A., & Brooks, N. (2010). Quantity impli- as alternatives in a scale (Barner et al., 2010). cature and access to scalar alternatives in language acquisi- tion. Experiments 2 and 3 contrasted two different theories of Braine, D., & Rumain, B. (1981). Development of compre- the nature of implicature: counterfactual and linguistic al- hension of ”or”: Evidence for a sequence of competencies. ternatives theories. The Gricean, counterfactual theory pre- Journal of Experimental Child Psychology, 31(1), 46-70. dicted that ad-hoc scalar implicatures of the sort described in Chierchia, G., Fox, D., & Spector, B. (2008). The grammati- Experiment 1 would be possible across a range of contexts, cal view of scalar implicatures and the relationship between even when a featural contrast replaced a contrast between a semantics and pragmatics. Handbook of Semantics. Mou- feature and nothing. The linguistic alternatives theory pre- ton de Gruyter, New York, NY. dicted that the negation of a feature (“no top hat”) would be Clark, H. (1996). Using language. Computational Linguis- more complex than an alternative feature (“baseball cap”) and tics, 23(4). hence the implicature would be possible in the “scales” condi- Fox, D. (2007). Free choice and the theory of scalar impli- tion, but not in the equivalent “no scales” condition. Consis- catures. and implicature in compositional tent with the linguistic alternatives theory, participants were semantics, 71–120. at chance in the “no scales” condition (Experiment 2). Frank, M., Goodman, N., Lai, P., & Tenenbaum, J. (2009). Experiment 3 then tested the linguistic alternatives theory Informative communication in word production and word more stringently. When participants were trained that cer- learning. tain properties were commonplace and others were informa- Gelman, A., & Hill, J. (2007). Data analysis using regression tive, they drew strong pragmatic inferences, just as in the and multilevel/hierarchical models (Vol. 625). Cambridge “scales” condition. Conversely, when exposed to a distribu- University Press Cambridge. tion in which feature frequencies were flipped, participants Grice, H. (1975). Logic and conversation. In P. Cole & made the reverse inferences, as if they had established a scale J. Morgan (Eds.), Syntax and semantics (Vol. 3). New York: in the opposite direction. Statistical information about prop- Academic Press. erties closed the gap between the “scales” and “no scales” Huang, Y., & Snedeker, J. (2009). Semantic meaning conditions. These data are inconsistent with a simple linguis- and pragmatic interpretation in 5-year-olds: Evidence from tic alternatives theory: changing the contextual base rates— real-time spoken language comprehension. Developmental without changing the complexity of linguistic alternatives— Psychology, 45(6), 1723-1729. was enough to invert participants’ judgments. Levinson, S. (2000). Presumptive meanings: The theory Taken together, these findings support a statistical linguis- of generalized conversational implicature. Boston: MIT tic account of scalar pragmatics in our ad-hoc task. Adults Press. and children succeeded at the “scales” condition by relying, Noveck, I. A. (2001). When children are more logical than we suspect, on the real-world knowledge that possessing a adults: experimental investigations of scalar implicature. feature (e.g. a top hat) is less common than not possessing Cognition, 78(2), 165 - 188. that feature. This statistical information about the rarity and Oaksford, M., & Chater, N. (1996). Rational explanation of informativeness exemplifies what Sperber and Wilson (1986) the selection task. call “shared knowledge” and Clark (1996) calls “common Papafragou, A., & Musolino, J. (2003). Scalar implicatures: ground.” A theory of implicature must integrate fine-grained experiments at the semantics-pragmatics interface. Cogni- statistical information about such shared context to capture tion, 86(3), 253 - 282. the effects reported here. Smith, C. L. (1980). Quantifiers and question answering in Pragmatic computations operate over our knowledge about young children. Journal of Experimental Child Psychol- the world, our knowledge of language, and our knowledge ogy, 30(2), 191 - 205. of other people. The research reported here takes a first step Sperber, D., & Wilson, D. (1986). Relevance: Communica- towards understanding and manipulating the complex depen- tion and cognition. Cambridge, Mass.: Harvard University dencies that go into the computation of pragmatic implica- Press. tures.