The -to- mapping in real-time language production: A view from psych verbs Monica L. Do ([email protected]) Elsi Kaiser ([email protected]) LSA 2019 January 6, 2019

Thanks to: Ana Besserman (USC) USC Russell Endowed Fellowship How does production work?

• Production is Incremental: Only some parts of our sentences are planned before speaking. The rest is planned on the fly!! [1]

[1] Levelt, 1989

2 How does production work?

Production is Multi-Stage:[1] Message Formulation: Pre- linguistic conceptual representation about events The the are formed mailman <>is dog. Linguistic Encoding: chasing Concepts/Events become structured linguistic representations Phonological Encoding: Representations outfitted with sound representations Articulation: Sound representations translated for motor systems [1] Levelt, 1989

3 How does linguistic encoding work?

• Encoding is hierarchical: We do not encode our messages simply following the linear word order of the sentence[1] • What kind of hierarchical structure do we use to linguistically encode our sentences? Syntactic Structure BUT: Thematic Structure In active sentences, these <> accounts <> make the same Start with the SUBJECT predictions Start with the AGENT of of the sentence the sentence

[1] Do & Kaiser, 2018, JML; Griffin and Bock, 2000; Lee et al., 2014

4 PSYCHOLOGICAL SCIENCE

Zenzi M. Griffin and Kathryn Bock that is responsible for lexical encoding is incremental (e.g., Bock, 1982; Kempen & Hoenkamp, 1987; Lindsley, 1975), eye movements should indicate that word selection and execution overlap. In contrast, if only phonological encoding is incremental (Meyer, 1996), there PSYCHOLOGICAL SCIENCE should be evidence that all words are selected before speech begins. Zenzi M. Griffin and Kathryn Bock METHOD that is responsible for lexical encoding is incremental (e.g., Bock, Participants 1982; Kempen & Hoenkamp, 1987; Lindsley, 1975), eye movements should indicate that word selection and execution overlap. In contrast, Participants were native speakersif of only American phonological English encoding ages 18 is to incremental (Meyer, 1996), there 30 years. They were recruited from theshould University be evidence of Illinois that all commu- words are selected before speech begins. nity. All participants reported normal or corrected-to-normal vision. They received $5 or credit in introductory psychology courses. There were 20 participants in the extemporaneous-speech condition,METHOD 12 in the prepared-speech condition, 8 in the detection condition, and 8 in the inspection condition. Participants

Participants were native speakers of American English ages 18 to Apparatus 30 years. They were recruited from the University of Illinois commu- nity. All participants reported normal or corrected-to-normal vision. A binocular EyeLink head-mountedThey eyetracker received $5(SR or Research credit in Ltd.) introductory psychology courses. There controlled by a Gateway 2000 P5-120were computer 20 participants recorded in eye the move- extemporaneous-speech condition, 12 in One Potential Solution: Passives ments with a temporal resolution ofthe 4 prepared-speech ms and spatial condition, resolution 8 of in the detection condition, and 8 in Active: approximately 0.5°. Another Gatewaythe inspection computer condition.controlled the pre- The mailman is • Griffin and Bock, 2000: Passives chasing the sentation of pictures and digital recording of speech. The digitized dog. separate syntactic from thematic pictures were displayed on a 21-in. ViewSonic P815 monitor. Four hierarchy reflectors at the corners of the monitorApparatus provided references for the • See-and-describe Subject • Visual world eye-tracking eyetracker’s head-position camera. A hand-held button box was used A binocular EyeLink head-mounted eyetracker (SR Research Ltd.) Agent • Subjecthood is privileged: for manual responses, and a tie-clip microphone for voice recording. controlled by a Gateway 2000 P5-120 computer recorded eye move- • Participants look to subject first, even if ments with a temporal resolution of 4 ms and spatial resolution of Passive: it’s not the agent. approximately 0.5°. Another Gateway computer controlled the pre- The mailman is Materials being chased by sentation of pictures and digital recordingFig. 1. ofExamples speech. The of picture digitized sets used in the experiment. Thethe top dog. panel• BUT, other factors may ‘boost’ The experimental pictures werepictures black-and-white were displayed line drawings on a 21-in. of ViewSonicshows a sample P815 monitor. active picture Four set, typically described with active subjecthood effect simple transitive events, selected forreflectors eliciting at reliable the corners descriptions of the monitorin sentences provided in references all four versions for the shown (“The mouse is squirting the • Subjects always human[1] Subject a preliminary norming study. Thereeyetracker’s were four head-position versions for each camera. of Aturtle hand-held with button water” box and was “The used turtle is squirting the mouse with water”). • Unclear when agent of optional by- for manual responses, and a tie-clip microphoneThe bottom for panel voice shows recording. a sample passive-active picture set, typically phrases planned[2] eight events (see Fig. 1). We created two versions by switching the described with active sentences if the human is the agentAgent and with elements that performed and underwent the actions (agents and pa- passive sentences if the human is the patient (“The mailman is being [1] Clark & Begun, 1971 [2] Thompson and Lee, 2009 tients, respectively), to control the perceptual, conceptual, and lexical Materials chased by the dog” and “The mailman is chasing the dog”). Within 5 properties of the elements. We refer to these as the original and each set, the upper pictures are theFig. original 1. Examples and role-traded of picture versions,sets used in the experiment. The top panel role-traded versions. Both versions wereThe then experimental mirror-imaged pictures to coun- were black-and-whiteand the lower line pictures drawings are their of mirrorshows images. a sample active picture set, typically described with active terbalance left-to-right scanning preferencessimple transitive (Buswell, events, 1935). selected for eliciting reliable descriptions in sentences in all four versions shown (“The mouse is squirting the a preliminary norming study. There were four versions for each of turtle with water” and “The turtle is squirting the mouse with water”). The experimental pictures depicted two types of events. Active viewed one list. Within tasks,The participants bottom panel were shows divided a sample equally passive-active picture set, typically eight events (see Fig. 1). We created two versions by switching the events elicited predominantly active sentences in the experiment, re- among the four lists. described with active sentences if the human is the agent and with gardless of which element was theelements agent. Passive-active that performedevents and underwent in- the actions (agents and pa- passive sentences if the human is the patient (“The mailman is being cluded a human who was consistentlytients, respectively), used as the to grammatical control the perceptual, conceptual, and lexical chased by the dog” and “The mailman is chasing the dog”). Within subject, eliciting passive sentencesproperties when the human of the elements.was the patient We refer toProcedure these as the original and each set, the upper pictures are the original and role-traded versions, (original version) rather than the agentrole-traded (role-tradedversions. version), Both versions so that were then mirror-imaged to coun- and the lower pictures are their mirror images. the order in which event elementsterbalance were mentioned left-to-right did not scanning change. preferencesParticipants (Buswell, were 1935). instructed and tested on two printed example Figure 1 illustrates picture sets of the twoThe event experimental types. There pictures were depicted 5 pictures two types before of being events. equippedActive withviewed the eyetracker. one list. Within Both extempora- tasks, participants were divided equally active and 3 passive-active sets. Anevents additional elicited 17 predominantly pictures served active as sentencesneous and in prepared the experiment, speakers re- wereamong told to the describe four lists. each pictured event gardless of which element was the agent. Passive-active events in- example, practice, and filler items. These pictures depicted events that in one sentence without pronouns, and to press a button on the button cluded a human who was consistently used as the grammatical elicited intransitive descriptions (sentences without objects; e.g., “A box at the end of the description. In addition, prepared speakers were subject, eliciting passive sentences when the human was the patient baseball player is bunting”), although some of the events included instructed to press a button when theyProcedure were ready to speak; the button (original version) rather than the agent (role-traded version), so that patients (e.g., the baseball). press caused the picture to disappear. To encourage normal formula- the order in which event elements were mentioned did not change. Participants were instructed and tested on two printed example The four versions of each experimental picture were distributed tion, we gave speakers no guidelines about the form or content of their Figure 1 illustrates picture sets of the two event types. There were 5 pictures before being equipped with the eyetracker. Both extempora- across four lists, with each list containingactive an and equal 3 passive-active number of mirror- sets. An additionaldescriptions, 17 pictures and speed served was as not mentioned.neous and prepared Participants speakers in the were detec- told to describe each pictured event imaged, original, and role-traded pictureexample, versions. practice, Each and filler participant items. Thesetion pictures task were depicted asked events to locate that thein “victim” one sentence in each without picture pronouns, by fixating and to press a button on the button elicited intransitive descriptions (sentences without objects; e.g., “A box at the end of the description. In addition, prepared speakers were VOL. 11, NO. 4, JULY 2000 baseball player is bunting”), although some of the events included instructed to press a button when275 they were ready to speak; the button patients (e.g., the baseball). press caused the picture to disappear. To encourage normal formula- The four versions of each experimental picture were distributed tion, we gave speakers no guidelines about the form or content of their across four lists, with each list containing an equal number of mirror- descriptions, and speed was not mentioned. Participants in the detec- imaged, original, and role-traded picture versions. Each participant tion task were asked to locate the “victim” in each picture by fixating

VOL. 11, NO. 4, JULY 2000 275 The Question What kind of hierarchical structure do we use to linguistically encode our sentences?

Does encoding begin with the most syntactically prominent element (Subject) or the most thematically prominent element (Agent) of a sentence?

6 Our Solution: Pysch(ological State) Verbs

Experiencer Stimulus- -Stimulus Experiencer • Loves • Amazes • Hates • Scares • Fears • Frustrates • Adores • Confuses … …

7 Our Solution: Psych(ological State) Verbs

• Syntactically, the same surface form Thematic Exp-Stim: Leslie loves Ann. Hierarchy Stim-Exp: Leslie scares Ann. Agent

• Thematically, Experiencers more prominent than Experiencer Stimulus in standard thematic hierarchies[1]

Syntactic Structure Thematic Structure Sources Goals Locatives Instruments Stimulus

Patient

Stim-Exp: Leslie scares Ann. Stim-Exp: Leslie scares Ann.

[1] Grimshaw, 1980; Jackendoff, 1987; Belletti & Rizzi, 1988

8 Why Psych(ological State) Verbs?

1. These verbs are rarely investigated experimentally: We want to extend prior psycholinguistic work beyond the Agt-Pat structure

2. They provide a different way to tap into how linguistic encoding unfolds: We want a minimal contrast that teases apart the most syntactically prominent element (Subject) from the most thematically prominent element (Experiencer) of a sentence.

9 Psych Verbs: Methods & Design • ‘See-and-Describe’: 1. Trained on names of characters 2. See a verb prompt 3. See a critical image 4. Participants (n=34) produce sentence about the image using verb

• 3 Verb Types * 8 trials each • Experiencer-Stimulus: e.g. loves • Stimulus-Experiencer: e.g. scares • Agent-Patient: e.g. confronts

• We analyzed (i) speech onset times, and (ii) eye-movements to subject during encoding (400-1000ms after image) “Leslie scares Ann.”

• 3 post-experiment questionnaires: Image clarity, Visual salience, Autism Spectrum Quotient

10 Hypotheses & Predictions

Does encoding begin with the most syntactically prominent or thematically prominent element? Agt-Pat Exp-Stim Stim-Exp Leslie confronts Ann. Leslie fears Ann. Leslie scares Ann.

Syntactic: Subject Subject Subject Subject Leslie Leslie Leslie

Thematic: Subject Subject Object Agt/Exp Leslie Leslie Ann Multi- Factorial Subject Subject Both things Leslie Leslie important

11 Psych Verbs: Speech Onset Times n.s . *

People slower to start speaking in Stim-Exp sentences

12 Psych Verbs: Eye-movements 1. Patterns before and after SPEECH ONSET • Not theoretically relevant to hypotheses about linguistic encoding, which happens well-before speaking • Just checking: Do eye-movements make sense based on prior work?

2. Patterns immediately after IMAGE APPEARS • This tells us about how linguistic encoding unfolds

13 Psych Verbs: The “sanity check” Agt-Pat: Leslie confronts Ann. Exp-Stim: Leslie loves Ann. Stim-Exp: Leslie scares Ann. Speech Onset

14 Psych Verbs: Eye-movements 1. Patterns before and after SPEECH ONSET • Not theoretically relevant to hypotheses about linguistic encoding, which happens well-before speaking • Just checking: Do eye-movements make sense based on prior work?

2. Patterns immediately after IMAGE APPEARS • This tells us about how linguistic encoding unfolds

15 Psych Verbs: Eye-Movements @ Encoding Agt-Pat: Leslie confronts Ann. Exp-Stim: Leslie loves Ann. Stim-Exp: Leslie scares Ann.

(1) Competition between syntactically prominent subject versus thematically prominent experiencer in Stim-Exp conditions. (2) Linguistic Encoding is independent from prior stages of planning

Linguistic Encoding

16 Hypotheses & Predictions

Does encoding begin with the most syntactically prominent or thematically prominent element? Agt-Pat Exp-Stim Stim-Exp Leslie confronts Ann. Leslie fears Ann. Leslie scares Ann.

Syntactic: Subject Subject Subject Subject Leslie Leslie Leslie

Thematic: Subject Subject Object Agt/Exp Leslie Leslie Ann Multi- Factorial Subject Subject Both things Both things Leslie Leslie have to align! important

17 Psych Verbs: What did we find?

1. Psych verbs, as a class, are not categorically more difficult to plan for production than Agent-Patient verbs. • Exp-Stim and Stim-Exp do not show the same data patterns

2. Linguistic encoding is driven by alignment of syntactic to thematic prominence – not by syntax alone • Slower speech onsets & prolonged competition between looks to subject & object in Stim-Exp conditions

3. Message Conceptualization and Linguistic Encoding are separate processes in production • Eye-movements before during message conceptualization did not predict movements during encoding But how much of our effects were visually driven?

18 Experiment 2: Are results in Experiment 1 visually driven (rather than linguistic)?

• Goal: Make sure that results for Stim-Exp verbs in Exp 1 due to misalignment between syntactic and thematic prominence, not • Interpretability (codeability) of images, and/or • Wonkiness of some facial expressions

• Prediction: If Exp1 results were visually driven, early eye-movements when there is no linguistic task should be the same as when preparing to produce a sentence

19 Exp2. Non-Linguistic Task: Methods & Design

• ‘Picture Inspection’: 1. Fixation Cross 2. Participants (n=18) inspected images for sense of ‘quality’ and ‘content’ 3. No sentence production • Same images as Exp1 • To ensure participants attend to images, randomly interspersed rating task • Across verb types, measured: • Proportion of looks to subject • 2 post-experiment questionnaires: 1. Visual Salience 2. Autism Spectrum Quotient

20 Picture Inspection: Eye-Movements in Exp2

Evidence against Visual Account: Eye- Movements not the same in Linguistic vs Non-Linguistic Task

21 How does linguistic encoding work?

• What kind of hierarchical structure do we use to linguistically encode our sentences?

Syntactic Structure BUT: Thematic Structure In active sentences, these <> accounts <> make the same Start with the SUBJECT predictions Start with the AGENT of of the sentence the sentence

[1] Do & Kaiser, 2018, JML; Griffin and Bock, 2000; Lee et al., 2014

22 How does linguistic encoding work?

• What kind of hierarchical structure do we use to linguistically encode our sentences?

• Exp1: Alignment of syntactic-to-thematic structures (not just one structure) matters: When these hierarchies are not aligned, linguistic encoding is delayed. • Only Stim-Exp (mismatched) verbs showed slower speech onset • Only Stim-Exp (mismatched) verbs showed slower preferential fixations to subj/obj

• Exp2: Results are linguistically, not visually, driven • Different pattern of eye-movements when people planning for speech vs when they are just looking at images

23 Current/Future Directions

1. Thematic Hierarchies: What does it mean to be the most prominent element of an ‘event’? • Experiencer-Stimulus Relationship: In the right contexts, can the Stimulus be more prominent than the Experiencer? • Source-Goal Asymmetries: Other work has shown a massive goal- bias in language.[1] The butterfly flew from the chair to the lamppost. In the right contexts, can sources be more prominent than goals? 2. What are the psychological underpinnings of the Thematic Hierarchy? • Are linguistic asymmetries homologous to those in non-linguistic cognition? • Perception[2], Attention[3]

[1] Lakusta & Landau, 2005, 2012; Papafragou, 2010 [2] Hafri et al., 2013, 2018 [3] Do, Papafragou, Trueswell, & Robinson, in prep

24 References

Bock, J. K., and Warren, R. K. (1985). Conceptual accessibility and syntactic structure in sentence formulation. Cognition 21, 47–67. Clark, H. H. and Begun, J. S. (1971). The Semantics of Sentence Subjects. Language and Speech 14(1), 34-46. Do, M. L. and Kaiser, E. M. (2018). Subjecthood and linear order in linguistic encoding: Evidence from the real-time production of wh- questions in English and Mandarin Chinese. Journal of Memory and Language 105, 60-75. Do, M. L., A. Papafragou, A., Trueswell, J. C., and Robinson, N. (In prep). Discourse Effects on the Source-Goal Asymmetry. Griffin, Z. M., and Bock, K. (2000). What the eyes say about speaking. Psychological Science 11, 274-279. Grimshaw, J. (1990). Argument structure. Cambridge, MA: the MIT Press. Hafri, A., Papafragou, A., & Trueswell, J. C. (2013). Getting the gist of events: Recognition of two-participant actions from brief displays. Journal of Experiment Psychology: General 142(3), 880–905. Hafri, A., Trueswell, J. C., Strickland, B. (2018). Encoding of event roles from visual scenes is rapid, spontaneous, and interacts with higher-level visual processing. Cognition 175, 36-52. Jackendoff, R. S. (1972). Semantic interpretation in . Cambridge, MA: MIT Press. Jackendoff, R. S. (1987). The status of thematic relations in linguistic theory. Linguistic Inquiry 18(3), 369-411. Lakusta, L., & Landau, B. (2005). Starting at the end: The importance of goals in spatial language. Cognition 96, 1–33. Lakusta, L., & Landau, B. (2012). Language and Memory for Motion Events: Origins of the Asymmetry Between Source and Goal Paths. Cognitive Science 36, 517–544. Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Papafragou, A. (2010). Source-goal asymmetries in motion representation: Implications for language production and comprehension. Cognitive Science 34, 1064–1092. Thompson, C. K., and Lee, M. (2009). Psych verb production and comprehension in agrammatic Broca’s aphasia. Journal of 22, 354–369.

25 25 Thank you!! � Monica L. Do ([email protected]) & Elsi Kaiser ([email protected])