BEHAVIORAL AND BRAIN SCIENCES (2013), Page 1 of 64 doi:10.1017/S0140525X12001495

An integrated production and comprehension

Martin J. Pickering Department of Psychology, University of Edinburgh, Edinburgh EH8 9JZ, United Kingdom [email protected]

Simon Garrod University of Glasgow, Institute of Neuroscience and Psychology, Glasgow G12 8QT, United Kingdom [email protected]∼simon/

Abstract: Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others’ actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as , , and ). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming . We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal.

Keywords: comprehension; covert imitation; dialogue; forward model; language; prediction; production

1. Introduction This model includes “thick” arrows between message and (linguistic) form, corresponding to production and Current accounts of language processing treat production and comprehension. The production arrows represent the fact comprehension as quite distinct from each other. The split is that production may involve converting one message into clearly reflected in the structure of recent handbooks and form (serial account) or the processor may convert multiple textbooks concerned with the psychology of language (e.g., messages at once, then select one (parallel account). Within Gaskell 2007;Harley2008). This structure does not merely production, the “internal” arrows signify feedback (e.g., reflect organizational convenience but instead treats compre- from phonology to syntax), which occurs in interactive hension and production as two different questions to investi- accounts but not purely feedforward accounts. Note that gate. For example, researchers assume that the processes these arrows are consistent with any type of information involved in comprehending a spoken or written sentence, (linguistic or nonlinguistic) being used during production. such as resolving ambiguity, may be quite distinct from the The arrows play an analogous role within comprehension processes involved in producing a description of a scene. In (e.g., the internal arrows could signify feedback from , the “classic” Lichtheim–Broca–Wernicke semantics to syntax). In contrast, the arrows corresponding model assumes distinct anatomical pathways associated with to sound are “thin” because a single sequence of sounds is production and comprehension, primarily on the basis of sent forward between the speakers. If communication is fi – ’ ’ de cit lesion correlations in aphasia (see Ben Shalom & fully successful, then A s message1=B s message1. Similarly, Poeppel 2008). This target article rejects such a dichotomy. there is a “thin” arrow for thinking because such accounts In its place, we propose that producing and understanding assume that each individual converts a single message are tightly interwoven, and this interweaving underlies (e.g., an understanding of a question, message ) into ’ 1 people s ability to predict themselves and each other. another (e.g., an answer, message2), and the answer does not affect the understanding of the question. The model is split vertically between the processes in 1.1. The traditional independence of production and different individuals, who of course have independent comprehension minds. But it is also split horizontally, because the pro- To see the effects of the split, we need to think about cesses underlying production and comprehension within language use both within and between individuals, in each individual are separated. The traditional model terms of a model of communication (Fig. 1). assumes discrete stages: one in which A is producing and

© Cambridge University Press 2013 0140-525X/13 $40.00 1 Pickering & Garrod: An integrated theory of language production and comprehension B is comprehending an , and one in which B is The horizontal split is also challenged by findings from producing and A is comprehending an utterance. Each isolated instances of comprehension or production. Take speaker constructs a message that is translated into sound picture-word interference, in which participants are told before the addressee responds with a new message. to name a picture (e.g., of a dog) while ignoring a spoken Hence, dialogue is “serial monologue,” in which interlocu- or written distractor word (e.g., Schriefers et al. 1990). At tors alternate between production and comprehension. certain timings, they are faster naming the picture if the In conversation, however, interlocutors’ contributions word is phonologically related to it (dot) than if it is not. often overlap, with the addressee providing verbal or non- The effect cannot be caused by the speaker’s interpreting verbal feedback to the speaker, and the speaker altering dot before producing dog – the meaning of dot is not the her contribution on the basis of this feedback. In fact, such cause of the facilitation. Rather, the participant accesses feedback can dramatically affect both the quality of the phonology during the comprehension of dot, and this speaker’s contribution (e.g., Bavelas et al. 2000)andthe affects the construction of phonology during the pro- addressee’s understanding (Schober & Clark 1989). This of duction of dog. So experiments such as these suggest that course means that both interlocutors must simultaneously production and comprehension are tightly interwoven. produce their own contributions and comprehend the Quite ironically, most psycholinguistic theories attempt to other’s contribution. Clearly, an approach to language pro- explain either production or comprehension, but a great cessing that assumes a temporal separation between pro- many experiments appear to involve both. Single word duction and comprehension cannot explain such behavior. naming is typically used to explain comprehension but Interlocutors are not static, as the traditional model involves production (see Bock 1996). Sentence completion assumes, but are “moving targets” performing a joint is often used to explain production but involves compre- activity (Garrod & Pickering 2009). They do not simply hension (e.g., Bock & Miller 1991). Similarly, the finding transmit messages to each other in turn but rather nego- that word identification can be affected by externally con- tiate the form and meaning of expressions they use by inter- trolled cheek movement (Ito et al. 2009) suggests that pro- weaving their contributions (Clark 1996), as illustrated in duction influences comprehension. (1a–1c), below (from Gregoromichelaki et al. 2011). In In addition, production and comprehension appear to (1b), B begins to ask a question, but A’s interruption (1c) recruit strongly overlapping neural circuits (Scott & Johns- completes the question and answers it. B, therefore, does rude 2003; Wilson et al. 2004). For example, Paus et al. not discretely encode a complete message into sound but, (1996) found activation (dependent on the rate of rather, B and A jointly encode the message across (1b–c). speech) of regions associated with speech perception —– ’ when people whispered but could not hear their own 1a A: I m afraid I burnt the kitchen ceiling speech. Listeners also activate appropriate muscles in the 1b—–B: But have you 1c—–A: burned myself? Fortunately not. tongue and lips while listening to speech but not nonspeech (Fadiga et al. 2002; Watkins et al. 2003). Additionally, increased muscle activity in the lips is associated with increased activity (i.e., blood flow) in Broca’s area, suggesting that this area mediates between the comprehen- MARTIN J. PICKERING is Professor of the Psychology of sion and production systems during speech perception Language and Communication at the University of (Watkins & Paus 2004). There is also activation of brain Edinburgh. He is the author of more than 100 journal areas associated with production during aspects of articles and numerous other publications in language comprehension from phonology (Heim et al. 2003) to nar- production, comprehension, dialogue, reading, and bilingualism. His articles have appeared in Psychological rative structure (Mar 2004); see Scott et al. (2009) and Bulletin; Trends in Cognitive Sciences; Psychological Pulvermüller and Fadiga (2010). Finally, Menenti et al. Science; Cognitive Psychology; Journal of Memory and (2011) found massive overlap between speaking and listen- Language; Cognition; Journal of Experimental Psychol- ing for regions showing functional magnetic resonance ogy: Learning, Memory, and Cognition; and many other imaging (fMRI) adaptation effects associated with repeat- journals. In particular, he published “Toward a mechan- ing language at different linguistic levels (see also Segaert istic psychology of dialogue” (Behavioral and Brain et al. 2012). These results are inconsistent with separation Sciences, 2004) with Simon Garrod. He is a Fellow of of neural pathways for production and comprehension in the Royal Society of Edinburgh and the editor of the classical Lichtheim–Broca–Wernicke neurolinguistic Journal of Memory and Language. model. SIMON GARROD is Professor of Cognitive Psychology at In conclusion, the evidence from dialogue, psycholin- the University of Glasgow. Between 1989 and 1999 he guistics, and cognitive neuroscience all casts doubt on the was also Deputy Director of the ESRC Human Com- independence of production and comprehension, and munication Research Centre. He has published two therefore on the horizontal split assumed in Figure 1. Let books, one with Anthony J. Sanford, Understanding us now address two theoretical issues relating to the aban- written language, and one with Kenny R. Coventry, donment of this split, and then ask what kind of model is Seeing, saying and acting: The psychological semantics compatible with the interweaving of production and of spatial prepositions. Additionally, he has published comprehension. more than 100 research papers on various aspects of the psychology of language. His special interests include discourse processing, language processing in 1.2. Modularity and the cognitive sandwich dialogue, psychological semantics, and graphical com- munication. He is a Fellow of the Royal Society of Much of has sought to test the claim that Edinburgh. language processing is modular (Fodor 1983). Such accounts investigate the way in which information travels

2 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension

Figure 1. A traditional model of communication between A and B. (comp: comprehension; prod: production) between the boxes in a model such as in Figure 1. In par- interwoven as well, and so accounts of language processing ticular, the arrows labeled thinking correspond to “central should also reject the cognitive sandwich. processes” and contain representations in some kind of language of thought. Researchers are particularly con- 1.3. Production and comprehension processes cerned with the extent to which thinking arrows are separ- ated from the production and comprehension arrows. How can production and comprehension both be involved Modular theories assume that some aspects of production in isolated speaking or listening? Within the individual, we or comprehension do not make reference to “central pro- mean that production and comprehension processes are cesses” (e.g., Frazier 1987; Levelt et al. 1999). In contrast, interwoven. Production processes must of course be interactionist theories allow “central processes” to directly used when individuals produce language, and comprehen- affect production or comprehension (e.g., Dell 1986; Mac- sion processes must be used when they comprehend Donald et al. 1994; Trueswell et al. 1994). But both types of language. However, production processes must also be theory maintain that production and comprehension are used during, for example, silent naming, when no utterance separated from each other. In this sense, both types of is produced. Silent naming therefore involves some pro- theory are modular and are compatible with Figure 1. duction processes (e.g., those associated with aspects of for- In fact, Hurley (2008a) argued that traditional cognitive mulation such as name retrieval) but not others (e.g., those psychology assumes this type of modularity in order to keep associated with articulation; see Levelt 1989). Likewise, action and perception separate. She referred to this comprehension processes must occur when a participant assumption as the cognitive sandwich. Individuals perceive retrieves the phonology of a masked prime word but not the world, reason about their perceptions using thinking its semantics (e.g., Van den Bussche et al. 2009). And so it (i.e., cognition), and act on the basis of those thoughts. is also possible that production processes are used during Researchers assume that action and perception involve sep- comprehension and comprehension processes used during arate representations and processes and study one or the production. other but not both (and they are kept separate in textbooks How can we distinguish production processes from com- and the like). In Hurley’s terms, the cognitive “meat” keeps prehension processes? For this, we assume that (1) people the motor “bread” separate from the perceptual “bread.”1 represent linguistic information at different levels; (2) these She argued that perception and action are interwoven levels are semantics, syntax, and phonology2; (3) they are and, therefore, rejected the cognitive sandwich. ordered “higher” to “lower,” so that a speaker’s message Importantly, language production is a form of action and is linked to semantics, semantics to syntax, syntax to pho- language comprehension is a form of perception. Therefore, nology, and phonology to speech. We then assume that a traditional psycholinguistics also assumes the cognitive sand- producer goes from message to sound via each of these wich, with the thinking “meat” keepingaparttheproduction levels (message → semantics → syntax → phonology → and comprehension “bread.” But if action and perception sound), and a comprehender goes from sound to message are interwoven, then production and comprehension are in the opposite direction. Given this framework, we

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 3 Pickering & Garrod: An integrated theory of language production and comprehension define a production process as a process that maps from a comprehension are integrated. We assume that instances “higher” to a “lower” linguistic level (e.g., syntax to phonol- of both production and comprehension involve extensive ogy) and a comprehension process as a process that maps use of prediction – determining what you yourself or your from a “lower” to a “higher” level.3 This means that produ- interlocutor is likely to say next. Predicting your own utter- cing utterances must involve production processes, but can ance involves comprehension processes as well as pro- also involve comprehension processes; similarly, compre- duction processes, and predicting another person’s hending utterances must involve comprehension processes, utterance involves production processes as well as compre- but can also involve production processes. hension processes. One possibility is that people have separate production As we have noted, production is a form of action, and and comprehension systems. On this account, producing comprehension is a form of perception. More specifically, utterances may make use of feedback mechanisms that comprehension is a form of action perception – perception are similar in some respects to the mechanisms of compre- of other people performing actions. We first consider the hension, and comprehending utterances may make use of evidence for interweaving in action and action perception, feedback mechanisms that are similar in some respects to and we explain such evidence in terms of prediction. We the mechanisms of production. This is the position assume that actors construct forward models of their assumed by traditional interactive models of production actions before they execute those actions, and that percei- (e.g., Dell 1986) and comprehension (e.g., MacDonald vers of others’ actions construct forward models of others’ et al. 1994). In such accounts, production and comprehen- actions that are based on their own potential actions. sion are internally nonmodular, but are modular with Finally, we apply these accounts to joint action. respect to each other. They do not take advantage of the We then develop these accounts of action, action percep- comprehension system in production or the production tion, and joint action into accounts of production, compre- system in comprehension (even though the other system hension, and dialogue. Unlike many other forms of action is often lying dormant). and perception, language processing is clearly structured, Very little work in comprehension makes reference to incorporating well-defined levels of linguistic represen- production processes, with classic theories of lexical proces- tation such as semantics, syntax, and phonology. Thus, sing (from, e.g., Marslen-Wilson & Welsh 1978 or Swinney our accounts also include such structure. We show how 1979 onward) and sentence processing (e.g., Frazier 1987; speakers and comprehenders predict the content of levels MacDonald et al. 1994) making no reference to production of representation by interweaving production and compre- processes (see Bock 1996 for discussion, and Federmeier hension processes. We then explain a range of behavioral 2007 for an exception). In contrast, some theories of pro- and neuroscientific data on language processing, and duction do incorporate comprehension processes. Most discuss some of the implications of the account. notably, Levelt (1989) assumed that speakers monitor their own speech using comprehension processes. They can hear their own speech (external self-monitoring), in 2. Interweaving in action and action perception which case the speaker comprehends his own utterance just like another person’s utterance; but they can also For perception and action to be interwoven, there must be monitor a sound-based representation (internal self-moni- a direct link between them. If so, there should be much evi- toring), in which comprehension processes are used to dence for effects of perception on action, and there is. In convert sound to message (see sect. 3.1). one study, participants’ arm movements showed more var- In addition, some computationally sophisticated models iance when they observed another person making a differ- can use production and comprehension processes together ent versus the same arm movement (Kilner et al. 2003; see (e.g., Chang et al. 2006), use comprehension to assist in the also Stanley et al. 2007). Conversely, there is good evidence process of learning to speak (Plaut & Kello 1999), or for effects of action on perception. For example, producing assume that comprehension and production use the same hand movements can facilitate the concurrent visual dis- network of nodes and connections so that feedback pro- crimination of deviant hand postures (Miall et al. 2006), cesses during production are the same as feedforward pro- and turning a knob can affect the perceived motion of a cesses during comprehension (MacKay 1982). In addition, perceptually bistable object (Wohlschläger 2000). Such evi- Dell has proposed accounts in which feedback during pro- dence immediately casts doubt on the “sandwich” architec- duction is a component of comprehension (e.g., Dell 1988), ture for perception and action. although he has also queried this claim on the basis of neu- What purpose might such a link serve? First, it could ropsychological evidence (Dell et al. 1997, p. 830); see also facilitate overt imitation, but overt imitation is not the debate between Rapp and Goldrick (2000; Rapp & common in many species (see Prinz 2006). Second, it Goldrick 2004) and Roelofs (2004). could be used postdictively, with action representations But none of these theories incorporate mechanisms of helping perceivers develop a stable memory for a percept sentence comprehension (e.g., parsing or lexical ambiguity or a detailed understanding of it (e.g., via rehearsal), and resolution) into theories of production. We believe that this perceptual representations doing the same for actors. But is a consequence of the traditional separation of production we propose a third alternative: people compute action rep- and comprehension (as represented in Fig. 1). In contrast, resentations during perception and perception represen- we propose that comprehension processes are routinely tations during action to aid prediction of what they are accessed at different stages in production, and that pro- about to perceive or to do, in a way that allows them to duction processes are routinely accessed at different “get ahead of the game” (see Wilson & Knoblich 2005).4 stages in comprehension. To explain this, we turn to the theory of forward modeling, The rest of this target article develops an account of which was first applied to action but has more recently been language processing in which processes of production and applied to action perception. We interpret the theory in a

4 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension way that then allows us to extend it to account for language the command). The predicted act then causes the forward processing. perceptual model to construct a predicted percept of the experience of moving the hand. (This percept would not form part of a traditional action plan.) Note that this pre- 2.1. Forward modeling in action dicted percept is compatible with the theory of event To explain forward modeling, we draw on Wolpert’s propo- coding (Hommel et al. 2001), in which actions are rep- sals from computational neuroscience (e.g., Davidson & resented in terms of their predicted perceptual Wolpert 2005; Wolpert 1997), but reframed using psycho- consequences. logical terminology couched in the language of perception Importantly, the efference copy is (in general) processed and action (see Fig. 2). We use the simple example of more quickly than the action command itself (see Davidson moving a hand to a target. The actor formulates the & Wolpert 2005). For example, the command to move the action (motor) command to move the hand. This hand causes the action implementer to activate muscles, command initiates two processes in parallel. First, it which is a comparatively slow process. In contrast, the causes the action implementer to generate the act, which forward action model and the forward perceptual model in turn leads the perceptual implementer to construct a make use of representations of the position of the hand, percept of the experience of moving the hand. In Wolpert’s state of the muscles, and so on (and may involve simplifica- terms, this percept is used as sensory feedback (reafference) tions and approximations). These representations may be in and is partly proprioceptive, but may also be partly visual (if terms of equations (e.g., hand coordinates), and such the agent watches her hand move). equations can (typically) be solved rapidly (e.g., using a Second, it sends an efference copy of the action network that represents relevant aspects of mathematics). command to cause the forward action model to generate So the predicted percept (the predicted sensations of the the predicted act of moving the hand.5 Just as the act hand’s movement and position) is usually “ready” before depends on the application of the action command to the the actual percept. The action then occurs and the pre- current state of the action implementer (e.g., where the dicted percept is compared to the actual percept (the sen- hand is positioned before the command), so the predicted sations of the hand’s actual movement and position). act depends on the application of the efference copy of the Any discrepancy between these two sensations (as deter- action command to the current state of the forward action mined by the comparator) is fed back so that it can modify model (e.g., a model of where the hand is positioned before the next action command accordingly. If the hand is to the

Figure 2. A model of the action system, using a snapshot of executing an act at time t. Boxes refer to processes, and terms not in boxes refer to representations. The action command u(t) (e.g., to move the hand) initiates two processes. First, u(t) feeds into the action (motor) implementer, which outputs an act a(t) (the event of moving the hand). In turn, this act feeds into the perceptual (sensory) implementer, which outputs a percept s(t) (the perception of moving the hand). Second, an efference copy of u(t) feeds into the forward action model, a computational device (distinct from the action implementer) which outputs a predicted act a^(t) (the predicted event of moving the hand); the carat indicates an approximation. In turn, a^(t) feeds into the forward perceptual model, a computational device (distinct from the perceptual implementer) which outputs a predicted percept s^(t) (the predicted perception of moving the hand). The comparator can be used to compare the percept and the predicted percept.

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 5 Pickering & Garrod: An integrated theory of language production and comprehension left of its predicted position, the next action command can (2001) proposed that actors run sets of model pairs in par- move it more to the right. In this way, perceptual processes allel, with each forward model making different predictions have an online effect on action, so that the act can be about how the action might unfold in different contexts. By repeatedly affected by perceptual processes as well as matching actual movements against these different predic- action processes. (Alternatively, the actor can correct the tions, the system can shift responsibility for controlling the forward model rather than the action command, depending action toward the model pair whose forward model predic- on her confidence about the relative accuracy of the action tion best fits that movement. For example, a person starts command and the efference copy.) Such prediction is to pick up a small (and apparently light) object using a necessary because determining the discrepancy on the weak grip but subsequently finds the grip insufficient to basis of reafferent feedback would be far too slow to lift the object. According to MOSAIC, the person would allow corrective movements (see Grush [2004], who then shift the responsibility for controlling the action to a referred to forward models as emulators). new forward-inverse model pairing, which produces a We assume that the central role of forward modeling is stronger grip. perceptual prediction (i.e., predicting the perceptual out- The same principles apply to structured activities that are comes of an action). However, it has other functions. more complex, such as the process of drinking a cup of tea. First, it can be used to help estimate the current state, Here the forward model provides information ahead of given that perception is not entirely accurate. The best esti- time about the sequence and overlap between the different mate of the current position of the hand combines the esti- stages in the process (moving the hand to the cup, picking it mate that comes from the percept and the estimate that up, moving it to the mouth, opening the mouth, etc.) and comes from the predicted percept. For example, a person represents the predicted sensory feedback at each stage can estimate the position of her hand in a dark room by (i.e., the predicted percept). Controlling such complex remembering the action command that underlay her sequences of actions has been implemented by Haruno hand movement to its current location. Second, forward et al. (2003) in their Hierarchical MOSAIC (HMOSAIC) models can cancel the sensory effects of self-motion (reaf- model. HMOSAIC extends MOSAIC by having hierarchi- ference cancellation) when these sensory effects match the cally organized forward-inverse model pairings that link predicted movement. This enables people to differentiate “high level” intentions to “low level” motor operations – in between perceptual effects of their own actions and those our terms, from high-level to low-level action commands. reflecting changes in the world, for example, explaining In conclusion, forward modeling in action allows the why self-applied tickling is not effective (Blakemore et al. actor to predict her upcoming action, in a way that allows 1999). her to modify the unfolding action if it fails to match the A helpful analogy is that of an old-fashioned sailor navi- prediction. In addition, it can be used to facilitate esti- gating across the ocean (cf. Grush 2004). He starts at a mation of the current state, to cancel reafference, and to known position, which he marks on his chart (i.e., model support short- and long-term learning. In doing so, of the ocean), and determines a compass course and forward modeling closely interweaves representations speed. He lays out the corresponding course on the chart associated with action and representations associated with and traces out where he should be at noon (his predicted perception, and can therefore explain effects of perception act, at^ðÞ), and determines what his sextant should read at on action. this time and place (his predicted percept, st^ðÞ). He then sets off through the water until noon (his act, a(t)). At 2.2. Covert imitation and forward modeling in action noon, he uses his sextant to estimate his position from perception the sun (his percept, s(t)), and compares the predicted and observed sextant readings (using the comparator). He When you perceive inanimate objects, you draw on your can then use this in various ways. If he is not confident of perceptual experience of objects. For example, if an his course keeping, he pays more attention to the actual object’s movement is unclear, you can think about how reading; if he is not confident of his sextant reading (e.g., similar objects have appeared to move in the past (e.g., it is misty), he pays more attention to the predicted obeying gravity). When you perceive other people (i.e., reading. If the predicted and actual readings match, he action perception), you can also draw on your perceptual assumes no other force (this is equivalent to reafference experience of other people. We refer to this as the associ- cancellation). But if they do not match and he is confident ation route in action perception. For example, you about both course keeping and sextant reading, he assumes assume someone’s ambiguous arm movement is compati- the existence of another force, in this case the current. ble with your experience of perceiving other people’s arm Forward modeling also plays an important role in motor movements. People can clearly predict each other’s learning (Wolpert 1997). To be able to pick up an object actions using the association route, just as they can you need a model that maps the object’s location onto an predict the movement of physical objects on the basis of action (motor) command to move the hand to that location. past experience (e.g., Freyd & Finke 1984). This is called an inverse model because it represents the However, you can also draw on your experience of your inverse of the forward model. Learning a motor skill own body – you assume that someone’s arm movement is requires learning both an appropriate forward model and compatible with your experience of your own arm move- an appropriate inverse model. ments. We refer to this as the simulation route in action Motor control theories that are more sophisticated use perception. The simplest possibility is that the perceiver linked forward-inverse model pairs to explain how actors determines what she would do under the circumstances. can adapt dynamically to changes in the context of an In the case of hand movement, the perceiver would see unfolding action. In their Modular Selection and Identifi- the start of the actor’s hand movement and would then cation for Control (MOSAIC) account, Haruno et al. determine how she would move if it were her hand,

6 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension thereby determining the actor’s intention. Informally, she perceptual model to derive a prediction of her percept of would see the hand and the way it was moving, and then that act). To do this, she would identify the actor’s intention think of it as her own hand and use the mechanisms that from her perception of the previous and current states of she would use to move her own hand to predict her part- his arm (or from background information such as knowl- ner’s hand movement. In other words, she would covertly edge of his state of mind) and use this to generate an effer- imitate her partner’s movements, treating his arm positions ence copy of the intended act. If she determined that the as though they were her own arm positions. However, the actor was about to punch her face, she would have time perceiver cannot simply use the same mechanisms the to move. She can also compare this predicted percept actor would use but must “accommodate” to the differ- with her actual percept of his act when it happens. We illus- ences in their bodies (the context, in motor control trate this account in Figure 3. theory) – for example, applying a smaller force if her body This simulation account uses the mechanisms involved in is lighter weight than her partner’s.6 In any case, her repro- the prediction of action (as illustrated in Fig. 2), but it adds duction is unlikely to be perfect – she is in the position of a a mechanism for covert imitation. This mechanism also character actor attempting to reproduce another person’s allows for overt imitation of the action itself or a continu- mannerisms. ation of that action (the overt imitation and continuation In theory, the perceiver could simulate by using her own are what we call overt responses). In fact, the strong link action implementer (and inhibiting its output). However, between actions and predictions of those actions means this would be too slow – much of the time, she would deter- that perceivers tend to activate their action implementers mine her partner’s action after he had performed that as well as forward action models. Note that Figure 3 action. Instead, she can use her forward action model to ignores the association route to action prediction, which ’ derive a prediction of her partner s act (and the forward uses the percept sB(t) and knowledge about percepts that

Figure 3. A model of the simulation route to prediction in action perception in Person A. Everything above the solid line refers to the ’ ðÞ ’ unfolding action of Person B (who is being observed by A), and we underline B s representations. For instance, aB t can refer to B s initial ðÞþ ’ fi ’ ðÞþ ’ ðÞ hand movement (at time t) and aB t 1 to B s nal hand movement (at time t+1). A predicts B s act aB t 1 given B s act aB t .Todo fi ’ ’ ðÞ this, A rst covertly imitates B s act. This involves perceiving B s act aB t to derive the percept sB (t), and from this using the inverse model and context (e.g., information about differences between A’s body and B’s body) to derive the action command (i.e., the ’ intention) uB(t) that A would use if A were to perform B s act (without context, the inverse model would derive the command that B would use to perform B’s act – but this command is useless to A) and from this the action command that A would use if A were to ’ perform the subsequent part of B s act uB(t+1). A now uses the same forward modeling that she uses when producing an act (see ’ ^ ðÞþ ’ ^ ðÞþ Fig. 2) to produce her prediction of B s act aB t 1 , and her prediction of her perception of B s act sB t 1 . This prediction is ’ ðÞþ ^ ðÞþ ðÞþ generally ready before her perception of B s act sB t 1 . She can then compare sB t 1 and sB t 1 using the comparator. ’ Notice that A can also use the derived action command uB(t) to overtly imitate B s act and the derived action command uB(t+1) to overtly produce the subsequent part of B’s act (see “Overt responses”).

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 7 Pickering & Garrod: An integrated theory of language production and comprehension tend to follow sB(t) to predict the percept of the act. (The 2.3. Joint action perceiver may of course be able to combine the action- People are highly adept at joint activities, such as ballroom based and perceptual predictions into a single prediction.) dancing, playing a duet, or carrying a large object together Additionally, we have glossed over the computationally – (Sebanz et al. 2006a). Clearly, such activities require two complex part of this proposal the mapping from the (or more) agents to coordinate their actions, which in percept sB(t) to the action commands uB(t)anduB(t+1). ’ ’ turn means that they are able to perceive each other s How can the perceiver determine the actor s intention? acts and perform their own acts together. In many of In fact, Wolpert et al. (2003) showed how to do this using these activities, precise timing is crucial, with success HMOSAIC, which can make predictions about how differ- occurring only if each partner applies the right force at ent intentional acts unfold over time. In their account, the the right time in relation to the other. Such success there- perceiver runs parallel, linked forward-inverse model pair- fore requires tight interweaving of perception and action. ings at multiple levels from “low-level” movements to ’ “ ” Moreover, people must predict each other s actions, high-level intentions. By matching actual movements because responding after they perceive actions would against these different predictions, HMOSAIC determines simply be too slow. Clearly, it may also be useful to the likelihood of different possible intentions (and dynami- ’ fi predict one s own actions, and to integrate these predic- cally modi es the space of possible intentions). This in turn tions with predictions of others’ actions. modifies the perceiver’s predictions of the actor’s likely be- fi We therefore propose that people perform joint actions havior. For example, a rst level might determine that a by combining the models of prediction in action and action movement of the shoulder is likely to lead to a movement perception in Figures 2 and 3. Figure 4 shows how A and B of the arm (and would draw on information about the ’ ’ can both predict B s upcoming action (using prediction-by- actor s body shape); a second level might determine simulation). A perceives B’s current act and then uses covert whether such an arm movement is the prelude to a prof- imitation and forward modeling; B formulates his forthcom- fered handshake or a punch (and would draw on infor- ’ ing act and uses forward modeling based on that intention. If mation about the actor s state of mind). At the second successful, A and B should make similar predictions about level, the perceiver runs forward models based on those ’ ’ B s upcoming act, and they can use those predictions to coor- alternative intentions to determine what the actor s hand dinate. Note that they can both compare their predictions is likely to do next. If, for example, I predict you are ’ fi with B s forthcoming act when it takes place. more likely to initiate a handshake but then your st Joint action can involve overt imitation, continuation starts clenching, I modify my interpretation of your inten- of other’s behavior, or complementary behavior. Overt tion and now predict that you will likely throw a punch. fi imitation and continuation follow straightforwardly from At this point, I have determined your intention and con - Figure 3 (see the large arrow leading from “Covert imita- dently predict the upcoming position of your hand, just as tion” to “Overt responses”). There is much evidence that I would do if I were predicting my own hand movements. people overtly imitate each other without intending to or Good evidence that covert imitation plays a role in pre- being aware that they are doing so, from studies involving diction comes from studies showing that appropriate motor-related brain areas can be activated before a per- ceived event occurs (Haueisen & Knösche 2001). Similarly, mirror neurons in monkeys can be activated by perceptual predictions as well as by perceived actions (Umiltá et al. 2001); note there is recent direct evidence for mirror neurons in people (Mukamel et al. 2010).7 Additionally, people are better at predicting a movement trajectory (e.g., in dart-throwing or handwriting) when viewing a video of themselves versus others (Knoblich & Flach 2001; Knoblich et al. 2002). Presumably, prediction-by- simulation is more accurate when the object of the predic- tion is one’s own actions than when it is someone else’s actions. This yoking of action-based and perceptual pro- cesses can therefore explain the experimental evidence for interweaving (e.g., Kilner et al. 2003). Notice that such covert imitation can also drive overt imi- tation. However, the perceiver does not simply copy the actor’s movements, but rather bases her actions on her Figure 4. A and B predicting B’s forthcoming action (with B’s determination of the actor’s intentions. This is apparent processes and representations underlined). B’s action command ðÞ ’ ’ in infants’ imitation of caregivers’ actions (Gergely et al. uB t feeds into B s action implementer and leads to B s act ðÞ ’ ’ 2002) and in the behavior of mirror neurons, which code aB t . A covertly imitates B s act and uses A s forward action ’ for intentional actions (Umiltá et al. 2001). Importantly, model to predict B s forthcoming act (at time t+1). B mirror neurons do not exist merely to facilitate imitation simultaneously generates the next action command (the dotted (because imitation is largely or entirely absent in line indicates that this command is causally linked to the previous action command for B but not A) and uses B’s forward monkeys), and so one of their functions may be to drive action model to predict B’s forthcoming act. If A and B are action prediction via covert imitation (Csibra & Gergely coordinated, then A’s prediction of B’s act and B’s prediction of 2007; Prinz 2006). In conclusion, we propose that action B’s act (in the dotted box) should match. Moreover, they may perception interweaves action-based and perceptual pro- both match B’s forthcoming act at time t+1 (not shown). A and cesses in a way that supports prediction. B also predict A’s forthcoming action (see text).

8 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension the imitation of specific movements (e.g., Chartrand & performing a related action than when you are performing Bargh, 1999; Lakin & Chartrand, 2003) or involving the an unrelated action. (Notice also that A and B are likely to synchronization of body posture (e.g., Shockley et al., overtly imitate each other and that such overt imitation will 2003). For example, pairs of participants tend to start make their actions more similar, hence the predictions rocking chairs at the same frequency, even though the easier to integrate.) In conclusion, joint action can be suc- chairs have different natural frequencies (Richardson cessful because the participants are able to integrate their et al., 2007), and crowds come to clap in unison (Neda own action with their perception of their partner’s action. et al., 2000). Such imitation appears to be on a percep- tion-behavior expressway (Dijksterhuis & Bargh, 2001), not mediated by inference or intention. Many of these find- 3. A unified framework for language production ings demonstrate close temporal coordination and appear and comprehension to require prediction (see Sebanz & Knoblich, 2009). For instance, in a joint go/no-go task, Sebanz et al. (2006b) We have noted that language production is a form of action found enhanced N170 event-related potentials (ERPs), and comprehension is a form of action perception; accord- reflecting response inhibition, for the nonresponding ingly, we now apply the above framework to language. This player when it was the partner’s turn to respond. They is of course consistent with the evidence for interweaving fl interpreted this as suggesting that a person suppresses his that we brie y considered in section 1: the tight coupling or her own actions at the point when a partner is about between interlocutors in dialogue, the evidence for to act. In addition, people continue each other’s behavior effects of comprehension processes on acts of production by overtly imitating their predicted behavior (in contrast and vice versa in behavioral experiments, and the overlap to overt imitation of actual behavior). For example, early of brain circuits involved in acts of production and compre- studies showed that some mirror neurons fired when the hension. We now argue that such interweaving occurs pri- monkey observed a matching action (i.e., one that would marily to facilitate prediction, which in turn facilitates cause that neuron to fire if the monkey performed that production and comprehension. fi action) and others fired when it observed a nonmatching We rst propose that speakers use forward production action that could precede the matching action (di Pelle- models of their utterances in the same way that actors grino et al., 1992). use forward action models, by constructing efference Complementary behavior occurs when the co-actors use copies of their predicted utterance and comparing those their same predictions to derive different (but coordinated) copies with the output of the production implementer. ’ behaviors. For example, in ballroom dancing, both A and B We then propose that listeners predict speakers upcoming predict that B will move his foot forward; B will then move utterances by covertly imitating what they have uttered so his foot, and A will plan her complementary action of far, deriving their underlying message, generating effer- moving her foot backward. Graf et al. (2010) reviewed ence copies, and comparing those copies with the actual much evidence for complementary motor involvement in utterances when they occur, just as in our account of action perception (see Häberle et al. 2008; Newman- action perception. Dialogue involves the integration of Norlund et al. 2007; van Schie et al. 2008). the models of the speaker and the listener. These proposals So far, we have described how A and B predict B’s are directly analogous to our proposals for action, action action. To explain joint activity, we first note that A and B perception, and joint action, except that we assume struc- predict A’s action as well (in the same way). They then inte- tured representations of language involving (at least) grate these predictions with their predictions of B’s action. semantics, syntax, and phonology. To do this, they must simultaneously predict their own action and their partner’s action. They can determine 3.1. Forward modeling in language production whether these acts are compatible (essentially asking them- selves, “does my upcoming act fit with your upcoming In acting, the action command drives the action implemen- act?”). If not, they can modify their own upcoming ter to produce an act, which the perceptual implementer actions accordingly (so that such modifications can occur uses to produce a percept of that act (see Fig. 2). But typi- on the basis of comparing predictions alone, without cally, before this process is complete, the efference copy of having to wait for the action). (If I find out that I am the action command drives the forward action model to likely to collide with you, I can move out of the way.) produce a predicted act, which the forward perceptual This account can therefore explain tight coupling of joint model uses to produce a predicted percept. The actor activity, as well as the experience of “shared reality” that can then compare these outputs and adjust the action occurs when A and B realize that they are experiencing command (or the forward model) if they do not match. the world in similar ways (Echterhoff et al. 2009). In language production (see Fig. 5), the action command Importantly, the participants in a joint action perform is specified as a production command. The action implemen- actions that are related to each other. It is of course ter is specified as the production implementer, and the per- easier for A to predict both A and B’s actions if their ceptual implementer is specified as the comprehension actions are closely related (as is the case in tightly implementer. Similarly, the forward action model is speci- coupled activities such as ballroom dancing). If A’s predic- fied as the forward production model, and the forward per- ^ ðÞþ ’ fi tions of her own action (aA t 1 ) and her prediction of B s ceptual model is speci ed as the forward comprehension ^ ðÞþ fi action (aB t 1 ) were unrelated, she would nd both pre- model. The comparison of the utterance percept and the dictions hard; if the predictions are closely related, A is able predicted utterance percept constitutes self-monitoring. to use many of the computations involved in one prediction In Figure 5, the production command constitutes the to support the other prediction. In other words, it is easier message that the speaker wishes to convey (see Levelt to predict another person’s actions when you are 1989) and includes information about communicative

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 9 Pickering & Garrod: An integrated theory of language production and comprehension

Figure 5. A model of production, using a snapshot of speaking at time t. The production command i(t) is used to initiate two processes. First, i(t) feeds into the production implementer, which outputs an utterance p[sem,syn,phon](t), a sequence of sounds that encodes semantics, syntax, and phonology. Notice that t refers to the time of the production command, not the time at which the representations are computed. In turn, the speaker processes this utterance to create an utterance percept, the perception of a sequence of sounds that encodes semantics, syntax, and phonology. Second, an efference copy of i(t) feeds into the forward production model, a computational device which outputs a predicted utterance. This feeds into the forward comprehension model, which outputs a predicted utterance percept (i.e., of the predicted semantics, syntax, and phonology). The monitor can then compare the utterance percept and the predicted utterance percept at one or more linguistic levels (and therefore performs self-monitoring). force (e.g., interrogative), pragmatic context, and a nonlin- Finally, speakers use the comprehension implementer to guistic situation model (e.g., Sanford & Garrod 1981). In construct the utterance percept. Again, we assume that this addition, Figure 5 does not merely differ from Figure 2 system acts on each production representation individually, in terminology, but it also assumes structured linguistic rep- so that p[sem](t) is mapped to c[sem](t), p[syn](t)to resentations, such as p [sem,syn,phon](t) rather than a(t). c[syn](t), and p[phon](t)toc[phon](t); therefore, Figure 5 As we have noted, language processing appears to involve is a simplification in this respect as well. Importantly, the a series of intermediate representations between message speaker constructs her utterance percept for semantics and articulation. So Figure 5 is a simplification: We before syntax before phonology. Unlike Levelt (1989), we assume that speakers construct representations associated therefore assume that the speaker maps between represen- with the semantics, syntax, and phonology of the actual tations associated with production and comprehension at utterance, with the semantics being constructed before all linguistic levels. the syntax, and the syntax before the phonology (in ÂÃThe forwardÂÃ production model constructs p^½ sem ðÞt , accord with all theories of language production, even if p^ syn ðÞt , and p^ phon ðÞt , andÂÃ the forwardÂÃ comprehension they assume some feedback between representations).8 model constructs c^½ sem ðÞt , c^ syn ðÞt , and c^ phon ðÞt . Most We can therefore refer to these individual representations important, these representations are typically ready before as p[sem](t), p[syn](t), and p[phon](t). Note that the map- the representations constructed by the production imple- pings from p[sem](t)top[syn](t) and p[syn](t)top[phon](t) menter and the comprehension implementer. The involve aspects of the production implementer, but speaker can then use the monitor to compare the predicted Figure 5 places the production implementer before a utterance percept with the (actual) utterance percept at single representation p[sem,syn,phon](t) for ease of presen- each level (see Fig. 5) when those actual percepts are tation. Assuming Indefrey and Levelt’s(2004) estimates ready. Thus, the monitor can compare predicted with (based on single-word production), semantics (including actual semantics first, then predicted with actual syntax, message preparation) takes about 175 ms, syntax (lemma then predicted with actual phonology. The production access) takes about 75 ms, and phonology (including sylla- implementer makes occasional errors, and the monitor bification) takes around 205 ms. Phonetic encoding and detects such errors by noting mismatches between articulation takes an additional 145 ms (see Sahin et al. outputs of the production implementer and outputs of 2009, for slightly longer estimates of syntactic and phonolo- the forward model. It may then trigger a correction (but gical processing). does not need to do so). To do this, the monitor must of

10 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension course be fairly accurate and use predictions made inde- the implemented semantics and the implemented syntax, pendently of the production implementer itself. and between the implemented syntax and the implemented Let us now consider the content of these predictions and phonology, but we do not assume intervention in the the organization of the forward models in more detail using forward production model. examples. In doing so, we address the obvious criticism that For example, a speaker might decide to describe a transi- if the speaker is computing a forward model, why not just tive event. At this point, she constructs a forward model of use that model in production itself? The answer is that syntax, say [NP [VNP]VP]s, where NP refers to a noun the predictions are not the same as the implemented pro- phrase, V a verb, VP a verb phrase, and S asentence.This duction representations, but are easier-to-compute “impo- forward model appears appropriate if the speaker knows verished” representations. They leave out (or simplify) that transitive events are usually described by transitive con- many components of the implemented representations, structions, a piece of information assumed in construction just as a forward model of predicted hand movements grammar (Goldberg 1995), which associates constructions might encode coordinates but not distance between index with “general” meanings. The speaker can therefore make finger and thumb, or a forward model for navigation this prediction before having decided on other aspects of might include information about the ship’s position and the semantics of the utterance, thus allowing the syntactic perhaps fuel level but not its response to the heavy swell. prediction to be ready before the implemented semantics. Similarly, the forward model does not form part of the At a more abstract level, consider when the speaker production command. The production command incorpor- wishes to refer to something in common ground (but not ates a conceptual representation that describes a situation highly focused). On the basis of extensive experience, she model and communicative force. It cannot represent infor- can predict that the utterance will have the semantics defi- fi – mation such as the rst phoneme of the word the speaker is nite nominal, the syntax [Det N]NP where Det refers to a to use, because such information is phonological, not con- determiner, N a noun, and NP a noun phrase – and the pho- ceptual. In addition, the production command does not nology starting with /ðe/; she may also predict that she will involve perceptual representations (what it “feels like” to start uttering the noun in 200 ms. perform an act), unlike the forward comprehension model. This approach might underlie choice of syntactic struc- Additionally, the forward model represents rather than ture during production. For example, speakers of English instantiates time. For example, a speaker utters The boy favor producing short constituents before long ones (e.g., went outside to fly… and has decided to produce a word Hawkins 1994). To do this, they might start constructing corresponding to a conceptual representation of a kite. At short and long constituents at the same time but tend to this point, she has predicted that the next word will be a produce short ones first because they are ready first (see definite determiner with phonology /ðe/, and that its articu- Ferreira 1996). However, this appears to be inefficient lation should start in 100 ms. (She does not wait 100 ms to because it would lead to sharp increases in processing dif- make this prediction.) She may also have predicted some ficulty at specific points (here, when producing the short aspects of the following word (kite) and that it should phrase), and would therefore work against a preference start in 300 ms. for uniform information density during production But apart from the timing, in what sense is this forward (Jaeger 2010, p. 25). It would mean that the long phrase modelÂÃ impoverished? The phonological prediction would often be ready much too early, and would incorrectly (p^ phon ðÞt ) might indicate (for example) the identities of predict that blend errors should be very common. the phonemes (/k/, /a/, /I/, /t/) and their order, but not Alternatively, the speaker could decide to describe a how they are produced. So when the speaker decides to complex event and a simple event. She uses forward mod- utter kite, she might simply look up the phonemes in a eling to predict that the complex event will require a heavy table and associate them with the numbers 1, 2, 3, and 4. phrase and the simple event a light phrase. She then evokes Importantly, she does not necessarily have the prediction the “short before long” principle, and uses it to convert the of /k/ ready before the prediction of /t/. Alternatively, she simple event into a light phrase (using the production might look up the first phoneme, in which case the implementer). She can then wait till quite near the end forward model would include informationÂÃ about /k/ only. of the phrase before beginning to produce the heavy Similarly, the syntactic prediction (p^ syn ðÞt ) might phrase (again, using the implementer). In this way, she include the grammatical category of noun, but not keeps information density fairly constant, prevents blend- whether the noun is singular or plural (or its gender, in a ing errors, and reduces memory load. gender-marking language). The speaker might simply Just as in action, the speaker “tunes” the forward model look up the information that a flyable object is likely to based on experience speaking. If she has repeatedly formu- be a noun. This information then suggests that the word lated the intention to refer to a kite concept and then should occur at particular positions: for instance, following uttered the phoneme /k/, sheÂÃ will start to construct an a determiner. In addition, it is not necessary that the pre- accurate forward model (p^ phon ðÞ¼t =k=) when she dicted representations are computed sequentially. next decides to refer to such a concept. If she then con- Although the implemented syntax (p[syn](t)) must be structsÂÃ an incorrect phonological representation (e.g., ready before the implemented phonology (p[phon](t)), p phon ðÞ¼t =g=), the monitor will likely immediately the syntactic prediction need not be ready before the pho- notice the mismatch between these two representations. nological prediction. For example, the speaker might If she believes the forward model is accurate, she will predict that the kite concept should have the first detect a , perhaps reformulate, and modify phoneme /k/ and predict that it should be a noun at the her production implementer for subsequent utterances; same time, or indeed predict the first phoneme without if she believes that it may not be accurate, she will not making any syntactic prediction at all. In summary, we reformulate but will alter her forward model accordingly assume that the production system “intervenes” between (cf. Wolpert et al. 2001).

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 11 Pickering & Garrod: An integrated theory of language production and comprehension Evidence from . There is good evi- is in fact due to feedback in the production system; Dell dence for use of forward perceptual models during 1986). The accounts do not, however, deny the existence speech production. In a magnetoencephalography (MEG) of a comprehension-based monitor. study, Heinks-Maldonado et al. (2006) found that the However, alternative accounts have assumed that at least M100 was reduced when people spoke and concurrently some monitoring can be “internal” to language production listened to their own unaltered speech versus a pitch- (e.g., Laver 1980;Schlencketal.1987;VanWijk& shifted distortion of the speech. We assumeÂÃ that they con- Kempen 1987;seePostma2000). Such monitoring could struct a predicted phonological percept, c^ phon ðÞt .This involve the comparison of different aspects of implemented typically matches their phonological percept (c[phon](t)) production – for example, if the process is redundantly orga- and thus suppresses the M100 (i.e., via reafference cancel- nized and a problem is noted if the outputs do not match lation). But when the actual speech is distorted, the percept (see Schlenck et al. 1987). Alternatively, it could register a and the predicted percept do not match, and thus the problem if there is high conflict between potential words M100 is enhanced. (The M100 could not reflect distorted or phonemes (Nozari et al. 2011). Our account makes the speech itself as it was not enhanced when distorted rather different claim that the monitor compares the speech was replayed to the speakers.) The rapidity of the output of implemented production (the utterance percept) effect suggests that speakers could not be comprehending with the output of the forward model (the predicted utter- what they heard and comparing this to their memory of ance percept).9 their planned utterance. Additionally, Tian and Poeppel Of course, speakers clearly can perform comprehension- (2010) had participants produce or imagine producing a syl- based monitoring using the external loop and indeed may lable, and found the same rapid MEG response in auditory be able to perform it using the internal loop as well. But a cortex in both conditions. This suggests that speakers con- purely comprehension-based account cannot explain the struct a forward model incorporating phonological infor- data from Heinks-Maldonado et al. (2006) and Tourville mation under conditions when they do not speak (i.e., do et al. (2008). In addition, such an account has difficulty not use the production implementer). explaining the timing of error detection. To correct to Tourville et al. (2008) had participants read aloud mono- the ye– to the orange node, the speaker prepares for syllabic words while recording fMRI. On a small proportion p[phon](t)foryellow,convertsitintoc[phon](t), uses com- of trials, participants’ auditory feedback was distorted by prehension to construct c[sem](t), judges that c[sem](t)is shifting the first formant either up or down. Participants not appropriate (i.e., it is incompatible with p[sem](t)orit compensated by shifting their speech in the opposite direc- does not make sense in the context), and manages to stop tion within 100 ms. Such rapid compensation is a hallmark speaking, before she articulates more than ye–. Given Inde- of feedforward (predictive) monitoring (as correction fol- frey and Levelt’s(2004) estimates, the speaker has about lowing feedback would be too slow). Moreover, the fMRI 145 ms plus the time to utter ye–, which is arguably less results identified a network of neurons coding mismatches than the time it takes to comprehend a word (e.g., Levelt between expected and actual auditory signals. These three 1989). Speakers might therefore make use of a “buffer” to studies therefore provide clear evidence for forward store intermediate representations and delay phonetic encod- models in speech production. In fact, Tourville and ing and articulation (e.g., Blackmer & Mitton 1991), but this Guenther (2011) described a specific implementation of is unlikely given that they speed up the process of monitoring such forward-model-based monitoring in the context of andrepairwhenspeakingfaster(seePostma2000). their Directions into Velocities of Articulators (DIVA) Such findings appear incompatible with a purely com- and Gradient Order DIVA (GODIVA) models of speech prehension-based approach to monitoring.10 In addition, production. However, these data and implementations do Nozari et al. (2011) argued that nonspeakers may be able not relate to the full set of stages involved in language to use the internal loop (as in Wheeldon & Levelt 1995), production. but that speakers would face the extreme complexity of Language production and self-monitoring. In psycholin- simultaneously comprehending different parts of an utter- guistics, well-established accounts of language production ance with the internal and the external loops (see also Vig- (e.g., Bock & Levelt 1994; Dell 1986; Garrett 1980; Hart- liocco & Hartsuiker 2002). They also noted that there is suiker & Kolk, 2001; Levelt 1989; Levelt et al. 1999) make much evidence for a dissociation between comprehension no reference to forward modeling, and instead debate the and self-monitoring in aphasic patients. operations of the production implementer (see top line in Huettig and Hartsuiker (2010) monitored speakers’ eye Fig. 4). They tend to assume that self-monitoring uses movements while speakers referred to one of four objects the comprehension system. Levelt (1989) proposed that in an array. The array contained an object whose name was people can monitor what they utter (using an external phonologically related to the name of the target object. In loop) and thus repair errors. But he noted that they also comprehension experiments, people tend to look at such make many repairs before completing the word, as in to phonological competitors more than unrelated objects (Allo- the ye– to the orange node, where it is clear that they pena et al. 1998). Huettig and Hartsuiker found that their were going to utter yellow (Levelt 1983), and show speakers also tended to look at competitors after they had arousal when they are about to utter a taboo word but do produced the target word. This suggests that they monitored not do so (Motley et al. 1975). Levelt therefore proposed their speech using the comprehension system. They did not, that speakers construct a sound-based representation (orig- however, look at competitors while producing the target inally phonetic, but phonological in Wheeldon & Levelt word, which suggests that they did not use a comprehen- 1995) and input that representation directly into the com- sion-based monitor of a phonological representation. prehension system (using an internal loop). Note that Huettig and Hartsuiker’s findings therefore imply that speak- other accounts have assumed monitoring that is more ers first monitor using a forward model (as we propose) but limited (e.g., suggesting that some evidence for monitoring can later perform comprehension-based monitoring.

12 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension

Accounts using an internal loop imply that phonological utterance at time t is the semantics, syntax, and phonology errors should be detected before semantic errors (assuming of I want to go out and fly the. To predict the situation at that both forms of detection are equally difficult). In con- time t+1, A covertly imitates B’s production of I want to trast, our account claims that speakers construct the pre- go out and fly the, and derives the production command dicted semantic, syntactic, and phonological percepts that A would use to produce this utterance. A then early. Speakers then construct the semantic percept and derives the production command that A would use to compare it with the predicted semantic percept; then produce the word that B would likely say (kite) and runs they construct the syntactic percept and compare it with his forward models to derive his predicted utterance the predicted syntactic percept; finally, speakers construct percept. If A feels sufficiently certain of what B is likely the phonological percept and compare it with the predicted to say, A can act on this prediction – for example, looking phonological percept. Thus, they should detect semantic for a kite before B actually says kite. In addition, A can errors before syntactic errors, and detect syntactic errors compare his prediction of what B will say with what B actu- before phonological errors.11,12 ally says using the monitor. In this case, A has no access to B’s representations during production, and therefore derives the utterance percept from B’s actual utterance. 3.2. Covert imitation and forward modeling in language This means that A will access B’s phonology before B’s comprehension semantics. In this respect, other-monitoring is different We now propose an account of prediction during language from self-monitoring. comprehension that incorporates the account of prediction Importantly, A derives the production command of what during language production (see Fig. 5) in the same way A assumes B is likely to say (i.e., kite), rather than what A that the account of prediction during action perception himself would be likely to say (i.e., airplane). This is the (see Fig. 3) incorporates the account of prediction during effect of using context together with the inverse model. It action (see Fig. 2). This account of prediction during is consistent with the finding that comprehenders often language comprehension assumes that people make use pay attention to the speaker’s state of knowledge (e.g., of their ability to predict aspects of their own utterances Hanna et al. 2003; Metzing & Brennan 2003). However, to predict other people’s utterances. Of course, language comprehenders also show some “egocentric biases” (e.g., comprehension involves structured linguistic represen- Keysar et al. 2000), a finding which is expected given that tations (semantics, syntax, and phonology), and different the comprehender’s use of context cannot be perfect. predictions can be made at different levels. Hence predic- Note also that predictions are driven by the forward pro- tion is very powerful, because it is often the case that duction model, not by the production system itself. The language is highly predictable at one linguistic level at production system would normally be too slow, given that least. An upcoming content word is sometimes predictable. the speaker should be at least as aware of what she is Often, a syntactic category can be predicted when the word trying to say as the listener is. Use of the forward model itself cannot. On other occasions, the upcoming phoneme also tends to cause some co-activation of the production is predictable. We propose that comprehenders make system (as is typically the case when forward models are whatever linguistic predictions they can. constructed). Such activation is not central to prediction- We assume that people can predict language using the by-simulation, but can lead to interference between pro- association route and the simulation route. The association duction and comprehension, and serves as the basis for route is based on experience in comprehending others’ overt imitation (see “Overt responses” in Fig. 6). utterances. A comparable mechanism could be used to Note that Glenberg and Gallese (2012) recently pro- predict upcoming natural sounds (e.g., of a wave crashing posed an Action Based Language (ABL) model of acqui- against rocks). The simulation route is based on experience sition and comprehension that also uses paired inverse producing utterances. As in action perception, the simplest and forward models as in MOSAIC. The primary goal of possibility is that the comprehender works out what he ABL is to account for the content (rather than form) of would say under the circumstances more quickly than the language understanding, with language comprehension producer speaks, using a forward model. But just as with leading to the activation of action-based (embodied) rep- action perception, he needs to be able to represent what resentations. To do this, they specifically draw on evidence the speaker would say, not what he himself would say, from mirror-neuron systems (see sect. 4). and to do this, he needs to take into account the context. To assess our account, we discuss the evidence that com- We illustrate the model in Figure 6, in which the compre- prehenders make predictions, that they covertly imitate hender A covertly imitates B’s unfolding utterance (at what they hear, and that covert imitation leads to prediction time t) and uses forward modeling to derive the predicted that facilitates comprehension. utterance percept, which can then be compared with A’s percept of B’s actual utterance (at time t+1). Note that this account differs from Pickering and Garrod 3.2.1. Evidence for prediction. A great deal of evidence (2007), in which the comprehender simply predicts what shows that people predict other people’s language (see he would say (and where these representations are not Kutas et al. 2011 and Pickering & Garrod 2007, for impoverished). Other-monitoring can take place at differ- reviews). This evidence is compatible with probabilistic ent linguistic levels, just like self-monitoring. models of language comprehension (e.g., Hale 2006; We now illustrate this account using a situation in which Levy 2008), models of complexity that incorporate predic- A (a boy) and B (a girl) have been given presents of an air- tion (Gibson 1998), and accounts based on simple recur- plane and a kite respectively. B utters I want to go out and rent networks (Elman 1990; see also Altmann & Mirkovic fly the. It is of course highly likely that B will say kite, which 2009). But much of the evidence also provides support has p[sem,syn,phon]B (t+1) = [KITE, noun, /kaIt/]. The for aspects of the account in Figure 6.

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 13 Pickering & Garrod: An integrated theory of language production and comprehension

Figure 6. A model of the simulation route to prediction during comprehension in Person A. Everything above the solid line refers to B’s ’ unfolding utterance (and is underlined). A predicts B s utterance p[sem,syn,phon]B (t+1) (i.e., its upcoming semantics, syntax, and phonology) given B’s utterance (i.e., at the present time t). To do this, A first covertly imitates B’s utterance. This involves deriving a representation of the utterance percept, and then using the inverse model and context (e.g., information about differences between ’ ’ ’ A s speech system and B s speech system) to derive the production command iB(t) that A would use if A were to produce B s ’ utterance and from this the production command iB(t+1) associated with the next part of B s utterance (e.g., phoneme or word). A now uses the same forward modeling as she does when producing an utterance (see Fig. 4) to produce her predictions of B’s utterance and of B’s utterance percept (at different linguistic levels). These predictions are typically ready before her comprehension of B’s utterance (the utterance percept). She can then compare the utterance percept and the predicted utterance percept at different linguistic levels (and therefore performs other-monitoring). Notice that A can also use the derived production command ’ iB(t) to overtly imitate B s utterance and the derived production command iB(t+1) to overtly produce the subsequent part of B’s utterance (see “Overt responses”).

First, prediction occurs at different linguistic levels. In fact, either makes the sentence more predictable by Some research shows prediction of phonology (or associ- ruling out an analysis in which or starts a new clause. Simi- ated visual or orthographic information). Delong et al. larly, early syntactic anomaly effects in the ERP record are (2005) recorded ERPs while participants read sentences affected by whether the linguistic context predicts a par- such as The day was breezy so the boy went outside to ticular syntactic category for the upcoming word or fly… They showed an N400 effect when the sentence whether the linguistic context is compatible with different ended with the less predictable an airplane than the syntactic categories (Lau et al. 2006), and reading times more predictable a kite. The striking finding was that this are affected by predicted syntactic structure associated effect occurred at a or an. It could not relate to ease of inte- with ellipsis (Yoshida et al. 2013). gration but must have involved prediction of the word and Clear evidence for semantic prediction comes from eye- its phonological form (i.e., that it began with a consonant). tracking studies in which participants listened to sentences Vissers et al. (2006) found evidence of disruption when a while viewing arrays of objects or depictions of events. They highly predictable word was misspelled, presumably started looking at edible objects more than at inedible because it clashed with the predicted orthographic rep- objects while hearing the man ate the (but not when ate resentation of the correct word. was replaced with moved; Altmann & Kamide 1999). Other experiments show prediction of syntax. Van These predictive eye movements do not just depend on Berkum et al. (2005) found disruption when Dutch the meaning (or lexical associates) of the verb, but are readers and listeners encountered an adjective that did affected by properties of the prior context (Kaiser & Trues- not agree in with an upcoming, well 2004; Kamide et al. 2003) or other linguistic infor- highly predictable noun. Staub and Clifton (2006) found mation such as (Weber et al. 2006). People also that people read or the subway faster after The team took predict the upcoming event as well as the upcoming refer- either the train …than after The team took the train … ent (Knoeferle et al. 2005).

14 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension

Some of these studies do not clearly demonstrate that contact. Furthermore, this effect only occurred with incon- the predictions are used more rapidly than would be poss- gruent distracters and not with distinct but congruent dis- ible with the production implementer. The eye-tracking tracters (e.g., /g/-initial distracters when producing /k/). studies reveal faster predictions, but they may show predic- These results suggest that perceiving speech results in tion of semantics (e.g., edible things) rather than a word selective, covert, and automatic activation of the speech (e.g., cake). However, recent MEG evidence shows sensi- articulators. Note that these findings show activation of tivity to syntactic manipulations in little over 100 ms, in the production implementer (not a forward model). visual cortex (Dikker et al. 2009; 2010). For example, the There is also much evidence for both overt imitation and M100 was affected by predictability when the upcoming overt completion. Speakers tend to imitate the speech of word looked like a typical noun (e.g., soda) but not when other people after they have comprehended it (see Picker- it did not (e.g., infant). Presumably, these results cannot ing & Garrod 2004), and to repeat each other’s choice of be due to integration, because activation of the grammati- words and semantics (Garrod & Anderson 1987), syntax cal category of this word (as part of the process of lexical (Branigan et al. 2000), and sound (Pardo 2006). Such imi- access) could not occur so rapidly or in an area associated tation can be rapid and apparently automatic; for instance, with visual form. Instead, the comprehender must predict speakers are almost as quick imitating a phoneme as they both syntactic categories and the form most likely associ- are making a simple response to it (Fowler et al. 2003). ated with those categories, then match those predictions Speakers also tend to complete others’ utterances. For against the upcoming word. Given that syntactic processing example, Wright and Garrett (1984; see also Peterson does not take place in the visual cortex (or indeed so et al. 2001) found that participants were faster at naming quickly), these results reflect the visual correlates of syntac- a word that was syntactically congruent with prior context tic predictions. They suggest that the comprehender con- than a word that was incongruent (even though neither structs a forward model of visual properties (presumably word was semantically appropriate). Moreover, people reg- closely linked to phonological properties) on the basis of ularly complete each other’s utterances during dialogue sentence context and can compare these predicted visual (e.g., 1a-c presented in sect. 1.1); see, for example, Clark properties with the input within around 100 ms. and Wilkes-Gibbs (1986). Rapid overt imitation and overt Dikker and Pylkkänen (2011) found evidence for form completion are of course compatible with prior covert imi- prediction on the basis of semantics. Participants saw a tation (see “Overt responses” in Fig. 6). picture followed by a noun phrase that matched (or mis- matched) the specific item in the picture (e.g., an apple) 3.2.3. Evidence that covert imitation facilitates compre- or the semantic field (e.g., a collection of food). They hension via prediction. The previous sections presented found an M100 effect in visual cortex associated with evidence that comprehenders make rapid predictions and matching the specific item but not the semantic field, that they covertly imitate what they hear. But are imitating suggesting that participants predicted the form of the and predicting causally linked in the way suggested in specific word. Figure 6? The evidence for prediction could involve the Kim and Lai (2012) conducted a similar study to Vissers association route. In addition, covert imitation of language et al. (2006) and found a P130 effect for contextually sup- could be used postdictively, to facilitate memory (as a com- ported pseudowords (e.g., … bake a ceke) but not for non- ponent of rehearsal) or to assist when comprehension leads supported pseudowords (e.g., bake a tont). In contrast, an to incomplete analyses or fails to resolve an ambiguity (see N170 effect occurred for nonsupported pseudowords Garrett 2000). (and nonwords). The N170 may relate to lexical access, Recent evidence, however, suggests that covert imitation but the P130 occurs before lexical access can have occurred drives predictions that facilitate comprehension. Adank and and again appears to reflect a forward model, in which the Devlin (2010) used fMRI to show that during adaptation to comprehender predicts the form of the word (cake) and time-compressed speech there was increased activation in matches the input to that form.13 In conclusion, these the left ventral premotor cortex, an area concerned with four studies support forward modeling, but they do not dis- planning articulation. This suggests that participants cov- criminate between prediction-by-simulation and predic- ertly imitated the compressed speech as part of the adap- tion-by-association. tation process that facilitates comprehension. Adank et al. (2010) found that overt imitation of sentences in an unfami- 3.2.2. Evidence for covert imitation. Much evidence liar accent facilitated comprehension of subsequent sen- suggests that comprehenders activate mechanisms associ- tences in that accent, in the context of noise. This ated with aspects of language production. As we have suggests that overt imitation adapts the production system noted, there appear to be integrated circuits associated to an unfamiliar accent and therefore that the production with production and comprehension (Pulvermüller & system plays an immediate causal role in comprehension. Fadiga 2010). For example, the lateral part of the precen- Ito et al. (2009) manipulated listeners’ cheeks as they tral cortex is active when listening to /p/ and producing /p/, heard words on a continuum between had and head. whereas the inferior precentral area is active when listening When the skin of the cheek was stretched upward, listeners to /t/ and producing /t/ (Pulvermüller et al. 2006; see also reported hearing head in preference to had; when the skin Vigneau et al. 2006; Wilson et al. 2004). We have also was stretched downward, they reported hearing had in pre- noted that tongue and lip muscles are activated during lis- ference to head. Because production of had requires an tening to speech but not other sounds (Fadiga et al. 2002; upward stretch of cheek skin and production of head adown- Watkins et al. 2003). More specifically, Yuen et al. (2010) ward stretch, the results suggest that proprioceptive feed- found that listening to incongruent /t/-initial distracters back from the articulators causally affected comprehension leaves articulatory traces on simultaneous production of (see also Sams et al. 2005). These results could conceivably /k/or/s/ phonemes, in the form of increased alveolar be postdictive, perhaps relating to reconstruction occurring

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 15 Pickering & Garrod: An integrated theory of language production and comprehension during self-report. Clearer evidence comes from Möttö- Huettig et al. 2011). For example, Huettig and McQueen nen and Watkins (2009), who used repetitive transcranial (2007) had participants listen to a sentence and found that magnetic stimulation (rTMS) to temporarily disrupt they looked at a picture whose name was phonologically specific articulator representations during speech percep- related to a target word (cf. Allopena et al. 1998)when tion. Disrupting lip representations in left primary motor they viewed the pictures for 2–3 s before hearing the target cortex impaired categorical perception of speech sounds word but not when they viewed the pictures for 200 ms. In involving the lips (e.g., /ba/-/da/), but not the perception the former case, they presumably had enough time to of sounds involving other articulators (e.g., /ka/-/ga/). Fur- access the phonological form of the name of the picture. thermore, D’Ausilio et al. (2009) found that double-pulse These studies therefore show that the results of covert imi- TMS administered to the part of the motor cortex control- tation have immediate effects on comprehension as a result ling lip movements speeded up and increased accuracy of of prediction. Moreover, we have shown that covert imitation responses to lip-articulated phonemes, whereas TMS and prediction take place at many linguistic levels. Together, administered to the part of the motor cortex controlling all of these findings provide support for the model of predic- tongue movements speeded up and increased accuracy tion-by-simulation in Figure 6. Of course, comprehenders of responses to tongue-articulated phonemes. More may also perform prediction-by-association, just as they can recently, D’Ausilio et al. (2011) had participants repeat- for predicting nonlinguistic events. edly hear a pseudoword (e.g., birro)andusedTMSto reveal immediate appropriate articulatory activation 3.3. Interactive language (associated with rr)iftheyheardthefirst part of the same word (bi, when co-articulated with rro)thanif Interactive conversation is a highly successful form of joint they heard the firstpartofadifferentword(bi,when activity. It appears to be very complex, with interlocutors co-articulated with ffo). Thus, covert imitation facilitates having to switch between production and comprehension, speech recognition as it occurs and before it occurs. perform both acts at once, and develop their plans on the A different type of evidence comes from Stephens et al. fly (Garrod & Pickering 2004). Just as we explained joint (2010), who correlated cortical blood-oxygen-level-depen- actions by combining the accounts of action and action dent (BOLD) signal changes between speakers and listen- perception (see Fig. 4), so we explain conversation by com- ers during the course of a narrative. There was aligned bining the accounts of language production and compre- neural activation in many cortical areas at different lags. hension (as in Figs. 5 and 6). Sometimes the speaker’s neural activity preceded that of Figure 7 shows how both A and B can predict B’s the listener, but sometimes the listener’s activity preceded upcoming utterance (using prediction-by-simulation). A that of the speaker. Importantly, listeners whose activity comprehends B’s current utterance and then uses covert preceded that of the speaker showed better comprehen- imitation and forward modeling; B formulates his forth- sion, suggesting that covert imitation led to prediction coming production command and uses forward modeling and that this prediction facilitated comprehension. based on that command. If A and B are successful, they Finally, speakers may use the production system to should make similar predictions about B’s upcoming utter- predict upcoming words (and events) in relation to ance, and they can use those predictions to coordinate (i.e., scenes. In “visual world” experiments, participants activate have a well-organized conversation). Note that they can the phonology associated with the names of the objects (see both compare their predictions with B’s forthcoming

Figure 7. A and B predicting B’s forthcoming utterance (with B’s processes and representations underlined). B’s production command ’ ’ ’ ’ iB(t) feeds into B s production implementer and leads to B s utterance p[sem,syn,phon]B (t). A covertly imitates B s utterance and uses A s forward production model to predict B’s forthcoming utterance (at time t+1). B simultaneously constructs the next production command (the dotted line indicates that this command is causally linked to the previous action command for B but not A) and uses B’s forward production model to predict B’s forthcoming utterance. If A and B are coordinated, then A’s prediction of B’s utterance and B’s prediction of B’s utterance (in the dotted box) should match. Moreover, they may both match B’s forthcoming (actual) utterance at time t+1 (not shown).

16 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Pickering & Garrod: An integrated theory of language production and comprehension utterance when produced, with A using other-monitoring B would be likely to say if B spoke at that point. In other and B using self-monitoring. In addition, A and B can words, B’s prediction of B’s completion becomes a good also predict A’s forthcoming utterance (so both A and B proxy for B’s prediction of A’s completion, and so there is predict both A and B). Of course, these predictions will less likelihood of an egocentric bias.14 In fact, linguistic be related to A’s and B’s predictions of B’s utterance joint action is more likely to be successful and well-coordi- (e.g., both of them might predict both A’s upcoming nated than many other forms of joint action, precisely word and B’s response following that word), in a way that because the interlocutors communicate with each other will reduce the difficulty of making two predictions. and share the goal of mutual understanding. Our account can explain how interlocutors can be so well coordinated – for example, why intervals between turns are so close to 0 ms (Sacks et al. 1974; Wilson & Wilson 2005) 4. General Discussion and why interlocutors are so good at using the content of utterances to predict when they are likely to end (de Our accounts of comprehension and dialogue assign a Ruiter et al. 2006). Moreover, it accords with the treatment central role to simulation. We discuss three aspects of of dialogue as coordinated joint activity, in which partners simulation: the relationship between “online” and “offline” are able to take different roles as appropriate (Clark simulation, between prediction-by-simulation and predic- 1996). It can also explain the existence and speed of com- tion-by-association, and between simulation and embodi- pletions, overt imitation (e.g., Branigan et al. 2000; ment. We conclude by explaining how our account provides Fowler et al. 2003; Garrod & Anderson 1987), and (assum- an integrated theory of production and comprehension. ing links between intentions) rapid complementary We have focused on online simulation, when the com- responses (as in answers to questions). prehender wishes to predict the speaker in real time. We illustrate with the following extract (adapted from However, our notion of simulation is compatible with the Howes et al. 2011): simulation theory of mind-reading (Goldman 2006; see Gordon 1986), which is primarily used to explain offline – … 2a A: and then we looked along one deck, we were high up, understanding of others. In our account, the comprehen- and down below there were rows of, rows of lifeboats in case, der “enters” the simulation during covert imitation, and you see, “ ” 2b – B: –there was an accident exits after constructing the predicted utterance percept 2c – A: –of an accident (see Fig. 6). As in our account, Goldman assumed that people covertly imitate as though they were character In (2b–c), B speaks at the same time as A and has a similar acting – attempting to resemble their target as much as understanding to A. B interrupts A, and it is clear that B possible, and then running things forward as well as they must be as ready to contribute as A. Because B completes can. This means that the derived action command is “sup- A’s utterance without delay, it would not be possible for B posed” to be the action command of the target, but it incor- to produce (2b) by comprehending (2a) and then preparing porates any changes that are required because of bodily a response “from scratch,” as traditional “serial monologue” differences. (I can walk like Napoleon by putting my hand accounts assume (see Fig. 1). Instead, we assume that B cov- inside my jacket and seeing how this affects my gait, but I ertly imitates A’s utterance, determines A’s current pro- cannot shrink.) In addition, the perceiver may fail to duction command, determines A’s forthcoming production derive the actor’s action command correctly, in which case command, and produces an overt completion (see “Overt her covert imitation is biased toward her own proclivities. responses” in Fig. 6). Thus B’s response is time-locked to The important difference between such accounts and ours A’s contribution. In fact, (2b) is different from A’sowncon- is that they do not assume forward models and therefore tinuation (2c). The two continuations are syntactically differ- assume that covert imitation uses the action implementer ent (though both grammatical) but semantically equivalent, (but inhibiting overt responses). This may be appropriate thereby indicating that prediction can occur differently at for offline reasoning but is too slow for prediction (see different linguistic levels. Note that prediction-by-associ- Goldman 2006, pp. 213–17; Hurley 2008b). Goldman’s ation might allow B to predict A’s continuation, but would account uses simulation as an alternative to constructing a not explain the rapidity of B’s response, as B would also theory of the other person’s mind. In contrast, our account have to produce the continuation “from scratch.” uses simulation to facilitate processing, which is particularly During conversation, interlocutors tend to become important when behavior is rapid (as in Grush 2004;Prinz aligned with each other at different linguistic levels, and 2006). Clearly, this is the case for language processing. such alignment appears to underlie mutual understanding However, prediction-by-simulation can also be applied (Pickering & Garrod 2004). Our account can help explain offline as part of the process of thinking and planning (as this process, because the close link between production indeed can prediction-by-association). For example, a and comprehension leads to tightly yoked representations speaker might think about the likely consequences of pro- for comprehension and production, and allows those rep- ducing a particular utterance, both for her own subsequent resentations to be used extremely rapidly (see Garrod & utterances and perhaps more important for the responses Pickering 2009). Note, however, that the relationship also that addressees are likely to produce. She might do this works the other way: Prediction during comprehension is by constructing a predicted utterance percept, using facilitated when the interlocutors are well-aligned, forward modeling. She could also construct an utterance because the comprehender is more likely to predict the percept (without articulating), using the production imple- speaker accurately (and the speaker is more likely to menter and comprehension implementer (see top right of predict the comprehender’s response, as in question- Fig. 5 and discussion in sect. 3.1), as she would typically answering). One effect of this is that B’s prediction of have enough time to do so. Assuming co-activation, what A is going to say is more likely to accord with what offline predictions may often involve both the production

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 17 Pickering & Garrod: An integrated theory of language production and comprehension implementer and forward modeling. See Pezzulo (2011a) Both our account and embodied accounts seek to for a related discussion. abandon the “cognitive sandwich” (Hurley 2008a). Our Our account assigns a central role to prediction-by-simu- account assumes that producers use comprehension pro- lation, but it assumes that language comprehension and cesses and comprehenders use production processes, dialogue also involve prediction-by-association. We whereas embodied accounts assume that producers and propose that comprehenders will emphasize simulation comprehenders use perceptual and motor representations when they are (or appear to be) similar to the speaker associated with the meaning of what they are communicat- because simulation will tend to be accurate. These simi- ing. Our account does not require such embodiment but is larities might relate to cultural or educational background compatible with it. or dialect, or, alternatively, to speed or style of language pro- cessing. In addition, simulation will be emphasized during dialogue because the interlocutors will tend to become 5. Conclusion aligned (Pickering & Garrod 2004), and simulation will Traditional accounts of language assume separate proces- tend to persist among those in close relationships (who con- sing “streams” for production and comprehension. They tinue to be aligned). In addition, simulation may also be adopt the “cognitive sandwich,” a perspective that is incom- primed during dialogue, because the fact that the compre- patible both with the demands of communication and with hender also has to speak may activate mechanisms associated extensive data indicating that production and comprehen- with production. In contrast, prediction-by-association will sion are tightly interwoven. We therefore propose an be emphasized when the comprehender is less similar to account of language processing that abandons the cognitive the producer, as for example when the comprehender is a sandwich. This account assumes a central role to prediction native adult speaker of the language and the producer is a in language production, comprehension, and dialogue. By nonnative speaker or a child, or when the comprehender building on research in action and action perception, we does not have the opportunity to speak (as in reading). propose that speakers use forward models to predict We therefore assume that comprehenders emphasize aspects of their upcoming utterances and listeners covertly whichever route is likely to be more accurate (given that imitate speakers and then use forward models based on they should both be fast enough). It may also be that pre- their own potential utterances to predict what the speakers diction-by-association is more accurate for simple, “one- are likely to say. The account helps explain the rapidity of step” associations between a current and a subsequent production and comprehension and the remarkable state. For example, people can straightforwardly predict fluency of dialogue. It thereby provides the basis for a that a person who looks confused is likely to respond psychological account of human communication. slowly. In contrast, prediction-by-simulation is likely to be more complex, because it makes use of the structure inherent in the speaker’s own production mechanisms. ACKNOWLEDGMENTS Of course, comprehenders may combine prediction-by- We thank Dale Barr, Martin Corley, Chiara Gambi, and simulation and prediction-by-association. They make use Laura Menenti for their comments, and acknowledge of the same representational vocabulary and hence the support of ESRC Grants RES-062-23-0376 and RES- mental states are the same; the association route simply 060-25-0010. involves a different (and more straightforward) set of map- pings than the simulation route. Informally, for example, if NOTES 1. The meat is “amodal” in the sense that its representations I see that you are about to speak, I can predict your utter- are couched in terms of abstract symbols rather than in terms of ances by combining my experiences of how people like you bodily movements (see section 4). have spoken and my experiences of how I have spoken 2. Nothing hinges on this particular “traditional” set of levels. under similar circumstances. For example, it may be correct to distinguish logical form from There is a lot of current interest in the extent to which semantics, or from phonology. language is embodied (see Barsalou 1999; Fischer & 3. Note that a mapping from semantics to phonology would be Zwaan 2008). Such literature focuses on embodiment of a production process, and a mapping from phonology to semantics would be a comprehension process. Some researchers argue that content, in which the conceptual content of language is “ ” represented in “modal” (i.e., action-based or perceptual) levels can be skipped in comprehension (e.g., Ferreira 2003). terms (e.g., kick is represented in terms of the movements But mappings between phonology and semantics also occur for other reasons: for example, to express the relationship between associated with kicking). It is supported by strong evidence emphasis (represented in the message level) and phonological from behavioral experiments (e.g., Glenberg & Kaschak stress, or between meaning and sound in sound symbolism. 2002) and cognitive neuroscience (e.g., Desai et al. 2010). 4. We assume that prediction is separate from action or per- In contrast, our account is concerned with embodiment ception – that the processes involved in predicting action or per- of form, which Gallese (2008) called the vehicle level. It ception can at least in principle be distinguished from action or assumes that comprehension involves aspects of pro- perception itself. In this respect, our account differs from some duction, which is a form of action; by definition, production theories such as that by Elman (1990). ’ is embodied at the form level. Interestingly, Glenberg and 5. Our forward action model corresponds to Wolpert s forward Gallese (2012) used covert imitation and prediction in an dynamic model, and our forward perception model corresponds to account primarily concerned with content embodiment. his forward output model. 6. The perceiver also has to accommodate differences in per- They explained why representational gesture tends to co- spective (e.g., when the actor is facing the perceiver). This type occur with speech by arguing that speaking activates the of accommodation is less relevant to (spoken) language, so we corresponding action and that the need to perform the do not refer to it again. action of articulation prevents the inhibition of related ges- 7. Mirror neurons fire during both action and perceiving an tural actions (see Hostetter & Alibali 2008). action (Di Pellegrino et al. 1992), and they are of course

18 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension compatible with covert imitation during perception. Most evi- and joint action as inextricably linked and continuously interacting dence for mirror neurons is indirect in humans (e.g., activation components is a useful extension to this model. It is an advance on of action areas during perception), but Mukamel et al. (2010) the Markov-type linear analyses to which transcribed language used intercranial electrodes to demonstrate widespread mirror lends itself. Their proposal only begins to touch on the complex- activity in Broca’s area of an epileptic patient. ities inherent in human communication and its evolution. 8. We assume that speakers implement a level of semantics P&G have previously proposed that humans are “designed” for during production that is distinct from the production command. dialogue, not for monologue (Garrod & Pickering 2009). Their The production command includes a situation model that incorpor- suggested model moves communicative analysis in this direction ates nonlinguistic information, whereas semantics is more akin to an but provides a rather simplistic approach that they have con- “LF” level of representation (e.g., incorporating quantifier scope). trasted, somewhat quixotically, with an even simpler one. 9. In fact, Wijnen and Kolk (2005) briefly speculated about the Recent work on the functional neuroanatomy of language possible use of forward and inverse models in monitoring, making suggests at the least that the expressive and receptive phonologi- reference to Wolpert’s proposals. cal, syntactic, and semantic systems, although closely interlinked, 10. Note that Levelt (1989) assumed that there is appropriate- can be disambiguated (see, e.g., Ben Shalom & Poeppel 2008; ness monitoring that takes place over semantic representations, D’Ausilio et al. 2009; Sidtis & Sidtis 2003). Communicative inter- and that there is no loop based on syntactic representations. action may also involve many nonlinguistic processes, including 11. The predicted utterance percept must be represented proprioception (Sams et al. 2005) and interpersonal timing similarly to the utterance percept, in order that they can be com- (Richardson et al. 2007), not inherently linked to communicative pared. Thus, we might expect speakers to have some awareness of intent or content. the predicted utterance percept as well as the utterance percept. Some of the richness and complexity in the systems of human One possibility is that tip-of-the-tongue states constitute aware- communication have been highlighted through neuropsychological ness of the forward model (in cases when the production imple- analyses of what have been called disconnection syndromes (see menter fails) rather than incompletely implemented production. Catani & ffytche 2005). The approach has illuminated the impor- For example, the speaker may compute the forward model for tance of many nonlinguistic features as in the analysis of “emotional the first phoneme (e.g., Brown & McNeill 1966) or grammatical dysprosodias” (Ross 1981; Van Lancker & Breitenstein 2000). gender (Vigliocco et al. 1997). Functional neuroimaging is demonstrating the roles of distrib- 12. Some evidence suggests that inner speech may be impo- uted neural networks in human communication (Vigneau et al. verished (Oppenheim & Dell 2008; 2010; though cf. Corley 2006). The utility of this approach is being extended through et al. 2011). An intriguing possibility is that such impoverishment the development of explanatory models for conditions such as reflects forward modeling rather than an abstract phonological the autistic spectrum disorders (ASDs; Geschwind & Levitt 2007). representation constructed by the production implementer. Our capacity to understand the complex neural systems in 13. Note that Kim and Lai interpreted their results as involving human communication at both the individual and the dyadic interaction during early stages of lexical access, but this is not level was limited until the development of functional magnetic necessary. resonance imaging. Progress has been a function of the develop- 14. In fact, our account can explain why completions can be ment of technologies sufficient to the task rather than through compatible with the perspective of either of the interlocutors. advances in understanding per se. Methods to enable such inves- In (1), B said But have you … and A completed with burned tigation of the brain in interaction are being actively developed myself?,A’s completion takes A’s perspective (myself). However (Schilbach et al. 2012; Schippers et al. 2010). A could have alternatively said burned yourself?, thus taking B’s The need for a developmental perspective. The human infant perspective (see sect. 1.1). communicates with caregivers in part because of a range of evol- utionary adaptations that are successful in engaging with those around them to ensure their survival and in part because of being reared in an environment that has co-evolved to nurture their use of these adaptations. Language is a relatively recent addition to this process that subsequently enables the rapid trans- Open Peer Commentary mission of knowledge and culture. This rapid transmission of learning to the infant through acculturation by the caregiver is a process seldom observed in other primates (Tomasello 2008). The human infant is born largely neotenous but with an altricial It ain’t what you do (it’s the way that you do it) capacity to engage with caregivers in nonlinguistic forms of reci- procal interaction. Examples are “interactional synchrony” doi:10.1017/S0140525X12002488 (Condon & Sander 1974); the bidirectional influence seen in the patterning of interaction (Cohn & Tronick 1988); selective Kenneth John Aitken preference for maternal voice (e.g., Hepper et al. 1993); imitation Psychology Department, Hillside School, Aberdour, Fife KY3 0RH, Scotland. of facial expressions (Meltzoff & Moore 1977), and the infant’s [email protected] ability to track the objects of others’ eye movements (Beier & Spelke 2012; Navab et al. 2011). Many sensory systems are well Abstract: Knowledge of the complexity of human communication comes developed and involved in communication from tactile (Stack from three main sources – (i) studies of the and 2007) to the olfactory (Doucet et al. 2009). neuropsychology of dysfunction after brain injury; (ii) studies of the It is becoming clear that these processes are neurobiological in development of social communication in infancy, and its dysfunction in origin with individual variations in function arising in part through developmental psychopathologies; and (iii) the evolutionary history of human communicative interaction. Together, these suggest the need for transgenerational differences in early experience (see Barrett & a broad, integrated theory of communication of which language forms a Fleming 2011). small but critical component. Timing and prosody are essential features of communicative interaction, and concepts such as vitality contours of interaction Pickering & Garrod (P&G) are correct in pointing out problems and proto-narrative envelopes are helpful in better describing with treating expressive and receptive language as the only separ- both verbal and nonverbal interaction (see Stern 2010). Prosody able, distinct, and sufficient components to human communi- in speech is both language and culture dependent, as is neonatal cation. Their own discussion of communication as a more crying (Mampe et al. 2009), presumably through in utero vocal elaborated system involving personal action, action perception, exposure to maternal inflection patterns.

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 19 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

In describing any contingent linguistic system of communi- A central aspect of Pickering & Garrod’s(P&G’s) target article is that cation, we need to characterise the dyadic nonverbal mechanisms language production relies on forward modeling processes. These that are its prerequisite base. are explicitly described as involving semantic, syntactic, and phono- The evolution of human communication. The development of logical representations. Here we will attempt to challenge this aspect more-complex communication in early Homo sapiens is likely to of their proposal on two grounds: the evidence available for such a have paralleled the selection pressure for the earlier birth of claim, and the predictions that may follow from it. less-mature infants that enabled more brain growth to occur P&G state that there is good evidence for the use of forward after birth and reduced the risks of mortality and morbidity to models during speech production. This statement must be quali- mother and infant through childbirth (see Falk 2004). In conse- fied or clarified. The experimental evidence put forth relies on quence, mothers were required to spend more time in caregiving quite specific language production situations (e.g., vowel articula- and became more dependent on other adults to support this tion or repeated production of very few simple and similar-sound- process. This was also made easier by the division of labour in ing words). Such linguistic specificity in the stimuli means that the the production of food, clothing, and shelter, and by the use of resulting data may not apply to evidence processes other than fire for warmth and cooking (see Wrangham 2009). articulatory motor control. It is dubious that the evidence provides Human communication cannot be simply reduced to a linguistic much information about semantic, syntactic, and possibly phono- means for epistemic exchange. Its ontological (Bråten 1998;Ger- logical processes, either in the speech production implementer or hardt 2004) and phylogenetic (Arbib 2012;Denton2005; Panksepp in the feedforward model. 2004) origins are in the communication of basic affect such as Neurophysiological evidence supporting forward modeling in hunger, thirst, discomfort, threat, and affection (Feldman 2007). speech production comes from intracranial-EEG studies The breadth of differing forms of human communication showing auditory cortex suppression during vocalization (Flinker suggests that it is the functional significance of being able to com- et al. 2010; Towle et al. 2008). In the visual system, well-identified municate that is critical, and the specific form that this takes is of motor-visual pathways subserve a similar suppression of sensory secondary significance. In terrestrial species, human language may activity during eye movements (Sommer & Wurtz 2008). Conse- be unique in its complexity and its ability to convey certain types of quently, auditory suppression during speech is attributed to an information, but there is tremendous variation in what can be con- efference copy which exerts its influence from Broca’s area to veyed. Similar discussions are also found regarding human music the auditory cortex via the inferior parietal lobe (Rauschecker & (see, e.g., Fitch 2006), but for the same reasons evolutionary argu- Scott 2009; Tourville & Guenther 2011). Although there is ana- ments remain speculative. tomical evidence for such a pathway (Frey et al. 2008), attempts To deconstruct communication into expressive and receptive to test the functionality of this connection during speech have language and their associated actions is to deal only with a small proved inconclusive (Flinker et al. 2010; Towle et al. 2008). The aspect of this far more complex but fundamental process largely unclear matter of which motor-auditory pathway could (Aitken 2008; Aitken & Trevarthen 1997; Trevarthen & Aitken carry such an efference copy is not considered in the target 2001). Greater understanding of the processes involved in decon- article. More generally, it is difficult to foresee which pathways struction is likely to require more detailed analysis with, at the may underlie the transmission of an efference copy, should linguis- least, the triadic modelling of communication’s functional com- tic information be involved (semantics, phonology, and syntax). ponents (Fivaz-Depeursinge & Favez 2006; McHale et al. P&G also refer to previous theoretical work to support their 2008), and the development of a robust methodology for what is generalization of feed-forward models beyond sensorimotor pro- being called “second-person neuroscience” (Przyrembel et al. cesses into linguistic levels. They use as an example the previous 2012; Schilbach et al. 2012). Incorporating these broader generalization of a motor control theory (MOSAIC) to a hierarch- aspects to communication will also require a broader appreciation ical version (HMOSAIC) that controls complex sequences of of human communication’s various components including the actions. This parallel has not helped to clarify matters. “The contributions of the tactile, the olfactory, and the pansensory. HMOSAIC model suggests that there are multiple levels of rep- Many different assessment and assimilation approaches will be resentation within the sensorimotor system” (Haruno et al. needed to facilitate such analyses. Some have been developed and 2003, p. 11). Hence, the HMOSAIC model seems specificto can form the basis for a more comprehensive framework for the the sensorimotor system, and not necessarily applicable to more understanding of human communication (see Anders et al. abstract levels of representation and processing, which is a core 2011; Delaherche et al. 2012). assumption of P&G’s proposal. Scalp-EEG evidence consistent with the hypothesis that language production is monitored by a general-purpose mechan- ism can be found in Riès et al. (2011). Using a grammatical gender decision task (a proxy for lexical access) and a standard picture Evidence for, and predictions from, forward naming task, those authors reported postresponse EEG waves modeling in language production very similar to those linked to response monitoring in nonlinguis- tic tasks (e.g., error-related negativity). In the speech task, the doi:10.1017/S0140525X1200249X onset of these waves preceded the onset of overt response. Although this timing feature has sometimes been taken as a signa- F.-Xavier Alario and Carlos M. Hamamé ture of efference copy (Gehring et al. 1993), it could also reflect Laboratoire de Psychologie Cognitive, Aix-Marseille Université & CNRS, the engagement of internal loop monitoring (Riès et al. 2011). 13003 Marseille, France. In the absence of strong evidence for some of P&G’s claims on [email protected] forward modeling in language production, it is appropriate to [email protected] examine the predictions that follow from them, and to gauge how they might guide future empirical tests. The only explicitly stated prediction regarding forward modeling in language production is worded in broad terms: “[speakers] should Abstract: Pickering & Garrod (P&G) put forward the interesting idea that detect semantic errors before syntactic errors, and should syntactic language production relies on forward modeling operating at multiple ” processing levels. The evidence currently available to substantiate this errors before phonological errors (target article, sect. 3.1, para. 23). idea mostly concerns sensorimotor processes and not more abstract Although this statement is clear, there are two requirements for linguistic levels (e.g., syntax, semantics, phonology). The predictions that testing the relative ordering of error occurrence: that the errors follow from the claim seem too general, in their current form, to guide are unambiguously classified,andthatamomentoferrordetection specific empirical tests. can be defined and measured. These requirements seem difficult to

20 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension meet in the absence of (some form of) overt response. Yet the pro- According to Pickering & Garrod (P&G), a key feature of forward duction of such an overt response would complicate the attribution models is that they are fast; they allow a system to correct itself of the detection to forward modeling versus external loop processes. much more quickly than is possible on the basis of reafferent feed- For timing, a common reference time point is required that is avail- back. I understand how a forward model allows fast feedback able across utterances. A reasonable proxy in experimental setups is when the feedback is based on the output of the forward model stimulus onset, but for narrative speech or dialogue such an event is itself (as opposed to conditions in which the output of the not easily defined. Testing this prediction is further complicated forward model is compared to real feedback). But to better under- because “it is not necessary that the predicted representations are stand how this approach works, it needs to be made clearer what computed sequentially […] the syntactic prediction need not be additional information is included in the forward model that is not ready before the phonological prediction” (sect. 3.1, para. 10). in the production or comprehension systems themselves. Other- This clearly opens the possibility of reordering the sequence in wise, there is no point. Furthermore, if indeed forward models which the parallel outputs of the implemented production and have extra information, how is this reconciled with the claim the forward model are compared. that forward models are “impoverished” compared to the pro- A different tentative prediction can be constructed from P&G’s duction and comprehension implementers. proposal. The feedforward model is hypothesized to involve “impo- I also find it unclear how a forward model speeds things up verished representations [that] leave out (or simplify) many com- when the output of a forward model and actual feedback (e.g., ponents of the implemented representations” (sect. 3.1, para. 6). proprioceptive, visual, auditory, etc.) are compared. The speed P&G provide various examples of components that “might” (sect. with which the forward model can compute seems useless in 3.1, para. 9 onwards) be left out. It is not stated whether such these conditions, as the model has to wait for the actual feedback opt-out is circumstantial (i.e., whether a component is left out or before the comparison process can begin. Furthermore, when- not depends on the speech act) or systematic (i.e., a component ever feedback is involved in correcting motor control for future is always omitted from the feedforward model). In the latter case, actions (as opposed to the current action), it is not immediately omitted components would be susceptible only to external error clear why a slow feedback loop is not supposed to work. detection and monitoring, whereas included components should At the same time, it is possible to imagine feedback between be internally detectable and correctable. Checking whether these levels of a production (or a comprehension) system that would general statements are amenable to a specific testable hypothesis, be fast enough to correct errors before overt errors are produced contrasting feedforward with inner and overt speech-monitoring (or before comprehension is compromised). Consider the “predic- performance (e.g., Oppenheim & Dell 2010), would require tive coding” model introduced by Grossberg (1980). In his Adap- more space than this commentary can accommodate. tive Resonance Theory (ART) model, a single unit in layer 2 of the In short, we submit that the evidence presented by P&G for network learns to code for a pattern of activation in layer 1. Criti- forward models in language production concerns only limited cally, the learning between the two layers takes place in both aspects of this behavior, these being primarily sensorimotor pro- bottom-up connections (such that a pattern of activation in layer cesses (i.e., articulatory processes for speech). No currently avail- 1 learns to activate a given unit in layer 2 – what Grossberg calls able evidence calls for a generalization to more abstract levels. “instar learning”) – and in top-down connections (such that an On the other hand, the predictions that may follow from this activated layer 2 units learns to activate a pattern in layer 1; aspect of P&G’s proposal are, in their current form, too general what Grossberg calls “outstar learning”). These top-down connec- or unconstrained to guide specific empirical tests. These specific tions in fact support “prediction,” that is, the layer 2 unit learns to points notwithstanding, P&G’s proposal provides a stimulating activate those layer 1 units that activated it in the past. impetus for combining psychological and neurophysiological evi- The process of identifying an input involves comparing the dence more closely. bottom-up (input) signal with the top-down (predicted) signal: If the two patterns match (in layer 1) the model, the model goes ACKNOWLEDGMENT into a state of resonance (with bottom-up and top-down signals Funding from the European Research Council under the Euro- reinforcing one another), and this is taken as evidence that the pean Community’s Seventh Framework Program (FP7/2007– correct node in layer 2 was indeed activated. If not, there is a 2013 Grant agreement 263575). Institutional support from mistake that needs to be corrected. The identification of a “Fédération de Recherche 3C” and the “Brain and Language mistake happens quickly, before any learning takes place (in Research Institute,” both at Aix-Marseille Université. We thank order to solve the stability-plasticity dilemma, otherwise known as Marieke Longcamp for comments and Dashiel Munding for com- “catastrophic interference”). The important point for present pur- ments and native proofreading poses is that ART includes fast prediction within a single system, with no need to posit a separate, parallel forward model. (In fact, the ART system does have a separate parallel system that is engaged in cases of bottom-up/top-down mismatch, but this system does not carry any information about a prediction – it just How do forward models work? And why would turns off the layer 2 unit and tells the network to “try again.”) you want them? Could something similar work in the case of speech pro- duction? Perhaps a semantic input could activate a lemma unit, doi:10.1017/S0140525X12002506 and the lemma could feed back to the semantic system, and the model could be confident that the correct lemma was selected if Jeffrey Bowers its top-down and bottom-up signals matched. Similar top-down/ School of Experimental Psychology, University of Bristol, BS8 4JS Bristol, bottom-up interactions across levels in the speech production United Kingdom. system could lead to quick corrections at each stage. And, ulti- [email protected] mately, feedback from the actual output (a spoken word in the case speech production) could play a role in correcting errors (after the fact). Given that predictions are made between levels Abstract: The project of coordinating perception, comprehension, and of the speech production system (and possibly between pro- motor control is an exciting one, but I found it hard to follow some of Pickering & Garrod’s (P&G’s) arguments as presented. Consequently, duction and comprehension systems), corrections could presum- my comment is not so much a disagreement with P&G but a query ably be made quickly, and at different stages of the process. about the logic of forward models: It is not clear how they are supposed Of course, this sketch of an outline of an idea does not even to work, nor why they are needed in this (or many other) contexts, and attempt to address the complexities that are addressed by P&G, toward that end I present an alternative idea. and any proposal without a forward model may well be

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 21 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension inadequate. But I’m struggling to see why separate fast forward (e.g., eat-cake), where both children and adults have been found models are needed (as opposed to feedback between levels to predict a verb’s object before the object is encountered in within and between production and comprehension systems), speech (Altmann & Kamide 1999). These results are particularly and to understand how forward models are thought to be fast interesting, because Mani and Huettig (2012) found a significant whenever they rely on actual feedback. relationship between expressive vocabulary and prediction behav- iour in children, which supports both P&G’s and the Dual-path model’s hypothesis that production representations support prediction. P&G’s model places prediction within a forward model that Prediction in processing is a by-product of implements constraints on word order (prediction-as-processing). language learning The model explains verb-object affordances in the following manner: a production command EAT(CAKE) can be derived doi:10.1017/S0140525X12002518 from hearing the word eat and seeing the cake in the visual scene. The command passes through the production forward Franklin Chang,a Evan Kidd,b and Caroline F. Rowlanda system to activate the triplet p[cake, NP, /cake/]. Crucially, the aUniversity of Liverpool, Institute of Psychology, Health and Society, Liverpool model predicts that only syntactically and semantically appropri- L69 7ZA, United Kingdom. bThe Australian National University, Research ate predictions will be generated. In the eat-cake example, the School of Psychology, The Australian National University, Canberra 0200, prediction is correct, but is not borne out in every instance. For Australia. instance, Kamide et al. (2003) reported that participants made [email protected] [email protected] predictive looks to a cabbage after the processing the fragment [email protected] The hare will be eaten …, where the passivized verb should have restricted looks to an animate agent (e.g., a fox). In contrast to P&G’s model, the Dual-path model acquires syn- tactic representations within a network that contains separate rowland/ meaning and sequencing pathways (Elman 1990). Its learning algorithm compares the predicted next word with the actual com- Abstract: Both children and adults predict the content of upcoming prehended next word, and the mismatch or error is used to adjust language, suggesting that prediction is useful for learning as well as the model’s internal representations (error-based learning, processing. We present an alternative model which can explain prediction behaviour as a by-product of language learning. We suggest Rumelhart et al. 1986). The Dual-path model is able to explain that a consideration of places important constraints the phenomena of structural priming as error-based learning on Pickering & Garrod’s (P&G’s) theory. (Chang et al. 2006), and this ability requires that prediction-for- learning is constantly taking place during language comprehen- Pickering & Garrod (P&G) have done the field a huge favour by sion. For the eat-cake prediction, the input word eat activates conceptualising language processing in terms of an integrated the concept CAKE because the model’s meaning pathway action-perception system, which highlights the centrality of pre- learns associations between words in utterances and elements in diction. It seems to us that taking a similar approach to the field messages. Critically, the same word-concept mechanism learns of language acquisition would be equally productive. After all, to associate eaten with cabbage. This associative word-concept the task of learning a language requires a close integration of prediction mechanism is different from the structure-sensitive oral motor action and sound perception. P&G’s model adopts a prediction mechanism in the sequencing system, which can theory of acquisition from motor learning, in which forward explain why eaten increases predictive looks to likely agents like models learn to bridge between action and perception. In this the fox in Kamide et al. (2003). Thus, the pathways in the commentary, we explore the implications of P&G’s forward model can explain the different types of prediction. Crucially, model for our understanding of psycholinguistic processes, with the similarity in prediction in children and adults can be explained a particular focus on language acquisition. by the idea that humans are constantly doing prediction-for-learn- There is now substantial evidence that children use prediction ing to adjust their language representations to the input (Kidd in comprehension in much the same way as adults. For 2012; Rowland et al. 2012). example, Lew-Williams and Fernald (2007) reported that Learning processes can explain prediction in processing, but Spanish-learning 3-year-old children can use the grammatical language acquisition constraints are also critical for learning the gender of the article (la/el) to predict the referent of the next syntactic and semantic representations that support prediction word (la pelota vs. el zapato) in a looking-while-listening task. in P&G’s model. P&G’s theory is based on Wolpert et al.’s Mani and Huettig (2012) have shown that 2-year-olds can use a (2011) theory of motor planning and perception, which uses verb’s semantic affordances in a similar manner to adults (e.g., error-based learning for motor and forward model learning eat predicts cake), and Borovsky et al. (2012) have reported (Jordan & Rumelhart 1992; Plaut & Kello 1999). According to similar findings in older children and adults. Importantly, the Wolpert et al.’s theory, the forward model is learned by acquisition studies have demonstrated that children’s predictive mapping from muscle commands to the perception of one’s arm abilities correlate with their knowledge of language. In all of in three-dimensional space. These algorithms work because these studies, children (and adults) with bigger productive (and/ humans can directly perceive their arm’s position. In P&G’s or receptive) vocabularies tend to be faster and more accurate theory, the forward model maps from a message-like production at prediction during online sentence comprehension. Faster pre- command to a triplet including syntax and semantics. This is pro- diction correlates with positive language outcomes longitudinally: blematic: We cannot directly perceive syntax and semantics, and Marchman and Fernald (2008) reported that the speed at which hence the learning mechanism in the motor theory cannot 25-month-olds process lexical information correlates with linguis- explain how P&G’s forward model learns to make these language tic knowledge 6 years later. These findings suggest that prediction predictions. When error-based learning is used to map from pro- is not only part of the language processing system, but is tightly duction commands to sentences, Chang (2002) demonstrated that linked to language acquisition mechanisms. abstract syntax was not always learned unless the model had Scrutinizing the mechanism underlying prediction therefore language acquisition constraints like those in the Dual-path archi- appears an important priority. Here we compare P&G’s model tecture. Therefore, P&G’s forward model may need a similar to an alternative language production model, Chang’s(2009) architecture to yield the appropriate predictions. Dual-path model, concentrating on how each model explains pre- Language processing theories like P&G’s account treat diction. By way of example, we consider verb-object affordances language learning as a peripheral process. We argue that

22 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension prediction in processing is actually a by-product of learning. Pre- are limited to the domain they have been trained on (for an diction is a critical component of error-based learning, which is alternative Bayesian approach to attacking this problem that one of the most successful accounts of both motor and language does not involve offline training procedures, see De Ruiter & learning. Cummins 2012 ). We would urge P&G to prioritize this aspect, as we believe that the success of the proposed approach will be to a large degree dependent on its ability to model intention rec- ognition in dialogue. The second comment we have involves the use of “impover- Forward modelling requires intention ” recognition and non-impoverished predictions ished representations for the efferent copies, and especially the nature of this impoverishment. In P&G’s exposition of the theory, the stated reason that the system does not simply use doi:10.1017/S0140525X1200252X the efferent copies as motor programs is their impoverished Jan P. de Ruiter and Chris Cummins nature (target article, sect. 3.1, para. 6). However, as the efferent copy represents the perceptual consequences of the motor Department of Psycholinguistics, Bielefeld University, 33501 Bielefeld, Germany. program (and not the motor program itself), not using them [email protected] [email protected] directly as motor programs, in our view, does not need to be motivated at all. It is simply a different type of representation, not suited as a motor program. Abstract: We encourage Pickering & Garrod (P&G) to implement this A potentially more serious problem with the proposed impover- promising theory in a computational model. The proposed theory ished nature of the efferent copies is that they do not adequately crucially relies on having an efficient and reliable mechanism for early explain the phenomena they are supposed to. This holds for both intention recognition. Furthermore, the generation of impoverished comprehension and production. In production, for instance, the predictions is incompatible with a number of key phenomena that cited findings by Heinks-Maldonado et al. (2006), and especially motivated P&G’s theory. Explaining these phenomena requires fully fi those by Tourville et al. (2008), can only be explained if the effer- speci ed perceptual predictions in both comprehension and production. ent copy is fully specified, not merely phonologically but also pho- We heartily congratulate Pickering & Garrod (P&G) on their netically. If a speaker knows only which phonemes he is going to produce in what order but not how (in terms of phonetic detail), outline of a new cognitive architecture that integrates language fi production and comprehension. What is particularly impressive then the proposed theory would predict that changing the rst in their sketch is that it is a comprehensive approach that formant in the auditory feedback (as Tourville et al. did) would addresses entire classes of interrelated psycholinguistic phenom- have no effect at all. ena (as opposed to a selected subset of empirical findings) and In language comprehension, the proposed theory assumes that that it provides natural explanations for especially the time-critical listeners predict what their interlocutor is going to say. Indeed, phenomena which have been difficult to explain plausibly and this appears to be essential for explaining the phenomenon of elegantly with our “standard models” (e.g., Dell 1986; Levelt close shadowing (Marslen-Wilson 1973), with delays as short as 1989). 250 ms. Also, predictions of utterance content probably underlie the listener’s highly accurate anticipation of the end of the speak- Therefore, it is, in our view, all the more urgent that this sketch, ’ or at least its central parts, be further developed into a real com- er s turn as found, for instance, by De Ruiter et al. (2006) and putational implementation, one that actually generates the Stivers et al. (2009). But here too, the accuracy obtained from complex behavior that we can now only simulate in our minds having access to an early but impoverished prediction would not on the basis of verbal accounts. Only with such an implementation be able to explain the levels of accuracy observed in end-of-turn will we be able to assess the adequacy and accuracy of the pro- anticipation in experiments and natural data. Magyari and De vided account, and be able to generate nontrivial and testable pre- Ruiter (2012) found evidence that people are able to predict when a turn ends by predicting how it ends – that is, with which dictions that can subsequently be tested using the large arsenal of fi behavioural and neurocognitive methods now available. We are speci c words the turn will end. This suggests that the forward aware that implementing models is not an easy task, and that model cannot be lexically impoverished, as suggested by P&G in implementing this particular architecture will prove to be a chal- section 3.1 (para. 9). lenging exercise. But it is possible to use a piecemeal approach: An This is why we would strongly urge P&G to adopt the assump- fi fi tion that the representations of the predictions, both in pro- obvious simpli cation is to rst develop the model on a miniature fi language, in a restricted context, using simulated time. Also, it is duction and comprehension are fully speci ed (perceptual) possible and probably advantageous to employ division of labour representations, as Pickering and Garrod (2007) suggested for by delegating parts of the implementation to different research comprehension. groups that have complementary expertise. Finally, we again want to express our support for the exciting There are two central aspects of the proposed theory that we approach that P&G have taken with their highly original and would like to comment on, and suggest improvements to. thought-provoking outline, and look forward to discussing these The first concerns the role of intentions. As P&G note, when issues further. predicting the utterance of an interlocutor (i.e., in comprehen- sion), it is essential to have an (early) estimate of the underlying intention. In the HMOSAIC model, this is done by running par- allel inverse models. But in modelling verbal interaction, one of Cascading and feedback in interactive models the most intractable problems is the complex, seemingly arbitrary, of production: A reflection of forward many-to-many mapping of utterances and intentions (see, e.g., modeling? Levinson 1983; 1995). We suspect, therefore, that in a model of language production and comprehension (i.e., dialogue proces- doi:10.1017/S0140525X12002531 sing) this problem is much harder than in the recognition of inten- tions underlying functional motor behaviour. There are Gary S. Dell computational models that use Bayesian machine learning pro- Beckman Institute, University of Illinois, Urbana–Champaign, Urbana IL 61820. cedures to capture the utterance-intention mapping from multi- [email protected] modal interaction corpora (see, e.g., DeVault et al. 2011), but this approach involves computationally expensive and time- Abstract: Interactive theories of lexical retrieval in language production consuming offline learning procedures, and the resulting models assume that activation cascades from earlier to later processing levels,

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 23 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension and feeds back in the reverse direction. This commentary invites Pickering bound to structural frames, at least in activation-based models & Garrod (P&G) to consider whether cascading and feedback can be seen that use such frames (e.g., Dell 1986). So, although units activated as a form of forwarding modeling within a hierarchical production system. through cascading may soon be fully part of an utterance’s rep- Over the past 20 years, one of the most contentious issues in resentation at a particular level, they are not there yet. language production has concerned the degree to which lexical P&G have outlined a compelling integrated theory of pro- access occurs in discrete stages. Theorists agree that words are duction and comprehension, showing how each contributes to retrieved first as semantic-syntactic entities, and then later the other. With this commentary, I invite them to consider spelled out in terms of their phonological forms (e.g., Garrett whether the notions of cascading and feedback in production fi are part of the picture, and particularly whether they can be con- 1975; Kempen & Huijbers 1983). But is the rst of these fl steps – lemma or word-access – entirely separate from the sidered to be re ections of the forward modeling system operating second one – phonological access? Three possibilities are between processing levels. debated. The discrete-stage or modular view (Levelt et al. 1999) ACKNOWLEDGMENT holds that the first step must be completed before any activation Preparation of this commentary was supported by NIH DC- of phonological forms takes place. During the first step of retrieval 000191 of the word “cat,” the lemmas for CAT and DOG may both be active, but the phonological forms of these will not be. Not until the first step has completed its selection of CAT can /k/, /æ/, and /t/ gain activation. The cascade hypothesis blurs the distinction between the steps by allowing for activation of phonological forms The neurobiology of receptive-expressive of potential lemmas (e.g., the forms of both “cat” and “dog”) language interdependence before the first step has been completed. The interactive hypoth- esis permits cascading, but also bottom-up feedback (e.g., Dell doi:10.1017/S0140525X12002543 1986). Activated phonological units send activation upwards to lexical units. This loop of cascading and feedback between units Anthony Steven Dicka and Michael Andricb at adjacent levels of the system is assumed to operate regardless aDepartment of Psychology, Florida International University, Miami, FL 33199; of whether the lexical access process is engaged in word or phono- bCenter for Mind/Brain Sciences (CIMeC), The University of Trento, Trento logical access. Currently, there is a considerable amount of evi- 38122, Italy. dence for cascading (e.g., Cutting & Ferreira 1999), but little [email protected] consensus on the degree to which the system is interactive (see∼adick Harley 2008, for review). I suggest that Pickering & Garrod’s (P&G’s) proposed use of Abstract: With a focus on receptive language, we examine the forward modeling and the predicted perceptions that result neurobiological evidence for the interdependence of receptive and expressive language processes. While we agree that there is compelling from it map onto the notions of cascading and feedback in evidence for such interdependence, we suggest that Pickering & interactive models of production. Cascading consists of a pre- Garrod’s (P&G’s) account would be enhanced by considering more- diction by processing level i of what needs to be active on the specific situations in which their model does, and does not, apply. next lower level i+1, and feedback from that level delivers the anticipated “sensory” consequences of that prediction back to The classical Lichtheim–Broca–Wernicke neurobiological model level i. of language proposed distinct neuroanatomical pathways for In P&G’s view, the advance prediction of representational com- language comprehension and production. Recent evidence ponents of an utterance and their sensory consequences allow for suggests abandoning this model’s classical form, and although each production decision to be coordinated with other decisions. there is not yet an established replacement (Dick & Tremblay They illustrate by showing how heavy noun phrase (NP) shift and 2012; Price 2010; 2012 for review), we think much of the data phonological error monitoring could result from this system. Gen- support P&G’s proposal. However, we also think P&G could be erally speaking, forward modeling during production helps make clearer about whether there are situations in which their model the many parts of an utterance mesh for accurate fluent speech. does not apply. For example, they state that “comprehenders That is also the function of cascading and feedback in hierarchical make whatever linguistic predictions they can” (target article, production models, except that representational levels rather than sect. 3.2, para. 1), but this is so broad as to be unfalsifiable. utterance parts are what are being meshed. Cascading of acti- Neurobiological evidence suggests production and perception vation to lower levels prepares the way for the construction of rep- system interdependence occurs in specific situations. By high- resentations at those levels. The resulting feedback allows for lighting emerging models and findings in the neurobiology of decisions at the higher representational level to be sensitive to receptive language, we suggest that P&G’s proposal could be information at the lower level. For example, feedback from pho- fine-tuned to make more-specific, testable predictions. nological forms to the word/lemma level allows for word selection Neurobiological evidence for the interdependence of receptive- at the higher level to reflect the retrievability of the form (Dell expressive language in speech perception. The most widely et al. 1997). A phonological form that is more easily available adopted model of language neurobiology is a dual-stream model will feed back more activation to its lemma than a form that is dif- analogous to the visual system (Ungerleider & Haxby 1994). ficult to retrieve will. Hence, the system will be biased to select Within this model, during receptive language, auditory speech lemmas whose forms will be available. Feedback also enables rep- sounds map to articulatory (motor) representations in a dorsal resentations at a lower level to mesh with higher-level infor- stream and to meaning in a ventral stream (Hickok 2009b; mation, as seen when feedback from phonological to lexical Hickok & Poeppel 2000; 2004; 2007; Rauschecker 2011; levels biases the phonological level activations toward lexical Rauschecker & Scott 2009; Rauschecker & Tian 2000; Rogalsky outcomes, functioning as a lexical editor (e.g., Nozari & Dell & Hickok 2011). If this is correct, models like P&G’s must 2009). account for the way these processing streams interact with the P&G emphasize that forward predictions are not actual pro- motor system involved in language production. duction representations. They are “easier-to-compute ‘impover- This problem is easier to solve within the dorsal stream, as many ished’ representations” (target article, sect. 3.1, para. 6). This is of the same brain regions are active during speech planning and also true of cascading in interactive models. Units that are active execution, and during speech perception (Callan et al. 2004; through cascading are less active than they would be if they Eickhoff et al. 2009; Hickok & Poeppel 2007; Pulvermüller were committed parts of a representation. Furthermore, unlike et al. 2006; Vigneau et al. 2006; Wilson et al. 2004). In fact, a committed representational elements, they have yet to be primary contention is not whether the motor system is recruited

24 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension during speech perception but in what situations it occurs. Some Intermediate representations exclude argue the motor system is essential (D’Ausilio et al. 2009; Iacoboni embodiment 2008; Meister et al. 2007), whereas others argue that it is only fi involved when auditory-only speech is dif cult to parse (e.g., doi:10.1017/S0140525X12002555 during noisy situations, or when discriminating between similar phonemic units; Hickok 2009a; Hickok et al. 2011; Sato et al. Guy Dove 2009; Tremblay & Small 2011). Department of Philosophy, University of Louisville, Louisville, KY 40292. The latter situation appears to be the case for audiovisual [email protected] speech perception, when visual information from the lips and mouth is present. Moreover, a forward-modeling architecture consistent with P&G’s proposal has been suggested to explain Abstract: Given that Pickering & Garrod’s (P&G’s) account integrates the neurobiology of audiovisual speech perception (Callan et al. language production and comprehension, it is reasonable to ask whether 2004; Skipper et al. 2005; Skipper et al. 2007b; van Wassenhove it is compatible with embodied cognition. I argue that its dependence et al. 2005; Wilson & Iacoboni 2006). Here, visual information, on rich intermediate representations of linguistic structure excludes temporally preceding the auditory signal by several hundred embodiment. Two options are available to supporters of embodied milliseconds (Chandrasekaran et al. 2009), provides a “forward cognition: They can adopt a more liberal notion of embodiment or they ” ’ can attempt to replace these intermediate representations with robustly model of the speech sound. These models draw on the listener s embodied ones. Both of these options face challenges. articulatory representations to provide possible phonetic targets of the talker’s speech (Callan et al. 2004; Skipper et al. 2007b; Pickering & Garrod (P&G) maintain that their integrated van Wassenhove et al. 2005). Findings that visual speech influ- approach to language production and comprehension is compati- ences the auditory neural response’s latency and amplitude (van ble with, but does not require, embodiment. I argue that it is Wassenhove et al. 2005), and recruits motor-speech regions incompatible. The fundamental role played by intermediate rep- (Callan et al. 2004; Dick et al. 2010; Hasson et al. 2007; Sato resentations that capture phonological, syntactic, and semantic et al. 2010; Skipper et al. 2005; 2007b; Watkins et al. 2003), structure rules out embodiment and preserves important support predictive coding via forward models of the kind P&G aspects of classical cognitive science. propose. Integration and embodiment. There is a clear affinity between Neurobiological evidence for the interdependence of receptive- P&G’s project and that of embodied cognition. The basic idea expressive language in language and gesture comprehension. behind embodied cognition is that cognitive processes are partly Although the neurobiological evidence for receptive-expressive constituted by wider bodily structures and processes. Glenberg language interdependence is compelling in speech perception, it (2010, p. 586) characterized it as the claim that “all psychological is mixed for higher-level language comprehension, which involves processes are influenced by body , sensory systems, brain regions along a ventral language pathway (Binder et al. 2009; motor systems, and emotions.” Such influence clearly requires an Hickok & Poeppel 2007; Vigneau et al. 2006). There is evidence – intimate relationship between action, perception, and cognition. for example, in processing verbs – that the motor system contrib- Traditionally, researchers have assumed that language proces- utes to understanding, and this is cited to support “motor sing is inherently modular; they have been committed to what simulation” theories (Cappa & Pulvermüller 2012; Fischer & Hurley (2008a) calls the classical sandwich. On this view, Zwaan 2008; Glenberg 2011; Glenberg & Gallese 2012). central cognition (the meat) intercedes between action and per- Notably, some authors interpret these findings without adhering ception (the slices of bread). P&G argue that language production to motor simulation theories (Bedny & Caramazza 2011; Mahon and comprehension are forms of action and action perception & Caramazza, 2009). Indeed, motor (production) system contri- respectively. As they see it, receivers of linguistic messages bution to language comprehension is a contentious issue (e.g., actively compute action representations during perception to this was a topic of an organized debate at the 2011 Neurobiology help them predict what they are about to perceive. Similarly, pro- of Language Conference). ducers of a linguistic message actively compute perception rep- Additional evidence suggests that involvement of the motor resentations to help them predict sensory feedback from their system is specific to the task. For example, Tremblay et al. ongoing action. In violation of the classical sandwich, comprehen- (2012) applied repetitive transcranial magnetic stimulation sion often involves production processes and production often (rTMS) to the ventral premotor cortex during a sentence compre- involves comprehension processes. hension task. The rTMS interfered with sentences describing One of P&G’s primary theoretical innovations is their use of manual actions, but not with other types of sentences, suggesting forward and inverse modeling to account for the dynamic that predictive motor encoding is not always called upon. Another nature of language processing. This clearly fits with the central example is gesture comprehension. Some studies have shown that role played by perceptual and motor simulation in many accounts the act of viewing gestures recruits areas associated with a putative of embodied cognition (for reviews, see Barsalou 2008; Kem- “mirror neuron” system thought to covertly simulate others’ merer 2010; Martin & Zwaan 2008). It also fits with the more actions (Green et al. 2009; Holle et al. 2008; Skipper et al. recent suggestion that prediction is important to guiding action 2007a; 2009; Willems et al. 2007; Xu et al. 2009), but others and perception (Gallese 2009). show no evidence that this correlates with comprehension The problem posed by intermediate representations. A core (Andric & Small 2012; Dick et al. 2009; 2012; Straube et al. aspect of P&G’s theory does not fit with embodied cognition: its 2011; Willems et al. 2009). reliance on disembodied representations. The problem begins In closing, we note that within P&G’s model it may not be with their acknowledgement that language is special: “Unlike necessary to elicit motor activation. For example, P&G state many other forms of action and perception, language processing that “embodied accounts assume that producers and com- is clearly structured, incorporating well-defined levels of linguistic prehenders use perceptual and motor representations associated representation such as semantics, syntax, and phonology” (target with the meaning of what they are communicating. Our article, sect. 1.3, para. 9).To handle the linguistic structure at account does not require such embodiment but is compatible these three levels, they posit “a series of intermediate represen- with it” (sect. 4, para. 9). Hence, the model seems able to tations between message and articulation” (sect. 3.1, para. 3). account for motor activity, or lack of it, during receptive language. These intermediate representations are central to their account. If this is the case, P&G should clarify what neurobiological Indeed, P&G define production processes as those that map findings could help decide between competing accounts that “higher” linguistic representations to “lower” ones and compre- call upon interdependent receptive and expressive language hension processes as those that map “lower” linguistic represen- systems. tations to “higher” ones.

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 25 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

One of the difficulties facing any attempt to assess embodied cog- Abstract: In examining the utility of the action view advanced in the nition is that it has been associated with several distinct theses Pickering & Garrod (P&G) target article, I first consider its contribution (Anderson 2003; Shapiro 2011; Wilson 2002). There is, however, to the analysis of language vis-à-vis earlier language-as-action good reason to think that P&G’s intermediate representations are approaches. Second, I assess the relation between coordinated joint incompatible with most versions of embodiment. Obviously, any action, which serves as a blueprint for dialogue coordination, and the experience of shared reality, a key concomitant and product of appeal to representations excludes radical anti-representational interpersonal communication. forms of embodied cognitive science (Chemero 2009). Less radical forms of embodied cognitive science generally assume that The theory of language production and comprehension (TLPC), embodiment requires, at a minimum, grounding in modality-specific laid out in impressive keenness in the Pickering & Garrod input/output systems. Pezzulo et al. (2011,p.3)outlineacore (P&G) target article, rests on an analogy between language and feature of this grounding: “Perhaps the first and foremost attribute action. It musters a throng of cutting-edge research on forward of a grounded computational model is the implementation of cogni- modeling and covert simulation to flesh out the analogy. In the fol- tive processes… as depending on modal representations and associ- lowing, I examine the utility of the present action view for the ated mechanisms for their processing… rather than on amodal understanding of verbal communication and interpersonal align- representations, transductions, and abstract rule systems.” P&G’s ment. I will first consider the distinct contribution to the analysis intermediate representations clearly fail to meet this criterion. of language vis-à-vis earlier language-as-action approaches. Then I Traditional cognitive science posits amodal representations for will turn to the relation between coordinated joint action, which a reason: They provide a means of integrating information associ- serves as a blueprint for dialogue coordination, and a key conco- ated with distinct perceptual and motor modalities. Psycholin- mitant and product of interpersonal communication, that is, the guists often argue that amodal representations are needed for experience of a shared reality between interlocutors. language processing because linguistic structure transcends the One merit of the TLPC lies in its extension of the action- particulars of the various modalities associated with production language analogies that have been championed by a prominent and comprehension (e.g., Jackendoff 2002; 2007; Pinker 2007). lineage of approaches epitomized by Austin (1962) and Grice On the syntactic front, the well-known structural similarity of (1975). These approaches characterized language use as purpose- signed and spoken languages is taken to provide further support ful contextualized action (Holtgraves 2002). However, much of for this claim (Goldin-Meadow 2005; Poizner et al. 1987). the research inspired by the language-as-action perspective did Although the need for amodal representations has typically not study issues that are now addressed by the TLPC, primarily been formulated against the assumption that production and com- the online processes that permit the seamless and instantaneous prehension are separate processes, this background assumption is mutual attunement to the current topic (Holtgraves 2002, pp. not necessary. Indeed, as P&G show, amodal representations can 180–82). In this respect, the integration of action simulation by serve as a bridge for the ongoing interaction between production the TLPC marks a novel and promising contribution. and comprehension. To mangle a cliché, P&G provide a way to However, the focus of the TLPC on co-present interweaving of avoid throwing the meat out with the sandwich by identifying an action, based on prediction processes, outshines key insights from important role for the sort of amodal representations posited by language-as-action work and hence comes at a cost. First of all, the traditional cognitive science within a non-modular account of role of context and communicative intentions, essential to any language processing. view of language use as action, is mentioned rather parenthetically Conclusion. P&G’s appeal to intermediate representations and relegated to subsidiary information contained in one of the leaves supporters of embodied cognition with something of a processing modules (viz., the production command). More dilemma. On the one hand, they could try to liberalize the notion specifically, it seems that the emphasis on prediction processes of embodiment in order to encompass such representations. This underappreciates postdictive processes, that is, the search for an move is not without precedent. Meteyard et al. (2012), for adequate interpretation after utterances have been perceived. example, argue that researchers need to consider the possibility Recipients often enough straggle and struggle to infer what is that cognition is weakly embodied because it involvessupramodal meant by, for instance, nonliteral utterances, figures of speech, representations that capture associations between distinct sensori- or complex (scientific or literary) formulations. According to motor systems (Barsalou et al. 2003; Damasio & Damasio 1994; language-as-action approaches, these efforts are driven by prag- Gallese & Lakoff 2005). The obvious danger of this strategy is matic assumptions of cooperativeness and mutual adherence to that it could erode the force and novelty of the thesis that cognition communication rules (Higgins 1981). is embodied. On the other hand, they could try to offer more Furthermore, the view of action execution and perception, robustly embodied accounts of phonological, syntactic, and seman- which serves as the blueprint for the meticulous modeling of tic knowledge. This strategy faces a general challenge: In order to language production and comprehension, restricts the action- eliminate the need for amodal representations at a given level, it language analogy to co-present, oral dialogue. Given the is not enough to show that some phenomena at that level can be primacy of face-to-face conversation (e.g., Clark & Wilkes-Gibbs handled in an embodied fashion. Instead, what needs to be 1986), such a focus of the theory design is reasonable. However, shown is that all of the phenomena at that level can be handled the role of action in other forms of verbal communication in this way (Toni et al. 2008). This sets the bar very high, and we remains an open issue. For instance, writers want to accomplish can reasonably doubt that it is achievable. As matters stand, purposes with what they write; readers attempt to identify these neither of these options seems particularly promising. purposes. (Processes underlying reading, specifically prediction- by-association, are addressed only once at the end of P&G’s article.) To what extent can the action-view embraced by the TLPC account for the processes that operate, for instance, in The role of action in verbal communication and typing a tweet or blog commentary, and in reading and interpreting shared reality it? In reading, there are no immediate sensory perceptions of an interlocutor’s movements or the current interaction context that doi:10.1017/S0140525X12002567 can help a recipient to infer the intended action or purpose under- lying a piece of text; but there are other resources, such as assump- Gerald Echterhoff tions of conversational rules or background knowledge, that allow Social Psychology Group, Department of Psychology, University of Münster, recipients to make such inferences. It seems that the role of D-48149 Münster, Germany. action outside of co-present conversation can be more readily [email protected] accounted for by other approaches from the “language-as-action” family (for the reception of literary texts, see, e.g., Ricœur 1973).

26 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

A second issue I want to examine concerns the relation between language users. Compared to earlier language-processing coordinated joint action and the experience of a shared reality models, it is more complex – in particular, by integrating pro- between the interlocutors. Shared reality is of potentially high rel- duction and comprehension – with regard to predicting actions evance to verbal communication because interpersonal communi- and the embodiment of language. However, it fails to account cation is a key arena of social sharing. In section 2.3, the models of for the important aspects of increased processing demands and prediction and simulation are applied to action coordination in processing costs in bilingual language interaction. These joint activities such as ballroom dancing or carrying a bulky additional demands and costs are observed even for bilinguals object. P&G claim that the joint-action model can explain “the with native-like knowledge of the language (so-called highly experience of ‘shared reality’ that occurs when A and B realize proficient bilinguals) and cannot be attributed to a low level of that they are experiencing the world in similar ways” (target proficiency. I consider these aspects most relevant for a truly inte- article, sect. 2.3, para. 5). grated, up-to-date, and generalizable theory of language pro- From the perspective of shared-reality theory, however, the duction and comprehension. commonality or alignment involved in the coordination of action When comparing the performance of monolinguals and bilin- does not necessarily involve the experience of shared reality. guals on the same task, we can trace increased processing According to a current definition (Echterhoff et al. 2009), demands and processing costs with different measures, two of shared reality is the product of the motivated process of experien- which I will describe. Such studies investigate the language pro- cing an interpersonal commonality of inner states about the world. duction in only one language at a time, not during switching The creation of a shared reality allows us, for example, to form between languages and do not make claims about switch costs. political or moral convictions, or to evaluate other people or First, increased processing demands for bilinguals can be groups. For instance, when people meet a new employee at observed in functional magnetic resonance imaging (fMRI) their workplace, they tend to form shared impressions of the new- studies in terms of modulation of the blood-oxygen-level-depen- comer with their colleagues. As such, and consistent with extant dent (BOLD) signal. A recent study (Parker Jones et al. 2011) applications of the theory (e.g., Echterhoff et al. 2008), shared compared performance of monolinguals and highly proficient reality essentially reflects an evaluative alignment between com- bilinguals on naming pictures and reading words aloud. They municator and audience regarding a target entity or state of demonstrated that the same five left hemisphere areas sensitive affairs. to increasing demands on speech production in monolinguals Given this conceptualization, the experience of shared reality showed higher activation in bilinguals. More specifically, during and action coordination can be dissociated. Having common rep- word retrieval and articulation, higher activation was found in resentations of an activity and each other’s actions does not mean dorsal precentral gyrus, pars triangularis, pars opercularis, that the actors have a shared reality regarding how to evaluate the superior temporal gyrus, and planum temporale. Word retrieval activity. For example, two high-school graduates may perform a was more demanding for bilinguals than for monolinguals. seamlessly coordinated ballroom dance at a prom, but the two Second, processing costs for highly proficient bilinguals are also may feel about and evaluate the dance differently: Whereas one reported from simple word retrieval tasks. Reaction time latencies of them may experience it as the exciting beginning of a romantic are longer for bilinguals than for monolinguals when they were affair, the other could view it as the mere fulfillment of an obli- naming pictures or producing a list of tokens from a common gation, or could even fear subsequent harassment. semantic category (e.g., Gollan et al. 2002; 2005). This distinction may not come as a surprise because the TLPC Increased language-processing demands are attributed to the is designed to address the key explanandum of psycholinguistic speaker’s necessity of dealing with two languages. Parallel acti- research, that is, the success of conversational interaction, with vation of both languages was observed at all stages of bilingual an emphasis on the rapidity and smoothness of coordination. It speech production (e.g., Guo & Peng 2006). Consequently, task is not designed to address higher-level commonalities, such as performance in one of two available languages involves compe- the sharing of attitudes, evaluations, and judgments. Still, for tition of words of the unwanted language with those of the the sake of further integration and synergy, it may be worthwhile intended language. Therefore, lexical selection is a more demand- to consider the possible interplay of conversation and shared ing process for bilinguals than for monolinguals. As a result, retrie- reality from the perspective of the TLPC. val of words is slower even in highly proficient bilinguals of all ages than in monolinguals (Sorace 2011). Unbalanced bilinguals are better in one language compared with the other. Hence, they have a stronger and a weaker language. Usually their mother tongue is the stronger language The complexity-cost factor in bilingualism whereas a second, later learned language is the weaker language, characterized by incomplete acquisition of lexicon and grammar. doi:10.1017/S0140525X12002579 If the first language undergoes attrition (that is, language loss due to infrequent use of that language as a consequence of Julia Festman migration), this language becomes the weaker language and the University of Potsdam, Potsdam Research Institute for , 14476 second language develops to be the stronger language. In both Potsdam, Germany. cases, if language proficiency is rather low, it is even more [email protected] demanding to produce that weak language. It requires more resources to inhibit the stronger, unwanted language (Meuter & Allport 1999). This means that using a weak language involves Abstract: Language processing changes with the knowledge and use of much language control to avoid unintentional switching to the two languages. The advantage of being bilingual comes at the expense stronger language. Several processing models for bilingualism of increased processing demands and processing costs. I suggest (e.g., Green 1986) capitalize on the concept of language control considering bilingual complexity including these demands and costs. The proposed model claims effortless monolingual processing. By integrating and emphasize the problem of limited resource and processing individual and situational variability, the model would lose its idealistic capacities. Applying the proposed theory to a communicative touch, even for monolinguals. setting with unbalanced bilinguals, I seriously question, in particu- lar, the suggested obligatory action predictions and the effortless Because most people today are bilingual, that is, they are language involvement in interactive language. producers and comprehenders of at least two languages, the pro- P&G acknowledge one condition under which predictions of posed model by Pickering & Garrod (P&G) cannot be generalized one’s own and other’s actions can be difficult, namely, if they in its current version; it accounts only for a small minority of are “unrelated” (an example for a related action would be

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 27 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension ballroom dancing; see target article, sect. 2.3, Joint Action). In my species of action and perception that, accordingly, require expla- view, in the domain of language, the ease of prediction depends nations consistent with those of nonlinguistic acting and perceiv- not only on relatedness, but also, and much more, on familiarity. ing. A second is the recognition that, in contrast to typical Not only are predictions easier and more likely to be correct when theoretical treatments, language production and comprehension communicative participants are familiar with each other and their are thoroughly interleaved. To reflect that, Pickering & Garrod communicative habits, but also the interlocutors must be familiar (P&G) weaken the “horizontal split” in their Figure 1 that separ- with the language they are using for the interaction, and with the ates processes of comprehension and production within an culture of that language. It is certainly rather easy to predict individual. actions of communication in a long-standing relationship (e.g., However, in my view, the particular integration of production the best friend), but for an exchange student – age 16, foreign to and comprehension processes that the authors propose is unrealis- a country, new to a host family, and with little knowledge either tic in consisting of a complex cognitive tiling of predictive model- of the language spoken or of the family’s speech habits – the ing processes (that is, predictions at multiple levels that occur success rate of other’s action prediction is most probably low. repeatedly over time). The proposal falls into the category of Similarly, in foreign-language reading, cultural familiarity has theoretical accounts that Bentley (1941) identified as “sad been found to be crucial for comprehension. Studies on non- responses,” that are sad in unrealistically ascribing responsibility native reading revealed that if readers lack the relevant cultural for behavioral systematicities in the world almost exclusively to knowledge, reading activities could not fully compensate for the processes “in the head” or “in the brain” (p. 13). discrepancy or help readers comprehend a text (Erten & Razı I recommend a different route to understanding language use 2009). that involves weakening separations not only along the horizontal With all this in mind, it becomes clear that the language produ- dimension of P&G’s Figure 1, but also along the vertical dimen- cer in the proposed model is ideal, even as a monolingual. There sion, the one that separates entities bounded, as Bentley (1941) are moments when we suffer a snub from another person’s action: put it, by the skin. Warren (2006) offered an integrated theory we are so surprised that words fail us. Some seem to have the gift of perception and action along these lines, and Marsh et al. of gab, and are more quick-witted than others. For such situa- (e.g., Marsh et al. 2006) provided a compatible ecological tional or inter-individual variability known from everyday life, approach that encompasses social interactions. the current model does not account. It holds the view that inter- Warren (2006) explicitly rejected the model-based approaches locutors are perfectly coordinated and that there is no time gap adopted by P&G on grounds that, (a) in them, implausibly, between interlocutors’ turns. Nonetheless, P&G have been actor/perceivers are supposed to interact directly with their aware of differences between native and non-native speakers or internal models, but indirectly with the world itself, and (b) inter- of the difference between speaking to an adult or a child (see acting with the models means using representations whose origins sect. 4, General Discussion) to some degree, but they did not inte- must appeal “in circular fashion to the very perception and action grate these differences into their model. The authors might want abilities they purport to explain” (p. 361). He offered instead an to elaborate on situations and conditions when the construction of approach in which perceiver/actors and their environments forward models, covert imitations, joint actions, and continuous are modeled as dynamical systems that are coupled both mechani- prediction generation are not ideal, not even for the monolingual. cally and informationally. Adaptive behavior emerges from con- straints arising from the structure of the environment, the biomechanics of the body, perceptual information about the coupled system, and task demands. In the approach of Marsh “ ” et al. to social perception and action (e.g., Marsh et al. 2006; An ecological alternative to a sad response : 2009), the coupled dynamical systems relevant to understanding Public language use transcends the social activity include more than one perceiver/actor. In the boundaries of the skin context of this account, the kinds of alignments among inter- locutors that Garrod and Pickering (2004) identified as fostering doi:10.1017/S0140525X12002580 successful conversation are reflections of the interpersonal coordi- nations that occur when humans come together to engage in joint Carol A. Fowler activities. Department of Psychology, University of Connecticut, Storrs, CT 06269. This approach has promise for understanding interpersonal [email protected] language use in two ways that are relevant to themes in the target article. One is that the approach promises to obviate postulating the Abstract: Embedding theories of language production and comprehension multiple levels of repeated predictive modeling that characterizes in theories of action-perception is realistic and highlights that production P&G’s account. Some perceptual information is prospective in and comprehension processes are interleaved. However, layers of internal nature in signaling what will happen. Moreover, use of prospective models that repeatedly predict future linguistic actions and perceptions are implausible. I sketch an ecological alternative whereby perceiver/ stimulus information (e.g., about nutrients at some distance) actors are modeled as dynamical systems coupled to one another and to occurs among organisms that lack a nervous system and hence the environment. lack the means to construct forward and inverse models (see, e.g., Reed 1996). Turning to humans, an outfielder for example, In investigating between-person language use, Pickering & does not need to predict where and when a fly ball will become Garrod (P&G) have taken a road less traveled. Psychology of catchable; structure in reflected light over time provides prospec- language has been dominated by studies of private language pro- tive information about the ball’s future trajectory (e.g., Michaels & cessing within individuals, respecting the view of Noam Chomsky Oudejans 1992). Likewise, information in reflected light can signal that public language, “however construed, appears to have no sig- whether it is safe to cross a street in traffic without pedestrians nificance” (1986, p. 31). I particularly admire Garrod and Picker- having to predict whether or when a vehicle will cross their path ing’s (2004) paper in which they invoked between-person (e.g., Oudejans et al. 1996). In short, prospective information alignments at multiple linguistic levels to explain why, for most can constrain action without intervening predictions being of us, conversation is easier than monologue even though conver- made. In language, the prospective information can be pragmatic, sation requires coordination with others. syntactic, lexical, semantic, phonological, and so on. The approach presented in the target article is important and A second domain in which the ecological approach may have valuable in other ways. One is the recognition that language pro- promise for understanding language production and comprehen- duction and comprehension at its various descriptive levels are sion concerns the pervasive findings to which P&G allude of

28 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension imitation or, less often, complementation in language use and aphasia reported by Marshall et al. (1985). Although this patient’s elsewhere. I suggest that imitation occurs pervasively in laboratory auditory system was intact, she was unable to comprehend familiar research, because perception of the actions of someone else sounds, words, or sentences and could not report number of syl- essentially serves as instructions for imitation (e.g., Fowler et al. lables or stress contrasts. Speech production was seriously 2003). Therefore, it is often easy to imitate, and imitation can impaired, with speech often containing neologistic jargon. But be a default reflection of the disposition of humans to coordinate despite the patient’s severe comprehension problems, she pro- with one another (e.g., Richardson et al. 2007). In general, duced a great many (often unsuccessful) attempts at self-correc- however, imitation is not always an adaptive response to the tions of errors – in particular, her phonological errors (but not actions of someone else. Nor is complementing what is perceived. her semantic errors). These findings suggest that error detection Missing from much research on “embodied cognition” are task can take place without perception. constraints that encourage anything other than default mirroring. Another patient that challenges perception-based monitoring However, when imitation is discouraged, embodied responses to is G., a 71-year-old Dutch Broca’s aphasic (Oomen et al. 2005). language input may occur that are not imitative (cf. Olmstead In a speech production task, G. produced many phonological et al. 2009). In the context of tasks that make other kinds of inter- errors, of which he repaired very few, whereas he produced personal coordinations relevant, imitative responses may not be as very few semantic errors, which he usually repaired. G.’s pro- pervasive as they are in typical laboratory research. duction difficulties hence mirror his monitoring difficulties. Indeed, an important issue for understanding public language Importantly, G.’s difficulty to repair phonological errors cannot concerns how in general utterances constrain users’ behavior. be attributed to a perception deficit: In a perception task, he When someone shouts Duck! to a bicyclist obliviously approach- detected as many phonological errors as a group of controls. It ing a low-hanging branch, an adaptive response is, in fact, to therefore seems that G.’s monitoring deficit is related to this pro- duck (not to imitate the utterance). Research using the visual duction deficit and not to any perception deficit. These data argue world procedure (e.g., Ferreira & Tanenhaus 2007), albeit against a forward model account, because the forward model is a designed with other purposes in mind, suggests that language separate and qualitatively different system from the production can affect nonimitative adaptive action (in this case, by the eyes) implementer. There is no reason why the monitoring deficit very quickly. Investigation of language use embedded in meaning- should mirror the production deficit. ful contexts may help to reveal whether or not imitation is Our recent visual world eye-tracking data (Huettig & Hartsuiker fundamental. 2010) also speak against both perceptual loop and forward modeling accounts. When our subjects named a picture of a heart, they gazed at the phonologically related written word harp more often than at unrelated words, and this “competitor effect” had the same time course as the analogous effect when lis- Are forward models enough to explain tening to someone else (Huettig & McQueen 2007). These data self-monitoring? Insights from patients are not consistent with perceptual loop accounts, which predict and eye movements earlier competitor effects in production than in comprehension, because the phonological representation inspected by the inner doi:10.1017/S0140525X12002749 loop precedes external speech by a considerable amount of time (namely, the time articulation takes). Similarly, the represen- Robert J. Hartsuiker tations created by forward models also precede overt speech in Department of Experimental Psychology, Ghent University, B-9000 Ghent, time, so a monitoring with forward models account also predicts Belgium. an early competitor effect. After all, forward models are predic- [email protected] tions of what one will say, and fixation patterns in the visual∼rhartsui/ world are strongly affected by prediction. One might object that the predicted phonological percept is too impoverished to ’ ’ Abstract: At the core of Pickering & Garrod s (P&G s) theory is a monitor create a phonological competitor effect. But this seems to contrast that uses forward models. I argue that this account is challenged by with Heinks-Maldonado et al.’s(2006) magnetoencephalography neuropsychological findings and visual world eye-tracking data and that it has two conceptual problems. I propose that conflict monitoring data showing reafferance cancellation with frequency-shifted avoids these issues and should be considered a promising alternative to feedback, which implies a highly detailed phonological percept perceptual loop and forward modeling theories. that even includes pitch. There are also conceptual issues with monitoring via forward At the core of Pickering & Garrod’s (P&G’s) theory of language models. One is the reduplication of processing systems (Levelt production and comprehension is the monitor, a cognitive mech- 1989). Specifically, the production implementer creates semantic, anism that allows speakers to detect problems in speech. The pro- syntactic, and phonological representations whereas a forward posal is that the monitor compares perceptual representations model creates corresponding representations. If the forward from two channels, namely (a) a perceptual representation of model creates highly accurate representations at each level, we the semantics, syntax, and phonology that the “production imple- have two separate systems doing almost the same thing, which menter” is producing; and (b) forward perceptual representations is not parsimonious. But if the forward model creates highly impo- of forward production representations of each linguistic level. If verished representations (e.g., only one phoneme), such represen- there is a mismatch, the monitor has detected the problem and tations are not a good standard for judging correctness. It then so can begin a correction. Monitoring via forward models is part becomes difficult to see how speakers detect so many errors at of an elegant account that nicely integrates the action and so many linguistic levels. language literatures. But is this account compatible with findings Additionally, if we assume the output of forward models, on speech monitoring? although impoverished, is still good enough to be useful for the The model shares with perceptual loop theories (Hartsuiker & monitor, then the monitor will have to “trust” the forward Kolk 2001; Levelt 1989) the assumption that monitoring needs model, just like P&G’s metaphorical sailor having to trust his speech comprehension (i.e., to create a perceptual representation charted route. But there is no a priori reason for assuming that of produced speech). It therefore shares the problems of other the forward model is less error prone than the production imple- models with a perceptual component. One problem is that menter; in fact, if forward models are “quick and dirty” they will neuropsychological studies found dissociations between compre- be more error prone. Trust in the forward model will then be mis- hension and monitoring in brain-damaged patients. A striking placed. But such misplaced trust has the undesirable effect of example is a 62-year-old woman with auditory agnosia and creating “false alarms,” so that a correct item is replaced by an

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 29 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension error. “Corrections” that make speech worse do not seem to occur predictions (all is well, carry on). Hence, if motor prediction frequently, although some repetitions may in fact be misplaced were used in the context of perception, it would tend to suppress corrections (e.g., Hartsuiker & Notebaert 2010). sensitivity to that which is predicted, whereas an efficient mechan- P&G briefly mention an alternative to both the perceptual loop ism should enhance perception. Behavioral evidence bears this account and the forward model account, namely, a conflict moni- out. The system is less sensitive to the perceptual effects of self- toring account (Botvinick et al. 2001; Mattson & Baars 1992; generated actions (unless there is a deviation) than to externally Nozari et al. 2011). According to such accounts, monitoring generated perceptual events. Some of the “reafference can- does not use comprehension, but measures the amount of “con- cellation” effects noted by P&G are good examples: inability to flict” in each layer of production representations, assuming that self-tickle, saccadic suppression of motion percepts, and the conflict is a sign of error. Conflict monitoring has the advantages motor-induced suppression effect measured electrophysiologi- of allowing error detection without perception and that a pro- cally. This contrasts with nonmotor forms of prediction, what duction deficit at a given level is straightforwardly related to an P&G referred to as the association route, which might include error detection deficit at that level. Such an account is consistent context effects and priming and which tend to facilitate perceptual with Huettig and Hartsuiker’s (2010) eye-tracking data and avoids recognition. Put simply, motor prediction decreases perceptual the reduplication of processing components. Finally, in a conflict sensitivity to the predicted sensory event, nonmotor prediction monitoring account, there is also no issue of which representation increases perceptual sensitivity to the predicted sensory event. to “trust.” It is worth therefore considering conflict monitoring as Why, then, is there so much attention on motor-based prediction? a viable alternative to the perceptual loop and forward modeling P&G argue that there is evidence to support a role for motor accounts. prediction in language-related perceptual processes – a good reason to focus attention on a motor-based prediction process. There are problems with the evidence they cite, however. One cannot infer causation from motor activation during perception Predictive coding? Yes, but from what source? (it could be pure associative priming [Heyes, 2010; Hickok, 2009a]), the transcranial magnetic stimulation (TMS) evidence doi:10.1017/S0140525X12002750 in speech perception tasks is likely a response bias effect (Venezia et al. 2012), and the studies showing effects of imitation Gregory Hickok training on perception do not necessarily imply that imitation is Department of Cognitive Sciences, University of California, Irvine, CA 92697. carried out during perception, which is the claim that P&G wish [email protected] to make. There is a better way to conceptualize the architecture of the system, one that flows naturally out of fairly well-established Abstract: There is little doubt that predictive coding is an important models of cortical organization (Hickok & Poeppel 2007; Milner mechanism in language processing – indeed, in information processing & Goodale 1995). A dorsal stream subserves sensory-motor inte- generally. However, it is less clear whether the action system is the gration for motor control; it is a highly adaptable system (Catmur source of such predictions during perception. Here I summarize the et al. 2007) that links sensory targets (objects in space, sequences computational problem with motor prediction for perceptual processes of phonemes) with motor systems tuned to hit those targets under and argue instead for a dual-stream model of predictive coding. varying conditions. A ventral stream subserves the linkage Predictive coding is in vogue in cognitive neuroscience, probably between sensory inputs and conceptual memory systems; it is a for good reason, and we are no strangers to the idea in the domain more stable system designed to abstract over irrelevant sensory of speech (Hickok et al. 2011; van Wassenhove et al. 2005). The details. Both systems enlist predictive coding as a fundamental current trendsetters in predictive coding are the motor control computational strategy (Friston et al. 2010), but both in the crowd who have developed, empirically validated, and promoted service of what the systems are designed for computationally. Motor prediction facilitates motor behavioral (but suppresses per- the notion of internal forward models as a neural mechanism “ ” “ ” necessary for smooth, efficient motor control (Kawato, 1999; ception) and sensory or ventral stream prediction facilitates Shadmehr et al. 2010; Wolpert et al. 1995). But the basic idea perception (Hickok 2012b). has been pervasive in cognitive science for decades in the form P&G underline that their approach blurs the line between com- prehension and production and thus rejects the “cognitive sand- of theoretical proposals like analysis-by-synthesis (Stevens & ” Halle 1967) and in the form of empirical observations like wich view, whereas the alternative perspective just outlined priming, context and top-down effects, and the like. So Pickering might be interpreted as preserving the comprehension- & Garrod’s (P&G’s) claim that language comprehension involves production distinction. In this context, it is worth pointing out prediction is nothing new. Nor is it a particularly novel claim, that P&G do not actually blur the distinction between the two right or wrong, that the motor system might be involved in recep- slices of bread all that much. They are quite distinct compu- tive language; it has gotten much attention in the domain of tational and representational components as their c and p notation speech perception/phonemic processing, for example (Hickok attests, and they have even added some slices, an action et al. 2011; Rauschecker & Scott 2009; Sams et al. 2005; van Was- implementation system (p), a forward production model (p- senhove et al. 2005; Wilson & Iacoboni 2006) and has been a com- hat), a forward comprehension model (c-hat) and a perceptual system (c), each of which generates phonological, syntactic, and ponent of at least some aspects of sentence processing models for – decades (Crain & Fodor 1985; Frazier & Flores d’Arcais 1989; semantic representations nearly a loaf of bread. They do Gibson & Hickok 1993). What appears to be new here is the argue, correctly in my view, and consistent with many speech scientists and motor control researchers as well as the classical idea that prediction at the syntactic and semantic levels can ’ come out of the action system rather than being part of a purely aphasiologists (despite P&G s claims to the contrary), that com- perceptual mechanism. prehension and production systems must interact. We make the This is an interesting idea worth investigation, but it is impor- same claims of our dorsal stream (Hickok 2012a; Hickok & Poeppel 2007). But where P&G and others – including myself tant to note that there are computational reasons why motor pre- – diction generally is an inefficient, or even maladaptive, source for (Hickok et al. 2011) have gone wrong, in my view, is that they predictive coding during receptive functions. Here’s the heart of are trying to shoehorn a motor-control-based mechanism into a the problem. The computational goal of a motor prediction in perceptual system that it was not designed to serve. the context of action control is to increase perceptual sensitivity to deviations from prediction (because something is wrong and ACKNOWLEDGMENT correction is needed) and to decrease sensitivity to accurate Supported by NIH grant DC009659.

30 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

“Well, that’s one way”: Interactivity in parsing Hjalmarsson’s(2010), which compares string-based plans and and production computes the difference between the input speech plan and the current state of realisation. In our model, instead of having to doi:10.1017/S0140525X12002592 regenerate a new speech plan from scratch, we can repair the necessary increments, reusing representations already built up Christine Howes, Patrick G. T. Healey, Arash Eshghi, and in context, which are accessible to both speaker and hearer. Cur- Julian Hough rently, it is difficult to distinguish empirically between a dual-path Queen Mary University of London, Cognitive Science Research Group, model with predictions and a single-path incremental model School of Electronic Engineering and Computer Science, London E1 4NS, because both combine production and comprehension. United Kingdom. As the paper highlights, the “vertical” issue of interleaving pro- [email protected] [email protected] duction and comprehension is independent from the “horizontal” [email protected] [email protected] problem of accounting for how language use is coordinated in dia-∼chrizba/ logue. Nonetheless, this article extends previous Pickering and Garrod work (2004; 2007) in claiming that the model of intra- Abstract: We present empirical evidence from dialogue that challenges individual processing can be extended to inter-individual language some of the key assumptions in the Pickering & Garrod (P&G) model of processing (conversation). Unlike previous work, the new model speaker-hearer coordination in dialogue. The P&G model also invokes operates in different ways for speakers and hearers, and the an unnecessarily complex set of mechanisms. We show that a potential for differences between people’s dialogue contexts is computational implementation, currently in development and based on acknowledged (although not directly modelled). a simpler model, can account for more of this type of dialogue data. The problem with this generalisation is that in dialogue we do Pickering & Garrod’s(P&G’s) programmatic aim is to develop an not just predict what people are going to say, we also respond. integrated model of production and comprehension that can Even if I could predict what question you are about to ask, this explain intra-individual and inter-individual language processing does not determine my answer (although it might allow me to (Pickering & Garrod 2004; 2007). The mechanism they propose, respond more quickly). In terms of turn structure, all a prediction built on an analogy to neuro-computational theories of hand move- can do is make it easier for me to repeat you. Repetition does ments, involves producing and comparing two representations of occur in dialogue but is rare and limited to special contexts. each utterance; a full one containing all the structure necessary to Corpus studies (Healey et al. 2010) indicate that we repeat few produce the utterance and an “impoverished” efference copy that words (less than 4%) and little more syntactic structure (less can predict the approximate shape the utterance should have. than 1%) than would be expected by chance. Crudely, a cross- Although not our central concern, there is a tension between person prediction model of production-comprehension cannot endowing the efference copy with enough structure to be able explain 96% of what is actually said in ordinary conversation. to predict semantic, syntactic, and phonetic features of an utter- One conversational context that seems to depend on the ability ance and nonetheless making it reduced enough that it can be pro- to make online predictions about what someone is about to say is duced ahead of the utterance itself. To avoid a situation in which compound contributions, in which one dialogue contribution con- the “impoverishment” proposed for the efference copy is just tinues another, as in this excerpt from Lerner (1991): fi those things not required to t the data, we need independently Daughter: Oh here Dad, one way to get those corners out motivated constraints on its structure. Father: is to stick your fingers inside Neuro-computational considerations might provide such con- Daughter: Well, that’s one way. straints, but there are dis-analogies with the models of motor control P&G use as motivation. Efferent copies were originally Although it is unclear whether a predictive model better proposed to enable rapid cancellation of self-produced sensory accounts for the father’s continuation than one in which he is feedback, for example, to maintain a stable retinal image by can- building a response based on his partial parse of the linguistic celling out changes due to eye-movements. However, the claim input, the daughter’s response seems to be based on the mismatch that we use an analogous mechanism to predict, and correct, lin- between what was said and what she had planned to say. Although guistic structure before an utterance is produced involves some- possible she was predicting he would say what she herself had thing conceptually different. The awkwardness of phrases such planned to, there is no need for this additional assumption. as “semantic percept” highlight this difference; until the utterance Many cases of other-repair (Schegloff 1992) such as clarification is actually produced there is nothing to generate the appropriate requests asking what was meant by what was said (e.g., “what?”) sensory percept. Conversely, if the “percept” is internal we are also seem to require that any predictability used is impoverished still in the cognitive sandwich. at precisely the level it might be useful. These points aside, the target article provides a valuable overview In a study on responses to incomplete utterances in dialogue of the evidence that language production and comprehension are (Howes et al. 2012), increased syntactic predictability led to tightly interwoven. P&G’s main target, the “traditional model”, more clarification requests. Although participants made use of treats whole sentences, “messages” or utterances as the basic unit different types of predictability in producing continuations, pre- of production and comprehension. However, there is evidence dictability was neither necessary nor sufficient to prompt com- from cognitive psycholinguistics and neuroscience to show that pletion, and, in extremely predictable cases, participants did not language processing is tightly interleaved around smaller units. complete the utterance, responding as if the predictable elements The close interconnections between production and comprehen- had been produced. Our assumption is that it is the things we sion are especially clear in dialogue where fragmentary utterances cannot predict that are the most important parts of conversation. are commonplace and people often actively collaborate with each Otherwise, it is hard to see why we should speak at all. other in the production of each turn (Goodwin 1979). It is unclear if the interleaving of production and comprehen- sion requires internally structured predictive models. Recent pro- gress on incremental models of dialogue suggest a more Seeking predictions from a predictive framework parsimonious approach. In our computational implementation based on Dynamic Syntax (Purver et al. 2006; 2011; Hough doi:10.1017/S0140525X12002762 2011), the burden of predicting full utterances does not need to be employed in parsing, as speakers and hearers have incremental T. Florian Jaegera,b and Victor Ferreirac access to representations of utterances as these emerge. Contra- aDepartment of Brain and Cognitive Sciences, University of Rochester, rily, P&G’s approach to self-repairs is analogous to Skantze and Rochester, NY 14627-0268; bDepartment of Computer Science, University of

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 31 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

Rochester, Rochester, NY 14627; cDepartment of Psychology 0109, University words with low confusability tend to enter the lexicon (e.g., of California, San Diego, La Jolla, CA 92093-0109. “strodny,” rather than “extrary,” for “extraordinary”). If, [email protected] [email protected] however, the function of the forward model is to predict linguistic units, as P&G propose, no such generalization is expected. Rather, any deviation from the target phonology will cause a prediction Abstract: We welcome the proposal to use forward models to understand error, regardless of whether it affects the likelihood of being predictive processes in language processing. However, Pickering & Garrod (P&G) miss the opportunity to provide a strong framework for future understood. Similar reasoning applies to the reduction of morpho- work. Forward models need to be pursued in the context of learning. syntactic units, which often is blocked when it would cause This naturally leads to questions about what prediction error these systemic ambiguity (e.g., differential or optional case-marking, models aim to minimize. Fedzechkina et al. 2012; see also Ferreira 2008). Research on motor control finds that not all prediction errors Pickering & Garrod (P&G) are not the first to propose that com- are created equal: Stronger adaptation effects are found after prehension is a predictive process (e.g., Hale 2001; Levy 2008; task-relevant errors (Wei & Körding 2009). Indeed, in a recent Ramscar et al. 2010). Similarly, recent work has found that perturbation study on production, Frank (2011) found that speak- language production is sensitive to prediction in ways closely ers exhibit stronger error correction if the perceived deviation resembling comprehension (e.g., Aylett & Turk 2004; Jaeger from the intended acoustics makes the actual production more 2010). We believe that forward models (1) offer an elegant similar to an existing word (see also Perkell et al. 2004). account of prediction effects and (2) provide a framework that This view also addresses another shortcoming of P&G’s propo- could generate novel predictions and guide future work. sal. At several points, P&G state that the forward models make However, in our view, the proposal by P&G fails to advance impoverished predictions. Perhaps predictions are impoverished either goal because it does not take into account two important only in that they map the efference copy directly onto the pre- properties of forward models. The first is learning; the second is dicted meaning (rather than the intermediate linguistic units). the nature of the prediction error that the forward model is Of course, the goal of reducing the prediction error for efficient minimizing. information transfer is achieved by reducing the prediction error Learning. Forward models have been a successful framework at the levels assumed by P&G. In this case, the architecture for motor control in large part because they provide a unifying fra- assumed by P&G would follow from the more general principle mework, not only for prediction, but also for learning. Since their described here. However, in a truly predictive learning framework inception, forward models have been used to study learning – (Clark 2013), there is no guarantee that the levels of represen- both acquisition and adaptation throughout life. However, tation that such models would learn in order to minimize predic- except for a brief mention of “tuning” (target article, sect. 3.1, tion errors would neatly map onto those traditionally assumed (cf. para. 15), P&G do not discuss what predictions their framework Baayen et al. 2011). makes for implicit learning during language production, despite Finally, we note that, in the architecture proposed by P&G, the the fact that construing language processing as prediction in the production forward model seems to serve no purpose but to be context of learning readily explains otherwise puzzling findings the input of the comprehension forward model (sect. 3.1, from production (e.g., Roche et al. 2013; Warker & Dell 2006), Fig. 5; sect. 3.2, Fig. 6). Presumably, the output of, for example, comprehension (e.g., Clayards et al. 2008; Farmer et al. 2013; the syntactic production forward model will be a syntactic plan. Kleinschmidt et al. 2012) and acquisition (Ramscar et al. 2010). Hence, the syntactic comprehension forward model takes syntac- If connected to learning, forward models can explain how we tic plans as input. The output of that comprehension forward learn to align our predictions during dialogue (i.e., learning in model must be akin to a parse, as it is compared to the output order to reduce future prediction errors, Fine et al., submitted; of the actual comprehension model. Neither of these components Jaeger & Snider 2013; for related ideas, see also Chang et al. seems to fulfill any independent purpose. Why not map straight 2006; Fine & Jaeger 2013; Kleinschmidt & Jaeger 2011; Sonder- from the syntactic efference copy to the predicted “syntactic egger & Yu 2010). percept”? If forward models are used as a computational frame- Prediction errors. Deriving testable predictions from forward work, rather than as metaphor, one of their strengths is that models is integrally tied to the nature of the prediction error they can map efference copies directly onto the reference that the system is meant to minimize during self- and other-moni- frame that is required for effective learning and minimization of toring (i.e., the function of the model, cf. Guenther et al. 1998). the relevant prediction error (cf. Guenther et al. 1998). P&G do not explicitly address this. They do, however, propose separate forward models at all levels of linguistic representations. These forward models seem to have just one function, to predict the perceived linguistic unit at each level. For example, the syn- tactic forward model predicts the “syntactic percept,” which is used to decide whether the production plan needs to be adjusted Prediction plays a key role in language (how this comparison proceeds and what determines its outcome development as well as processing is left unspecified). Minimizing communication error: A proposal. If one of the doi:10.1017/S0140525X12002609 – goals of language production is to be understood or even to com- a a fi Matt A. Johnson, Nicholas B. Turk-Browne, and municate the intended message both robustly and ef ciently b (Jaeger 2010; Lindblom 1990) – correctly predicting the intended Adele E. Goldberg aDepartment of Psychology, Princeton University, Princeton, NJ 08544; linguistic units should only be relevant to the extent that not doing b so impedes being understood. Therefore, the prediction error that Program in Linguistics, Princeton University, Princeton, NJ 08544. forward models in production should aim to minimize is not the [email protected] [email protected] perception of linguistic units, but the outcome of the entire infer- [email protected] ∼ ence process that constitutes comprehension. Support for this adele alternative view comes from work on motor control, work on Abstract: Although the target article emphasizes the important role of articulation, and cross-linguistic properties of language. fi prediction in language use, prediction may well also play a key role in For example, if the speaker produces an underspeci ed refer- the initial formation of linguistic representations, that is, in language ential expression but is understood, there is no need to self- development. We outline the role of prediction in three relevant language- correct (as observed in research on conceptual pacts, Brennan & learning domains: transitional probabilities, statistical preemption, and Clark 1996). This view would explain why only reductions of construction learning.

32 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

Pickering & Garrod (P&G) argue forcefully that language pro- that have been heard. Rather, learners must generalize over utter- duction and language comprehension are richly interwoven, ances in order to understand and produce new formulations. The allowing for fluid, highly interactive discourse to unfold. They learning of novel phrasal constructions involves learning to associ- note that a key feature of language that makes such fluidity poss- ate form with meaning, such as the double object pattern with ible is the pervasive use of prediction. Speakers predict and “intended transfer.” Note, for example, that She mooped him monitor their own language as they speak, allowing them to something implies that she intends to give him something, and plan ahead and self-correct, and listeners predict upcoming utter- this meaning cannot be attributed to the nonsense word, moop. ances as they listen. The authors in fact provide evidence for pre- In the domain of phrasal construction learning, phrasal construc- dictive strategies at every level of language use: from phonology, tions appear to be at least as good predictors of overall sentence to lexical semantics, syntax, and . meaning as individual verbs (Bencini & Goldberg 2000; Goldberg Given the ubiquity of prediction in language use, an interesting et al. 2005). consideration that P&G touch on only briefly is how prediction We have recently investigated the brain systems involved in may be involved in the initial formation of linguistic represen- learning novel constructions. While undergoing functional mag- tations, that is, in language development. Indeed, the authors netic resonance imaging (fMRI), participants were shown short draw heavily from forward modeling, invoking the Wolpert audiovisual clips that provided the opportunity to learn novel con- models as a possible schematic for their dynamic, prediction- structions. For example, a novel “appearance” construction con- based system. And although their inclusion is surely appropriate sisted of various characters appearing on or in another object, for discourse and language use, these models are fundamentally with the word order Verb-NPtheme-NPlocative, (where NP is noun models of learning (e.g., Wolpert 1997; Wolpert et al. 2001). phrase). For each construction, there was a patterned condition Hence, the degree to which our predictions are fulfilled (or vio- and a random condition. In the patterned condition, videos lated) might have enormous consequences for linguistic represen- were consistently narrated by the V-NPtheme-NPlocative pattern, tations and, ultimately, for the predictions we make in the future. enabling participants to associate the abstract form and More generally, prediction has long been viewed as essential to meaning. In the random condition, the exact same videos were learning (e.g., Rescorla & Wagner 1972). shown in the same order, but the words were randomly reordered; Prediction might play an important role in language develop- this inconsistency prevented successful construction learning. ment in several ways, such as when using transitional probabilities, Most relevant to present purposes, we found an inverse relation- when avoiding overgeneralizations, and when mapping form and ship between ventral striatal (VS) activity and learning for pat- meaning in novel phrasal constructions. Each of these three terned presentations only: Greater test accuracy on new case studies is described, as follows. instances (requiring generalization) was correlated with less Transitional probabilities. Extracting the probability of Q given ventral striatal activity during learning. In other tasks, VS gauges P can be useful in initial word segmentation (Graf Estes et al. the discrepancy between predictions and outcomes, signaling 2007; Saffran et al. 1996), word learning (Hay et al. 2011; that something new can be learned (Niv & Montague 2008; Mirman et al. 2008), and grammar learning (Gomez & Gerken O’Doherty et al. 2004; Pagnoni et al. 2002). This activity may 1999; Saffran 2002). A compelling way to interpret the contri- therefore suggest a role for prediction in construction learning: bution of transitional probabilities to learning is that P allows lear- Better learning results in more accurate predictions of how the ners to form an expectation of Q (Turk-Browne et al. 2010). In scene will unfold. fact, sensitivity to transitional probabilities correlates positively Such prediction-based learning may therefore be a natural con- with the ability to use word predictability to facilitate comprehen- sequence of making implicit predictions during language pro- sion under noisy input conditions (Conway et al. 2010). Moreover, duction and comprehension. Future research is needed to sensitivity to sequential expectations also correlates positively with elucidate the scope of this prediction-based learning mechanism, the ability to successfully process complex, long-distance depen- and to understand its role in language. Such investigations would dencies in natural language (Misyak et al. 2010). Simple recurrent strengthen and ground P&G’s proposal, and would suggest that networks (SRNs) rely on prediction error to correct connection predictions are central to both language use and language weights, and appear to learn certain aspects of language in development. much the same way as children do (Elman 1991; 1993; Lewis & Elman 2001; French et al. 2011). Statistical preemption. Children routinely make overgeneraliza- tion errors, producing foots instead of feet,orShe disappeared the quarter instead of She made the quarter disappear. A number of Communicative intentions can modulate the theorists have suggested that learners implicitly predict upcoming linguistic perception-action link formulations and compare witnessed formulations to their predic- tions, resulting in error-driven learning. That is, in contexts in doi:10.1017/S0140525X12002610 which A is expected or predicted, but B is repeatedly used Yoshihisa Kashima,a Harold Bekkering,b and instead: children learn that B, not A, is the appropriate formu- c lation – B statistically preempts A. Preemption is well accepted Emiko S. Kashima in morphology (e.g., went preempts goed; Aronoff 1976; Kiparsky aMelbourne School of Psychological Sciences, The University of Melbourne, b 1982). Parkville, Victoria 3010, Australia; Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen, 6500 HE Nijmegen, Unlike went and goed, distinct phrasal constructions are vir- c tually never semantically and pragmatically identical. Nonethe- The Netherlands; School of Psychological Science, La Trobe University, Bundoora, Victoria 3086, Australia. less, if learners consistently witness one construction in contexts [email protected] [email protected] where they might have expected to hear another, the former [email protected] can statistically preempt the latter (Goldberg 1995; 2006; 2011; Marcotte 2005). For example, if learners expect to hear disappear used transitively in relevant contexts (e.g., She disappeared it), but instead consistently hear it used periphrastically (e.g., She made it uname=ekashima disappear), they appear to read just future predictions so that they ultimately prefer the periphrastic causative (Boyd & Goldberg Abstract: Although applauding Pickering & Garrod’s (P&G’s) attempt to 2011; Brooks & Tomasello 1999; Suttle & Goldberg forthcoming). ground language use in the ideomotor perception-action link, which Construction learning. Because possible sentences form an provides an “infrastructure” of embodied social interaction, we suggest open-ended set, it is not sufficient to simply memorize utterances that it needs to be complemented by an additional control mechanism

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 33 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension that modulates its operation in the service of the language users’ had no effect. These results are consistent with Carpenter’s communicative intentions. Implications for intergroup relationships and (1852) original proposal of the ideomotor principle, which goes intercultural communication are discussed. beyond perception-guided movement and stresses the impor- tance of conceptual action intention and expectation in the gui- Pickering & Garrod (P&G) collapse the oft-made separation dance of voluntary behaviour (see Ondobaka & Bekkering 2012, between comprehension and production, and propose to concep- for a recent review). tualise language processing in terms analogous to the ideomotor Analogously, communicative intentions may modulate the lin- account of action perception and motor action. In a nutshell, guistic perception-action link. For instance, conversants’ accents the ideomotor principle (1) rejects the modular view of percep- often converge (e.g., Giles et al. 1991), and this can be interpreted tion, cognition, and action, dubbed by Hurley (2008a) as the cog- within P&G’s framework. However, there is evidence to suggest nitive sandwich, and (2) suggests that perception and action are that accent convergence depends on the conversant’s communica- closely linked together, and cognition acts as a fallible control tive intentions. Bourhis and Giles (1977) examined bilingual mechanism that modulates and is modulated by the dynamic per- English-Welsh speakers’ accent change in a conversation with ception-action link. P&G highlight the perception-action link and an English interviewer. When their Welsh identity was threa- reconceptualise language comprehension (≈perception) and pro- tened, their accent depended on their personal goals. Those duction (≈action) as highly dynamic and closely linked processes who were learning Welsh to extend their careers showed a conver- that cannot be separated. In so doing, their proposal sheds an gence to the interviewer’s accent, whereas those who were learn- intriguing light on the mystery of language-based social inter- ing the language because of their Welsh identity showed a action – how humans can communicate with each other so effi- divergence, strengthening their Welsh accent. Babel’s(2010) ciently and smoothly to carry out their joint activity despite recent study partly replicated this finding. Native speakers of some obvious issues of occasional miscommunication and com- New Zealand English listened to an Australian English speaker munication breakdown. and pronounced the same words (i.e., shadowed productions). We applaud their attempt to ground language use in the mech- Despite the similarities between New Zealand and Australian anisms of motor perception and action, and we endorse their the- English dialects, there are detectable and systematic differences. orizing that the perception-action substrate provides an The participants who had more-negative implicit attitudes “infrastructure” of language-based social interaction. However, towards Australia (vs. New Zealand) showed less convergence to we also believe that this infrastructure needs to be complemented the Australian accent. by an additional control mechanism that modulates its operation, These findings in language use as well as nonverbal mimicry or cognition, the “ideo” part of ideomotor theory. We suggest that (e.g., Castelli et al. 2009) suggest that communicative and action one such mechanism is language users’ communicative intentions. intentions can modulate the verbal and nonverbal perception- What are communicative intentions? Language is typically action link in joint activities. This raises at least two classes of criti- used within the social context of joint activities (e.g., Clark cal questions. First, how do communicative and action intentions 1996; Kashima & Lan, in press). As Bratman (1992) noted, com- regulate the embodied substrate of social interaction? What are mitment to a joint activity and readiness to be responsive to the the mechanisms for the dynamic and mutual influences between intentions and actions of one’s partner are integral to an inten- the conceptual strata of intentions and the embodied strata of per- tional joint activity. The participants in a joint activity hold joint ception-action link? Second, what social and cultural processes intentions, or intentions to carry out their individual parts of the contribute to the shaping of the two strata, which in turn can joint activity to attain their joint goal while coordinating each have a long-term impact on the evolution of macro structures other’s actions (e.g., Bratman 1999; Pacherie 2012; Tuomela such as culture and society (e.g., Holtgraves & Kashima 2008)? 2007). By communicative intentions we mean intentions to In particular, the interaction of intentions and embodiment may carry out largely the linguistic part of the joint activity by perform- play a critical role in intergroup differentiation (e.g., Welsh vs. ing illocutionary and locutionary acts to attain its goal. English; Australian vs. New Zealander), the maintenance and dis- Indeed, P&G briefly alluded to the importance of joint inten- solution of the intergroup boundary, as well as intergroup and tions in language use: “linguistic joint action is more likely to be intercultural communication and understanding. successful and well-coordinated than many other forms of joint action, precisely because the interlocutors communicate with each other and share the goal of mutual understanding” (target article, sect. 3.3, para. 8). Echoing their sentiment and building on it more explicitly, we argue that intentional communicative Preparing to be punched: Prediction may not mechanisms can partly regulate the comprehension-production always require inference of intentions processes, and this can have significant sociocultural implications. Intentions and perception-action link. The modulation of doi:10.1017/S0140525X12002622 perception-action link by shared intentions was suggested by Ondobaka et al.’s (2011) work. In this experiment involving a con- Helene Kreysa federate and a naïve participant, the confederate first performed Department for General Psychology and Cognitive Neuroscience, Friedrich an action (e.g., selecting the higher of two numbers) on each Schiller University Jena, 07743 Jena, Germany. trial, and the participant performed an action congruent or incon- [email protected] gruent with the confederate’s action intention, that is, doing the same thing (selecting a higher number) or the opposite (selecting ’ ’ a lower number). However, the correct action for the participant Abstract: Pickering & Garrod s (P&G s) framework assumes an efference ’ copy based on the interlocutor’s intentions. Yet, elaborate attribution of was motorically congruous with the confederate s action in some intentions may not always be necessary for online prediction. Instead, cases (selecting the number displayed on the same side), but contextual cues such as speaker gaze can provide similar information incongruous in others (selecting the number on the opposite with a lower demand on processing resources. side). If the perception-action link is fixed and not intentionally modulated, the participant’s perception of an action should At several points in their target article, Pickering & Garrod (P&G) facilitate a motorically congruent action, but interfere with a suggest that prediction by simulation is based on determining a motorically incongruent action regardless of action intentions. conversational partner’s intention through perceiving the unfold- On the contrary, this occurred only when action intentions were ing action or speech, potentially combined with background congruous (i.e., the co-actors shared the same goal). When knowledge. On this basis, an efference copy of the intended act action intentions were incongruous, congruity of motor actions is generated, enabling the prediction of upcoming behavior

34 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension and the production of behavior or speech that complements recent reviews see Altmann 2011 and Huettig et al. 2011). It is it. But how explicit do the attributed intentions need to be to interesting to consider P&G’s classification and to speculate on allow such prediction, and how might comprehenders derive whether this might be prediction by association (e.g., “people them? who look at objects often mention them shortly thereafter”)or Unfortunately, P&G do not define clearly what they mean by even prediction by simulation (using one’s own gaze behavior as “intention”: Whereas on the part of the actor or speaker, an inten- a proxy, e.g., “if I had just looked at the kite, I’d refer to it tion seems to represent nothing more than an “action command” next”). Alternatively, it might be sufficient that speaker gaze (e.g., in the legend to Figure 3; target article, sect. 2.2), recogniz- attracts the comprehender’s attention to a location that is relevant ing intentions on the part of the comprehender is more compli- for understanding the unfolding speech utterance. In all three cated. According to P&G, it can involve considerations of past cases, the end result is a coordination of the interlocutors’ and present behavior and of the speaker’s perceived state of attention on the same objects in the visual world. This is known mind, as well as ongoing modifications of this interpretation. In to benefit problem solving (Grant & Spivey 2003; Knoblich the literature, identifying others’ intentions is generally taken to et al. 2005) and conversation in general (Richardson & Dale imply an additional step beyond action recognition, that of identi- 2005; Richardson et al. 2007). Such benefits may well be due to fying the goal of this action (see e.g., Levinson 2006; Tomasello a shared perspective and aligned representations of the situation, et al. 2005). A similar view underlies the HMOSAIC architecture but they need not imply awareness of the interlocutor’s intentions. (Wolpert et al. 2003), which contains symbolic representations of the task in the form of goals or intentions. Elaborate attributions of intention based on the interlocutor’s potential motivations in the current situation and on general world knowledge are certainly useful for understanding speech. A developmental perspective on the They help to generate expectations about how the conversation integration of language production and is likely to develop and can be used for what P&G call “offline pre- comprehension diction.” Yet although such expectations have, in turn, been shown to influence moment-by-moment language comprehension (e.g., doi:10.1017/S0140525X12002774 Kamide et al. 2003; Van Berkum et al. 2008), they are presumably not computed on a moment-by-moment basis and remain rela- Saloni Krishnan tively constant across extended periods of time. Centre for Brain & Cognitive Development, Birkbeck, University of London, The time-critical online simulations in P&G’s account of real- London WC1E 7HX, United Kingdom. time conversation must require intentions of a more basic [email protected] kind – some form of heuristic for anticipating others’ upcoming actions. To use their example: It is unquestionably valuable if I am quick to predict that someone is preparing to punch me Abstract: The integration of language production and comprehension fi rather than to shake my hand, so I can prepare to move appropri- processes may be more speci c in terms of developmental timing than ately. But if I start considering why they might wish to hurt me, I Pickering & Garrod (P&G) discuss in their target article. Developmental studies do reveal links between production and comprehension, but also will probably be too late in responding. demonstrate that the integration of these skills changes over time. For real-time online prediction of upcoming words and sen- Production-comprehension links occur within specific language skills tences, I would like to suggest that it may often be possible to and specific time windows. rely on contextual clues to a speaker’s upcoming actions that are directly perceivable: A particular tone of voice, a facial expression, Pickering & Garrod (P&G) have set out an argument for a possible a hand gesture, or a shift in the speaker’s gaze direction can all be integration of language production and comprehension processes informative about how the speaker plans to continue a sentence. in adults. However, charting how these two processes come to be In this sense, such cues are closely connected to the speaker’s set up during development is critical in attempts to understand intentions. At the same time, they are often produced uninten- their subsequent integration. An obvious problem when consider- tionally on the part of the speaker, and can be readily perceived ing production/comprehension processes within a developmental on the part of the comprehender. This is exactly what makes framework is that language production lags behind language com- them so efficient: They can help to disambiguate the linguistic prehension. So, to reiterate P&G’s question, can silent naming use signal without requiring deliberate consideration of intentions the production system, if no word production has yet emerged? (cf. Shintel & Keysar 2009, who refer to such processes as “non- Furthermore, the dynamic and nonmonotonic changes in the strategic generic-listener adaptations”; p. 269). development of children’s motor systems, as well as the specificity A prime example of this type of contextual cue is the direction of motor-language links during these stages, need to be integrated of other people’s gaze. Gaze is a salient attentional cue that into any such model. The need to examine developmental evi- reliably causes viewers to shift their own attention in the same dence in building this theory is therefore critical. direction (Emery 2000). This occurs even in the visual periphery Numerous links between production and comprehension occur and without requiring conscious awareness (Langton et al. 2000; during language learning in childhood. These links have been Xu et al. 2011). Additionally, because speakers tend to look at reported in developmental studies assessing gesture production objects they are preparing to mention (e.g., Griffin & Bock and language comprehension (Bates & Dick 2002; Iverson & 2000; Meyer et al. 1998), gaze will often directly reflect the speak- Thelen 1999). From an atypical development standpoint, there er’s action plan. Such referential gaze is therefore both easy to is also a greater incidence of co-morbid motor coordination and detect and informative about upcoming sentence content. In planning difficulties in children with language impairment fact, comprehenders can and do make rapid use of the speaker’s (Iverson & Braddock 2011). In support of P&G’s model, there gaze direction to anticipate upcoming referents (Hanna & are studies specifically demonstrating how auditory perception/ Brennan 2007; Staudte & Crocker, 2011) and even to assign language comprehension can also affect speech motor perform- thematic role relations (Nappa et al. 2009; Knoeferle & Kreysa ance in childhood. For instance, perceptual ability can influence 2012). the learning of motor gestures. Seemingly due to the complexity These benefits of gaze-following in comprehension can be con- of articulation, affricates are produced later in development for ceived of as a form of prediction about what will be mentioned English-speaking children. By contrast, for Putonghua-speaking next, similar to anticipatory fixations of objects in the visual children, affricates are acquired very early, probably due to world in eye tracking studies of spoken sentence processing their salience within the language (Dodd & McIntosh 2010). (e.g., Altmann & Kamide 1999; Knoeferle & Crocker 2006; for Furthermore, for higher cognitive-linguistic demands as

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 35 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension compared to lower ones, speech motor variability also increases Integrate, yes, but what and how?A (reviewed in Goffman 2010), indicating that the production pro- fl computational approach of sensorimotor cesses are in uenced by comprehension/perceptual processes. fusion in speech Indeed, a catalyst to changes in motor control may be vocabulary increases (Green & Nip 2010). Speech motor variability can also doi:10.1017/S0140525X12002634 act as an index of learning; kinematic analyses of motor move- ments reveal that children receiving training for articulatory dis- Raphaël Laurent,a,b Clément Moulin-Frier,a,c orders produce different motor gestures associated with Pierre Bessière,b,d Jean-Luc Schwartz,a and Julien Diarde phonetic categories, which were imperceptible at the acoustic aGIPSA-Lab – CNRS UMR 5216, Grenoble University, 38402 Saint Martin level (Gibbon 1999). D’Hères Cedex, France; be-Motion team - INRIA Rhône-Alpes, 38334 Saint However, it is important to note that these links do not extend Ismier Cedex, France; cFLOWERS team - INRIA Bordeaux Sud-Ouest, 33405 to all motor skills, and in particular, not to gross motor ability as Talence Cedex, France; dLaboratoire de Physiologie de la Perception et de assessed by locomotion or play (Bates et al. 1979). In longitudinal l’Action – CNRS UMR 7152, Collège de France,75005 Paris, France; studies of infants, using parental reports, two orthogonal factors of eLaboratoire de Psychologie et NeuroCognition – CNRS UMR 5105, Grenoble language comprehension and production seemed to exist (Bates University, 38040 Grenoble Cedex 9, France. et al. 1988). Alcock and Krawczyk (2010) specifically investigated [email protected] the link between oral motor control, and other motor and [email protected] [email protected] language abilities in 21-month-olds, and concluded that oral [email protected] motor control was significantly associated with the grammatical [email protected] complexity of utterances and with language production ability overall. They found no relationship between overall motor∼clement.moulin-frier/cv_en.html control and language comprehension ability at this age. In our study with school age children, oral motor control was linked to∼jean-luc.schwartz/ the production of novel words, but did not predict individual differences in the comprehension of syntactically complex sen- tences (Krishnan et al., in press). This evidence may appear con- Abstract: We consider a computational model comparing the possible roles of “association” and “simulation” in phonetic decoding, demonstrating that tradictory, but taking a developmental perspective may provide “ ” some explanation. First, there may be specific motor behaviours these two routes can contain similar information in some perfect communication situations and highlighting situations where their decoding that provide an opportunity to acquire and practise skills necessary performance differs. We conclude that optimal decoding should involve for language. For example, rhythmic hand banging peaks around some sort of fusion of association and simulation in the human brain. 28 weeks of age, and this is also when children start to produce reduplicated babbling (Iverson 2010). And, when rhythmic In their target article, Pickering & Garrod (P&G) propose an banging is delayed, as is the case in the neurodevelopmental dis- ambitious model of language perception and production. It is cen- order Williams syndrome, babbling and subsequent comprehen- tered on three main ingredients. First, it considers the complete sion and production are also delayed (Masataka 2001). hierarchy of layers of language processing, from message to Understanding the specific skills that are likely to cause changes semantics to syntax to phonology and finally, to speech. Second, in behaviour during a particular time-window may therefore be it features predictive forward models, so that temporally extended necessary. P&G’s model fails to provide an account of how com- sequences, such as whole sentences and dialogues, can be pro- prehension/production processes for learning language might be cessed. Third, it features dual processing routes, the “association” integrated within specific skills over developmental time. route and “simulation” route, so that auditory and motor knowl- The second factor that must be considered in a model integrat- edge can be involved simultaneously, rejecting the classic dichot- ing comprehension/production processes is the nonmonotonicity omy between perception and action processes. of these developmental trajectories, which are consistent across In this commentary, we set aside the temporal and hierarchical children. For example, rhythmic banging is low in pre-babbling aspects, and focus on the domain of speech perception and pro- children, increases sharply as infants start to babble, and then duction, where sequences are typically short (e.g., syllable percep- declines as infants become experienced at babbling (Iverson tion and production), and processing limited to phonological 2010). Similar nonmonotonic trajectories are seen across other decoding. Even in this more restricted field, the age-old debate speech motor skills, for instance, in the variability of lip and jaw between purely motor-based accounts and purely sensory-based movements (Smith & Zelaznik 2004) or for the coordination of accounts of perception and production now appears to be a upper and lower lip movements (Green et al. 2000). The combi- false dilemma (Schwartz et al. 2012). Indeed, neurophysiological nation of gestures and language during development may have a and behavioral evidence strongly suggests a dual route account similar nonmonotonicity, as event-related potential (ERP) evi- of information processing in the central nervous system, with dence suggests children infants younger than 20 months interpret both a direct, associative route and an indirect, simulation route. symbolic gestures and words similarly, but that gestures and words The target article amply documents the evidence, we do not take on divergent communicative roles when infants are 26 repeat examples here. months old (Sheehan et al. 2007). Although P&G suggest that In our view, the debate is now shifting toward the issue of the experience may be important for learning inverse-forward functional role of each route and their integration. That is to say, a model pairing, their model lacks explanations of how these central question of the debate asks what is integrated and how kinds of trajectories might arise, how trajectories change with integration proceeds in the human brain. time, and what kind of input may be necessary for change. For We would argue that conceptual models such as proposed in example, these trajectories may arise due to some combination the target article would unfortunately have a difficult time bring- of the changes in contextual support that are needed while a ing light to these questions. To support this argument, we con- skill is learnt, or the neural changes that occur during sider the question of perceptual decoding of phonetic units, for development. which we have developed a computational framework (Moulin- Therefore, in the model that P&G outline, I agree that integrat- Frier et al. 2012) based on Bayesian programming (Bessière ing knowledge about the production processes may help us under- et al. 2008; Colas et al. 2010; Lebeltel et al. 2004). With this fra- stand more about language comprehension, but this would be mework, various models of speech perception can be simulated possible only if the specificity of production-comprehension and quantitatively compared. One model is purely auditory, links and the developmental timing of their occurrence are exploiting what P&G call “association.” A second model is taken into account. purely motor, exploiting what they call “simulation.” A third one

36 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension is sensory-motor, integrating the association and simulation continuous flows of speech and consider semantic, syntactic and processes. phonology layers of processing. All of these models can then be implemented and compared in However, in our view, the main challenge for future studies is various experimental configurations. Three major results emerge first to assess what kind of information is present in “association” from such comparisons. and “simulation” routes, and second, to better understand how fi computational fusion models, describing the integration of these 1. Under some hypotheses, with perfectly identi ed communi- two routes, can account for experimental neurocognitive data. cation noise and no difference between motor repertoires of the speaker and the listener (i.e., when conditions for speech com- munication are “perfect”), motor and auditory theories are indis- tinguishable. Therefore, the “association” and “simulation” routes provide exactly the same information in these perfect communi- Towards a complete multiple-mechanism cation conditions. The reason is that, in our learning scenario, account of predictive language processing the auditory classifier is learned by association from data obtained through a motor production process, and possesses enough math- doi:10.1017/S0140525X12002646 ematical power of expression. This casts an interesting light on the question of what information Nivedita Mania and Falk Huettigb,c is encoded in the association and simulation routes: Labeling a box aLanguage Acquisition Junior Research Group, Georg-August-Universität as an “association” route, in a conceptual model, is not enough to Göttingen, 37073 Göttingen, Germany; bMax Planck Institute for be certain that it is different, from an information processing point Psycholinguistics, 6500 AH Nijmegen, The Netherlands; cDonders Institute for of view, from another box of the model. Computational descrip- Brain, Cognition, and Behaviour, Radboud University Nijmegen, Nijmegen, tions however, by virtue of rigorous mathematical notation, have The Netherlands. to be precisely defined, and their content can be systematically [email protected] [email protected] assessed. This also explains why behavioral evidence has histori- cally not been able to discriminate between motor and auditory Abstract: Although we agree with Pickering & Garrod (P&G) that prediction-by-simulation and prediction-by-association are important theories of perception and production: They are sometimes fi mechanisms of anticipatory language processing, this commentary simply indistinguishable. Unfortunately, we believe this dif culty suggests that they: (1) overlook other potential mechanisms that might was not avoided in the target article, in particular when P&G underlie prediction in language processing, (2) overestimate the detail experimental evidence for their model (e.g., target article, importance of prediction-by-association in early childhood, and (3) sect. 3.2.1, para. 7, “these four studies support forward modeling, underestimate the complexity and significance of several factors that but they do not discriminate between prediction-by-simulation might mediate prediction during language processing. and prediction-by-association”; and sect. 3.2.3, para. 6, “all of these findings provide support for the model of prediction-by- Pickering & Garrod (P&G) propose a model of language proces- simulation […]. Of course, comprehenders may also perform pre- sing that blends production and comprehension mechanisms in diction-by-association […].”). such a way as to allow language users to covertly predict upcoming 2. In the general case where “perfect conditions” for communi- linguistic input. They ascribe a central role to our production cation are not met, mathematical comparison of the models system covertly anticipating what the other person (or oneself) emphasizes the respective roles of motor and auditory knowledge might be likely to say in a particular situation (prediction-by-simu- in various conditions of speech perception in adverse conditions. lation). A second prediction mechanism, they argue, is based on “ ” the probability of a word being uttered (given other input) in Therefore, the information provided by the association and ’ “simulation” routes is more or less distinct and prominent depend- our experience of others speech (prediction-by-association). We ing on the communication conditions. In other words, this demon- agree with P&G that prediction-by-simulation and prediction- strates that adverse conditions provide leverage for discriminating by-association are important mechanisms of anticipatory language hypotheses about the perceptual and motor processes involved. processing but believe that they overlook other (additional) poten- This is convergent with recent findings from neuroimaging and tial mechanisms, overestimate the importance of prediction-by- ’ association in early childhood, and underestimate the complexity transcranial magnetic stimulation (TMS) studies (D Ausilio et al. fi 2012b; Meister et al. 2007; Zekveld et al. 2006), as well as compu- and signi cance of several mediating factors. tational studies (Castellini et al. 2011). Multiple mechanisms. If we consider predictive language pro- 3. In any case, sensory-motor fusion provides better perceptual cessing to be any pre-activation of the representational content of upcoming words, then only multiple-mechanism accounts of performance than pure auditory or motor processes. Therefore, ’ complementarities of information provided by the “association” prediction that are even more comprehensive than P&G s can “ ” fi account for the multifarious phenomena documented in the and simulation routes could be ef ciently exploited in the fra- fl mework of integrative theories such as those hinted at in the dis- language processing literature to date. Schwanen ugel and cussion of the target article. It is now obvious in the field of Shoben (1985), for example, have argued that more featural audiovisual perception that auditory and visual cues are comp- restrictions of upcoming words are generated in advance of the lementary, with a great deal of work already done on sensor input in high as opposed to low predictive contexts. These featural fusion. In our opinion, comparable work can now be done on (e.g., semantic, syntactic) restrictions may then constrain what how to integrate auditory and motor processes in speech percep- words are likely to come up. In other words, pre-activation of rep- tion. In this view, the proposal by P&G that “comprehenders resentational content is constrained by the featural restrictions emphasize whichever route is likely to be more accurate” (sect. 4, generated in particular contexts (e.g., via spreading activation in para. 6) can be regarded as a first candidate model, which a semantic network). Whereas such online generation of featural would have to be made mathematically precise and compared restrictions may be compatible with prediction-by-simulation (as with alternative explanations, possibly driven by neuroanatomical P&G appear to point out), it is conceivable that such pre-acti- findings (e.g., both auditory and motor processes are performed vation can also happen independently of prediction-by simulation automatically in parallel and compete, or they both bring infor- and thus constitutes a third anticipatory mechanism. mation in an ongoing fusion process, etc.). Predictive language processing may also make use of certain heuristics and biases (cf. Tversky & Kahneman 1973). Borrowing An obvious challenge, of course, is to bridge the gap between the availability heuristic from decision-making research, upcom- computational approaches such as ours, which are usually ing words may be pre-activated simply because of an availability restricted to isolated syllable production and perception, and con- bias (certain words/representations may be very frequent or ceptual models as proposed in the target article, that tackle have occurred very recently). The analogy to decision-making

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 37 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension research is not as far-fetched as one may think: Tversky and differences are bound to have substantial impacts on the specifics Kahneman introduced a simulation heuristic in the 1970s accord- (and degree) of anticipatory processing a particular language affords. ing to which people predict the likelihood of an upcoming event Future work could usefully explore the cognitive reality and by how easy it is to simulate it. Other heuristics may well be relative importance of the potential mechanisms and mediating worth exploring (in line with the affect heuristic, emotionally factors mentioned here. Even though Occam’s razor may favour charged words, e.g., stupid, boring, may be predicted more single-mechanism accounts, we conjecture that multiple-mechan- rapidly). Our point is simply that predictive language processing ism accounts are required to provide a complete picture of antici- is likely to be complex and may make use of a set of rather patory language processing. diverse mechanisms (with many yet unexplored). Importance of prediction-by-simulation in development. P&G suggest that analysis of children’s prediction abilities might throw light on the distinction between prediction-by-association and prediction-by-simulation and place stronger emphasis on pre- Toward a unified account of comprehension diction-by-association in young children: Prediction-by-associ- and production in language development ation might play a more important role when listeners and speakers have little in common with each other, such as the doi:10.1017/S0140525X12002658 case of children listening to adults’ talking. In a recent experiment examining 2-year-olds’ prediction abil- Stewart M. McCauley and Morten H. Christiansen ities, however, we found that, consistent with prediction-by-simu- Department of Psychology, Cornell University, Ithaca, NY 14853. lation, only toddlers in possession of a large production vocabulary [email protected] [email protected] are able to predict upcoming linguistic input in another speaker’s utterance (Mani & Huettig 2012; see Melzer et al. 2012, for similar results in action perception). Furthermore, if, as P&G Abstract: Although Pickering & Garrod (P&G) argue convincingly for a fi suggest, covert imitation is the driving force of prediction-by- uni ed system for language comprehension and production, they fail to simulation, then 18-month-olds are equipped with the cognitive explain how such a system might develop. Using a recent computational model of language acquisition as an example, we sketch a developmental pre-requisites for covert imitation: Covert imitation can modulate perspective on the integration of comprehension and production. We infants’ eye gaze behaviour around a (linguistically relevant) visual ’ conclude that only through development can we fully understand the scene (Mani & Plunkett 2010; Mani et al. 2012) similar to adults intertwined nature of comprehension and production in adult processing. behaviour (Huettig & McQueen 2007). Prediction-by-simulation may also be an important developmental mechanism to train Much like current approaches to language processing, contem- the production system (Chang et al. 2006). In sum, prediction- porary accounts of language acquisition typically assume a sharp by-simulation appears to be crucial even early in development, distinction between comprehension and production. This assump- and hence prediction-by-association is not necessarily the simple tion is driven, in large part, by evidence for a number of asymme- prediction mechanism which dominates early childhood. tries between comprehension and production in development. Mediating factors. Finally, there are many mediating factors Comprehension is usually taken to precede production (e.g., (e.g., literacy, working memory capacity, cross-linguistic differ- Fraser et al. 1963), although there are certain instances in ences) involved in predictive language processing whose inter- which children exhibit adult-like production of sentence types action with anticipatory mechanisms have been little explored that they do not appear to comprehend correctly (cf. Grimm and whose importance, we believe, has been vastly underesti- et al. 2011). Evidence for such asymmetries strongly constrains mated. Mishra et al. (2012), for instance, observed that Indian theories of language acquisition, challenging integrated accounts high literates, but not low literates, showed language-mediated of development and, by extension, integrated accounts of adult anticipatory eye movements to concurrent target objects in a processing. Hence, it is key to determine the plausibility of a visual scene. Why literacy modulates anticipatory eye gaze unified framework for acquisition that is compatible with evidence remains to be resolved, though literacy-related differences in for comprehension/production asymmetries. associations (including low-level word-to-word contingency stat- Although Pickering & Garrod’s (P&G’s) target article may be istics, McDonald & Shillcock, 2003), online generation of featural construed as a useful point of departure in this respect, P&G restrictions, and general processing speed are likely to be pay scant attention to how such a unified system for comprehen- involved. Similarly, Federmeier et al. (2002) found that older sion and production might develop. As a result, they implicitly adults are less likely to show prediction-related benefits during subscribe to a different, questionable distinction often made in sentence processing with a strong suggestion that differences in the language literature: the separation of acquisition from adult working memory capacity underlie differences in predictive pro- processing. In light of this, and given the tendency of develop- cessing. The influence of such mediating factors may greatly mental psycholinguists to view comprehension and production depend on the situation language users find themselves in: Antici- as separate systems, we briefly sketch a unified developmental fra- patory eye gaze in the visual world, for instance, requires the mework for understanding comprehension and production as a building of online models allowing for visual objects to be linked single system, instantiated by a recent usage-based computational to unfolding linguistic information, places, times, and each other. model of acquisition (McCauley & Christiansen 2011; submitted). Working memory capacity may be particularly important for Importantly, our approach is consistent with evidence for compre- anticipatory processing during such language-vision interactions hension/production asymmetries in development, even while (Huettig & Janse 2012). uniting comprehension and production within a single framework. More work is also required with regard to the specific represen- Our computational model, like that of Chang et al. (2006), tations which are pre-activated in particular situations. Event- simulates both comprehension and production, but it goes related potential studies have shown that even the grammatical beyond this and previous usage-based models (e.g., Borensztajn gender (van Berkum et al. 2005), phonological form (DeLong et al. 2009; Freudenthal et al. 2007) in that (a) it learns to do so et al. 2005), and visual form of the referents (Rommers et al. incrementally using simple distributional information; (b) it 2013) of upcoming words can be anticipated. Most of these offers broad, cross-linguistic coverage; and (c) it accommodates studies, however, have used highly predictive “lead-in” sentences. a range of developmental findings. The model learns from It also remains to be seen to which extent these specific represen- corpora of child and child-directed speech, acquiring item-based tations are activated in weakly and moderately predictive contexts. knowledge in a purely incremental fashion, through online learn- Last but not least, languages differ dramatically in all levels of lin- ing using backward transitional probabilities (which infants track; guistic organisation (Evans & Levinson 2009). These cross-linguistic cf. Pelucchi et al. 2009). The model uses peaks and dips in

38 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension transitional probabilities to chunk words together as they are complete understanding of the intertwined nature of language encountered, incrementally building an item-based “shallow comprehension and production. parse” as each incoming utterance unfolds. The model stores the word sequences it groups together, gradually building up an inventory of multiword chunks – a “chunkatory”–which underlies both comprehension and production. When the model encoun- ters a multiword utterance produced by the target child of a What does it mean to predict one’sown corpus, it attempts to generate an identical utterance using only utterances? chunks and transitional probabilities learned up to that point. Cru- cially, the very same chunks and distributional information used doi:10.1017/S0140525X12002786 during production are used to make predictions about upcoming material during comprehension. This type of prediction-by-associ- Antje S. Meyera,b and Peter Hagoorta,b ation facilitates the model’s shallow processing of the input. The aMax Planck Institute for Psycholinguistics, 6500 AH Nijmegen, model’s comprehension abilities are scored against a state-of- The Netherlands; bRadboud University Nijmegen, 6525 HP Nijmegen, the-art shallow parser, and its production abilities are scored The Netherlands. against the target child’s original utterances (the model’s utter- [email protected] [email protected] ances must match the child’s). The model makes close contact with P&G’s approach in that it uses information employed during production to make predictions Abstract: Many authors have recently highlighted the importance of about upcoming linguistic material during comprehension (consist- prediction for language comprehension. Pickering & Garrod (P&G) are ’ the first to propose a central role for prediction in language production. ent with recent evidence that children s linguistic predictions are This is an intriguing idea, but it is not clear what it means for speakers tied to production; cf. Mani & Huettig 2012). However, our ’ to predict their own utterances, and how prediction during production approach extends P&G s account from prediction to the acquisition can be empirically distinguished from production proper. and use of linguistic knowledge itself; comprehension and pro- duction rely upon a single set of statistics and representations, Pickering & Garrod (P&G) offer an integrated framework of which are reinforced in an identical manner during both processes. speech production and comprehension, highlighting the impor- Moreover, our model’s design reflects recent psycholinguistic tance of predicting upcoming utterances. Given the growing evi- findings that have hitherto remained largely unconnected, but dence for commonalities between production and comprehension which, when viewed as complementary to one another, strongly processes and for the importance of prediction in comprehension, support a unified framework for comprehension and production. we find their proposal timely and interesting. First, the model is motivated by children’s use of multiword Our comment focuses mainly on the role of prediction in units in production (Bannard & Matthews 2008), which cautions language production. P&G propose that speakers predict against models of production in which words are selected inde- aspects of their utterance plans and compare these predictions pendently of one another. The model’s primary reliance on the against the actual utterance plans. This monitoring process discovery and storage of useful multiword sequences follows this happens at each processing level, that is, minimally at the seman- line of evidence. Second, the model is motivated by evidence tic, syntactic, and phonological level. that children, like adults, can rely on shallow processing and Given the important role of prediction in comprehension and underspecified representations during comprehension (e.g., the well-attested similarities between production and comprehen- Gertner & Fisher 2012; Sanford & Sturt 2002). Shallow proces- sion, the idea that prediction should play a role in speech pro- sing, supplemented by contextual information (e.g., tied to seman- duction follows quite naturally. Nevertheless, to us the proposal tic and pragmatic knowledge) may often give children the that speakers predict their utterance plans does not have immedi- appearance of comprehending grammatical constructions they ate appeal. This is because, in everyday parlance, prediction and have not yet mastered (and therefore cannot use effectively in the predicted event have some degree of independence. It is production). The model exhibits this in its better comprehension because of this independence that predictions may or may not performance; through chunking, the model can arrive at an item- be borne out. It makes sense to say a person predicts the out- based “shallow parse” of an utterance, which can then be used in comes of their hand or jaw movements, as these outcomes are conjunction with semantic and pragmatic information to arrive at not fully determined by the cognitive processes underlying the a “good enough” interpretation of the utterance (Ferreira et al. predictions, but depend, among other things, on properties of 2002). On the production side, however, the model – like a child the physical environment that may not be known to the person learning to speak – is faced with the task of retrieving and sequen- planning the movement. Similarly, it makes sense to say that a lis- cing words and chunks in a particular order. Hence, asymmetries tener predicts what a speaker will say because the speaker’s utter- arise from differing task demands, despite the use of the very ances are not caused by the same cognitive processes as those that same statistics and linguistic units during both comprehension lead to the listener’s prediction. Speaker and listener each have and production. their own, private cognition and therefore the listener’s expec- Such an abandonment of the “cognitive sandwich” approach to tations about the speaker’s utterance may or may not be met. acquisition clearly has implications for adult processing. If, as we We can predict our own utterances. For instance, based on suggest and make explicit in our model, children learn to compre- memory of past experience, I can predict how I will greet my hend and produce speech by using the same distributional infor- family. However, such predictions concern overt behavior rather mation and chunk-based linguistic units for both tasks, we would than plans for behavior, and they occur offline rather than in par- expect adults to continue to rely on a unified set of represen- allel with the predicted behavior. Just like predictions about other tations. This is corroborated by studies showing that, like children, persons, my predictions of my own utterances may or may not be adults not only rely on multiword units in production (Janssen & borne out, depending on circumstances not known at the moment Barber 2012), but also use multiword sequences during compre- of prediction. I may, for instance, deviate from my predicted hension (e.g., Arnon & Snider 2010; Reali & Christiansen greeting if I find my family standing on their heads. 2007). This evidence further suggests that prediction-by-associ- Such offline predictions of overt behavior differ from the pre- ation may be more important for language processing than dictions proposed by P&G. In their framework, speakers predict assumed by P&G, not just for children as indicated by our their utterance plans as they plan them, with prediction at each model, but also for adults. It is only by considering how the planning level running somewhat ahead of the actual planning. adult system emerges from the child’s attempts to comprehend Importantly, the predictions are based on the same information and produce linguistic utterances that we can hope to reach a as the predicted behavior, namely, the speaker’s intention

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 39 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

(target article, sect. 3.1, “production command” in Fig. 5) and Abstract: The neurocognitive evidence that Pickering & Garrod (P&G) involve closely related cognitive processes, although the plans cite in favor of positing forward models in speech production is not are more detailed than the predictions and can therefore be compelling. The data to which they appeal either cannot be explained created faster. With respect to the information encoded in both by forward models, or can be explained by a more parsimonious model. representations, plan and prediction will be identical. Discrepan- Pickering & Garrod (P&G) take production commands to be cies can only arise when the plan and/or prediction include a fault. conceptual representations that encode high-level features – This is different from predictions a listener might generate about a “ ’ information about communicative force (e.g., interrogative), speaker s upcoming utterance; no matter how well aligned pragmatic context, and a nonlinguistic situation model” (target speaker and listener are in a dialogue, their intentions are not article, sect. 3.1, para. 3). On their model, a production identical, and their cognitive processes are not shared. command is input directly to the production implementer, P&G invoke prediction during production to support self-moni- which outputs an utterance. But somewhere in between this toring. It is not entirely clear how the monitoring processes would fi input and output, there must be, in addition, an intermediate rep- work and how bene cial they would be. Key properties of the pre- resentation that specifies the low-level features of the utterance, dicted representations are that they are more abstract and that for example, its phonological and phonetic features. In what they are created faster than the speech plans and not necessarily follows, we will call this low-level production command the utter- in the same order. This raises the issues of how it is decided ance plan. In the analogous motor control case, upon which P&G which information to include in the prediction and what to omit base their model, an utterance plan corresponds to a motor and, given that planning and predicting need not follow the command, which specifies the low-level features of the bodily same time course, whether and how the cognitive processes movement, and is output by the inverse model (Wolpert 1997). leading to plans and predictions differ. It is also not clear why pre- We argue here that the evidence that P&G cite in favor of posit- dictions would be more likely to be correct than plans, and how a ing forward models in speech production is not compelling. More speaker detects errors concerning features of the utterance that specifically, the data to which they appeal either cannot be are not included in the predictions. Finally, as P&G point out, explained by forward models, or can be explained by a more par- there is strong evidence for the involvement of forward modeling simonious model, on which the utterance plan and the sensory in motor planning, but there is as yet no empirical evidence feedback are directly compared. On this alternative picture, demonstrating that this approach scales up in the way they envi- there is no need to posit forward models. sion. Finding this evidence is likely to be extremely challenging, P&G appeal to Heinks-Maldonado et al. (2006) to support their as it will, for instance, involve separating the time course of plan- claim that forward modeling is used in speech production. They ning and predicting and the properties of the planned and pre- argue that the suppressed M100 signal in the condition where par- dicted representations. ticipants spoke and heard their own unaltered speech – compared The question of what and when to predict is also relevant for with conditions in which their speech was distorted or they heard prediction during comprehension. P&G assume, correctly in our fi – fi an arti cial voice is the result of the forward model prediction view, that in dialogue there is often not suf cient information in “canceling out” the matching auditory feedback from the utter- the mind of the comprehender to generate reliable predictions ance. They urge that “the rapidity of the effect suggests that at all conceptual and linguistic levels. This results in two possibili- speakers could not be comprehending what they heard and com- ties. One is that predictions are always made, but will often be paring this to their memory of their planned utterance” (sect. 3.1, highly unreliable, creating a need for correction that would not para. 16). While this is indeed implausible, there is an alternative exist in the absence of prediction. The other option is that predic- fi hypothesis that is not ruled out by the data: The attenuation effect tion occurs only if there is suf cient information for making a valid results from a match between the utterance plan and auditory prediction. P&G seem to favor the latter option (“comprehenders ” feedback. Such a comparison would take no more time than the make whatever linguistic predictions they can ; sect. 3.2, para. 1). purported comparison between the forward model prediction However, their model should then specify a mechanism or pro- and auditory feedback. The same point applies to P&G’s discus- cedure that determines when to predict and when not to do so. ’ sion of the datum reported in Levelt (1983) concerning mid- In passing, we note another gap in P&G s proposal: According word self-correction. to P&G, predictions can be generated via an association route Some theorists (e.g., Prinz 2012, p. 238) have insisted that what- and a simulation route. But how are the contributions of the ever states enter into the comparison with sensory feedback must two prediction routes weighted and integrated? What happens if have the same representational format as the feedback. Because these predictions do not fully match? an utterance plan must encode the low-level features of the utter- In sum, we are not convinced that prediction plays a similar ance that it specifies, it arguably meets this criterion. crucial role in speech production as it does in comprehension. P&G also appeal to the results reported in Tourville et al. Whereas my listener can at best guess (predict) what I might (2008), highlighting two features of that study. First, the compen- say next, as a speaker I know perfectly well where I am heading, sation that participants make in response to distorted auditory and plans and predictions cannot be separated. Moreover, the feedback is rapid –“a hallmark of feedforward (predictive) moni- account of prediction in both production and comprehension ” fi toring (as correction following feedback would be too slow) (sect. needs further speci cation of what triggers the decision to 3.1, para. 17). But rapid compensation can only be attributed to predict and of how predictions are derived. forward modeling when the forward model prediction is used in place of the auditory feedback during online control of behavior. The idea is that, by using the putative forward model prediction of Is there any evidence for forward modeling the sensory feedback, the system need not wait for the auditory in language production? feedback. However, this cannot be the case in the experiment conducted by Tourville et al. (2008), because the distorted audi- doi:10.1017/S0140525X1200266X tory feedback is externally induced at random, and therefore unpredictable. Participants must base their compensations on Myrto I. Mylopoulosa and David Pereplyotchikb the distorted auditory feedback itself, since no prediction would aDoctoral Program in Philosophy, City University of New York (CUNY), be available in this type of case. Hence, however rapid their com- Graduate Center, New York, NY 10016; bDepartment of Philosophy, Hamilton pensation, it cannot reflect the operation of forward modeling. College, Clinton, NY 13323. The second feature of the Tourville et al. (2008) study to which [email protected] [email protected] P&G appeal is that “the fMRI [functional magnetic resonance imaging] results identified a network of neurons coding

40 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension mismatches between expected and actual auditory signals” (sect. by-product of speakers’ need to control their overt verbal behav- 3.1, para. 17). But while the fMRI results did identify a network ior. Second, why does inner speech develop so long after overt of neurons that has been shown to be activated when auditory speech (e.g., Vygotsky 1962)? Inner speech develops as the feedback from an utterance is distorted (Fu et al. 2006; Hashi- speaker learns to simulate their verbal behavior, which may lag moto & Sakai 2003; Hirano et al. 1997; McGuire et al. 1996), behind the ability to produce that behavior. And third, how are the further claim that these neurons code mismatches between people able to produce inner speech without actually speaking forward model predictions (“expected” auditory signals) and aloud? If inner speech is simply the offline use of forward actual auditory signals is unwarranted by the available neuroima- models (p^ ! c^), then speakers never need to engage the pro- ging data. All such data are equally consistent with the more duction and comprehension implementers (p ! c) that are the parsimonious hypothesis that these neurons code mismatches traditional generators and perceivers of inner speech. between the utterance plan and the auditory feedback. P&G’s framework would specifically address two more recently Finally, we are skeptical of P&G’s interpretation of the data in demonstrated qualities of inner speech. First, inner speech Tian and Poeppel (2010). The Tian and Poeppel study found acti- involves attenuated access to subphonemic representations. vation in the auditory cortex in two conditions: after participants When people say tongue-twisters in their heads, their reported actually produced a syllable and after they merely imagined pro- errors are less influenced by subphonemic similarities than their ducing that same syllable. Following Tian and Poeppel, P&G reported errors when saying them aloud (Oppenheim & Dell interpret such activation as evidence of forward modeling. 2008; 2010; also Corley et al. 2011, as noted by Oppenheim However, this activation may simply encode a general expectation 2012). For instance, /g/ shares more features with /k/ than with that a sound will be heard, rather than specifically encoding the /v/, so someone trying to say GOAT aloud would more likely anticipated auditory feedback. Moreover, even if this activation slip to COAT than VOTE, but this tendency is less pronounced were shown to be a representation of specific auditory feedback, for inner slips. As P&G note, this finding is predicted if the it does not follow that the activation should be construed as a forward models underlying inner speech produce phonologically forward model prediction. It could instead subserve a mere simu- impoverished predictions (and thus might not reflect the pro- lation of the auditory feedback. In order to determine that this duction implementer). Second, inner speech is flexible enough activation subserves a forward model prediction, as against a to incorporate additional detail. Although inner slips show less mere simulation, we would need evidence that it plays the rel- pronounced similarity effects than overt speech, adding silent evant functional role – that is, it is both based on an efference articulation is sufficient to boost their similarity effect, apparently copy and that it goes on to be compared to auditory feedback. coercing inner speech to include more subphonemic detail Tian and Poeppel provide no evidence for the latter condition. (Oppenheim & Dell 2010). Such flexibility could be problematic Indeed, the relevant kind of forward model should not be operat- for models that assign inner speech to a specific level of the pro- ive in the passive listening condition of Tian and Poeppel’s study. duction process (e.g., Levelt et al. 1999), but P&G’s account So the fact that auditory cortex activations were found to be specifically suggests that forward models simulate multiple similar for listening passively to a sound and imagining producing levels of representation, so it might accommodate the subphone- that sound does not support the claim that the latter reflects mic flexibility of inner speech by adding motoric predictions forward modeling in particular. Finally, Tian and Poeppel’s (p[sem,syn,phon,art]^ ; forward models’ more traditional jurisdic- framing of the issue is itself suspect. They cast forward model pre- tion) that are tied to motor planning. dictions as conscious personal-level states – mental images of a But forward model simulations cannot provide a complete certain kind. One might reasonably doubt that such subpersonal account of inner speech. One would still need to use what P&G states are ever present in the phenomenology that accompanies would call “the production implementer” (target article, sect. 3, speech production. para. 2). First, inner rehearsal facilitates overt speech production (MacKay 1981; Rauschecker et al. 2008; but cf. Dell & Repka 1992), suggesting that some aspects of the production implemen- ter are also employed in inner speech. Second, there is abundant evidence that people easily detect their inner speech errors Inner speech as a forward model? (Corley et al. 2011; Dell 1978; Dell & Repka 1992; Hockett 1967; Meringer & Meyer 1895, cited in MacKay 1992; Oppen- doi:10.1017/S0140525X12002798 heim & Dell 2008; 2010; Postma & Noordanus 1996). But since monitoring is described as the resolution of predicted and actual Gary M. Oppenheim percepts (from forward models and implementers, respectively), Center for Research in Language, University of California San Diego, it is unclear how one could detect and identify inner slips La Jolla, CA 92093-0526. without having engaged the production implementer. (Conflict [email protected] monitoring, e.g., Nozari et al. 2011, within forward models∼goppenheim/ might at least allow error detection, but its use there seems to lack independent motivation, and still leaves the problem of Abstract: Pickering & Garrod (P&G) consider the possibility that inner how a speaker could identify the content of an inner slip.) speech might be a product of forward production models. Here I Third, analogues of overt speech effects are often reported for consider the idea of inner speech as a forward model in light of empirical work from the past few decades, concluding that, while experiments substituting inner-speech-based tasks. For instance, forward models could contribute to it, inner speech nonetheless inner slips tend to create words, just like their overt counterparts requires activity from the implementers. (Corley et al. 2011; Oppenheim & Dell 2008; 2010), and their dis- tributions resemble overt slips in other ways (Dell 1978; Postma & Pickering & Garrod (P&G) argue that coarse predictions from Noordanus 1996). And though inner and overt speech can forward models can help detect errors of overt speech production diverge, they tend to elicit similar behavioral and neurophysiolo- before they occur. This error-detecting function is often assigned gical effects in other domains (e.g., Kan & Thompson-Schill to inner speech (e.g., Levelt 1983; Levelt et al. 1999; Nooteboom 2004), and their impairments are highly correlated (e.g., Geva 1969): the little voice in one’s head, better known for its role in et al. 2011). Though more ink is spilled cautioning differences conscious thought. It is therefore tempting to identify inner between inner and overt speech, similarities between the two speech as a product of these forward models, with p^ ! c^ provid- are the rule rather than the exception (at least for pre-articulatory ing what we know as the internal loop. In fact, conceiving of inner aspects). speech as a forward model could elegantly address three key ques- Given the impoverished character of P&G’s forward models, it tions. First, why do we have inner speech at all? Inner speech is a seems difficult to account for such parallels without assuming a

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 41 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension role for production implementers in the creation of inner speech. intention of an action from a perceptual-motor code should imply Therefore, we could posit that inner speech works much like overt accurate, one-to-one perceptual motor mapping between the goal speech production, recalling P&G’s acknowledgment that offline and its respective kinematics (Kilner 2011). This is evident, for simulations could engage the implementers, actively truncating example, with a specific set of “action-constrained” single-cell the process before articulation; forward models would supply a recordings in monkeys, which fired during grasping for eating necessary monitoring component. This more-explicit account of but not during grasping for placing (Fogassi et al. 2005). Upon inner speech allows us to question P&G’s suggestion that the sub- exploring the various ways in which humans can reach and phonemic attenuation of inner speech might reflect impoverish- grasp, it appears that the kinematics precisely differ with ment of the forward model instead of the generation of an respect to compatibility, or incompatibility, with the goal (e.g., abstract phonological code by the production implementer. drinking vs. passing) (Tretriluxana et al. 2008). Crucially, activity Having clarified the role of forward models as error detection, in the inferior frontal cortex of onlooking human individuals is their suggestion now boils down to the idea that inner slips modulated differentially when a model exhibits different inten- might be hard to “hear.” Empirical work suggests that is not the tions associated with the grasping action (Iacoboni et al. 2005). case. Experiments using noise-masked overt speech (Corley In order to achieve this overall intention, an individual selects the et al. 2011) and silently mouthed speech (Oppenheim & Dell most appropriate movement that is compatible with the purpose 2010) showed that each acts much like normal overt speech in of the action. Within this framework, it is clear that the motor rep- terms of similarity effects (see also Oppenheim 2012). And, by resentations are comparatively stable and can be arranged in a explicitly modeling biased error detections, Oppenheim and limited, pre-wired motor chain that is functionally interpreted in Dell (2010) formally ruled out the suggestion that their evidence terms of motor intention (Rizzolatti & Sinigaglia 2007). for abstraction merely reflected such biases. Thus, better specify- Speech sound representations, however, are highly variable; the ing the role of forward models in inner speech allows the con- linguistic message can be achieved with many speech sounds and, clusion that the subphonemic attenuation of inner speech does more questionably, the same speech units vary with their position have its basis in the production implementer. More generally, within a word (Mottonen & Watkins 2009). In language proces- conceiving of forward models as components of inner speech sing, prediction by a simulation mechanism is plausible for articu- can wed strengths of the forward model account with the fidelity latory representations consisting of a limited set of uniform of implementer-based simulations. elements that mainly differ in their serial positions and require precise selection and timing, and are at least pre-wired in two units (noun and verb) to form a complete sentence. This type of mechanism was reported for the hearing or reading of motor- related words/sentences, for which a growing number of studies Does what you hear predict what you will do have proposed a constant matching of input–output processes, and say? however action-system mediated (Buccino et al. 2005; Pulvermül- ler & Fadiga 2010; Tettamanti et al. 2005). Moreover, several doi:10.1017/S0140525X12002804 studies have documented how a rapid simulation process supports motor-related speech/language in the frontal motor cortex, Mariella Pazzaglia ranging from the spontaneous imagery mechanism of tracking Dipartimento di Psicologia Sapienza University of Rome, 00185 Rome, Italy; articulatory gestures to the complex motor aspects of action IRCCS Fondazione Santa Lucia, 00179 Rome, Italy. verbs or tool words that grant them their meaning (Pulvermüller [email protected] 2005). Hence, the motor counterpart enables the matching of production and comprehension, extending motor-related sound identification to language (the “what” of speech recognition), which in turn leads to predictive coding. Given the role of Abstract: I evaluate the bottlenecks involved in the simulation mechanism motor system, it remains unclear whether language perception underpinning superior predictive abilities for upcoming actions. This occurs in a more general cognitive-motor domain or is an inde- perceptual-motor state is characterized by a complex interrelationship designed to make predictions using a highly fine-tuned and constrained pendent representation interacting with the action system motor operation. The extension of such mechanisms to language may (Fadiga & Craighero 2006). In addition, an intense debate exists occur only in sensorimotor circuits devoted to the action domain. that intimately links language and action at the ontogentic (Bates & Dick 2002) and phylogentic (Toni et al. 2008; Zlatev One of the most influential current theories suggests that higher- 2008) levels. From an evolutionary perspective, studies have pos- order socio-cognitive processes, such as mind/intention reading tulated that language initially evolved from manual gestures in the and the compound aspects of language, may be primarily form of a system of manual skills, pantomime, and protosigns “grounded” in sensorimotor brain (Barsalou 2008). (Arbib 2005; Leroi-Gourhan 1964; 1965). The subsequent con- Inspired by multiple-duty cells originally discovered in the ventionalization of signs and the shift to vocal emblems has monkey (di Pellegrino et al. 1992), neuroimaging studies have enabled the transition to more symbolic, alternative, and open proposed that the human brain is equipped with specific, rapid, systems of communication (Corballis 2009). and automatic mechanisms that share action execution and per- The second bottleneck refers to the controversial neural and ception in a common representational domain (Aglioti & Pazzaglia functional evidence reported by neuropsychological analyses. In 2010; 2011; Van Overwalle & Baetens 2009). According to Picker- brain-damaged patients with significant deficits of execution, the ing & Garrod (P&G), this inherent bidirectional, functional, and ability to predict and understand an observed/heard action may anatomical link seems to monitor the perception of other be spared. Although recent studies indicated a positive correlation agents’ actions through predictive mechanisms. The authors between deficits in perceived and performed actions (Buxbaum suggest that a simulative process might also run an internal gen- et al. 2005; Nelissen et al. 2010), many studies fail to provide erative representation that serves predictions in response to lin- straightforward evidence for direct matching between observed guistic input. Despite nearly two decades of intensive research and executed actions (Hickok 2009a). Precise perceptual-motor on the inextricable link between the perception and execution coding, on which predictions must be planned, would explain of action, there are two key problems with action prediction the input–output associations of the impairment, but not through simulation and, consequently, with its application to the range of dissociations reported at both the group (Cubelli language processing. et al. 2000; Halsband et al. 2001; Negri et al. 2007; Pazzaglia The first bottleneck concerns neurophysiological and cognitive et al. 2008a) or single-case (Pazzaglia et al. 2008b; Rumiati et al. constraints, as revealed by action predictive coding. Inferring the 2001) level. Moreover, the published studies do not clarify

42 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension whether or not neurologic patients are still able to infer the inten- action perspective, rather than in isolation from action, percep- tion of an observed action (Fontana et al. 2012), despite the dis- tion, and interaction dynamics, as most linguistic theories do. ruption in the ability to mentally simulate movements (Pazzaglia P&G’s theory thereby points toward the naturalization of linguis- et al. 2008a). tic communication and might help our understanding of how it Although the dissociations between the neural and functional develops on top of the nonlinguistic “interaction engine” of our aspects of matching among input–output still need to be clarified earlier evolutionary ancestors (Levinson 2006; Pezzulo 2011b). in aphasic and apraxic patients (Pulvermüller et al. 2005; Pazzaglia Second, to explain how interlocutors predict one another and 2013), the plausible implications of anatomical and clinical diver- monitor the ongoing interaction, P&G use the notions of forward gences cannot be ignored. Dissociation, rather than the associ- modeling and prediction-by-simulation (Dindo et al. 2011; Grush ation of neuropsychological deficits in brain-damaged patients, 2004;Wolpertetal.2003). These notions are increasingly adopted continues to be a highly sensitive verification technique that is in cognitive and social neuroscience; language studies could necessary to exclude vitiates and define the reliability boundaries greatly benefit from linking to the same mechanistic framework. of empirically viable theories. Note that P&G’s theory does not overlook the specificities of Therefore, the range of possible dissociations between pro- language processing, and it assumes that such a processing is struc- duction and comprehension, which can occur in both action and tured along multiple levels: semantic, syntactic, and phonological. language, is rather multifarious. In this conception, such dis- Third, P&G provide a theoretically sound motivation for the use sociations are reliant upon higher-order sensorimotor experiences of production processes in (language) comprehension and com- manifesting in the computational brain, namely: the intention to prehension processes in (language) production, dissolving a act; stable memory traces for different types of percepts; and strict production-comprehension dichotomy. P&G’s analysis the ecological and cultural conditions in which gestural, linguistic, links well to a large body of evidence documenting the inter- and affective communication are implemented. Such processes actions between perception and production processes outside could probably also interact with unique, more basic, low-level linguistic communication (Bargh & Chartrand 1999; Frith & motor-resonance mechanisms (Mahon & Caramazza 2008). In Frith 2008; Sebanz et al. 2006a), making it an excellent entry particular, this can include the automatic selection of symboliza- point to study dialogue within an action-based framework. tion on which judgments regarding communication and predic- Fourth, P&G’s theory explains how prediction and covert imita- tions of appropriateness are based. tion increase communication success and produce the automatic P&G discuss and emphasize studies that interweave the pro- alignment of linguistic representations, which they have exten- duction and recognition of actions. However, they too quickly sively documented empirically (Garrod & Pickering 2004; Picker- exclude the limits of prediction via the simulation of action. By ing & Garrod 2004). not looking closely at the crucial roles of the physiological These four reasons notwithstanding, however, we propose not process (whereby predictions emerge through extremely fine- only that producers and comprehenders predict and covertly grained cognitive-motor operations) and the neurologic popu- imitate their interlocutors, but also that they adopt intentional lation (behavioral and anatomical disease is fully dissociable), strategies that make their actions and intentions more predictable the extension of such mechanisms to language may become and comprehensible (D’Ausilio et al. 2012a; Pezzulo 2011c; unwarranted in situations where language does not call on cogni- Sartori et al. 2009; Vesper et al. 2011). In other words, to increase tive-motor representations. A fruitful direction for tracking a com- communication success, we propose that they facilitate another’s plete theory of language processing must not only recognize the predictive (but also perceptual, inferential, attention, and degree to which the processes underlying language and action memory) processes. For example, they can adopt signaling strat- are similar but should also discuss the intertwined and integrated egies to remain predictable and form common ground. aspects of this relationship, at least in a conceptual sense. There are various demonstrations in which producers modulate their behavior (e.g., loudness, choice of words, speech rate) – ACKNOWLEDGMENT depending on contextual factors (e.g., amount of noise, prior Mariella Pazzaglia is supported by the International Foundation knowledge or uncertainty of the comprehender) – to help the com- for Research in Paraplegie (IRP, P133). prehender’s predicting and understanding. A well-known case is the exaggeration of the vowels in child-directed speech (“motherese,” Kuhl et al. 1997). Not only are these modulations used during teach- ing, but also when comprehension is difficult, as is evident to those Intentional strategies that make co-actors finding themselves speaking more loudly and over-articulating in more predictable: The case of signaling noisy environments (the so-called “Lombard effect”). Signaling strategies can be characterized in terms of efficient doi:10.1017/S0140525X12002816 management of resources (e.g., articulatory effort, time) within an optimization framework that optimizes the joint goal of com- Giovanni Pezzuloa,b and Haris Dindoc munication success (Pezzulo 2011c; Pezzulo & Dindo 2011); see aIstituto di Linguistica Computazionale “Antonio Zampolli,” CNR, 56124 Pisa, also Moore (2007). Signaling consists of the intentional modu- Italy; bIstituto di Scienze e Tecnologie della Cognizione, CNR, 00185 Roma, lation of one’s own behavior (e.g., over-articulation) so that, in Italy; cComputer Science Engineering, University of Palermo, 90128 addition to its usual pragmatic or communicative goals (e.g., Palermo, Italy. informing the interlocutor), the performed action fulfills the [email protected] [email protected] additional goal of facilitating the interlocutor’s prediction and monitoring processes (e.g., lowers the uncertainty or cognitive load). Compared to an optimal action, this modulation comes at a “cost” (e.g., a motor cost associated with over-articulation). Abstract: Pickering & Garrod (P&G) explain dialogue dynamics in terms of However, as signaling ultimately helps to maximize the joint forward modeling and prediction-by-simulation mechanisms. Their theory goal of communicative success, it is part of a joint action optimiz- dissolves a strict segregation between production and comprehension ation process and is not (necessarily) altruistic. processes, and it links dialogue to action-based theories of joint action. fi We propose that the theory can also incorporate intentional strategies that Within the optimization framework, a cost-bene t analysis increase communicative success: for example, signaling strategies that determines the decision to signal or not. This implies that signal- help remaining predictable and forming common ground. ing should be more frequent in uncertain contexts, when predic- tion is harder. To assess the theory, we designed a (nonlinguistic) We highly appreciate Pickering & Garrod’s (P&G’s) theory for joint task in which signaling determined a motor cost. We four main reasons. First, P&G address dialogue from a joint reported that the producers’ signaling probability depended on

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 43 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension the comprehenders’ uncertainty; producers stopped signaling The fluency and rapidity with which we make ourselves under- when it was low (Pezzulo & Dindo 2011). As this was the case stood, especially within the context of a dialogue, demands expla- even when producers received no online feedback from the com- nation: It is astounding that speakers alternate with essentially a prehender, we hypothesized that they maintained an internal 0-ms gap between turns (Sacks et al. 1974). In their target model of the comprehender’s uncertainty. Computational and article, Pickering & Garrod (P&G) rise to this challenge and put empirical arguments suggest that the choice of signaling was stra- forward an interesting and cogent framework that addresses this tegic and intentional (although not necessarily conscious) rather pace, built upon an intertwining of production and comprehen- than a by-product of interaction dynamics. sion processes in the service of language as an action system. In repeated interactions, signaling and other forms of sensori- This intertwining is the headline of their proposal, but the real motor communication help in sharing representations and main- explanatory meat lies in how these processes are jointly used: taining a reliable common ground, too (Clark 1996; Pezzulo & the creation and checking of forward models, also known as pre- Dindo 2011; Sebanz et al. 2006a). For example, by emphasizing dictions, about upcoming linguistic events. These predictions the change of topic during a dialogue, a producer can reduce speed comprehension, speed production, and thereby contribute the comprehender’s uncertainty at the level of task represen- to “the remarkable fluency of dialogue” (target article, sect. 5, tations (say, dialogue topics) rather than only relative to the para.1). current utterance and so form a common ground (i.e., “align” We agree that many aspects of language use (especially within the task representations of the interlocutors). Considerations of dialogue) rely heavily on prediction, and in particular rely heavily parsimony apply, also: Although costly to maintain, the common on predictions about observable aspects of language, for example, ground facilitates the continuation of the interaction, as both a speaker’s stops and starts. We therefore understand why P&G interlocutors can use it to predict what comes next. Furthermore, might conclude that language is a form of action and action per- alignment entails parsimony: The shared part need not be main- ception, and why they then afford a central position to forward tained in two distinct forward models (one for each interlocutor), models and their ability to predictively monitor and control but the same forward model can be used as a “production model” actions. But although we certainly concur that prediction plays for one interlocutor and “recognition model” for the other. We an important explanatory role for theories of language, we modeled the interplay between shared task representations and cannot help feeling that the emphasis given to action-based pre- online action predictions using a hierarchical generative (Baye- diction in this model – and prediction in general throughout sian) architecture in which the former provide priors to the much of recent psycholinguistics (Altmann & Mirković2009; latter, and signaling strategies can be used to share task represen- DeLong et al. 2005; Dikker et al. 2009; Hale 2001; Levy tations intentionally (Pezzulo 2011c; Pezzulo & Dindo 2011). 2008) – is overstated. The truly unique and indispensible power Our proposals on signaling and joint action optimization can be of language does not lie in its ability to quickly communicate expressed in the neurocomputational architecture of P&G’s the foreseeable, but rather the unforeseeable; to rapidly transfer theory. For example, producers can modulate their behavior information that is novel, surprising, and unpredictable. In online by using prior knowledge and a forward model of the com- P&G’s example, The day was breezy so the boy went outside to prehender’s comprehension processes (e.g., they can over-articu- fly a kite, the critical phenomenon to explain is not why the last late if the environment is noisy or if they foresee prediction word kite is processed faster and more efficiently than the first errors). In turn, comprehenders can use the feedback channel word day, but rather how the initial phrase, The day was strategically to expose their mental states and uncertainty by breezy, is understood at all, given its completely unpredictable using a forward model of the producer’s prediction and monitor- location halfway through a paper on psycholinguistic theory. ing process. Furthermore, producers can use offline predictions Unfortunately, this phenomenon is left unexplained by the frame- (briefly mentioned in P&G’s theory) to maintain a model of the work of P&G, as it is not directly related to prediction or action comprehender’s prior knowledge and uncertainty, and to perception. No amount of forward modeling can produce the foresee the long-term communicative effects of their intended meaning of this initial phrase, as this meaning is not predictable messages (a form of recipient design). from the preceding context in any substantive way. By incorporating these (and similar) mechanisms, P&G’s theory By ignoring this crucial, and to our minds primary, function of can explain intentional strategies such as signaling that – we language – extracting meaning from novel expressions – P&G do argue – act in concert with automatic processes of alignment not allow their framework to get off the ground. As their examples and mutual imitation to facilitate prediction, align representations, testify, their model works well when predictions about time t+1 and form common ground. are generated during the last stages of a sentence. We see little evidence, however, that their model can explain what happens when t = 0, at the beginning of a sentence: Prediction relies on context, and within P&G’s prediction-centric framework, there is no provision for the initial creation of a context. Prediction is no panacea: The key to language Ultimately, we think that solving this problem requires P&G to is in the unexpected drop, or at least substantially soften, their characterization of language comprehension as a form of action perception. Under- doi:10.1017/S0140525X12002671 standing linguistic expressions goes far beyond perceiving the

a b actions by which they are delivered, and often, as in the case of Hugh Rabagliati and Douglas K. Bemis reading, there are no actions to be perceived at all. Neurologically, aDepartment of Psychology, University of Edinburgh, Edinburgh EH8 9JZ, this dissociation between perception and understanding is clearly b Scotland, United Kingdom; INSERM-CEA Cognitive Neuroimaging Unit, demonstrated by transcortical sensory aphasia (Boatman et al. CEA/SAC/DSV/DRM/Neurospin Center, F-91191 Gif-sur-Yvette Cedex, 2000; Lichtheim 1885), where patients can repeat words (i.e., France. use perception and production) without understanding them. [email protected] [email protected] Language, then, cannot simply be an action system but rather a system capable of productively transforming incoming perceptual elements into complex internal mental representations that Abstract: For action systems, the critical task is to predict what will convey meaning. To their credit, P&G recognize this problem to a certain degree happen next. In language, however, the critical task is not to predict the “ fi next auditory event but to extract meaning. Reducing language to an and include well-de ned levels of linguistic representation, such action system, and putting prediction at center, mistakenly marginalizes as semantics, syntax, and phonology” (target article, sect. 1.3, our core capacity to communicate the novel and unpredictable. para. 9) in their proposed cognitive architecture. However, it is

44 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension unclear how these levels operate within an action/action percep- Although memory processes are not explicit in P&G’s frame- tion system, as P&G do not specify whether their attempt to work, the model invokes the maintenance and evaluation of mul- “reject the cognitive sandwich” (sect. 1.2, para. 3) entails collap- tiple predictions and percepts, and relies on the retrieval of sing action, perception, and cognition into one system (as contextual information to create forward, anticipatory models of Hurley [2008a] proposed), or just action and perception. Either individuals’ linguistic and nonlinguistic actions. Memory and way, linguistic representations are too marginalized within the other cognitive functions are presumably an important part of model and require considerable elaboration to capture the rich these processes. A large body of work has investigated how communicative possibilities of human language. The insistence language processing interfaces with other cognitive abilities but, that language is only an action system leaves P&G with a model like most psycholinguistic research, this has progressed mainly that, although possibly eliminating the “cognitive sandwich,” independently in studies of language comprehension and pro- limits any explanation of the core function of language. duction. Despite this divide, recent work is converging on We believe that accounts of language must first and foremost similar conclusions about the types of nonlinguistic cognitive explain the understanding of novel expressions. In other words, systems that are critically involved in language production and it is not the primary function of language to align turns in a dialo- comprehension. This suggests that the role of these cognitive gue by facilitating the comprehension of predictable words, but systems might fruitfully be included in the forward modeling pro- rather to enable a listener to understand the meaning of a cesses advocated in P&G’s framework. We highlight how a few speaker. Any model of language must conform to this prioritiza- aspects of this framework might draw on other cognitive systems. tion and place understanding at the center, flanked by supporting Generating a prediction (of one’s own or another’s speech) processes such as prediction. relies heavily on memory processes. Indeed, anticipating how an To be sure, the type of forward models proposed by P&G may utterance or a discourse will unfold necessarily depends on the still play an important role within such a framework as control rapid coordination of considerable linguistic and contextual evi- systems. In the same way that forward models can help explain dence (Altmann & Kamide 1999; Tanenhaus 2007). To predict how a dancer completes a complex fouetté en tournant without effectively (and hence avoid confusion or misinterpretation), tumbling over, they can help explain the surprisingly error-free current input must be linked to representations in working execution of complex, rapid, interlaced dialogues. But just as we memory and in a longer-term store of prior experience. Moreover, would not expect theories of motor control to explain acts of individuals must be able to update and override these represen- motor creativity (such as how a dancer improvises), we should tations as new input is encountered moment by moment. not expect an analogous theory to explain the core creative In the case of prediction-by-association, language users must aspects of language: the algorithms by which an entirely unex- retrieve situation-relevant information and schemas from pected sentence can be integrated and understood, or by which memory as well as encode relevant information for use in future a complex novel thought becomes articulated as a sentence. associative predictions. It is, therefore, unsurprising that the In sum, we do not doubt that people make predictions during ease with which interlocutors can successfully encode and retrieve language use, quite possibly through the construction and evalu- relevant associations in memory relates to how successfully they ation of forward models. We just do not believe that these predic- can align their discourse models, both in terms of the utterance tions comprise the foundation stones of a psychological theory of choices that speakers make (Horton & Gerrig 2005) and the communication. Rather, we believe psycholinguists should focus interpretations that listeners reach (Brown-Schmidt 2009). on the representations these forward models are computed Prediction-by-simulation, too, likely relies on memory pro- over, the representations that allow creative linguistic thought. cesses. For example, the accessibility of information in memory influences how and when information is produced (Slevc 2011), and because prediction-by-simulation relies on internal pro- duction mechanisms, memory-based accessibility must also influ- ence the prediction of others’ speech. This is indeed the case. For example, anaphor resolution is sensitive to the cognitive promi- Memory and cognitive control in an integrated nence of antecedents (Cowles et al. 2007), and more accessible theory of language processing syntactic structures are easier to parse (Branigan et al. 2005). Additionally, irrelevant information active in memory can inter- doi:10.1017/S0140525X12002683 fere with both production (Slevc 2011) and parsing (Fedorenko et al. 2006), which in some cases could be construed as interfer- L. Robert Slevca and Jared M. Novickb ence with one’s successful prediction of upcoming material in aDepartment of Psychology and bCenter for Advanced Study of Language, real time. University of Maryland, College Park, MD 20742. In a sense, memory underlies the generation of predictions – [email protected] [email protected] linguistic and otherwise – and, conversely, it is when predictions are not met that linguistic information is better learned or encoded into memory (e.g., Chang et al. 2006). There is, there- Abstract: Pickering & Garrod’s (P&G’s) integrated model of production fore, a tight linkage of memory and language processes; in fact, and comprehension includes no explicit role for nonlinguistic cognitive the processes of forward modeling involved in language proces- processes. Yet, how domain-general cognitive functions contribute to sing may even be the foundation for much of our verbal language processing has become clearer with well-specified theories and supporting data. We therefore believe that their account can benefitby memory ability (cf. Acheson & MacDonald 2009). incorporating functions like working memory and cognitive control into But it is not just the act of generating predictions that relies on a unified model of language processing. nonlinguistic cognitive processes. Another crucial component of P&G’s model is monitoring, that is, comparing predicted to Pickering & Garrod (P&G) offer an integrated model of language observed utterance precepts. This comparison presumably processing that subsumes production and comprehension into a involves a process of detecting mismatch (or conflict) and resol- single cognitive framework, treating language as a form of ving any discovered incompatibility. Mounting evidence suggests action and action perception (cf. Clark 1996). This model draws that conflict detection and its resolution via cognitive control on work linking prediction to language comprehension (e.g., plays an important role in both language comprehension and pro- Rhode et al. 2011) and production (e.g., Dell et al. 1997), and duction (Novick et al. 2009). During comprehension, conflict is a fits with the more general idea that we interpret our world not natural by-product of incremental parsing: When late-arriving evi- only by analyzing incoming information, but also by initiating dence is inconsistent with a reader’s or listener’s current represen- proactive processes of prediction and expectation (Bar 2009). tation of sentence meaning, conflict resolution and cognitive

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 45 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension control functions deploy to revise earlier processing commitments In general terms, we question the rationale behind the proposal (Novick et al. 2005). Presumably, this applies to the monitoring that incomplete representations constitute the basis of speech function as well: Conflict resolution processes must adjudicate monitoring. A crucial aspect of P&G’s model refers to timing. when an utterance precept is inconsistent with a speaker’s or lis- Because forward representations are computed faster than the tener’s expectation. actual representations that will be used to produce speech, the Linguistic conflict resolution functions depend on the involve- former serve to correct potential deviations in the latter represen- ment of the left inferior frontal gyrus (IFG), an area recruited tations. To ensure that the forward representations are available when conflict must be resolved during nonlinguistic memory earlier than the implemented representations, P&G propose tasks (Jonides & Nee 2006). If conflict resolution underlies pro- that the percepts constructed by the forward model are impover- cessing in a shared production/comprehension system, then defi- ished, containing only part of the information necessary to cits in these conflict resolution functions (e.g., in patients with produce speech. But how can speech monitoring be efficient if circumscribed lesions to the left IFG) should yield both expressive it relies on “poor” representations to monitor the “rich” represen- and receptive language deficits when linguistic representations tations? For instance, a predicted syntactic percept could include conflict. This is indeed the case: Such patients are known to grammatical category, but lack number and gender information. have selective memory impairments when conflict/interference In this example, it is evident that if the slower production imple- demands are high (Hamilton & Martin 2007; Thompson-Schill menter is erroneously preparing a verb instead of a noun, the pre- et al. 2002), and they also suffer concomitant production and com- dicted representation coming from the forward model might prehension impairments under similar conditions (Novick et al. indeed serve to detect and correct the error prior to speech 2009). proper. However, if the representation prepared by the pro- In sum, we believe that an important extension of P&G’s model duction implementer contains a number or gender error, given is to consider how language processing interfaces with other cog- that this information is not specified in the predicted percept (in nitive systems such as working memory and cognitive control. This this example), then how do we avoid these errors when speaking? raises a number of questions; for example, how general or specific If the predicted language percepts are assumed to always be are the cognitive systems involved in prediction and monitoring? incomplete in order to be available early in the process, it is If domain-general, which domain-general mechanisms are truly remarkable that an average speaker produces only about 1 involved – for example, what are the roles of implicit and explicit error every 1,000 words (e.g., Levelt et al. 1999). Therefore, memory, and do other executive functions contribute? Consider- although prediction likely plays an important role in facilitating ation of these types of issues is likely to lead toward a more fully the retrieval of relevant language representations (e.g. Federme- integrated theory of language processing and of cognitive function ier 2007; Strijkers et al. 2011) and hence could also serve to aid the more generally. speech monitor, a proposal that identifies predictive processes with the inner speech monitor seems problematic or at least underspecified for now. Besides the above theoretical concern regarding the use of incomplete representations as the basis of speech monitoring, The poor helping the rich: How can incomplete also the strength of the evidence cited to support the use of representations monitor complete ones? forward modeling in speech production seems insufficient at present. The three studies discussed by P&G to illustrate the doi:10.1017/S0140525X12002695 usage of efference copies during speech production (i.e., Heinks-Maldonado et al. 2006; Tian & Poeppel 2010; Tourville Kristof Strijkers,a Elin Runnqvist,b Albert Costa,c and et al. 2008), demonstrate that shifting acoustic properties of lin- Phillip Holcombd guistic elements in the auditory feedback given to a speaker pro- aLaboratoire de Psychologie Cognitive, CNRS and Université d’Aix-Marseille, duces early auditory response enhancements. Although these data 13331 Marseille, France; bDepartamento de Psicología Básica, Universitat de are suggestive and merit further investigation, showing that reaf- Barcelona, Barcelona 08035, Spain; cUniversitat Pompeu Fabra, Center for ference cancellation generalizes to self-induced sounds does not Brain and Cognition, ICREA, Barcelona 08018, Spain; dDepartment of prove that forward modeling is used for language production Psychology, Tufts University, Medford, MA 02155. per se. It merely highlights that a mismatch between predicted [email protected] [email protected] and actual self-induced sounds (linguistic or not) produces an [email protected] [email protected] enhanced sensorial response just as in other domains of self- induced action (e.g., tickling). As for now, no study has explored Abstract: Pickering & Garrod (P&G) propose that inner speech monitoring whether auditory suppression related to self-induced sounds is is subserved by predictions stemming from fast forward modeling. In this commentary, we question this alignment of language prediction with the also sensitive to purely linguistic phenomena (e.g., lexical fre- inner speech monitor. We wonder how the speech monitor can function quency) or to variables known to affect speech monitoring (e.g., so efficiently if it is based on incomplete representations. lexicality). This leaves open the possibility that the evidence cited only relates to general sensorimotor properties of speech Pickering & Garrod’s (P&G’s) integrated account of language pro- (acoustics and articulation) rather than the monitoring of language duction and comprehension brings forward novel cognitive mech- proper. anisms for a range of language processing functions. Here we In addition, the temporal arguments put forward by P&G to would like to focus on the theoretical development of the conclude that these data cannot be explained by comprehen- speech monitor in P&G’s theory and the evidence cited in sion-based accounts and instead support the notion of speech support of it. The authors propose that we construct forward monitoring through prediction are premature. For instance, models of predicted percepts during language production and P&G take the speed with which self-induced sound auditory sup- that these predictions form the basis to internally monitor and if pression occurs (around 100 ms after speech onset) as an indi- necessary correct our speech. This view of a speech monitor cation that speakers could not be comprehending what they grounded in domain-general action control is refreshing in many heard and argue that this favors a role of forward modeling in ways. Nevertheless, in our opinion it raises a general theoretical speech production. But, the speed with which lexical represen- concern, at least in the form in which it is implemented in tations are activated in speech perception is still a debated issue P&G’s model. Furthermore, we believe that it is important to and some studies provide evidence for access to words within emphasize that the evidence cited in support of the use of 100 ms (e.g., MacGregor et al. 2012; Pulvermüller & Shtyrov forward modeling in speech monitoring is suggestive, but far 2006). In a similar vein, P&G rely on Indefrey and Levelt’s from directly supporting of the theory. (2004) temporal estimates of word production to argue in favor

46 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension of speech monitoring through prediction. However, this temporal would vary as a consequence of talker identity. Consider, for map is still hypothetical, especially in terms of the latencies example, an eye-tracking experiment from our laboratory between the different representational formats (see Strijkers & showing rapid comprehension of a regional accent (Trude & Costa 2011). More generally, one may question why P&G Brown-Schmidt 2012). In this study, participants heard two choose to link the proposed model, intended to be highly dynami- talkers: a male with American English dialect in which /æ/ raises cal (rejecting the “cognitive sandwich”), with temporal data to /eɪ/ only before /g/ (e.g., tag [teɪg], but tack [tæk]), and a embedded in fully serial models. Indeed, if one abandons the female without this shift. On critical trials, participants viewed strictly sequential time course of such models and instead allows four images: a /k/-final target (e.g., tack), a /g/-final cohort compe- for fast, cascading activation of the different linguistic represen- titor (e.g., tag), and two unrelated pictures. Participants clicked on tations, not only do the arguments of P&G become problematic, the image named by one of the talkers. The results indicated that but also the notion of a slow production/comprehension imple- when listening to the male talker, participants fixated tag less upon menter being monitored by a fast and incomplete forward hearing tack. Participants ruled the competitor out more quickly model loses a critical aspect of its theoretical motivation. because the way that the male talker would have pronounced its To sum up, we believe that theoretical development of the vowel was not consistent with the input. Hence, we observed speech monitor in P&G’s integrated account of language pro- that listeners mentally represented an unfamiliar regional accent duction and comprehension faces a major challenge since it and used their knowledge rapidly enough to influence processing needs to explain how representations that lack certain dimensions of a single word (see also Dahan et al. 2008). of information can serve to detect and correct errors to such a P&G argue that listeners rely more on simulation when the high – almost errorless – degree. Furthermore, it is important to talker is similar to them; however, the theory does not specify acknowledge that as it stands, the evidence used in support of what degree of similarity is necessary for listeners to be able to this proposal could just as easily be reinterpreted in other use prediction-by-simulation during comprehension and when terms, highlighting the need of direct empirical exploration of prediction-by-association is necessary. Additionally, for this P&G’s proposal. model to generate testable hypotheses, it must specify how listen- ers assess the input in order to determine whether to use simu- lation or association, and the basis and frequency on which they update their assessments. According to P&G, context is deter- mined using “information about differences between A’s speech When to simulate and when to associate? system and B’s speech system” (target article, sect. 3.2, para. 3; Accounting for inter-talker variability in the Fig. 6 caption), suggesting that listeners engage in a comparison speech signal process. However, the details of that process, and how it aligns with current theories of speech perception, are unclear. doi:10.1017/S0140525X12002701 At a phonological level, it is possible that our participants would have been able to use the simulation route to predict the male’s Alison M. Trude vowel shift since, as native English speakers, the vowel /eɪ/is Department of Psychology, University of Illinois at Urbana–Champaign, part of their own phonological system. It could also be the case, Champaign, IL 61820. however, that our participants used the association route since [email protected] their own phonological representation of tag includes an /æ/, rather than an /eɪ/. At the same time, because our talkers and Abstract: Pickering & Garrod’s(P&G’s) theory could be modified to participants were all American English speakers, their speech describe how listeners rapidly incorporate context to generate predictions was quite phonologically similar. Would this similarity have about speech despite inter-talker variability. However, in order to do so, allowed our participants to use the simulation route most of the the content of “impoverished” predicted percepts must be expanded to include phonetic information. Further, the way listeners identify and time, perhaps switching to association only for the critical vowel represent inter-talker differences and subsequently determine which shift? Considering that the two talkers alternated randomly in prediction method to use would require further specification. our study, and that certain features of their speech may have been more or less like those of a given participant at different A hallmark of speech perception is that despite inter-talker variabil- points in a single word, it seems as if it would have been necessary ity on dimensions including rate, pitch, and phonetic variation due to for the participant to constantly re-evaluate which prediction accents, comprehension usually proceeds quickly and with little con- route was more suitable from moment to moment. This process scious effort. P&G’s theory provides a potential means of accommo- would likely be too slow to implement and still produce dating this variability by including context in the inverse model the rapid online adaptation effects that we, and others, have during comprehension; however, although the theory is potentially observed. compatible with findings on talker adaptation, in order to generate A further question is whether listeners’ derived production testable hypotheses in this domain, it must more precisely specify commands are actually the same representations governing what information is included in listeners’ predictions and how listen- overt imitation in cases in which the talker’s and listener’s ers assess talkers’ speech to determine which prediction route is speech vary. In the model, the listener’s derived production more appropriate for the current input. command is generated after the percept has passed through the According to P&G’s model, listener-generated predictions are inverse model (which includes context), and therefore should impoverished, which suggests that they do not include fine- include information about the talker’s voice; however, it appears grained phonetic detail. However, a large body of research that this command is also used to generate overt imitation. If so, shows that not only do listeners use fine-grained acoustic-phonetic it seems that listeners should be able to imitate whatever features details online while processing speech (McMurray et al. 2009; of the talker’s speech they are able to predict (except when phys- Salverda et al. 2003), but that this phonetic detail can also affect iological differences prevent them from doing so). However, there listeners’ subsequent productions (Nielsen 2011). If P&G wish are many cases in which a listener may understand a talker’s to capture how listeners become phonetically aligned, allowing speech without readily producing certain features of it (Mitterer improved perception of an individual’s speech over time, the defi- & Ernestus 2008). The use of impoverished representations nition of “context” must be expanded to include phonetic details, during comprehension could explain listeners’ failure to overtly as well as listeners’ previous experiences with a particular talker or imitate these features; however, the fact that listeners can use group of talkers. fine-grained acoustic detail during comprehension seems at odds Another question raised by the model is how listeners’ use of with this explanation. Furthermore, it has been shown that listeners the prediction-by-simulation and prediction-by-association routes can accommodate sub-phonemic features during imitation (Nielsen

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 47 Commentary/Pickering & Garrod: An integrated theory of language production and comprehension

2011), though they may do so to varying degrees (Babel 2012). different from their partner, as in adult-child conversations, and Therefore, it would seem the theory needs a mechanism account- in reading, where the addressee does not typically speak. In ing for the dissociation between listeners’ use of phonetic detail what follows, we argue that with both familiar and unfamiliar part- during comprehension and production. ners, and in talking and reading alike, language users make soph- In conclusion, P&G’s theory can potentially explain talker- isticated predictions about future language use, based on available specific adaptation during comprehension because it allows a contextual information. role for context while making predictions about a talker’s P&G argue that an addressee using the simulation mechanism speech. Furthermore, the rapid generation and implementation will predict an utterance using an inverse model and contextual of representations is consistent with work using online methods information – a prediction about what the speaker would say. that show talker-specific adaptation over the course of a single P&G do not specify the details of this contextual information. word. However, there are many open questions that remain However, to predict what the speaker would say requires assessing about how listeners represent and predict the acoustic features the speaker’s context, where context must be broadly defined to of individuals’ speech that must be addressed to make this a include both local perceptual information as well as historical useful model of talker adaptation. information about the person’s dialect and past experience. After all, depending on a person’s age and regional dialect, a ACKNOWLEDGMENT given semantic meaning might be expressed as great, rad,or The writing of this commentary was supported by a National wicked. Similarly, in cases where a potential alternative referent Science Foundation Graduate Research Fellowship, no. is occluded from the speaker’s but not the addressee’s view 2010084258. (e.g., the speaker sees one cup whereas the addressee sees two), the context would predict different referential expressions depending on which perspective was used. We read P&G as saying that in each of these cases, on the simulation mechanism, What is the context of prediction? the addressee would predict what the speaker will say from the speaker’s perspective, predicting “wicked” or “the cup,” even doi:10.1017/S0140525X12002713 though “wicked” might connote a negative valance to the addres- see (rather than the intended positive valance), and even though Si On Yoona and Sarah Brown-Schmidtb “the cup” would be ambiguous from the addressee’s perspective. aDepartment of Psychology, University of Illinois, Champaign, IL 61820; Therefore, even though the prediction is executed using the bDepartment of Psychology, Beckman Institute for Advanced Science and addressee’s production system, the entire prediction process Technology, University of Illinois, Champaign, IL 61820. would have to be tailored to the addressee’s beliefs about the [email protected] [email protected] speaker’s context. In our view, the process would be no different on an association ’ ’ Abstract: We agree with Pickering & Garrod s (P&G s) claim that theories view in which addressees predict based on previous perceptual of language processing must address the interconnection of language experience. P&G propose that association occurs when interlocu- production and comprehension. However, we have two concerns: First, the central notion of context when predicting what another person will tors are dissimilar or the production system is not engaged (e.g., in say is underspecified. Second, it is not clear that P&G’s dual-mechanism reading). However, even when unfamiliar interlocutors have model captures the data better than a single-mechanism model would. different perspectives, overwhelming evidence now suggests that listeners do not progress egocentrically, but instead take We agree with Pickering & Garrod’s (P&G’s) claim that models of into account information about their partner’s context (e.g., language use must take into account the fact that production and Hanna et al. 2003; Heller et al. 2008). Similarly, as in live conver- comprehension processes are interwoven in time and intercon- sation, readers make rapid predictions when reading (Federmeier nected as in the case of split turns. Indeed, the most basic form & Kutas 1999) and tailor referential interpretation based on rep- of language use is arguably conversation, in which interlocutors resentations of the number of entities in the discourse context act both as producers and addressees, and each type of act (Greene et al. 1992; Nieuwland et al. 2007). Hence, it would involves elements of the other. seem that prediction-by-association would have to be tailored to The importance of examining dialogic processes has become the particular context of language use, just as in simulation. apparent following a surge of interest in studying language in What advantage, then, is gained by positing a second mechanism natural settings (Pickering & Garrod 2004; Trueswell & Tanen- to prediction? P&G suggest association might not afford rapid haus 2005). This interest has generated new, significant findings turn-taking, however this seems less of an argument to posit this in conversation including development of techniques for indepen- mechanism than an argument against it, given that turn-taking dently quantifying and predicting the degree of coordination in is, in fact, rapid. P&G also suggest that individuals might choose conversation (Richardson et al. 2007), as well as evidence that to use either association, simulation, or a combination of the coordinated contextual representations facilitate use of potentially two; however, it is unclear how these decisions would be made, ambiguous referential expressions (Brown-Schmidt & Tanenhaus and how the outputs of these mechanisms would be integrated 2008). Similarly, experiments in noninteractive settings increas- during real-time processing. ingly focus on dialogue-relevant questions such as how represen- In conclusion, we applaud P&G’s emphasis on the way pro- tations of others’ mental states guide processing (Ferguson et al. duction and comprehension are interwoven in natural communi- 2010). cation. However, in emphasizing the remarkable skill needed to A central feature of language use is that it is produced and produce, for example, a split turn, the authors overlook potential understood with respect to a particular context that constrains redundancy in the dual-mechanism proposal. Indeed, the evi- both what we say and how we say it. In conversation, the basic dence suggests that in a large variety of circumstances, interlocu- context is often assumed to be the interlocutors’ common tors integrate context and common ground into processing ground (Clark 1996) with conversational efficiency increasing as predictions. The accuracy, speed, and type of prediction seem common ground grows (Wilkes-Gibbs & Clark 1992). According to be determined largely by factors such as the quality of the lis- to P&G’s proposal, context plays a key role in circumscribing pre- tener’s estimation of the speaker’s context, and whether attending diction and, as a result, conversational efficiency. to one’s partner’s context is relevant to the communicative goals According to P&G, listeners predict upcoming utterances using (Yoon et al. 2012). Hence, the determining factor in the quality one of two mechanisms, simulation and association. Listeners of prediction should be seen as context-modeling, rather than a use the simulation mechanism with familiar partners, and the decision to use one mechanism in a hypothesized processing association mechanism when they perceive themselves as being architecture. Understanding the mechanisms that determine the

48 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Response/Pickering & Garrod: An integrated theory of language production and comprehension context of conversation, and the degree to which the contexts of R1. Production, comprehension, and the the speaker and listener are coordinated, then, would seem to “cognitive sandwich” be a central goal for understanding dialogic processes. Our account has the overall goal of integrating production ACKNOWLEDGMENTS and comprehension. Our specific models in Figures 5–7 Preparation of this article was partially supported by NSF grant (target article) are incompatible with traditional separation no. BCS 10-19161 to S. Brown-Schmidt. Thank you to Jennifer of production and comprehension as shown in Figure R1 Roche and Kara Federmeier for helpful discussions. (repeated here from the target article). Some commenta- tors addressed the question of whether our proposal leads to a radical rethinking of the relationship between the two. In fact, there are two issues concerning Figure R1.One is the extent to which production and comprehension are Authors’ Response separate. In terms of the cognitive sandwich, are there two pieces of bread (rather than a wrap)? Our proposal is that instances of production involve comprehension pro- cesses (Fig. 5) and that instances of comprehension Forward models and their implications for involve production processes (Fig. 6). But production and production, comprehension, and dialogue comprehension processes are nevertheless distinct – pro- duction involves mapping from intention to sound, and doi:10.1017/S0140525X12003238 comprehension involves mapping from sound to intention. a b The second issue is whether the bread is separate from the Martin J. Pickering and Simon Garrod fi aDepartment of Psychology, University of Edinburgh, Edinburgh EH8 9JZ, lling. In other words, what is the relationship between United Kingdom; bUniversity of Glasgow, Institute of Neuroscience and production/comprehension and nonlinguistic mechanisms Psychology, Glasgow G12 8QT, United Kingdom. (thinking, general knowledge)? We propose that at least [email protected] some aspects of general knowledge can be accessed [email protected] during production and during comprehension, and more-]PickeringMartin∼simon/ over that interpreting the intention involves general knowl- edge. These aspects of general knowledge of course draw Abstract: Our target article proposed that language production on a variety of cognitive functions such as memory and and comprehension are interwoven, with speakers making conflict resolution (see Slevc & Novick). predictions of their own utterances and comprehenders making By interweaving production and comprehension, we ’ predictions of other people s utterances at different linguistic have proposed that our account is incompatible with the levels. Here, we respond to comments about such issues as traditional “cognitive sandwich” (Hurley 2008a) – an archi- cognitive architecture and its neural basis, learning and tecture in which production and comprehension are iso- development, monitoring, the nature of forward models, communicative intentions, and dialogue. lated from each other. Because production is a form of action, the use of production processes during comprehen- Our target article proposed a novel architecture for langu- sion means that comprehension involves a form of embodi- age processing. Rather than isolating production and com- ment (i.e., uses action to aid perception). We also suggested prehension from each other, we argued that they are that our account may be compatible with embodied closely linked. We first claimed that people predict their accounts of meaning (e.g., Barsalou 1999; Glenberg & own and other people’s actions. In a similar way, we Gallese 2012). In contrast, Dove argues that we assume argued that speakers predict their own utterances and com- different intermediate and disembodied levels of represen- prehenders predict other people’s utterances at a range of tation (e.g., syntax, phonology) that are not grounded in different linguistic levels. modality-specific input/output systems. We agree that our The commentators made a wide range of perceptive account is not compatible with this form of embodiment, points about our account, and we thank them for their and we accept his conclusion that we retain some amodal input. We have divided their arguments into seven sections. representations but abandon a form of modularity. We first respond to comments about the relation between Our proposal for the relationship between production production, comprehension, and other cognitive processes; and comprehension runs counter to traditional interactive in the second section, we turn to questions about the neural accounts which assume that cascading and feedback basis for our account. We said very little about learning and occur during production. For example, Dell (1986) development in the target article, and our third section assumed that a speaker activates semantic features and responds to those commentators who considered its impli- that activation cascades to words associated with those fea- cations for these issues. In the fourth section, we address tures and sounds associated with those words, and that acti- the more technical issue of the nature of the represen- vation then feeds back from the phonemes to the activated tations created by forward modelling and how they are words and to other words involving those phonemes. compared with implemented representations during moni- Because this process involves several cycles before acti- toring. We respond, in the fifth section, to the commenta- vation settles on a particular word and set of phonemes, tors who remarked on the nature of prediction-by- Dell regards early stages of this process as involving predic- simulation and its relationship to prediction-by-association. tion. In his very interesting proposal, cascading and feed- Finally, in sections six and seven we look at broader ques- back are internal to the implementer and are causal in tions relating to the scope of the account: the nature of bringing about the (implemented) linguistic represen- communicative intentions, and the implications of the tations. This also seems to be the position adopted by account for dialogue. Mylopoulos & Pereplyotchik, who replace our forward

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 49 Response/Pickering & Garrod: An integrated theory of language production and comprehension

Figure R1. A traditional model of communication between A and B (repeated from target article). model with an utterance plan internal to the implementer, feedback. Hartsuiker also discusses a patient who detects and by Mani & Huettig, who regard it as a third route to phonemic errors in others but who does not repair his prediction. In contrast, our predicted representations are own (frequent) phonemic errors. It is possible that proprio- the result of the efference copy of the production ceptive monitoring is disturbed (and that such monitoring command, and they are therefore separate from the imple- is particularly important for repairing his phonemic menter. Our account of monitoring thus involves compar- errors), but outer-loop monitoring is preserved. ing two separate sets of representations and so is very Other commentators point out that the classical different from both Dell and Mylopoulos & Pereplyotchik. Lichtheim–Broca–Wernicke neurolinguistic model is Bowers’ discussion of Grossberg (1980) also appears inconsistent with much recent neurophysiological data. In similar to Dell’s proposal and usefully demonstrates imple- fact, we believe that their proposals are likely to be quite menter-internal prediction. He also queries why forward close to ours. Hickok assumes a dorsal stream which sub- modelling should speed up processing when its output is serves sensorimotor integration for motor control (i.e., pro- subsequently compared with the output of a slower imple- duction) and a ventral stream which links sensory inputs to menter. The reason is that the comparison can be made as conceptual memory (i.e., comprehension). Both systems soon as the implementer’s output is available (at any level). make use of prediction, with motor prediction facilitating Otherwise, it is necessary to analyse the implementer’s production and sensory prediction facilitating comprehen- output (as in comprehension-based monitoring; Levelt sion. Hickok’s position may not be so distinct from ours if 1989). we equate motor prediction with prediction-by-simulation and sensory prediction with prediction-by-association. However, unlike Hickok, we suggest that both streams R2. The neuroscience of production– may be called upon during comprehension. Dick & comprehension relations Andric also argue against the classical neurolinguistic model in favour of a dual-stream account. They suggest A number of commentators consider our account in that the motor involvement in speech perception may relation to neuroscientific evidence. Some of this evidence only be apparent in perception under adverse conditions concerns monitoring deficits associated with particular (see sect. R4 for a fuller discussion on this point). Finally, aphasias. Hartsuiker discusses a patient who cannot com- Alario & Hamamé point out additional evidence for prehend familiar sounds, words, or sentences, but who is forward modelling in production (e.g., Flinker et al. nevertheless able to correct some of her own phonemic 2010). We accept that current evidence does not allow us speech errors. He argues that such a patient would be to determine the neural basis of the efference copy, and incapable of monitoring her own speech by comparing we hope that this will be a target for future research. the output of the implementer (utterance percept) with a forward model prediction (predicted utterance percept). However, it is difficult to draw clear conclusions from R3. Learning and development such cases without knowing the precise nature of the defi- cits. For example, this patient might monitor propriocep- Our target article did not explicitly discuss the role of tively to correct errors. In relation to this, Tremblay et al. forward modelling in language learning and development. (2003) found that people can adapt their articulation to However, we certainly recognise that our account is rel- external perturbations of jaw movements during silently evant to the acquisition of language production and com- mouthed speech. In other words, they monitored and cor- prehension, particularly in relation to their fluency. In rected their planned utterances in the absence of auditory fact, both forward and inverse models were first introduced

50 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Response/Pickering & Garrod: An integrated theory of language production and comprehension and tested as neurocomputational models of early skill information about pitch. More generally, Strijkers, acquisition (e.g., Jordan & Rumelhart 1992; Kawato et al. Runnqvist, Costa, & Holcomb (Strijkers et al.) question 1987; 1990), and we therefore argue that they can be how “poor” (predicted) representations can be used to suc- applied to language. Accordingly, we are grateful to a cessfully monitor “rich” (implemented) representations, number of commentators for fleshing out the importance and Hartsuiker similarly claims that impoverished rep- of such models in the development of language. We also resentations are not a good standard for judging correctness strongly agree that the use of forward and inverse models (i.e., they will be particularly error-prone). in learning does not disappear following childhood. A key property of forward models is their flexibility. Instead, it leads to adaptation and learning in adults, as Their primary purpose (in adults) is to promote fluency, well as to prediction of their own and others’ utterances. and therefore speakers are able to “tune into” whatever Hence, Johnson, Turk-Browne, & Goldberg aspect of a stimulus is most relevant to this goal. So long (Johnson et al.) point out the role of prediction in learning as speakers know that pitch is relevant to a particular task to segment utterances, and in learning words and gramma- (or is obviously being manipulated in an experiment), tical constructions. Krishnan notes various interrelations they predict the pitch that they will produce, and are dis- between production and comprehension abilities during rupted if their predicted percept does not match the development. We endorse her goal of explaining the mech- actual percept. People are able to determine what aspects anisms underlying such developmental changes. Aitken of a percept to predict on the basis of their situation, emphasizes the importance of the developmental perspec- such as the current experimental task (see Howes, tive in accounting for communication processes in general, Healey, Eshghi, & Hough [Howes et al.]) Such flexi- and we agree. bility clearly makes the forward models more useful for Mani & Huettig point out that two-year-olds’ pro- aiding fluency, but it also means that we cannot determine duction vocabulary (rather than comprehension vocabu- which aspects of an utterance will necessarily be rep- lary) correlates with their ability to make predictions in a resented in a forward model. In Alario & Hamamé’s visual world situation (Mani & Huettig 2012). This provides terms, we assume that the “opt-out” is circumstantial important new evidence that prediction during compre- rather than systematic. Hence, predictions may contain hension makes use of production processes. In fact, it “fine-grained phonetic detail,” contra Trude. suggests that two-year-olds are already using prediction- In fact, the question of what information is represented by-simulation. Note that we speculated that adults may in forward models of motor control (and learning) has emphasize prediction-by-association when comprehending received some attention. For example, Kawato et al. children; we did not suggest that children emphasize pre- (1990) suggested that movement trajectories can be pro- diction-by-association during comprehension. jected using critical via-points through which the trajectory From a computational perspective, Chang, Kidd, & has to pass at a certain time, rather than in terms of the Rowland (Chang et al.) argue that linguistic prediction moment-by-moment dynamics of the implemented move- is a by-product of language learning. We accept that it ment trajectory. Optimal trajectories can then be learnt originates in learning but note that it is critical for fluent by applying local optimising principles for getting from performance in its own right. Their account of comprehen- one via-point to the next. In this way, impoverished predic- sion has some similarities to ours – in particular, that it uses tions can be used to monitor rich implementations. In the a form of production-based prediction. However, it same way, language users might predict particularly crucial assumes separate meaning and sequencing pathways, aspects of an utterance, but the aspects that they predict whereas we adopt a more traditional multi-level account will depend on the circumstances. (semantics, syntax, phonology); future research could More generally, Meyer & Hagoort question the value directly compare these accounts. We do not see why it is of predicting one’s own utterance. They argue that predic- problematic to use syntax and semantics in supervised tion is useful when it is likely to differ from the actual event. learning (any more than phonology, which is of course This is the case when predicting another person’s behav- also abstract). iour or when the result of one’s action is uncertain (e.g., McCauley & Christiansen discuss a model of language moving in a strong wind). But they argue that people are acquisition in which prediction-by-simulation facilitates the confident about their own speech. Meyer & Hagoort model’s shallow processing of the input during learning. admit that they will tend to be less confident in dialogue, They show how this model can account for a range of and we agree. But more important, we argue that the be- recent psycholinguistic findings about language acquisition. haviour of the production implementer is not fully deter- The integration of our account with the evidence for the mined by the production command, because the complex representation of multi-word chunks is potentially informa- processes involved in production are subject to internal tive. We also agree that shallow processing during compre- (“neural”) noise or priming (i.e., influences that may not hension may help explain apparent asymmetries between be a result of the production command). Assuming that production and comprehension. these sources of noise do not necessarily affect forward modelling as well, predicted speech may differ from actual speech. In addition, prediction is useful even if the R4. Impoverished representations and production behaviour is fully predictable, because it allows the actor monitoring to plan future behaviour on the basis of the prediction. In fact, we made such a proposal in relation to the order of Many commentators discuss our claim that predictions are heavy and light phrases (see sect. 3.1, target article). impoverished. For example, de Ruiter & Cummins point Hartsuiker claims that our account incorrectly predicts out that Heinks-Maldonado et al.’s (2006) findings are com- an early competitor effect in Huettig and Hartsuiker patible only with a forward model that incorporates (2010). His claim is based on the assumption that

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 51 Response/Pickering & Garrod: An integrated theory of language production and comprehension comparing the predicted utterance percept with the actual access to general knowledge when there is greater utterance percept should invoke phonological competitors discrepancy. in the same way that comprehending another’s speech invokes such competitors. But the predicted utterance percept of heart does not also represent phonological com- R5. Prediction-by-simulation versus petitors,1 and the utterance percept is directly related to prediction-by-association the predicted utterance percept (i.e., it is not analysed). Hartsuiker favours a conflict-monitoring account (e.g., We propose that comprehenders make use of both predic- Nozari et al. 2011), but such an account merely detects tion-by-simulation and prediction-by-association. In most some difficulty during production, and it is unclear how it situations, both routes provide some predictive value, and can determine the source of difficulty or the means of cor- so we assume that comprehenders integrate the predictions rection. We accept that a forward modelling account that they make. Both routes also use domain-general cogni- involves some duplication of information; a goal of our tive mechanisms such as memory (see Slevc & Novick). account and motor-control accounts is to provide reasons We made some suggestions about when comprehenders why complex biological systems are not necessarily parsi- are likely to weight one route more strongly than the monious in this respect. other. For example, simulation will be weighted more Oppenheim makes the interesting suggestion that inner strongly when the comprehender appears to be more speech might be the product of forward production similar to the speaker than otherwise. Laurent, Moulin- models. This is an alternative to the possibility that inner Frier, Bessière, Schwartz, & Diard (Laurent et al.) speech involves an incomplete use of the implementer, in describe how they have modelled the contributions of which the speaker inhibits production after computing a association and simulation (in their terms, auditory and phonological or phonetic representation. Clearly, findings motor knowledge) to speech perception under both ideal that inner speech is impoverished would provide some and adverse conditions (both in the context of external support for the forward-model account. But as Oppenheim noise and when the comprehender and speaker are very himself notes, it is hard to see how inner slips could be different). Under ideal conditions, association and simu- identified without using the production implementer to lation perform identically, but under adverse conditions, generate an utterance percept at the appropriate level of their performance falls off in different ways. However, representation (which is then compared to the predicted they note that integration of the two routes (i.e., sensory- utterance percept). motor fusion) yields better performance, and we agree. Jaeger & Ferreira argue that the output of the forward We strongly support their programme of modelling these production model (the predicted utterance) serves merely contributions (see also de Ruiter & Cummins) and as input to the forward comprehension model, and agree that experiments conducted under adverse con- suggest that the efference copy could directly generate ditions may help discriminate the contributions of the the predicted utterance percept. In fact, the motivation two forms of prediction. We agree that their modelling for constructing the forward production model is to aid results are consistent with findings from transcranial mag- learning an inverse model that maps backward from the netic stimulation (TMS) studies which point to the contri- predicted utterance percept via the predicted utterance bution of motor systems to speech perception, but only in to the production command, just as in motor control noisy conditions (e.g., D’Ausilio et al. 2011). theory (Wolpert et al. 2001). It is possible that sufficiently Yoon & Brown-Schmidt question the need for having fluent speakers might be able to directly map from the pro- both prediction-by-simulation and prediction-by-associ- duction command to the predicted utterance percept ation in comprehension (i.e., dual-route prediction). (though there may be a separate mapping to the predicted These commentators claim that comprehenders using utterance). But this would prevent speakers from remain- simulation would predict what the speaker would say ing sufficiently flexible to learn new words or utterances, (i.e., allocentrically). In fact, we propose that comprehen- just as in the early stages of acquisition. In this context, ders use context to aid allocentric prediction, but that Adank et al. (2010) found that adult comprehension of they are also subject to egocentric biases (i.e., comprehen- unfamiliar accents is facilitated by previous imitation of ders only partly take into account information about their those accents to a similar extent whether speakers can or partner’s context). Yoon & Brown-Schmidt claim predic- cannot hear their own speech. This suggests that adaptation tion-by-association would also be allocentric, and therefore is mediated by the forward production model (though there question why we need two prediction mechanisms. We first could also be an effect of proprioception). Note that the note that prediction-by-association need not be allocentric, inverse model is not merely used for long-term learning, as it might be biased by prior perception of oneself. but is also used to modify an action as it takes place (e.g., But more important, the two routes to prediction are dis- to speak more clearly if background noise increases). tinct for reasons unrelated to the role of context. Perhaps Finally, Slevc & Novick suggest that nonlinguistic most important, prediction-by-simulation takes into memory tasks and linguistic conflict resolution involve account the inferred intention of the speaker in a way common brain structures (the left inferior temporal that prediction-by-association cannot (as it makes no refer- gyrus). Patients with lesions to this area show difficulty ence to mental states). Hence, prediction-by-simulation with both memory tasks and with language production should offer a richer and more situation-specific kind of and comprehension. We propose that both self- and prediction than prediction-by-association, and a combi- other-monitoring rely on memory-based predictions. One nation of these predictions is likely to be more accurate possibility is that monitoring can involve automatic correc- than one form of prediction by itself. tion when the difference between the prediction and the To what extent can prediction-by-simulation account for implementation is low, but monitoring requires extensive speech adaptation effects? Trude and Brown-Schmidt

52 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 Response/Pickering & Garrod: An integrated theory of language production and comprehension

(2012) showed that listeners could use their knowledge of a important, we do not assume that covert imitation simply speaker’s regional pronunciation to rapidly rule out compe- involves copying linguistic representations (or that overt titors in a visual world task (see also Dahan et al. 2008). imitation involves “blind” repetition). Instead, our proposal Trude asks whether such rapid incorporation of context (see Fig. 6, target article) is that comprehenders use the to aid prediction can be explained by prediction-by-simu- inverse model and context to derive the production lation and to what extent it suggests that listeners make command that comprehenders would use if they were to detailed phonetic predictions of what they will hear. As produce the speaker’s utterance, and use this to drive the we proposed here (see sect. R4), forward models must be forward model (or to make overt responses). In other flexible, and in experimental situations that highlight words, the forward model and overt imitation (or com- detailed phonetic differences we would expect people to pletion) are affected by the production command. Hence, predict such details. Of course there is still the question our account is compatible with findings such as those of of how rapidly a listener could incorporate the context Ondobaka et al. (2011) because it proposes that imitation (i.e., relating to speaker identity) into their forward model. can be affected by aspects of intentions such as the compat- However, it is interesting to note that Trude and Brown- ibility between interlocutors’ goals. It can also explain how Schmidt found that competitors were ruled out earlier fol- accent convergence can depend on communicative inten- lowing increased exposure to previous words (e.g., hearing tions (e.g., relating to identity). point to the bag as opposed to just hearing bag). Trude We agree with Echterhoff that our account aims to inte- also suggests that listeners’ use of impoverished represen- grate a “language-as-action” approach to intention (rep- tations could explain difficulties in imitating those features. resented as the production command) with the evidence But, in fact, we claim that listeners typically compute fully for mechanistic time-locked processing. We limited our specified representations using the implementer, so the discussion of intentions for reasons of space, but agree reasons for difficulties in imitation presumably lie elsewhere. that their relationship to other aspects of mental life is More generally, we claim that comprehenders use some pro- central to a more fully developed theory. He specifically duction processes but not necessarily all (e.g., they may be highlights the importance of postdictive processing in unable to produce a particular accent). determining nonliteral and other complex intentions, and Festman addresses issues relating to bilingualism. One we agree. We see his proposal as closely related to the reason for the difficulty of conversations between a native use of offline prediction-by-simulation (see Pezzulo 2011a). and a nonnative speaker is that their processing systems The HMOSAIC architecture is used to determine the are likely to be very different (in terms of both speed and relationship between actions and higher-level intentions. content) and so prediction-by-simulation is likely to be Commentators de Ruiter & Cummins query whether adversely affected. (In addition, prediction-by-simulation this is possible for language because of the particularly will be hindered by limited experience on which to learn complex relationship between intention and utterance; forward models.) Prediction-by-association does not Pazzaglia also claims that the mappings between intention suffer from this problem. For example, most L1 (first- and speech sounds are too complex for prediction-by-simu- language) speakers have experience with L2 (second- lation. We are not convinced that this relationship is more language) speakers, and hence can predict L2 speakers complex than other aspects of human action (and inter- even when they would behave differently from them. L2 action) and believe that this is simply an issue for future speakers should also be able to predict L1 speakers (who research. they tend to encounter regularly), but they may of course Kreysa points out that comprehenders may use gaze to not be able to make good predictions (e.g., if they do not help predict utterances without deliberate consideration of know words that the L1 speakers would use). intention. As she notes, such cues may constitute a form of Rabagliati & Bemis argue that much of language is not prediction-by-association, though it is also possible that predictable and that its power is its ability to communicate comprehenders perform prediction-by-simulation but the unpredictable. We do not claim that prediction with gaze constituting part of the context that is used to underlies all of language comprehension. Rather, people compute the intention (see Fig. 6, target article). If this is use prediction whenever they can to assist comprehension the case, gaze would help reduce the complexity of the (at different linguistic levels). But when an utterance is intention-utterance relationship. She also questions unpredictable, they simply rely on the implementer. In whether anticipatory fixation in the visual-world paradigm fact, failure to predict successfully serves to highlight the involves prediction-by-association or simulation. In this unexpected, and therefore allows the comprehender to context, we note that Lesage et al. (2012) showed that cer- concentrate resources. ebellar repetitive TMS (rTMS) prevented such anticipatory fixations in predictive contexts (e.g., The man will sail the boat), but not in control sentences (or in vertex rTMS). R6. Communicative intentions and the production As the cerebellum appears to be used for prediction in command motor control (Miall & Wolpert 1996), this suggests that such fixations involve prediction-by-simulation. Many of the commentators raise the issue of how our Jaeger & Ferreira ask about the precise nature of the account is affected by communicative intentions. In our prediction errors that the system is trying to minimize. In terms, this is the question of how the production particular, do these relate to evaluations of how well command is determined and used. We agree with formed the output is or do they relate to evaluations of Kashima, Bekkering, & Kashima (Kashima et al.) its communicative effectiveness in the context in which it that communicative intentions do not simply underlie the is uttered? One possibility is that when speakers utter a pre- construction of semantics, syntax, and phonology, but dictable word, it means that they have forward modelled incorporate information such as illocutionary force. Most that word to a considerable extent before uttering it.

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 53 References/Pickering & Garrod: An integrated theory of language production and comprehension Hence, they weight the importance of the forward model Echterhoff first asks whether our action-based account more than the output of the implementer and therefore can generalize to noninteractive situations. Our models of attenuate the form of the word. But alternatively, the production and comprehension are explained noninterac- speakers realize that the error tends to be less for predict- tively and then applied to dialogue. In particular, we able than unpredictable words, and know that addressees propose that dialogue allows interlocutors to make use of are less likely to comprehend words when the error is overt responses; in monologue, such overt responses are great. They thus use a strategy of clearer articulation not relevant, therefore language users focus more comple- when they realize that the error is likely to be great tely on internal processes. Echterhoff also argues that (perhaps based on their view of the addressee’s ability to shared reality (see sect. 2.3, target article) involves more comprehend). than seamless coordinated activity, but also has an evalua- Pezzulo & Dindo argue that producers use intentional tive component. We propose that successful mutual predic- (signalling) strategies to aid their comprehenders’ predic- tion (both A and B correctly predict both A and B) tions. In other words, they make themselves more predict- underlies shared reality and is in turn likely to support able to the comprehender. To do this, they maintain an the alignment of evaluations of that activity. Mutual predic- internal model of the comprehenders’ uncertainty. tions occur as a result of aligned action commands, and Hence, the authors propose a type of allocentric account. action commands reflect intentions which of course We suspect that speakers make their utterances predictable involve the motivations that, according to Echterhoff, for both allocentric reasons (e.g., lengthening vowels for underlie shared reality. children) and egocentric reasons. For example, an effect of interactive alignment (Pickering & Garrod 2004)isto ACKNOWLEDGMENTS make interlocutors more similar (see sect. 3.3, target We thank Martin Corley and Chiara Gambi for their comments and article) so that comprehenders are more likely to predict acknowledge support of ESRC Grants no. RES-062-23-0736 and no. speakers accurately. Such alignment might itself follow RES-060-25-0010. from an intentional strategy to align but need not. In their penultimate paragraph, Pezzulo & Dindo make NOTE 1. This constitutes a sense in which the predicted utterance percept is several very interesting additional suggestions about ways impoverished with respect to the utterance percept, but does not lose any in which producers and comprehenders may make use of relevant information about the utterance. prediction beyond those we addressed in the target article.

R7. Interleaving production and comprehension in dialogue References “ ” “ ” ’ Howes et al. criticise traditional models of production and [The letters a and r before author s initials stand for target article and comprehension based on large units (such as whole sen- response references, respectively] tences), and we agree. Such models are clearly unable to Acheson, D. J. & MacDonald, M. C. (2009) Verbal working memory and language deal with the fragmentary nature of many contributions production: Common approaches to the serial ordering of verbal information. to dialogue. Howes et al. propose an incremental model Psychological Bulletin 135(1):50–68. [LRS] that combines production and comprehension. This Adank, P. & Devlin, J. T. (2010) On-line plasticity in spoken sentence comprehen- sion: Adapting to time-compressed speech. NeuroImage 49:1124–32. [aMJP] model may be able to deal with incrementality and joint Adank, P., Hagoort, P. & Bekkering, H. (2010) Imitation improves language utterances in dialogue but does not provide an account of comprehension. Psychological Science 21:1903–909. [arMJP] prediction, either of upcoming words in monologue (e.g., Aglioti, S. M. & Pazzaglia, M. (2010) Representing actions through their Delong et al. 2005) or ends of turns in dialogue (de sound. Experimental Brain Research 206(2):141–51. DOI: 10.1007/s00221- 010-2344-x. [MP] Ruiter et al. 2006). Moreover, Howes et al. argue that pre- Aglioti, S. M. & Pazzaglia, M. (2011) Sounds and scents in (social) action. Trends in diction can only help speakers repeat their interlocutors, Cognitive Sciences 15(2):47–55. DOI: 10.1016/j.tics.2010.12.003. [MP] but this is not the case. Overt responses based on the Aitken, K. J. (2008) Intersubjectivity, affective neuroscience, and the neurobiology of derived production command for the current utterance autistic spectrum disorders: A systematic review. Keio Journal of Medicine – (i (t)) lead to repetition, but overt responses based on 57:15 36. [KJA] B Aitken, K. J. & Trevarthen, C. (1997) Self/other organization in human psychological the derived production command for the upcoming utter- development. Development and Psychopathology 9:653–77. [KJA] ance (iB (t+1)) lead to continuations (see Fig. 6, target Alcock, K. J. & Krawczyk, K. (2010) Individual differences in language development: article). In other words, we believe that our account pro- Relationship with motor skill at 21months. Developmental Science vides mechanisms that underlie dialogue (Pickering & 13(5):677–91. [SK] Allopena, P. D., Magnuson, J. S. & Tanenhaus, M. K. (1998) Tracking the time Garrod 2004). course of spoken word recognition using eye movements: Evidence for Fowler argues that we wrongly emphasize internal predic- continuous mapping models. Journal of Memory and Language 38:419–39. tive models when the same benefits can be accrued by [aMJP] directly interacting with the environment, most importantly Altmann, G. T. M. (2011) The mediation of eye movements by spoken language. In our interlocutors. We accept that the information in the The Oxford handbook of eye movements, ed. S. P. Liversedge, I. D. Gilchrist & ’ S. Everling, pp. 979–1003. Oxford University Press. [HK] environment helps determine people s actions, but argue Altmann, G. T. M. & Kamide, Y. (1999) Incremental interpretation at verbs: that predictions driven by internal models that have been Restricting the domain of subsequent reference. Cognition 73(3):247–64. shaped by past experience allow people to perform better [FC, HK, aMJP, LRS] (just as is the case for complex engineering). In the inter- Altmann, G. T. M. & Mirkovic, J. (2009) Incrementality and prediction in human sentence processing. Cognitive Science 33(4):583–609. [aMJP, HR] action process, overt imitation, completions, and comp- Anders, S., Heinzle, J., Weiskopf, N., Ethofer, T. & Haynes, J.-D. (2011) Flow of lementary responses all appear to occur regularly, and all affective information between communicating brains. NeuroImage are compatible with our account (see sect. 3.3, target article). 54:439–46. [KJA]

54 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 References/Pickering & Garrod: An integrated theory of language production and comprehension

Anderson, M. L. (2003) Embodied cognition: A field guide. Artificial Intelligence Bock, J. K. (1996) Language production: Methods and methodologies. Psychonomic 149, 151–56. [GD] Bulletin & Review 3:395–421. [aMJP] Andric, M. & Small, S. L. (2012) Gesture’s neural language. Frontiers in Psychology Bock, J. K. & Levelt, W. J. M. (1994) Language production: Grammatical encoding. 3:99. DOI: 10.3389/fpsyg.2012.00099. [ASD] In: Handbook of psycholinguistics, ed. M. A. Gernsbacher, Academic Press. Arbib, M. A. (2005) From monkey-like action recognition to human language: An [aMJP] evolutionary framework for neurolinguistics. Behavioral and Brain Sciences Bock, K. & Miller, C. A (1991) Broken agreement. Cognitive Psychology 28(2):105–24. [MP] 23:45–93. [aMJP] Arbib, M. A. (2012) How the brain got language: The mirror system hypothesis. Borensztajn, G., Zuidema, J. & Bod, R. (2009) Children’s grammars grow more Oxford University Press. [KJA] abstract with age. Topics in Cognitive Science 1:175–88. [SMM] Arnon, I. & Snider, N. (2010) More than words: Frequency effects for multi-word Borovsky, A., Elman, J. L. & Fernald, A. (2012) Knowing a lot for one’s age: phrases. Journal of Memory and Language 62:67–82. [SMM] Vocabulary skill and not age is associated with anticipatory incremental sentence Aronoff, M. (1976) Word formation in . Linguistic Inquiry interpretation in children and adults. Journal of Experimental Child Psychology Monograph 1. MIT Press. [MAJ] 112(4):417–36. [FC] Austin, J. L. (1962) How to do things with words. Clarendon Press. [GE] Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S. & Cohen, J. D. (2001) Aylett, M., & Turk, A. (2004) The smooth signal redundancy hypothesis: A functional Conflict monitoring and cognitive control. Psychological Review 108:624–52. explanation for relationships between redundancy, prosodic prominence, [RJH] and duration in spontaneous speech. Language and Speech 47(1):31–56. Bourhis, R. Y. & Giles, H. (1977) The language of intergroup distinctiveness. In: [TFJ] Language, ethnicity, and intergroup relations, ed. H. Giles, pp. 119–36. Baayen, R. H., Milin, P., FilipovićDurdevic,́ D., Hendrix, P. & Marelli, M. (2011) Academic Press. [YK] An amorphous model for morphological processing in visual comprehension Boyd, J. K. & Goldberg, A. E. (2011) Learning what not to say: Categorization based on naive discriminative learning. Psychological Review 118:438–82. and statistical preemption in “a-adjective” production. Language 87:1–29. [TFJ] [MAJ] Babel, M. (2010) Dialect divergence and convergence in New Zealand English. Branigan, H. P., Pickering, M. J. & Cleland, A. A. (2000) Syntactic coordination in Language in Society 39:437–56. [YK] dialogue. Cognition 75:B13–25. [aMJP] Babel, M. (2012) Evidence for phonetic and social selectivity in spontaneous pho- Branigan, H. P., Pickering, M. J. & McLean, J. F. (2005) Priming prepositional- netic imitation. Journal of Phonetics 40:177–89. [AMT] phrase attachment during language comprehension. Journal of Experimental Bannard, C. & Matthews, D. (2008) Stored word sequences in language learning: Psychology: Learning, Memory, and Cognition 31:468–81. [LRS] The effect of familiarity on children’s repetition of four-word combinations. Bråten, S. (1998) Intersubjective communication and emotion in early ontogeny. Psychological Science 19:241–48. [SMM] Cambridge University Press. [KJA] Bar, M. (2009) The proactive brain: Memory for predictions. Philosophical Trans- Bratman, M. E. (1992) Shared cooperative activity. The Philosophical Review actions of the Royal Society B 364:1235–43. [LRS] 101:327–41. [YK] Bargh, J. A. & Chartrand, T. L. (1999) The unbearable automaticity of being. Bratman, M. E. (1999) Faces of intention: Selected essays on intention and agency. American Psychologist 54:462–79. [GP] Cambridge University Press. [YK] Barrett, J. & Fleming, A. S. (2011) Annual research review: All mothers are not Brennan, S. E. & Clark, H. H. (1996) Conceptual pacts and lexical choice in con- created equal: Neural and psychobiological perspectives on mothering and the versation. Learning, Memory 22(6):1482–93. [TFJ] importance of individual differences. Journal of Child Psychology and Psy- Brooks, P. J. & Tomasello, M. (1999) How children constrain their argument chiatry 52:368–97. [KJA] structure constructions. Language 75(4):720–38. [MAJ] Barsalou, L. (1999) Perceptual symbol systems. Behavioral and Brain Sciences Brown, R. & McNeill, D. (1966) The “tip of the tongue” phenomenon. Journal of 22:577–600. [arMJP] Verbal Learning and Verbal Behavior 5:325–37. [aMJP] Barsalou, L. W. (2008) Grounded cognition. Annual Review of Psychology 59:617– Brown-Schmidt, S. (2009) The role of executive function in perspective taking 45. DOI 10.1146/annurev.psych.59.103006.093639. [GD, MP] during online language comprehension. Psychonomic Bulletin & Review Barsalou, L. W., Simmons, W. K., Barbey, A. K. & Wilson, C. D. (2003) Grounding 16(5):893–900. [LRS] conceptual knowledge in modality-specific systems. Trends in Cognitive Science Brown-Schmidt, S. & Tanenhaus, M. K. (2008) Real-time investigation of referential 7:84–91. [GD] domains in unscripted conversation: A targeted language game approach. Bates, E., Benigni, L., Bretherton, I., Camaioni, L. & Volterra, V. (1979) The Cognitive Science 32:643–84. [SOY] emergence of symbols: Cognition and communication in infancy. Academic Buccino, G., Riggio, L., Melli, G., Binkofski, F., Gallese, V. & Rizzolatti, G. (2005) Press. [SK] Listening to action-related sentences modulates the activity of the motor Bates, E., Bretherton, I. & Snyder, L. (1988) From first words to grammar: Indi- system: A combined TMS and behavioral study. Brain Research and Cognition vidual differences and dissociable mechanisms. Cambridge University Press. Brain Research 24(3):355–63. DOI: 10.1016/j.cogbrainres.2005.02.020. [SK] [MP] Bates, E. & Dick, F. (2002) Language, gesture, and the developing brain. Devel- Buxbaum, L. J., Kyle, K. M. & Menon, R. (2005) On beyond mirror neurons: opmental Psychobiology 40(3):293–310. [SK, MP] Internal representations subserving imitation and recognition of skilled Bavelas, J. B., Coates, L. & Johnson, T. (2000) Listeners as co-narrators. Journal of object-related actions in humans. Brain Research and Cognition Personality and Social Psychology 79:941–52. [aMJP] Brain Research 25(1):226–39. DOI: 10.1016/j.cogbrainres.2005.05.014. Bedny, M. & Caramazza, A. (2011) Perception, action, and word meanings in the [MP] human brain: The case from action verbs. Annals of the New York Academy of Callan, D. E., Jones, J. A., Callan, A. M. & Akahane-Yamada, R. (2004) Phonetic Sciences 1224:81–95. DOI:10.1111/j.1749-6632.2011.06013.x. [ASD] perceptual identification by native- and second-language speakers differentially Beier, J. S. & Spelke, E. S. (2012) Infants’ developing understanding of social gaze. activates brain regions involved with acoustic phonetic processing and those Child Development 83:486–96. [KJA] involved with articulatory-auditory/orosensory internal models. NeuroImage Ben Shalom, D. & Poeppel, D. (2008) Functional anatomic models of language: 22:1182–94. [ASD] assembling the pieces. The Neuroscientist 14:119–27. [KJA, aMJP] Cappa, S. F. & Pulvermüller, F. (2012) Cortex special issue: Language and the motor Bencini, G. M. & Goldberg, A. E. (2000) The contribution of argument structure system. Cortex 48(7):785. [ASD] constructions to sentence meaning. Journal of Memory & Language Carpenter, W. B. (1852) On the influence of suggestion in modifying and directing 43:640–51. [MAJ] muscular movement independently of volition. Proceedings of the Royal Insti- Bessière, P., Laugier, C. & Siegwart, R. ed. (2008) Probabilistic reasoning and tution 147–54. [YK] decision making in sensory-motor systems, volume 46 of Springer tracts in Castelli, L., Pavan, G., Ferrari, E. & Kashima, Y. (2009) The stereotyper and the advanced robotics. Springer-Verlag. [RL] chameleon: The effects of stereotype use on perceiver’s mimicry. Journal of Binder, J. R., Desai, R. H., Graves, W. W. & Conant, L. L. (2009) Where is the Experimental Social Psychology 4:835–39. [YK] semantic system? A critical review and meta-analysis of 120 functional Castellini, C., Badino, L., Metta, G., Sandini, G., Tavella, M., Grimaldi, M. & neuroimaging studies. Cerebral Cortex 19:2767–96. [ASD] Fadiga, L. (2011) The use of phonetic motor invariants can improve automatic Blackmer, E. R. & Mitton, J. L. (1991) Theories of monitoring and the timing of phoneme discrimination. PLoS ONE 6(9):e24055. [RL] repairs in spontaneous speech. Cognition 39:173–94. [aMJP] Catani, M. & ffytche, D. H. (2005) The rises and falls of disconnection syndromes. Blakemore, S.-J., Frith, C. D. & Wolpert, D. M. (1999) Spatio-temporal prediction Brain 128:2224–39. [KJA] modulates the perception of self-produced stimuli. Journal of Cognitive Catmur, C., Walsh, V. & Heyes, C. (2007) Sensorimotor learning configures the Neuroscience 11:551–59. [aMJP] human mirror system. Current Biology 17(17):1527–31. [GH] Boatman, D., Gordon, B., Hart, J., Selnes, O., Miglioretti, D. & Lenz, F. (2000) Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A. A. Transcortical sensory aphasia: revisited and revised. Brain 123(8):1634–42. (2009) The natural statistics of audiovisual speech. PLoS Computational Biology [HR] 5(7):e1000436. DOI:10.1371/journal.pcbi.1000436. [ASD]

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 55 References/Pickering & Garrod: An integrated theory of language production and comprehension

Chang, F. (2002) Symbolically speaking: A connectionist model of sentence pro- Signals). In: Proceedings of SemDial 2012, ed. S. Brown-Schmidt, J. Ginzburg & duction. Cognitive Science 26(5):609–51. [FC] S. Larsson, pp. 149–50. [JPdR] Chang, F. (2009) Learning to order words: A connectionist model of heavy NP shift De Ruiter, J. P., Mitterer, H. & Enfield, N. J. (2006) Predicting the end of a speaker’s and accessibility effects in Japanese and English. Journal of Memory and turn; a cognitive cornerstone of conversation. Language 82(3):515–35. [JPdR, Language 61(3):374–97. [FC] arMJP] Chang, F., Dell, G. S. & Bock, K. (2006) Becoming syntactic. Psychological Review Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S. & 113(2):234–272. [FC, TFJ, NM, SMM, aMJP, LRS] Cohen, D. (2012) Interpersonal synchrony: A survey of evaluation methods Chartrand, T. L. & Bargh, J. A. (1999) The chameleon effect: The perception- across disciplines. IEEE Transactions on Affective Computing 3(3): 349–65. behavior link and social interaction. Journal of Personality and Social Psychol- [KJA] ogy 76:893–910. [aMJP] Dell, G. S. (1978) Slips of the mind. In: The fourth Lacus forum, ed. M. Paradis, Chemero, A. (2009) Radical embodied cognitive science. MIT Press. [GD] pp. 69–75. Hornbeam Press. [GMO] Clark, A. (2013) Whatever next? Predictive brains, situated agents, and the Dell, G. S. (1986) A spreading-activation theory of retrieval in sentence production. future of cognitive science. Brain and Behavioral Sciences 36(3):181–253. Psychological Review 93:283–321. [GSD, JPdR, arMJP] [TFJ] Dell, G. S. (1988) The retrieval of phonological forms in production: Tests of Clark, H. H. (1996) Using language. Cambridge University Press. [YK, GP, aMJP, predictions from a connectionist model. Journal of Memory and Language LRS, SOY] 27:124–42. [aMJP] Clark, H. H. & Wilkes-Gibbs, D. (1986) Referring as a collaborative process. Dell, G. S., Burger, L. K. & Svec, W. R. (1997) Language production and serial Cognition 22:1–39. [GE, aMJP] order: A functional analysis and a model. Psychological Review Clayards, M., Tanenhaus, M. K., Aslin, R. N. & Jacobs, R. A. (2008) Perception 104(1):123–47. [LRS] of speech reflects optimal use of probabilistic speech cues. Cognition 108 Dell, G. S. & Repka, R. J. (1992) Errors in inner speech. In: Experimental slips and (3):804–809. [TFJ] human error: Exploring the architecture of volition, ed. B. J. Baars, pp. 237–62. Cohn, J. F. & Tronick E. Z. (1988) Mother-infant face-to-face interaction: Influence Plenum. [GMO] is bidirectional and unrelated to periodic cycles in either partner’s behavior. Dell, G. S., Schwartz, M. F., Martin, N., Saffran, E. M. & Gagnon, D. A. (1997) Developmental Psychology 24:386–92. [KJA] Lexical access in aphasic and nonaphasic speakers. Psychological Review Colas, F., Diard, J. & Bessière, P. (2010) Common Bayesian models for common 104:801–38. [GSD, aMJP] cognitive issues. Acta Biotheoretica 58(2–3):191–216. [RL] DeLong, K. A., Urbach, T. P. & Kutas, M. (2005) Probabilistic word pre-activation Condon, W. S. & Sander, L. W. (1974) Neonate movement is synchronized with during language comprehension inferred from electrical brain activity. Nature adult speech: Interactional participation and language acquisition. Science Neuroscience 8(8):1117–21. [NM, arMJP, HR] 183:99–101. [KJA] Denton, D. (2005) The primordial emotions: The dawning of consciousness. Oxford Conway, C. M., Bauernschmidt, A., Huang, S. S. & Pisoni, D. B. (2010) Implicit University Press. [KJA] statistical learning in language processing: Word predictability is the key. Desai, R. H., Binder, J. R., Conant, L. L. & Seidenberg, M. S. (2010) Activation of Cognition 114:356–71. [MAJ] sensory-motor areas in sentence comprehension. Cerebral Cortex 20:468–78. Corballis, M. C. (2009) The evolution of language. Annals of the New York Academy [aMJP] of Sciences 1156:19–43. DOI: 10.1111/j.1749-6632.2009.04423.x. [MP] DeVault, D., Sagae, K. & Traum, D. (2011) Incremental interpretation and Corley, M., Brocklehurst, P. H. & Moat, H. S. (2011) Error biases in inner and overt prediction of utterance meaning for interactive dialogue. Dialogue and speech: Evidence from tongue twisters. Journal of Experimental Psychology: Discourse 2(1):143–70. [JPdR] Learning, Memory, and Cognition 37(1):162–75. DOI:10.1037/a0021321. Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V. & Rizzolatti, G. (1992) [GMO, aMJP] Understanding motor events: A neurophysiological study. Experimental Brain Cowles, H. W., Walenski, M. & Kluender, R. (2007) Linguistic and cognitive pro- Research 91(1):176–80. [MP, aMJP] minence in anaphor resolution: Topic, contrastive focus and pronouns. Topoi Dick, A. S., Goldin-Meadow, S., Hasson, U., Skipper, J. I. & Small, S. L. (2009) 26:3–18. [LRS] Co-speech gestures influence neural activity in brain regions associated with Crain, S. & Fodor, J. D. (1985) How can grammars help parsers? In: Natural processing semantic information. Human Brain Mapping 30:3509–26. [ASD] language parsing: Psychological, computational, and theoretical perspectives, Dick, A. S., Goldin-Meadow, S., Solodkin, A. & Small, S. L. (2012) Gesture in the ed. D. R. Dowty, L. Karttunen & A. M. Zwicky, pp. 94–128. Cambridge developing brain. Developmental Science 15:165–80. [ASD] University Press. [GH] Dick, A. S., Solodkin, A. & Small, S. L. (2010) Neural development of networks for Csibra, G. & Gergely, G. (2007) ‘Obsessed with goals’: Functions and mechanisms of audiovisual speech comprehension. Brain and Language 114:101–14. [ASD] teleological interpretation of actions in humans. Acta Psychologica Dick, A. S. & Tremblay, P. (2012) Beyond the arcuate fasciculus: Consensus and 124:60–78. [aMJP] controversy in the connectional anatomy of language. Brain: A Journal of Cubelli, R., Marchetti, C., Boscolo, G. & Della Sala, S. (2000) Cognition in action: Neurology 135:3529–50. doi: 10.1093/brain/aws222. [ASD] Testing a model of limb apraxia. Brain and Cognition 44(2):144–65. DOI: Dijksterhuis, A. & Bargh, J. A. (2001) The perception-behavior expressway: Auto- 10.1006/brcg.2000.1226. [MP] matic effects of social perception on social behavior. In: Advances in exper- Cutting, J. C. & Ferreira, V. S. (1999) Semantic and phonological information flow in imental social psychology, vol. 33, ed. M. P. Zanna, pp. 1–40. Academic Press. the production lexicon. Journal of Experimental Psychology: Learning, [aMJP] Memory, and Cognition 25:318–44. [GSD] Dikker, S. & Pylkkänen, L. (2011) Before the N400: Effects of lexical-semantic D’Ausilio, A., Badino, L., Li, Y., Tokay, S., Craighero, L., Canto, R., Aloimonos, Y. & violations in visual cortex. Brain and Language 118:23–28. [aMJP] Fadiga, L. (2012a) Leadership in orchestra emerges from the causal relation- Dikker, S., Rabagliati, H., Farmer, T. A. & Pylkkänen, L. (2010) Early occipital ships of movement kinematics. PLoS ONE 7(5): e35757. DOI:10.1371/journal. sensitivity to syntactic category is based on form typicality. Psychological Science pone.0035757. [GP] 21:629–34. [aMJP] D’Ausilio, A., Bufalari, I., Salmas, P. & Fadiga L. (2012b) The role of the motor system Dikker, S., Rabagliati, H. & Pylkkänen, L. (2009) Sensitivity to syntax in visual cortex. in discriminating degraded speech sounds. Cortex 48:882–87. [RL] Cognition 110(3):293–321. [aMJP, HR] D’Ausilio, A., Jarmolowska, J., Busan, P., Bufalari, I. & Craighero, L. (2011) Tongue Dindo, H., Zambuto, D. & Pezzulo, G. (2011) Motor simulation via coupled internal corticospinal modulation during attended verbal stimuli: Priming and coarti- models using sequential Monte Carlo. Proceedings of IJCAI 2011:2113–19. culation effects. Neuropsychologia 49:3670–76. [arMJP] [GP] D’Ausilio, A., Pulvermüller, F., Salmas, P., Bufalari, I., Begliomini, C. & Fadiga, L. Dodd, B. & McIntosh, B. (2010) Two-year-old phonology: Impact of input, motor (2009) The motor somatotopy of speech perception. Current Biology and cognitive abilities on development. Journal of Child Language 19:381–85. [KJA, ASD, aMJP] 37(5):1027–46. [SK] Dahan, D., Drucker, S. J. & Scarborough, R. A. (2008) Talker adaptation in speech Doucet, S., Soussignan, R., Sagot, P. & Schaal, B. (2009) The secretion of areolar perception: Adjusting the signal or the representations? Cognition (montgomery’s) glands from lactating women elicits selective, unconditional 108:710–18. [rMJP, AMT] responses in neonates. PLoS ONE, 4(10):e7579. DOI:10.1371/journal. Damasio, A. R. & Damasio, H. (1994) Cortical systems for retrieval of concrete pone.0007579. [KJA] knowledge: The convergence zone framework. In: Large-scale neuronal Echterhoff, G., Higgins, E. T., Kopietz, R. & Groll, S. (2008) How communication theories of the brain. Computational neuroscience, ed. C. Koch & J. L. Davis, goals determine when audience tuning biases memory. Journal of Experimental pp. 61–74. MIT Press. [GD] Psychology: General 137:3–21. [GE] Davidson, P. R. & Wolpert, D. M. (2005) Widespread access to predictive models in Echterhoff, G., Higgins, E. T. & Levine, J. M. (2009) Shared reality: Experiencing the motor system: A short review. Journal of Neural Engineering 2:S313–19. commonality with others’ inner states about the world. Perspectives on [aMJP] Psychological Science 4:496–521. [GE, aMJP] De Ruiter, J. P. & Cummins, C. (2012) A model of intentional communication: Eickhoff, S. B., Heim, S., Zilles, K. & Amunts, K. (2009) A systems perspective on AIRBUS (Asymmetric Intention Recognition with Bayesian Updating of the effective connectivity of overt speech production. Philosophical

56 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 References/Pickering & Garrod: An integrated theory of language production and comprehension

Transactions of the Royal Society A: Mathematical, Physical, and Engineering actions. NeuroImage 59(1):556–64. DOI: 10.1016/j.neuroimage.2011.07.046. Sciences 367(1896):2399–421. DOI:10.1098/rsta.2008.0287. [ASD] [MP] Elman, J. L. (1990) Finding structure in time. Cognitive Science, 14(2), 179–211. Fowler, C. A., Brown, J., Sabadini, L. & Weihing, J. (2003) Rapid access to speech [FC, aMJP] gestures in perception: Evidence from choice and simple response time tasks. Elman, J. L. (1991) Distributed representations, simple recurrent networks, and Journal of Memory and Language 49:296–314. [aMJP] grammatical structure. Machine Learning 7:195–225. [MAJ] Frank, A. (2011) Integrating linguistic, motor, and perceptual information in Elman, J. L. (1993) Learning and development in neural networks: The importance language production. University of Rochester. [TFJ] of starting small. Cognition 48:71–99. [MAJ] Fraser, C., Bellugi, U. & Brown, R. (1963) Control of grammar in imitation, com- Emery, N. J. (2000) The eyes have it: The neuroethology, function and evolution of prehension, and production. Journal of Verbal Learning and Verbal Behavior social gaze. Neuroscience and Biobehavioral Reviews 24:581–604. [HK] 2:121–35. [SMM] Erten, I. H. & Razı, S. (2009) The effects of cultural familiarity on reading com- Frazier, L. (1987) Sentence processing: A tutorial review. In: Attention and per- prehension. Reading in a Foreign Language 21:60–77. [JF] formance XII: The psychology of reading, ed. M. Coltheart, pp. 559–86. Evans, N. & Levinson, S. C. (2009) The myth of language universals: Language Erlbaum. [aMJP] diversity and its importance for cognitive science. Behavioral and Brain Frazier, L. & Flores d’Arcais, G. (1989) Filler driven parsing: A study of gap filling in Sciences 32(5):429–92. [NM] Dutch. Journal of Memory and Language 28:331–44. [GH] Fadiga, L. & Craighero, L. (2006) Hand actions and speech representation in Broca’s French, R. M., Addyman, C. & Mareschal, D. (2011) TRACX: A recognition-based area. Cortex 42(4):486–490. [MP] connectionist framework for sequence segmentation and chunk extraction. Fadiga, L., Craighero, L., Buccino, G. & Rizzolati, G. (2002) Speech listening Psychological Review 118:614–636. [MAJ] specifically modultes the excitability of tongue muscles: A TMS study. European Freudenthal, D., Pine, J. M., Aguado-Orea, J. & Gobet, F. (2007) Modelling the Journal of Neuroscience 15:399–402. [aMJP] developmental pattern of finiteness marking in English, Dutch, German, and Falk, D. (2004) Prelinguistic evolution in early hominins: Whence motherese? Spanish using MOSAIC. Cognitive Science 31:311–41. [SMM] Behavioral and Brain Sciences 27:491–503, commentaries 503–41. [KJA] Frey S., Campbell J. S., Pike G. B. & Petrides M. (2008) Dissociating the human Farmer, T. A., Brown, M. & Tanenhaus, M. K. (2013) Prediction, explanation, and language pathways with high angular resolution diffusion fiber tractography. the role of generative models in language processing. Brain and Behavioral Journal of Neuroscience 28:11435–44. [F-XA] Sciences 36(3):211–12. [TFJ] Freyd, J. J. & Finke, R. A. (1984) Representational momentum. Journal of Federmeier, K. D. (2007) Thinking ahead: The role and roots of prediction in Experimental Psychology: Learning, Memory, and Cognition 10:126–32. language comprehension. Psychophysiology 44:491–505. [aMJP, KS] [aMJP] Federmeier, K. D. & Kutas, M. (1999) A rose by any other name: Long-term Friston, K. J., Daunizeau, J., Kilner, J. & Kiebel, S. J. (2010) Action and behavior: memory structure and sentence processing. Journal of Memory and Language A free-energy formulation. Biological Cybernetics 102(3):227–60. [GH] 41:469–95. [SOY] Frith, C. D. & Frith, U. (2008) Implicit and explicit processes in social cognition. Federmeier, K., McLennan, D. B. & De Ochoa, E. & Kutas, M. (2002) The impact Neuron 60(3):503–10. DOI:10.1016/j.neuron.2008.10.032. [GP] of semantic memory organization and sentence context information on spoken Fu, C. H., Vythelingum, G. N., Brammer, M. J., Williams, S. C., Amaro Jr., E., language processing by younger and older adults: An ERP study. Psychophy- Andrew, C. M., Yaguez, L., van Haren, N. E., Matsumoto, K. & McGuire, P. K. siology, 39:133–46. [NM] (2006) An fMRI study of verbal self-monitoring: Neural correlates of auditory Fedorenko, E., Gibson, E. & Rohde, D. (2006) The nature of working memory verbal feedback. Cerebral Cortex 16:969–77. [MIM] capacity in sentence comprehension: Evidence against domain-specific working Gallese, V. (2008) Mirror neurons and the social nature of language: The neural memory resources. Journal of Memory & Language 54:541–53. [LRS] exploitation hypothesis. Social Neuroscience 3:317–33. [aMJP] Fedzechkina, M., Jaeger, T. F. & Newport, E. (2012) Language learners restructure Gallese, V. (2009) Motor abstraction: A neuroscientific account of how action goals their input to facilitate efficient communication. Proceedings of the National and intentions are mapped and understood. Psychological Research Academy of Sciences of the United States of America 109(44):17897–902. 73:486–98. [GD] [TFJ] Gallese, V. & Lakoff, G. (2005) The brain’s concepts: The role of the sensory-motor Feldman, R. (2007) On the origins of background emotions: From affect synchrony system in reason and language. Cognitive Neuropsychology 22:455–79. [GD] to symbolic expression. Emotion 7:601–11. [KJA] Garrett, M. (1980) Levels of processing in speech production. In: Language pro- Ferguson, H. J., Scheepers, C. & Sanford, A. J. (2010) Expectations in counterfactual duction, vol. 1. Speech and talk, ed. B. Butterworth, pp. 177–220. Academic and theory of mind reasoning. Language and Cognitive Processes Press. [aMJP] 25:297–346. [SOY] Garrett, M. (2000) Remarks on the architecture of language production systems. In: Ferreira, F. (2003) The misinterpretation of noncanonical sentences. Cognitive Language and the brain: Representation and processing. ed. Y. Grodzinsky & Psychology 47:164–203. [aMJP] L. P. Shapiro, pp. 31–69. Academic Press. [aMJP] Ferreira, F., Ferraro, V. & Bailey, K. G. D. (2002) Good enough representations Garrett, M. F. (1975) The analysis of sentence production. In: The psychology of in language comprehension. Current Directions in Psychological Science learning and motivation, ed. G. H. Bower, pp. 133–75. Academic Press. [GSD] 11:11–15. [SMM] Garrod, S. & Anderson, A. (1987) Saying what you mean in dialogue: A study in Ferreira, F. & Tanenhaus, M. (2007) Introduction to the special issue on language- conceptual and semantic co-ordination. Cognition 27:181–218. [aMJP] vision interactions. Journal of Memory and Language 57:455–59. [CAF] Garrod, S. & Pickering, M. J. (2004) Why is conversation so easy? Trends in Ferreira, V. S. (1996) Is it better to give than to donate? Syntactic flexibility in Cognitive Sciences 8(1):8–11. [GP, aMJP] language production. Journal of Memory and Language 35:724–55. [aMJP] Garrod, S. & Pickering, M. J. (2009) Joint action, interactive alignment and dialogue. Ferreira, V. S. (2008) Ambiguity, accessibility, and a division of labor for commu- Topics in Cognitive Science 1:292–304. [KJA, aMJP] nicative success. Psychology of Learning and Motivation 49:209–46. [TFJ] Gaskell, G. (2007) Oxford handbook of psycholinguistics. Oxford University Press. Fine, A. B. & Jaeger, T. F. (2013) Evidence for implicit learning in syntactic [aMJP] comprehension. Cognitive Science 37(3):578–91. [TFJ] Gehring, W. J., Goss, B., Coles, M. G. H., Meyer, D. E. & Donchin, E. (1993) Fine, A. B., Jaeger, T. F., Farmer, T. & Qian, T. (submitted) Rapid linguistic A neural system for error detection and compensation. Psychological Science adaptation during syntactic comprehension. [TFJ] 4:385–90. [F-XA] Fischer, M. H. & Zwaan, R. A. (2008) Embodied language: A review of the role of Gergely, G., Bekkering, H. & Király, I. (2002) Developmental psychology: Rational the motor system in language comprehension. Quarterly Journal of Exper- imitation in preverbal infants. Nature 415:755. [aMJP] imental Psychology (2006) 61(6):825–50. DOI:10.1080/17470210701623605. Gerhardt, S. (2004) Why love matters: How affection shapes a baby’s brain. Brunner- [ASD, aMJP] Routledge. [KJA] Fitch, W. T. (2006) The biology and evolution of music: A comparative perspective. Gertner, Y. & Fisher, C. (2012) Predicted errors in children’s early sentence com- Cognition 100:173–215. [KJA] prehension. Cognition 124:85–94. [SMM] Fivaz-Depeursinge, E. & Favez, N. (2006) Exploring triangulation in infancy: Two Geschwind, D. H. & Levitt, P. (2007) Autism spectrum disorders: Developmental contrasted cases. Family Process 45:3–18. [KJA] disconnection syndromes. Current Opinion in Neurobiology 17:103–11. [KJA] Flinker A., Chang E. F., Kirsch H. E., Barbaro N. M., Crone, N. E. & Knight R.T. Geva, S., Bennett, S., Warburton, E. a. & Patterson, K. (2011) Discrepancy between (2010) Single-trial speech suppression of auditory cortex activity in humans. inner and overt speech: Implications for post-stroke aphasia and normal Journal of Neuroscience 30:16643–50. [F-XA, rMJP] language processing. Aphasiology 25(3):323–43. DOI:10.1080/ Fodor, J. A. (1983) The modularity of mind. MIT Press. [aMJP] 02687038.2010.511236. [GMO] Fogassi, L., Ferrari, P. F., Gesierich, B., Rozzi, S., Chersi, F. & Rizzolatti, G. (2005) Gibbon, F. E. (1999) Undifferentiated lingual gestures in children with articulation/ Parietal lobe: From action organization to intention understanding. Science phonological disorders. Journal of Speech, Language and Hearing Research 308(5722):662–67. DOI: 10.1126/science.1106138. [MP] 42(2):382. [SK] Fontana, A. P., Kilner, J. M., Rodrigues, E. C., Joffily, M., Nighoghossian, N., Vargas, Gibson, E. (1998) Linguistic complexity: Locality of syntactic dependencies. C. D. & Sirigu, A. (2012) Role of the parietal cortex in predicting incoming Cognition 68:1–76. [aMJP]

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 57 References/Pickering & Garrod: An integrated theory of language production and comprehension

Gibson, E. & Hickok, G. (1993) Sentence processing with empty categories. Grush, R. (2004) The emulation theory of representation: Motor control, imagery, Language and Cognitive Processes 8:147–61. [GH] and perception. Behavioral and Brain Sciences 27(3):377–96. [GP, aMJP] Giles, H., Coupland, N. & Coupland, J. (1991) Accommodation theory: Communi- Guenther, F. H., Hampson, M. & Johnson, D. (1998) A theoretical investigation of cation, context, and consequence. In Contexts of accommodation, ed. H. Giles, reference frames for the planning of speech movements. Psychological Review J. Coupland & N. Coupland, pp. 1–68. Cambridge University Press. [YK] 105(4):611. [TFJ] Glenberg, A. M. (2010) Embodiment as a unifying perspective for psychology. Wiley Guo T. & Peng D. (2006) ERP evidence for parallel activation of two languages in Interdisciplinary Reviews: Cognitive Science 1:586–96. [GD] bilingual speech production. NeuroReport 17:1757–60. [JF] Glenberg, A. M. (2011) How reading comprehension is embodied and why that Häberle, A., Schütz-Bosbach, S., Laboissière, R. & Prinz, W. (2008) Ideomotor matters. International Electronic Journal of Elementary Education 4:5–18. action in cooperative and competitive settings. Social Neuroscience 3:26–36. [ASD] [aMJP] Glenberg, A. M. & Gallese, V. (2012) Action-based language: A theory of language Hale, J. (2001) A probabilistic early parser as a psycholinguistic model. In: North acquisition, comprehension, and production. Cortex 48(7):905–22. American Chapter of the Association for Computational Linguistics (NAACL), DOI:10.1016/j.cortex.2011.04.010. [ASD, arMJP] Vol. 2, pp. 1–8. Association for Computational Linguistics. [TFJ, HR] Glenberg, A. M. & Kaschak, M. P. (2002) Grounding language in action. Hale, J. (2006) Uncertainty about the rest of the sentence. Cognitive Science Psychonomic Bulletin & Review 9:558–65. [aMJP] 30:609–42. [aMJP] Goffman, L. (2010) Dynamic interaction of motor and language factors in normal Halsband, U., Schmitt, J., Weyers, M., Binkofski, F., Grutzner, G. & Freund, H. J. and disordered development. In: Speech motor control: New developments in (2001) Recognition and imitation of pantomimed motor acts after unilateral basic and applied research, ed. B. Maassen & P. van Lieshout, pp. 137–52. parietal and premotor lesions: A perspective on apraxia. Neuropsychologia Oxford University Press. [SK] 39(2):200–16. [MP] Goldberg, A. E. (1995) Constructions: A construction grammar approach to argu- Hamilton, A. C. & Martin, R. C. (2007) Proactive interference in a semantic short- ment structure. University of Chicago Press. [MAJ, aMJP] term memory deficit: Role of semantic and phonological relatedness. Cortex Goldberg, A. E. (2006) Constructions at work: The nature of generalization in 43:112–23. [LRS] language. Oxford University Press. [MAJ] Hanna, J. E. & Brennan, S. E. (2007) Speakers’ eye gaze disambiguates referring Goldberg, A. E. (2011) Corpus evidence of the viability of statistical preemption. expressions early during face-to-face conversation. Journal of Memory and Cognitive Linguistics 22:131–54. [MAJ] Language 57:596–615. [HK] Goldberg, A. E., Casenhiser, D. M. & Sethuraman, N. (2005) The role of prediction Hanna, J. E., Tanenhaus, M. K. & Trueswell, J. C. (2003) The effects of common in construction-learning. Journal of Child Language 32:407–26. [MAJ] ground and perspective on domains of referential interpretation. Journal of Goldin-Meadow, S. (2005) What language creation in the manual modality tells us Memory and Language 49:43–61. [aMJP, SOY] about the foundations of language. The Linguistic Review 22:199–226. [GD] Harley, T. (2008) The psychology of language: From data to theory, 3rd Edn., Goldman, A. I. (2006) Simulating minds. The philosophy, psychology, and neuro- Psychology Press. [GSD, aMJP] science of mindreading. Oxford University Press. [aMJP] Hartsuiker, R. J. & Kolk, H. H. J. (2001) Error monitoring in speech production: A Gollan, T. H., Montoya, R. I., Fennema-Notestine, C. & Morris, S.K. (2005) computational test of the Perceptual Loop Theory. Cognitive Psychology Bilingualism affects picture naming but not picture classification. Memory & 42:11357. [RJH, aMJP] Cognition 33:1220–34. [JF] Hartsuiker, R. J. & Notebaert, L. (2010) Lexical access problems lead to disfluencies Gollan, T. H., Montoya, R. I. & Werner, G. A. (2002) Semantic and letter fluency in in speech. Experimental Psychology 57:169–77. [RJH] Spanish-English bilinguals. Neuropsychology 16:562–76. [JF] Haruno, M., Wolpert, D. M. & Kawato, M. (2001) MOSAIC Model for sensorimotor Gomez, R. L. & Gerken, L. (1999) Artificial grammar learning by 1-year-olds leads to learning and control. Neural Computation 13:2201–20. [aMJP] specific and abstract knowledge. Cognition 70:109–35. [MAJ] Haruno, M., Wolpert, D. M. & Kawato, M. (2003) Hierarchical MOSAIC for Goodwin, C. (1979) The interactive construction of a sentence in natural conversa- movement generation. International Congress Series 1250:575–90. [F-XA, tion. In: Everyday language: Studies in ethnomethodology, ed. G. Psathas, aMJP] 97–121. Irvington Publishers. [CH] Hashimoto, Y. & Sakai, K. L. (2003) Brain activations during conscious self-moni- Gordon, R. (1986) Folk psychology as simulation. Mind and Language 1:158–71. toring of speech production with delayed auditory feedback: An fMRI study. [aMJP] Human Brain Mapping 20:22–28. [MIM] Graf Estes, K., Evans, J. L., Alibali, M. W. & Saffran, J. R. (2007) Can infants Hasson, U., Skipper, J. I., Nusbaum, H. C. & Small, S. L. (2007) Abstract coding of map meaning to newly segmented words? Psychological Science 18:254–60. audiovisual speech: Beyond sensory representation. Neuron 56:1116–26. [ASD] [MAJ] Haueisen, J. & Knösche, T. R. (2001) Involuntary motor activity in pianists evoked by Graf, M., Schütz-Bosbach, S. & Prinz, W. (2010) Motor involvement in object per- music perception. Journal of Cognitive Neuroscience 13:786–92. [aMJP] ception: Similarity and complementarity. In: Grounding sociality: Neurons, Hawkins, J. A. (1994) A performance theory of order and constituency. Cambridge minds, and culture, ed. G. Semin & G. Echterhoff, pp. 27–52. Psychology University Press. [aMJP] Press. [aMJP] Hay, J. F., Pelucchi, B., Graf Estes, K. & Saffran, J. R. (2011) Linking sounds to Grant, E. R. & Spivey, M. J. (2003) Eye movements and problem solving: Guiding meaning: Infant statistical learning in a natural language. Cognitive Psychology attention guides thought. Psychological Science 14:462–66. [HK] 63:93–106. [MAJ] Green, A., Straube, B., Weis, S., Jansen, A., Willmes, K., Konrad, K. & Kircher, T. Healey, P. G. T., Purver, M. & Howes, C. (2010) Structural divergence in (2009) Neural integration of iconic and unrelated coverbal gestures: dialogue. Proceedings of 20th Annual Meeting of the Society for Text & A functional MRI study. Human Brain Mapping 30:3309–24. [ASD] Discourse. [CH] Green, D. W. (1986) Control, activation and resource: A framework and a model Heim, S., Opitz, B., Müller, K. & Friederici, A. D. (2003) Phonological processing for the control of speech in bilinguals Brain and Language 27:210–23. [JF] during language production: fMRI evidence for a shared production-compre- Green, J. R., Moore, C. A., Higashikawa, M. & Steeve, R. W. (2000) The physiologic hension network. Cognitive Brain Research 12:285–29. [aMJP] development of speech motor control: Lip and jaw coordination. Journal of Heinks-Maldonado, T. H., Nagarajan, S. S. & Houde, J. F. (2006) Magnetoence- Speech, Language, and Hearing Research 43(1):239. [SK] phalographic evidence for a precise forward model in speech production. Green, J. R. & Nip, I. B. (2010) Some organizational principles in early speech NeuroReport 17(13):1375–79. [JPdR, RJH, MIM, arMJP, KS] development. In: Speech Motor Control: New developments in basic and Heller, D., Grodner, D. & Tanenhaus, M. K. (2008) The role of perspective in applied research, ed. B. Maassen & P. van Lieshout, pp. 172–88. Oxford identifying domains of references. Cognition 108:831–36. [SOY] University Press. [SK] Hepper, P. G., Scott, D. & Shahidullah, S. (1993) Newborn and fetal response to Greene, S. B., McKoon, G. & Ratcliff, R. (1992) Pronoun resolution and discourse maternal voice. Journal of Reproductive and Infant Psychology 11:147–53. models. Journal of Experimental Psychology: Learning, Memory, and Cognition [KJA] 18:266–83. [SOY] Heyes, C. (2010) Where do mirror neurons come from? Neuroscience & Biobeha- Gregoromichelaki, E., Kempson, R., Purver, M., Mills, J. G., Cann, R., Meyer-Viol, vioral Reviews 34(4):575–83. [GH] W. & Healey, P. G. T. (2011) Incrementality and intention–recognition in Hickok, G. (2009a) Eight problems for the mirror neuron theory of action utterance processing. Dialogue and Discourse 2:199–233. [aMJP] understanding in monkeys and humans. Journal of Cognitive Neuroscience Grice, H. P. (1975) Logic and conversation. In: Syntax and semantics 3: Speech acts, 21(7):1229–43. DOI: 10.1162/jocn.2009.21189. [ASD, GH, MP] ed. P. Cole & J. L. Morgan, pp. 41–58. Academic Press. [GE] Hickok, G. (2009b) The functional neuroanatomy of language. Physics of Life Griffin, Z. M. & Bock, K. (2000) What the eyes say about speaking. Psychological Reviews 6(3):121–43. [ASD] Science 11:274–79. [HK] Hickok, G. (2012a) Computational neuroanatomy of speech production. Nature Grimm, A., Muller, A., Hamann, C. & Ruigendijk, E. (2011) Production-compre- Reviews Neuroscience 13(2):135–45. [GH] hension asymmetries in child language. De Gruyter. [SMM] Hickok, G. (2012b) The cortical organization of speech processing: Feedback control Grossberg, S. (1980) How does a brain build a cognitive code? Psychological Review and predictive coding the context of a dual-stream model. Journal of 87(1):1–51. [JB, rMJP] Communication Disorders 45:393–402. [ GH]

58 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 References/Pickering & Garrod: An integrated theory of language production and comprehension

Hickok, G., Houde, J. & Rong, F. (2011) Sensorimotor integration in speech pro- Iverson, J. M. & Thelen, E. (1999) Hand, mouth and brain. The dynamic emergence cessing: Computational basis and neural organization. Neuron 69(3):407–22. of speech and gesture. Journal of Consciousness Studies 6:11(12):19–40. [SK] DOI:10.1016/j.neuron.2011.01.019. [ASD, GH] Jackendoff, R. (2002) Foundations of language: Brain, meaning, grammar, evolution. Hickok, G. & Poeppel, D. (2000) Towards a functional neuroanatomy of speech Oxford University Press. [GD] perception. Trends in Cognitive Sciences 4(4):131–38. [ASD] Jackendoff, R. (2007) Linguistics in cognitive science: The state of the art. The Hickok, G. & Poeppel, D. (2004) Dorsal and ventral streams: A framework for Linguistic Review 24:347–401. [GD] understanding aspects of the functional anatomy of language. Cognition 92 Jaeger, T. F. (2010) Redundancy and reduction: Speakers manage syntactic infor- (1–2):67–99. [ASD] mation density. Cognitive Psychology 61:23–62. [TFJ, aMJP] Hickok, G. & Poeppel, D. (2007) The cortical organization of speech processing. Jaeger, T. F. & Snider, N. (2013) Alignment as a consequence of expectation Nature Reviews Neuroscience 8(5):393–402. [ASD, GH] adaptation: Syntactic priming is affected by the prime’s prediction error given Higgins, E. T. (1981) The “communication game”: Implications for social cognition both prior and recent experience. Cognition 127(1):57–83. [TFJ] and persuasion. In: Social cognition: The Ontario Symposium, Vol. 1, ed. E. T. Janssen N. & Barber, H. A. (2012) Phrase frequency effects in language production. Higgins, C. P. Herman & M. P. Zanna, pp. 343–92. Erlbaum. [GE] PLoS ONE7:e33202. [SMM] Hirano, S., Kojima, H., Naito, Y., Honjo, I., Kamoto, Y., Okazawa, H., Ishizu, K., Jonides, J. & Nee, D. E. (2006) Brain mechanisms of proactive interference in Yonekura, Y., Nagahama, Y., Fukuyama, H. & Konishi, J. (1997) Cortical pro- working memory. Neuroscience 139(1):181–93. [LRS] cessing mechanism for vocalization with auditory verbal feedback. NeuroReport Jordan, M. I. & Rumelhart, D. E. (1992) Forward models: Supervised learning with a 8(9–10):2379–82. [MIM] distal teacher. Cognitive Science, 16: 307–54. [FC, rMJP] Hockett, C. F. (1967) Where the tongue slips, there slip I. In: To honor Roman Kaiser, E. & Trueswell, J. C. (2004) The role of discourse context in the processing of Jakobson, pp. 910–36. Mouton. [GMO] a flexible word-order language. Cognition 94:113–47. [aMJP] Holle, H., Gunter, T. C., Rüschemeyer, S. A., Hennenlotter, A. & Iacoboni, M. Kamide, Y., Altmann, G. T. M. & Haywood, S. L. (2003) Prediction and (2008) Neural correlates of the processing of co-speech gestures. NeuroImage thematic information in incremental sentence processing: Evidence from 39(4):2010–24. [ASD] anticipatory eye movements. Journal of Memory and Language 49:133–56. Holtgraves, T. & Kashima, Y. (2008) Language, meaning and social cognition. Per- [HK, aMJP] sonality and Social Psychology Review 12:73–94. [YK] Kamide, Y., Scheepers, C. & Altmann, G. T. M. (2003) Integration of syntactic and Holtgraves, T. M. (2002) Language as social action: Social psychology and language semantic information in predictive processing: Cross-linguistic evidence from use. Erlbaum. [GE] German and English. Journal of Psycholinguistic Research 32(1):37–55. [FC] Hommel, B., Müsseler, J., Aschersleben, G. & Prinz, W. (2001) The theory of event Kan, I. P. & Thompson-Schill, S. L. (2004) Effect of name agreement on prefrontal coding (TEC): A framework for perception and action planning. Behavioral and activity during overt and covert picture naming. Cognitive, Affective & Behav- Brain Sciences 24:849–78. [aMJP] ioral Neuroscience 4(1):43–57. [GMO] Horton, W. S. & Gerrig, R. J. (2005) The impact of memory demands on audience Kashima, Y. & Lan, Y. (in press) Communication and language use in social cogni- design during language production. Cognition 96:127–42. [LRS] tion. In: The Oxford handbook of social cognition, ed. D. Carlson. Oxford Hostetter, A. B. & Alibali, M. W. (2008) Visual embodiment: Gesture as simulated University Press. [YK] action. Psychonomic Bulletin & Review 15:495–514. [aMJP] Kawato, M. (1999) Internal models for motor control and trajectory planning. Hough, J. (2011) Incremental semantics driven natural language generation with Current Opinion of Neurobiology 9(6):718–27. [GH] self-repairing capability. Proceedings of RANLP 2011 Student Conference, Kawato, M., Furawaka, K. & Suzuki, R. (1987) A hierarchical neural network model September 2011, Hissar, Bulgaria, 79–84. [CH] for the control and learning of voluntary movements. Biological Cybernetics Howes, C., Healey, P. G. T., Purver, M. & Eshghi, A. (2012) Finishing each 56: 1–17. [rMJP] other’s… Responding to incomplete contributions in dialogue. In: Proceedings Kawato, M., Maeda, Y., Uno, Y. & Suzuki, R. (1990) Trajectory formation of arm of the 34th Annual Conference of the Cognitive Science Society. August 2012, movement by cascade neural network model based on minimum torque-change Sapporo, Japan, 479–85. [CH] criterion. Biological Cybernetics, 62: 275–88. [rMJP] Howes, C., Purver, M., Healey, P. G. T., Mills, G. J. & Gregoromichelaki, E. (2011) Kemmerer, D. (2010) How words capture visual experience: The perspective from Incrementality in dialogue: Evidence from compound contributions. Dialogue cognitive neuroscience. In: Words and the world: How words capture human and Discourse 2:279–311. [aMJP] experience, ed. B. Malt & P. Wolff, pp. 289–329. Oxford University Press. Huettig, F. & Hartsuiker, R. J. (2010) Listening to yourself is like listening to others: [GD] External, but not internal, verbal self-monitoring is based on speech perception. Kempen, G. & Huijbers, P. (1983) The lexicalization process in sentence production Language and Cognitive Processes 25:347–74. [RJH, arMJP] and naming: Indirect election of words. Cognition 14:185–209. [GSD] Huettig, F. & Janse, E. (2012) Anticipatory eye movements are modulated by Keysar, B., Barr, D. J., Balin, J. A. & Brauner, J. S. (2000) Taking perspective in working memory capacity: Evidence from older adults. Paper presented at the conversation: The role of mutual knowledge in comprehension. Psychological 18th Architectures and Mechanisms for Language Processing Conference, Riva Science 11:32–38. [aMJP] del Garda, Italy. [NM] Kidd, E. (2012) Implicit statistical learning is directly associated with the acquisition Huettig, F. & McQueen, J. M. (2007) The tug of war between phonological, of syntax. Developmental Psychology 48(1):171–84. [FC] semantic and shape information in language-mediated visual search. Journal of Kilner, J. M. (2011) More than one pathway to action understanding. Trends in Memory and Language 57:460–82. [RJH, NM, aMJP] Cognitive Sciences 15(8):352–57. DOI: 10.1016/j.tics.2011.06.005. [MP] Huettig, F., Rommers, J. & Meyer, A. S. (2011) Using the visual world paradigm to Kilner, J. M., Paulignan, Y. & Blakemore, S.-J. (2003) An interference effect study language processing: A review and critical evaluation. Acta Psychologica of observed biological movement on action. Current Biology 13:522–25. 137:151–71. [HK, aMJP] [aMJP] Hurley, S. (2008a) The shared circuits model (SCM): How control, mirroring, and Kim, A. & Lai, V. (2012) Rapid interactions between lexical semantic and word form simulation can enable imitation, deliberation, and mindreading. Behavioral and analysis during word recognition in context: Evidence from ERPs. Journal of Brain Sciences 31(01):1–22. [GD, YK, arMJP, HR] Cognitive Neuroscience 24: 1104–12. [aMJP] Hurley, S. (2008b) Understanding simulation. Philosophy and Phenomenological Kiparsky, P. (1982) Lexical morphology and phonology. In: Linguistics in the Research 77:755–74. [aMJP] Morning Calm, ed. I.-S. Yang, pp. 3–91. Hanshin. [MAJ] Iacoboni, M. (2008) The role of premotor cortex in speech perception: Evidence Kleinschmidt, D., Fine, A. B. & Jaeger, T. F. (2012) A belief-updating model of from fmri and rtms. Journal of Physiology (Paris), 102:31–34. [ASD] adaptation and cue combination in syntactic comprehension. Proceedings of the Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., Mazziotta, J. C. & 34rd Annual Meeting of the Cognitive Science Society (CogSci12), 605–10. Rizzolatti, G. (2005) Grasping the intentions of others with one’s own mirror [TFJ] neuron system. PLoS Biology 3(3):e79. DOI: 10.1371/journal.pbio.0030079. Kleinschmidt, D. & Jaeger, T. F. (2011) A Bayesian belief updating model of [MP] phonetic recalibration and selective adaptation. Proceedings of the Cognitive Indefrey P. & Levelt, W. J. M. (2004) The spatial and temporal signatures of word Modeling and Computational Linguistics Workshop at ACL, Portland, OR, June production components. Cognition 92:101–44. [aMJP, KS] 23rd,10–19. [TFJ] Ito, T., Tiede, M. & Ostry, D. J. (2009) Somatosensory function in speech Knoblich, G. & Flach, R. (2001) Predicting action effects: Interaction between perception. Proceedings of the National Academy of Sciences 106:1245–48. perception and action. Psychological Science 12:467–72. [aMJP] [aMJP] Knoblich, G., Öllinger, M. & Spivey, M. J. (2005) Tracking the eyes to obtain insight Iverson, J. (2010) Developing language in a developing body: The relationship into insight problem solving. In: Cognitive processes in eye guidance, ed. between motor development and language development. Journal of Child G. Underwood, pp. 355–75. Oxford University Press. [HK] Language 37(2):229–61. [SK] Knoblich, G., Seigerschmidt, E., Flach, R. & Prinz, W. (2002) Authorship effects in Iverson, J. & Braddock, B. A. (2011) Gesture and motor skill in relation to language the prediction of handwriting strokes: Evidence for action simulation during in children with language impairment. Journal of Speech, Language, and action perception. Quarterly Journal of Experimental Psychology Hearing Research 54(1):72–86. [SK] 55A:1027–46. [aMJP]

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 59 References/Pickering & Garrod: An integrated theory of language production and comprehension

Knoeferle, P. & Crocker, M. W. (2006) The coordinated interplay of scene, utter- Magyari, L. & De Ruiter, J. P. (2012) Prediction of turn-ends based on anticipation ance, and world knowledge: Evidence from eye tracking. Cognitive Science of upcoming words. Frontiers in Psychology 3:376. [JPdR] 30:481–529. [HK] Mahon, B. Z. & Caramazza, A. (2008) A critical look at the embodied cognition Knoeferle, P., Crocker, M. W., Scheepers, C. & Pickering, M. J. (2005) The influ- hypothesis and a new proposal for grounding conceptual content. Journal of ence of the immediate visual context on incremental thematic role-assignment: Physiology-Paris 102(1–3):59–70. DOI: 10.1016/j.jphysparis.2008.03.004. Evidence from eye-movements in depicted events. Cognition 95:95–127. [MP] [aMJP] Mahon, B. Z. & Caramazza, A. (2009) Concepts and categories: A cognitive neu- Knoeferle, P. & Kreysa, H. (2012) Can speaker gaze modulate syntactic structuring ropsychological perspective. Annual Review of Psychology 60:27–51. and thematic role assignment during spoken sentence comprehension? Fron- DOI:10.1146/annurev.psych.60.110707.163532. [ASD] tiers in Psychology 3:538. [HK] Mampe, B., Friederici, A. D., Christophe, A. & Wermke, K. (2009) Newborns’ cry Krishnan, S., Alcock, K. J., Mercure, E., Leech, R., Barker, E., Karmiloff-Smith, A. melody is shaped by their native language. Current Biology 19:1994–97. & Dick, F. (in press) Articulating novel words: Oromotor contributions to [KJA] individual differences in nonword repetition. Journal of Speech, Language and Mani, N., Durrant, S. & Floccia, C. (2012) Activation of phonological and semantic Hearing Research. [SK] codes in toddlers. Journal of Memory and Language 66:612–22. [NM] Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Mani, N. & Huettig, F. (2012) Prediction during language processing is a piece of Ryskina, V. L., Stolyarova, E. I., Sundberg, U. & Lacerda, F. (1997) Cross- cake – but only for skilled producers. Journal of Experimental Psychology: language analysis of phonetic units in language addressed to infants. Science Human Perception and Performance 38: 843–47. [FC, NM, SMM, rMJP] 277(5326):684–86. [GP] Mani, N. & Plunkett, K. (2010) In the infant’s mind’s ear: Evidence for implicit Kutas, M., DeLong, K. A. & Smith, N. J. (2011). A look around at what lies ahead: naming in infancy. Psychological Science 21:908–13. [NM] Prediction and predictability in language processing. In: Predictions in the Mar, R. A. (2004) The neuropsychology of narrative: Story comprehension, brain: Using our past to generate a future, ed. M. Bar, pp. 190–207. Oxford story production and their interrelation. Neuropsychologia 42:1414–34. University Press. [aMJP] [aMJP] Lakin, J. & Chartrand, T. L. (2003) Using nonconscious behavioral mimicry to create Marchman, V. A. & Fernald, A. (2008) Speed of word recognition and vocabulary affiliation and rapport. Psychological Science 14:334–39. [aMJP] knowledge in infancy predict cognitive and language outcomes in later child- Langton, S. R. H., Watt, R. J. & Bruce, V. (2000) Do the eyes have it? Cues to the hood. Developmental Science 11(3):F9–16. [FC] direction of social attention. Trends in Cognitive Sciences 4:50–59. [HK] Marcotte, J. (2005) Causative alternation errors as event-driven construction para- Lau, E., Stroud, C., Plesch, S. & Phillips, C. (2006) The role of structural prediction digm completions. Stanford, Ph.D. dissertation. [MAJ] in rapid syntactic analysis. Brain and Language 98:74–88. [aMJP] Marshall, R. C., Rappaport, B. Z. & Garcia-Bunuel, L. (1985) Self-monitoring be- Laver, J. D. M. (1980) Monitoring systems in the neurolinguistic control of speech havior in a case of severe auditory agnosia with aphasia. Brain and Language production. In: Errors in : Slips of the tongue, ear, pen, 24:297–313. [RJH] and hand, ed. V. A. Fromkin, Academic Press. [aMJP] Marslen-Wilson, W. D. (1973) Linguistic structure and speech shadowing at very Lebeltel, O., Bessière, P., Diard, J. & Mazer, E. (2004) Bayesian robot programming. short latencies. Nature 244:522–23. [JPdR] Autonomous Robots 16(1):49–79. [RL] Marslen-Wilson, W. D. & Welsh, A. (1978) Processing interactions and lexical access Lerner, G. (1991) On the syntax of sentences-in-progress. Language in Society during word recognition in continuous speech. Cognitive Psychology 20(3):441–58. [CH] 10:29–63. [aMJP] Leroi-Gourhan, A. (1964) Le Geste et la Parole I Technique et Langage, Parigi, Martin, H. F. & Zwaan, R. A. (2008) Embodied language: A review of the role of the Albin Michel (coll. Sciences d’aujourd’hui), p. 323. [MP] motor system in language comprehension. The Quarterly Journal of Exper- Leroi-Gourhan, A. (1965) Le Geste et la Parole II La Memoire et les rythmes, Parigi, imental Psychology 61:825–50. [GD] Albin Michel (coll. Sciences d’aujourd’hui), p 285. [MP] Masataka, N. (2001) Why early linguistic milestones are delayed in children with Lesage, E., Morgan, B. E., Olson, A. C., Meyer, A. S. & Miall, R. C. (2012) Williams syndrome: Late onset of hand banging as a possible rate-limiting Cerebellar rTMS disrupts predictive language processing. Current Biology constraint on the emergence of canonical babbling. Developmental Science 22, R794–95. [rMJP] 4(2):158–164. [SK] Levelt, W. J. M. (1983) Monitoring and self-repair in speech. Cognition 14:41– Mattson, M. & Baars, B. J. (1992) Error-minimizing mechanisms: Boosting or 104. [MIM, GMO, aMJP] editing? In Experimental slips and human error: Exploring the architecture of Levelt, W. J. M. (1989) Speaking: From intention to articulation. MIT Press. volition, ed. B. J. Baars, pp. 263–87. Plenum. [RJH] [JPdR, RJH, arMJP] McCauley, S. M. & Christiansen, M. H. (2011) Learning simple statistics for Levelt, W. J. M., Roelofs, A. & Meyer, A. S. (1999) A theory of lexical access in language comprehension and production: The CAPPUCCINO model. In: speech production. Behavioral and Brain Sciences 22(1):1–75. [GSD, GMO, Proceedings of the 33rd Annual Conference of the Cognitive Science Society, ed. aMJP, KS] L. Carlson, C. Hölscher & T. Shipley, pp. 1619–24. Cognitive Science Levinson, S. C. (1983) Pragmatics. Cambridge University Press. [JPdR] Society. [SMM] Levinson, S. C. (1995) Interaction biases in human thinking. In: Social intelligence McCauley, S. M. & Christiansen, M. H. (submitted) Language learning as language and interaction, ed. E. N. Goody, pp. 221–60. Cambridge University Press. use: A computational model of children’s comprehension and production of [JPdR] language. Manuscript in preparation, Cornell University. [SMM] Levinson, S. C. (2006) On the human “interaction engine.” In: Roots of human McDonald, S. A. & Shillcock, R. C. (2003) Eye movements reveal the on-line sociality: Culture, cognition and interaction, ed. N. J. Enfield & S. C. Levinson computation of lexical probabilities. Psychological Science 14:648–52. [NM] (Cur.), pp. 39–69. Berg. [HK, GP] McGuire, P. K., Silbersweig, D. A. & Frith, C. D. (1996) Functional neuroanatomy Levy, R. (2008) Expectation-based syntactic comprehension. Cognition of verbal self-monitoring. Brain 119(Pt. 3):907–17. [MIM] 106(3):1126–77. [TFJ, aMJP, HR] McHale, J., Fivaz-Depeursinge, E., Dickstein S., Robertson, J. & Daley, M. (2008) Lew-Williams, C. & Fernald, A. (2007) Young children learning Spanish make rapid New evidence for the social embeddedness of infants’ early triangular use of grammatical gender in spoken word recognition. Psychological Science capacities. Family Process 47:445–63. [KJA] 18(3):193–98. [FC] McMurray, B., Tanenhaus, M. K. & Aslin, R. N. (2009) Within-category VOT affects Lewis, J. D. & Elman, J. L. (2001) Learnability and the statistical structure of recovery from “lexical” garden paths: Evidence against phoneme-level inhi- language: Poverty of stimulus arguments revisited. Proceedings of the 26th bition. Journal of Memory and Language 60:65–91. [AMT] Annual Conference on Language Development. [MAJ] Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D. & Iacoboni, M. (2007) The Lichtheim, L. (1885) On aphasia. Brain 7(4):433–84. [HR] essential role of premotor cortex in speech perception. Current Biology Lindblom, B. (1990) Explaining phonetic variation: A sketch of the H&H theory. 17:1692–96. [ASD, RL] Speech production and speech modelling 55:40339. [TFJ] Meltzoff, A. N. & Moore, M. K. (1977) Imitation of facial and manual gestures by MacDonald, M. C., Pearlmutter, N. J. & Seidenberg, M. S. (1994) The lexical human neonates. Science 198:75–8. [KJA] nature of syntactic ambiguity resolution. Psychological Review 101:676–703. Melzer, A., Prinz, W. & Daum, M. M. (2012) Production and perception of con- [aMJP] tralateral reaching: A close link by 12 months of age. Infant Behavior and MacGregor, L. J., Pulvermuller, F., van Casteren, M. & Shtyrov, Y. (2012) Ultra- Development 35:570–79. [NM] rapid access to words in the brain. Nature Communications 3:711. [KS] Menenti, L., Gierhan, S. M. E., Segaert, K. & Hagoort, P. (2011) Shared language: MacKay, D. G. (1981) The problem of rehearsal or mental practice. Journal of Motor Overlap and segregation of the neuronal infrastructure for speaking and Behavior 13(4):274–85. [GMO] listening revealed by fMRI. Psychological Science 22:1173–82. [aMJP] MacKay, D. G. (1982) The problems of flexibility, fluency, and speed-accuracy trade- Meringer, R. & Meyer, K. (1895) Versprechen und verlesen. Behrs Verlag. [GMO] off in skilled behaviors. Psychological Review 89:483–506. [aMJP] Meteyard, L., Cuadrado, S. R., Bahrami, B. & Vigliocco, G. (2012) Coming of age: MacKay, D. G. (1992) Constraints on theories of inner speech. In: Auditory imagery, A review of embodiment and the neuroscience of semantics. Cortex ed. D. Reisberg, pp. 121–49. Erlbaum. [GMO] 48:788–804. [GD]

60 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 References/Pickering & Garrod: An integrated theory of language production and comprehension

Metzing, C. & Brennan, S. E. (2003) When conceptual pacts are broken: Partner- damage to ventrolateral prefrontal cortex. Cognitive Neuropsychology specific effects in the comprehension of referring expressions. Journal of 26(6):527–67. [LRS] Memory and Language 49:201–13. [aMJP] Novick, J. M., Trueswell, J. C. & Thompson-Schill, S. L. (2005) Cognitive control Meuter, R. & Allport, A. (1999) Bilingual language switching in naming: and parsing: Reexamining the role of Broca’s area in sentence comprehension. Asymmetric costs of language selection. Journal of Memory and Language Cognitive, Affective & Behavioral Neuroscience 5(3):263–81. [LRS] 40:25–40. [JF] Nozari, N. & Dell, G. S. (2009) More on lexical bias: How efficient can a “lexical Meyer, A. S., Sleiderink, A. M. & Levelt, W. J. M. (1998) Viewing and naming editor” be? Journal of Memory and Language 60:291–307. [GSD] objects: Eye movements during noun phrase production. Cognition 66: Nozari, N., Dell, G. S. & Schwartz, M. F. (2011) Is comprehension necessary for B25–33. [HK] error detection? A conflict-based account of monitoring in speech production. Miall, R. C., Stanley, J., Todhunter, S., Levick, C., Lindo, S. & Miall, J. D. (2006) Cognitive Psychology 63(1):1–33. DOI:10.1016/j.cogpsych.2011.05.001. Performing hand actions assists the visual discrimination of similar hand pos- [RJH, GMO, arMJP] tures. Neuropsychologia 44:966–76. [aMJP] O’Doherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K. & Dolan, R. Miall, R. C. & Wolpert, D. M. (1996) Forward models for physiological motor (2004) Dissociable roles of ventral and dorsal striatum in instrumental con- control. Neural Networks 9: 1265–79. [rMJP] ditioning. Science 304:452–54. [MAJ] Milner, A. D. & Goodale, M. A. (1995) The visual brain in action. Oxford University Ondobaka, S. & Bekkering, H. (2012) Hierarchy of idea-guided action and perception- Press. [GH] guided movement. Frontiers in Psychology. doi: 10.3389/fpsyg.2012.00579 [YK] Mirman, D., Magnuson, J., Graf Estes, K. & Dixon, J. A. (2008) The link between Ondobaka, S., de Lange, F. P., Newman-Norlund, R. D., Wiemers, M. & Bekkering, statistical segmentation and word learning in adults. Cognition 108:271–80. H. (2011) Interplay between action and movement intentions during social [MAJ] interaction. Psychological Science 23: 30–35. [YK, rMJP] Mishra, R. K., Singh, N., Pandey, A. & Huettig, F. (2012) Spoken language-mediated Oomen, C. C. E., Postma, A. & Kolk, H. H. J. (2005) Speech monitoring in aphasia: anticipatory eye movements are modulated by reading ability: Evidence from Error detection and repair behaviour in a patient with Broca’s aphasia. In: Indian low and high literates. Journal of Eye Movement Research 5(1):1–10. Phonological encoding and monitoring in normal and pathological speech, ed. [NM] R. Hartsuiker, R. Bastiaanse, A. Postma & F. Wijnen, pp. 209–25. Psychology Misyak, J. B., Christiansen, M. H. & Tomblin, J. B. (2010) Sequential expectations: Press. [RJH] The role of prediction-based learning in language. Topics in Cognitive Science Oppenheim, G. M. (2012) The case for subphonemic attenuation in inner speech: 2:138–53. [MAJ] Comment on Corley, Brocklehurst, and Moat (2011). Journal of Experimental Mitterer, H. & Ernestus, M. (2008) The link between speech perception and Psychology: Learning, Memory, and Cognition 38(3):502–12. DOI:10.1037/ production is phonological and abstract: Evidence from the shadowing task. a0025257. [GMO] Cognition 109:168–73. [AMT] Oppenheim, G. M. & Dell, G. S. (2008) Inner speech slips exhibit lexical bias, but not Moore, R. K. (2007) PRESENCE: A human-inspired architecture for speech-based the phonemic similarity effect. Cognition 106(1):528–37. DOI:10.1016/j.cog- human–machine interaction. IEEE Transactions on Computers nition.2007.02.006. [GMO, aMJP] 56(9):1176–88. [GP] Oppenheim, G. M. & Dell, G. S. (2010) Motor movement matters: The flexible Motley, M. T., Camden, C. T. & Baars, B. J. (1982) Covert formulation and editing of abstractness of inner speech. Memory & cognition 38(8):1147–60. anomalies in speech production: Evidence from experimentally elicited slips of DOI:10.1016/j.cognition.2007.02.006. [F-XA, GMO, aMJP] the tongue. Journal of Verbal Learning and Verbal Behavior 21:578–94. Pacherie, E. (2012) The phenomenology of joint action: Self-agency vs. joint-agency. [aMJP] In: Joint attention: New developments, ed. A. Seemann, pp. 343–89. MIT Mottonen, R. & Watkins, K. E. (2009) Motor representations of articulators con- Press. [YK] tribute to categorical perception of speech sounds. Journal of Neuroscience 29 Pagnoni, G., Zink, C. F., Montague, P. R. & Berns, G. S. (2002) Activity in human (31):9819–25. DOI: 10.1523/JNEUROSCI.6018-08.2009. [MP, aMJP] ventral striatum locked to errors of reward prediction. Nature Neuroscience Moulin-Frier, C., Laurent, R., Bessière, P., Schwartz, J.-L. & Diard, J. (2012) 5:97–98. [MAJ] Adverse conditions improve distinguishability of auditory, motor and percep- Panksepp, J. (2004) Affective neuroscience: The foundations of human and animal tuo-motor theories of speech perception: An exploratory Bayesian modeling emotions. Oxford University Press. [KJA] study. Language and Cognitive Processes 27(7–8):1240–63. [RL] Pardo, J. S. (2006) On phonetic convergence during conversational interaction. Mukamel, R., Ekstrom, A. D., Kaplan, J., Iacoboni, M. & Fried, I. (2010) Single- Journal of the Acoustical Society of America 119:2382–93. [aMJP] neuron responses in humans during execution and observation of actions. Parker Jones, ’O., Green, D. W., Grogan, A., Pliatsikas, C., Filippopolitis, K., Ali, N., Current Biology 20:750–56. [aMJP] Lee, H. L., Ramsden, S., Gazarian, K., Prejawa, S., Seghier, M. L. & Price, C. J. Nappa, R., Wessel, A., McEldoon, K. L., Gleitman, L. R. & Trueswell, J. C. (2009) (2012) Where, when and why brain activation differs for bilinguals and Use of speaker’s gaze and syntax in verb learning. Language Learning and monolinguals during picture naming and reading aloud. Cerebral Cortex Development 5:203–34. [HK] 22:892–902. [JF] Navab, A., Gillespie-Lynch, K., Johnson, S. P., Sigman, M. & Hutman, T. (2011) Paus, T., Perry, D. W., Zatorre, R. J., Worsley, K. J. & Evans, A. C. (1996) Eye-tracking as a measure of responsiveness to joint attention in infants at risk Modulation of cerebral blood flow in the human auditory cortex during speech: for autism. Infancy 17:416–31. [KJA] Role of motor-to-sensory discharges. European Journal of Neuroscience. Neda, Z., Ravasz, Y., Brechet, T. Vicsek, T. & Barabasi, A. L. (2000) The sound of 8:2236–46. [aMJP] many hands clapping. Nature 403:849. [aMJP] Pazzaglia, M., Pizzamiglio, L., Pes, E. & Aglioti, S. M. (2008b) The sound of actions Negri, G. A., Rumiati, R. I., Zadini, A., Ukmar, M., Mahon, B. Z. & Caramazza, A. in apraxia. Current Biology 18(22):1766–72. DOI: 10.1016/j. (2007) What is the role of motor simulation in action and object recognition? cub.2008.09.061. [MP] Evidence from apraxia. Cognitive Neuropsychology 24(8):795–816. DOI: Pazzaglia, M., Smania, N., Corato, E. & Aglioti, S. M. (2008a) Neural underpinnings 10.1080/02643290701707412. [MP] of gesture discrimination in patients with limb apraxia. Journal of Neuroscience Nelissen, N., Pazzaglia, M., Vandenbulcke, M., Sunaert, S., Fannes, K., Dupont, P., 28(12):3030–41. DOI: 10.1523/JNEUROSCI.5748-07.2008. [MP] Aglioti, S. M. & Vandenberghe, R. (2010) Gesture discrimination in primary Pazzaglia, M. (2013) Action discrimination: Impact of apraxia. Journal of Neurology, progressive aphasia: The intersection between gesture and language processing Neurosurgery & Psychiatry 84(5):477–78. doi: 10.1136/jnnp-2012-304817 [MP] pathways. Journal of Neuroscience 30(18):6334–41. DOI: 10.1523/JNEUR- Pelucchi, B., Hay, J. F. & Saffran, J. R. (2009) Learning in reverse: Eight-month-old OSCI.0321-10.2010. [MP] infants track backward transitional probabilities. Cognition 113:244–47. [SMM] Newman-Norlund, R. D., van Schie, H. T., van Zuijlen, A. M. J. & Bekkering, H. Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M., (2007) The mirror neuron system is more active during complementary and Zandipour, M. (2004) The distinctness of speakers’ productions of vowel compared with imitative action. Nature Neuroscience 10:817–18. [aMJP] contrasts is related to their discrimination of the contrasts. The Journal of the Nielsen, K. (2011) Specificity and abstractness of VOT imitation. Journal of Phonetics Acoustical Society of America 116:2338. [TFJ] 39:132–42. [AMT] Peterson, R. R., Burgess, C., Dell, G. S. & Eberhard, K. A. (2001) Dissociation Nieuwland, M. S., Otten, M. & Van Berkum, J. J. A. (2007) Who are you talking between syntactic and semantic processing during idiom comprehension. about? Tracking discourse-level referential processes with ERPs. The Journal of Journal of Experimental Psychology: Learning, Memory, and Cognition Cognitive Neuroscience 19:1–9. [SOY] 27:1223–37. [aMJP] Niv, Y. & Montague, P. R. (2008) Theoretical and empirical studies of learning. Pezzulo, G. (2011a) Grounding procedural and declarative knowledge in sensori- Neuroeconomics: Decision making and the brain, pp. 329–50. Elsevier. [MAJ] motor anticipation. Mind and Language 26:78–114. [arMJP] Nooteboom, S. G. (1969) The tongue slips into patterns. In: Leyden studies in Pezzulo, G. (2011b) The “interaction engine”: A common pragmatic competence linguistics and phonetics, ed. A. G. Sciarone, A. J. van Essen & A. A. van Raad, across linguistic and non-linguistic interactions. IEEE Transactions on Auton- pp. 114–32. Mouton. [GMO] omous Mental Development. 4(2):105–23. [GP] Novick, J. M., Kan, I. P., Trueswell, J. C. & Thompson-Schill, S. L. (2009) A case for Pezzulo, G. (2011c) Shared representations as coordination tools for interactions. conflict across multiple domains: memory and language impairments following Review of Philosophy and Psychology. 2(2):303–33. [GP]

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 61 References/Pickering & Garrod: An integrated theory of language production and comprehension

Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., McRae, K. & Spivey, conditioning II, A. H. Black & F. Prokasy, pp. 64–99. Appleton-Century- M. J. (2011) The mechanics of embodiment: A dialog on embodiment and Crofts. [MAJ] computational modeling. Frontiers in Psychology 2(5):1–21. [GD] Rhode, H., Levy, R. & Kehler, A. (2011) Anticipating explanations in relative clause Pezzulo, G. & Dindo, H. (2011) What should I do next? Using shared processing. Cognition 118:339–58. [LRS] representations to solve interaction problems. Experimental Brain Research Richardson, D. C. & Dale, R. (2005) Looking to understand: The coupling between 211(3):613–630. [GP] speakers’ and listeners’ eye movements and its relationship to discourse com- Pickering, M. J. & Garrod, S. (2004) Toward a mechanistic psychology of dialogue. prehension. Cognitive Science 29:1045–60. [HK] Behavioral and Brain Sciences 27(2):169–226. [CH, arMJP, GP, SOY] Richardson, D. C., Dale, R. & Kirkham, N. Z. (2007) The art of conversation is Pickering, M. J. & Garrod, S. (2007) Do people use language production to make coordination. Psychological Science 18:407–13. [HK, SOY] predictions during comprehension? Trends in Cognitive Sciences 11(3): Richardson, M. J., Marsh, K. L., Isenhower, R. W., Goodman, J. R. L. & Schmidt, 105–10. [CH, aMJP, JPdR] R. C. (2007) Rocking together: Dynamics of intentional and unintentional Pinker, S. (2007) The stuff of thought: Language as a window into human nature. interpersonal coordination. Human Movement Science 26:867–91. Penguin. [GD] [KJA, aMJP] Plaut, D. C. & Kello, C. T. (1999) The emergence of phonology from the interplay of Ricœur, P. (1973) The model of the text: Meaningful action considered as a text. speech comprehension and production: A distributed connectionist approach. New Literary History 5:91–17. [GE] In: The emergence of language, ed. B. MacWhinney, pp. 381–415. Erlbaum. Riès, S., Janssen, N., Dufau, S., Alario, F.-X. & Burle, B. (2011) General-purpose [FC, aMJP] monitoring during speech production. Journal of Cognitive Neuroscience Poizner, H., Klima, E. S. & Bellugi, U. (1987) What the hands reveal about the brain. 23:1419–36. [F-XA] MIT Press. [GD] Rizzolatti, G. & Sinigaglia, C. (2007) Mirror neurons and motor intentionality. Postma, A. (2000) Detection of errors during speech production. A review of speech Functional Neurology 22(4):205–10. [MP] monitoring models. Cognition 77: 97–131. [aMJP] Roche, J., Dale, R., Kreuz, R. J., & Jaeger, T. F. (2013) Learning to avoid syntactic Postma, A. & Noordanus, C. (1996) Production and detection of speech errors in ambiguity. Ms., University of Rochester. [TFJ] silent, mouthed, noise-masked, and normal auditory feedback speech. Roelofs, A. (2004). Error biases in spoken word planning and monitoring by aphasic Language and Speech 39(4):375–92. [GMO] and nonaphasic speakers: Comment on Rapp and Goldrick (2000). Psycho- Price, C. J. (2010) The anatomy of language: A review of 100 fMRI studies published logical Review 111:561–72. [aMJP] in 2009. Annals of the New York Academy of Sciences 1191(1):62–88. [ASD] Rogalsky, C. & Hickok, G. (2011) The role of Broca’s area in sentence compre- Price, C. J. (2012) A review and synthesis of the first 20 years of PET and fMRI hension. Journal of Cognitive Neuroscience 23(7):1664–80. [ASD] studies of heard speech, spoken language and reading. NeuroImage Rommers, J., Meyer, A. S., Praamstra, P., & Huettig, F. (2013). The contents of 62(2):816–47. DOI: 10.1016/j.neuroimage.2012.04.062. [ASD] predictions in sentence comprehension: Activation of the shape of objects Prinz, J. (2012) Waiting for the self. In: Consciousness and the self: New essays, ed. before they are referred to. Neuropsychologia 51(3):437–47. [NM] Liu, J. L. and Perry, J, pp. 213–40, Cambridge University Press. [MIM] Ross, E. D. (1981) The aprosodias: Functional-anatomical organization of the Prinz, W. (2006) What renactment earns us. Cortex 42:515–18. [aMJP] affective components of language in the right hemisphere. Archives of Przyrembel, M., Smallwood, J., Pauen, M. & Singer, T. (2012) Illuminating the dark Neurology 38:561–68. [KJA] matter of social neuroscience: Considering the problem of social interaction Rowland, C., Chang, F., Ambridge, B., Pine, J. M. & Lieven, E. V. (2012) The from philosophical, psychological and neuroscientific perspectives. Frontiers in development of abstract syntax: Evidence from structural priming and the Human Neuroscience 6:190. DOI:10.3389/fnhum.2012.00190. [KJA] lexical boost. Cognition 125(1); 49–63. [FC] Pulvermüller, F. (2005) Brain mechanisms linking language and action. Nature Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986) Learning representations Reviews Neuroscience. 6(7):576–82. Review. [MP] by back-propagating errors. Nature 323(6088):533–36. [FC] Pulvermüller, F. & Fadiga, L. (2010) Active perception: Sensorimotor circuits as a Rumiati, R. I., Zanini, S., Vorano, L. & Shallice, T. (2001) A form of ideational cortical basis for language. Nature Reviews Neuroscience 11(5):351–60. DOI: apraxia as a delective deficit of contention scheduling. Cognitive Neuropsy- 10.1038/nrn2811. [MP, aMJP] chology 18(7):617–42. DOI: 10.1080/02643290126375. [MP] Pulvermüller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O. & Sacks, H., Schegloff, E. A. & Jefferson, G. (1974) A simplest systematics for the Shtyrov, Y. (2006) Motor cortex maps articulatory features of speech sounds. organization of turn-taking for conversation. Language 50:696–735. Proceedings of the National Academy of Sciences 103(20):7865–70. [ASD, [aMJP, HR] aMJP] Saffran, J. R. (2002) Constraints on statistical language learning. Journal of Memory Pulvermüller, F. & Shtyrov, Y. (2006) Language outside the focus of attention: the and Language 47:172–96. [MAJ] mismatch negativity as a tool for studying higher cognitive processes. Progress in Saffran, J. R., Aslin, R. N. & Newport, E. L. (1996) Statistical learning by 8-month- Neurobiology 79:49–71. [KS] old infants. Science 274:1926–28. [MAJ] Purver, M., Cann, R. & Kempson, R. (2006) Grammars as parsers: The dialogue Sahin, N. T., Pinker, S., Cash, S. S., Schomer, D. & Halgren, E. (2009) Sequential challenge. Research in Language and Computation 4:289–326. [CH] processing of lexical, grammatical, and articulatory information within Broca’s Purver, M., Eshghi, A. & Hough, J. (2011) Incremental semantic construction in a area. Science 326:445–49. [aMJP] dialogue system. Proceedings of the 9th International Conference on Compu- Salverda, A. P., Dahan, D. & McQueen, J. (2003) The role of prosodic boundaries tational Semantics (IWCS). January 2011, Oxford, UK, 365–69. [CH] in the resolution of lexical embedding in speech comprehension. Cognition Ramscar, M., Yarlett, D., Dye, M., Denny, K. & Thorpe, K. (2010) The effects of 90:51–89. [AMT] feature-label-order and their implications for symbolic learning. Cognitive Sams, M., Möttönen, R. & Sihvonen, T. (2005) Seeing and hearing others and Science 34(6):909–57. [TFJ] oneself talk. Brain Research: Cognitive Brain Research 23(2–3):429–35. Rapp, B. & Goldrick, M. (2000) Discreteness and interactivity in spoken word pro- [KJA, GH, aMJP] duction. Psychological Review 107:460–99. [aMJP] Sanford, A. J. & Garrod, S. C. (1981) Understanding written language. Wiley. Rapp, B. & Goldrick, M. (2004). Feedback by any other name is still interactivity: A [aMJP] reply to Roelofs (2004). Psychological Review 111:573–78. [aMJP] Sanford, A. J. & Sturt, P. (2002) Depth of processing in language comprehension: Rauschecker, A. M., Pringle, A. & Watkins, K. E. (2008) Changes in neural activity Not noticing the evidence. Trends in Cognitive Sciences 6:382–86. [SMM] associated with learning to articulate novel auditory pseudowords by covert Sartori, L., Becchio, C., Bara, B. G. & Castiello, U. (2009) Does the intention repetition. Human brain mapping 29(11):1231–42. DOI:10.1002/ to communicate affect action kinematics? Consciousness and Cognition hbm.20460. [GMO] 18(3):766–72. DOI: 10.1016/j.concog.2009.06.004. [GP] Rauschecker, J. P. (2011) An expanded role for the dorsal auditory pathway in sen- Sato, M., Buccino, G., Gentilucci, M. & Cattaneo, L. (2010) On the tip of the tongue: sorimotor control and integration. Hearing Research 271(1–2):16–25. Modulation of the primary motor cortex during audiovisual speech perception. DOI:10.1016/j.heares.2010.09.001. [ASD] Speech Communication 52(6):533–41. [ASD] Rauschecker, J. P. & Scott, S. K. (2009) Maps and streams in the auditory cortex: Sato, M., Tremblay, P. & Gracco, V. L. (2009) A mediating role of the premotor Nonhuman primates illuminate human speech processing. Nature Neuroscience cortex in phoneme segmentation. Brain and Language 111(1):1–7. 12(6):718–24. DOI:10.1038/nn.2331. [F-XA, ASD, GH] DOI:10.1016/j.bandl.2009.03.002. [ASD] Rauschecker, J. P. & Tian, B. (2000) Mechanisms and streams for processing of Schegloff, E. A. (1992) Repair after next turn: The last structurally provided defense “what” and “where” in auditory cortex. Proceedings of the National Academy of of intersubjectivity in conversation. American Journal of Sociology 97(5): 1295– Sciences 97(22):11800. [ASD] 345. [CH] Reali, F. & Christiansen, M. H. (2007) Processing of relative clauses is made easier Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T. & by frequency of occurrence. Journal of Memory and Language, 57:1–23. Vogeley, K. (2013) Toward a second-person neuroscience. Behavioral and [SMM] Brain Sciences. [KJA] Rescorla, R. A. & Wagner, A. R. (1972) A theory of Pavlovian conditioning: Vari- Schippers, M. B., Roebroeck, A., Renken, R., Nanetti, L. & Keysers, C. (2010) ations in the effectiveness of reinforcement and nonreinforcement. In: Classical Mapping the information flow from one brain to another during gestural

62 BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 References/Pickering & Garrod: An integrated theory of language production and comprehension

communication. Proceedings of the National Academy of Science USA Staub, A. & Clifton, C., Jr. (2006) Syntactic prediction in language comprehension: 107:9388–93. [KJA] Evidence from either…or. Journal of Experimental Psychology: Learning, Schlenck, K.-J., Huber, W. & Willmes, K. (1987) “Prepairs” and repairs: Different Memory, and Cognition 32:425–36. [aMJP] monitoring functions in aphasic language production. Brain and Language Staudte, M. & Crocker, M. W. (2011) Investigating joint attention mechanisms 30:226–44. [aMJP] through spoken human-robot interaction. Cognition 120:268–91. [HK] Schober, M. F. & Clark, H. H. (1989) Understanding by addressees and overhearers. Stephens, G. J., Silbert, L. J. & Hasson, U. (2010) Speaker-listener neural coupling Cognitive Psychology 21:211–32. [aMJP] underlies successful communication. Proceedings of the National Academy of Schriefers, H., Meyer, A. S. & Levelt, W. J. M. (1990) Exploring the time course of Sciences 107:14425–30. [aMJP] lexical access in language production: Picture-word interference studies. Stern, D. N. (2010) Forms of vitality: Exploring dynamic experience in psychology Journal of Memory and Language 29:86–102. [aMJP] and the arts. Oxford University Press. [KJA] Schwanenflugel, P. J. & Shoben, E. J. (1985) The influence of sentence constraint on Stevens, K. N. & Halle, M. (1967) Remarks on the analysis by synthesis and dis- the scope of facilitation for upcoming words. Journal of Memory and Language tinctive features. In: Models for the perception of speech and visual form, ed. 24:232–52. [NM] W. Walthen-Dunn, pp. 88–102. MIT Press. [GH] Schwartz, J.-L., Basirat, A., Ménard, L. & Sato, M. (2012) The perception-for-action- Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., control theory (PACT): A perceptuo-motor theory of speech perception. Hoymann, G., Rossano, F., De Ruiter, J. P., Yoon, K. E., & Levinson, S. C. Journal of Neurolinguistics 25(5):336–54. [RL] (2009) Universals and cultural variation in turn-taking in conversation. Pro- Scott, S. & Johnsrude, I. S. (2003) The neuroanatomical and functional organisation ceedings of the National Academy of Sciences of the United States of America of speech perception. Trends in Neurosciences 26: 100–107. [aMJP] 106(26):10587–92. [JPdR] Scott, S., McGettigan, C. & Eisner, F. (2009) A little more conversation, a little less Straube, B., Green, A., Bromberger, B. & Kircher, T. (2011) The differentiation of action – candidate roles for the motor cortex in speech perception. Nature iconic and metaphoric gestures: Common and unique integration processes. Reviews Neuroscience 10:295–302. [aMJP] Human Brain Mapping 32(4):520–33. DOI: 10.1002/hbm.21041. [ASD] Sebanz, N., Bekkering, H. & Knoblich, G. (2006a) Joint action: Bodies and minds Strijkers, K. & Costa, A. (2011) Riding the lexical speedway: A critical review on the moving together. Trends in Cognitive Sciences 10(2):70–76. [GP, aMJP] time course of lexical selection in speech production. Frontiers in Psychology Sebanz, N. & Knoblich, G. (2009) Prediction in joint action: What, when, and where. 2:356. [KS] Topics in Cognitive Science 1:353–67. [aMJP] Strijkers, K., Holcomb, P. & Costa, A. (2011) Conscious intention to speak facilitates Sebanz, N., Knoblich, G., Prinz, W. & Wascher, E. (2006b) Twin peaks: An ERP lexical access during overt object naming. Journal of Memory and Language study of action planning and control in coacting individuals. Journal of Cognitive 65:345–62. [KS] Neuroscience 18:859–70. [aMJP] Suttle, L. & Goldberg, A. E. (forthcoming) Learning what not to say: Comparing the Segaert, K., Menenti, L., Weber, K., Petersson, K. M. & Hagoort, P. (2012) Shared role of preemption and entrenchment. Princeton University. [MAJ] syntax in language production and language comprehension – an fMRI study. Swinney, D. (1979) Lexical access during sentence comprehension: (Re) consider- Cerebral Cortex 22:1662–70. [aMJP] ation of context effects. Journal of Verbal Learning and Verbal Behavior Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010) Error correction, sensory 18:645–59. [aMJP] prediction, and adaptation in motor control. Annual Review of Neuroscience Tanenhaus, M. K. (2007) Eye movements and spoken language processing. In: Eye 33:89–108. [GH] movements: A window on mind and brain, ed. R. P. G. van Gompel, M. H. Shapiro, L. (2011) Embodied cognition. Routledge. [GD] Fischer, W. S. Murray & R. L. Hill, pp. 309–26. Elsevier. [LRS] Sheehan, E. A., Namy, L. L. & Mills, D. L. (2007) Developmental changes in neural Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., activity to familiar words and gestures. Brain and Language 101(3):246–59. [SK] Fazio, F., Rizzolatti, G., Cappa, S. F. & Perani, D. (2005) Listening to action- Shintel, H. & Keysar, B. (2009) Less is more: A minimalist account of joint action in related sentences activates fronto-parietal motor circuits. Journal of Cognitive communication. Topics in Cognitive Science 1:260–73. [HK] Neuroscience 17(2):273–81. DOI: 10.1162/0898929053124965. [MP] Shockley, K., Santana, M. V. & Fowler, C. A. (2003) Mutual interpersonal postural Thompson-Schill, S. L., Jonides, J., Marshuetz, C., Smith, E. E., D’Esposito, M., constraints are involved in cooperative conversation. Journal of Experimental Kan, I. P., Knight, R. T. & Swick, D. (2002) Effects of frontal lobe damage on Psychology: Human Perception and Performance 29:326–32. [aMJP] interference effects in working memory. Cognitive, Affective & Behavioral Sidtis, J. J. & Sidtis, D. v. L. (2003) A neurobehavioural approach to dysprosody. Neuroscience 2(2):109–20. [LRS] Seminars in Speech and Language 24:93–105. [KJA] Tian, X. & Poeppel, D. (2010) Mental imagery of speech and movement implicates Skantze, G. & Hjalmarsson, A. (2010) Towards incremental speech generation in the dynamics of internal forward models. Frontiers in Psychology 1:166. dialogue systems. Proceedings of the 11th Annual Meeting of the Special Interest [MIM, aMJP, KS] Group on Discourse and Dialogue,1–8. [CH] Tomasello, M. (2008) Origins of human communication. Bradford Books/MIT Skipper, J. I., Goldin-Meadow, S., Nusbaum, H. C. & Small, S. L. (2007a) Speech- Press. [KJA] associated gestures, Broca’s area, and the human mirror system. Brain and Tomasello, M., Carpenter, M., Call, J., Behne, T. & Moll, H. (2005) Understanding Language 101(3):260–77. [ASD] and sharing intentions: The origins of cultural cognition. Behavioral and Brain Skipper, J. I., Goldin-Meadow, S., Nusbaum, H. C. & Small, S. L. (2009) Gestures Sciences 28:675–91. [HK] orchestrate brain networks for language understanding. Current Biology Toni, I., de Lange, F. P., Noordzij, M. L. & Hagoort, P. (2008) Language beyond 19:1–7. [ASD] action. Journal of Physiology – Paris 102(1–3):71–79. DOI: 10.1016/j.jphy- Skipper, J. I., Nusbaum, H. C. & Small, S. L. (2005) Listening to talking faces: Motor sparis.2008.03.005. [GD, MP] cortical activation during speech perception. NeuroImage 25(1):76–89. [ASD] Tourville, J. A. & Guenther, F. H. (2011) The DIVA model: A neural theory of Skipper, J. I., van Wassenhove, V., Nusbaum, H. C. & Small, S. L. (2007b) Hearing speech acquisition and production. Language and Cognitive Processes lips and seeing voices: How cortical areas supporting speech production 26:952–81. [F-XA, aMJP] mediate audiovisual speech perception. Cerebral Cortex 17:2387–99. [ASD] Tourville, J. A., Reily, K. J. & Guenther, F. K. (2008) Neural mechanisms underlying Slevc, L. R. (2011) Saying what’s on your mind: Working Memory effects on sen- auditory feedback control of speech. NeuroImage 39:1429–43. [JPdR, MIM, tence production. Journal of Experimental Psychology: Learning, Memory, and aMJP, KS] Cognition 37(6):1503–14. [LRS] Towle, V. L., Yoon, H. A., Castelle, M., Edgar, J. C., Biassou, N. M., Frim, D. M., Smith, A. & Zelaznik, H. N. (2004) Development of functional synergies for speech Spire, J. P. & Kohrman, M. H. (2008) ECoG gamma activity during a language motor coordination in childhood and adolescence. Developmental Psychobiol- task: Differentiating expressive and receptive speech areas. Brain ogy 45(1):22–33. DOI: 10.1002/dev.20009. [SK] 131:2013–27. [F-XA] Sommer, M. A. & Wurtz, R. H. (2008) Brain circuits for the internal monitoring of Tremblay, P., Sato, M. & Small, S. L. (2012) TMS-induced modulation of action movements. Annual Review of Neuroscience 31:317–38. [F-XA] sentence priming in the ventral premotor cortex. Neuropsychologia Sonderegger, M. & Yu, A. (2010) A rational account of perceptual compensation for 50(2):319–26. DOI:10.1016/j.neuropsychologia.2011.12.00. [ASD] coarticulation. Paper presented at the Proceedings of the 32nd Annual Meeting Tremblay P. & Small, S. L. (2011) On the context-dependent nature of the contri- of the Cognitive Science Society (CogSci10). [TFJ] bution of the ventral premotor cortex to speech perception, NeuroImage Sorace, A. (2011) Pinning down the concept of “interface” in bilingualism. Linguistic 57(4):1561–71. [ASD] Approaches to Bilingualism 1:1–33. [JF] Tremblay, S., Shiller, D. M. & Ostry, D. J. (2003) Somatosensory basis of speech Stack, D. M. (2007) The salience of touch and physical contact during infancy: production. Nature 423: 866–69. [rMJP] Unravelling some of the mysteries of the somesthetic sense. In: Blackwell Tretriluxana, J., Gordon, J. & Winstein, C. J. (2008) Manual asymmetries in grasp handbook of infant development, ed. G. Bremner & A. Fogel, ch. 13. pre-shaping and transport-grasp coordination. Experimental Brain Research Blackwell. [KJA] 188(2):305–15. DOI: 10.1007/s00221-008-1364-2. [MP] Stanley, J., Gowen, E. & Miall, R. C. (2007) Effects of agency on movement inter- Trevarthen, C. & Aitken, K. J. (2001) Infant intersubjectivity: Research, theory, and ference during observation of a moving dot stimulus. Journal of Experimental clinical applications. Annual Research Review. Journal of Child Psychology and Psychology: Human Perception and Performance 33:915–26. [aMJP] Psychiatry 42:3–48. [KJA]

BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 63 References/Pickering & Garrod: An integrated theory of language production and comprehension

Trude, A. M. & Brown-Schmidt, S. (2012) Talker-specific perceptual adaptation Watkins, K., Strafella, A. P. & Paus, T. (2003) Seeing and hearing speech excites the during on-line speech perception. Language and Cognitive Processes motor system involved in speech production. Neuropsychologia 41:989–94. 27: 979–1001. [rMJP, AMT] [ASD, aMJP] Trueswell, J. C. & Tanenhaus, M. K. (ed.) (2005) Processing world-situated Weber, A., Grice, M. & Crocker, M. W. (2006) The role of prosody in the language: Bridging the language-as-action and language-as-product traditions. interpretation of structural ambiguities: A study of anticipatory eye movements. MIT Press. [SOY] Cognition 99:B63–72. [aMJP] Trueswell, J. C., Tanenhaus, M. K. & Garnsey, S. M. (1994) Semantic influences on Wei, K. & Körding, K. (2009) Relevance of error: What drives motor adaptation? parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Neurophysiology 101(2):655–64. [TFJ] Journal of Memory and Language 33:285–318. [aMJP] Wheeldon, L. R. & Levelt, W. J. M. (1995) Monitoring the time course of phono- Tuomela, R. (2007) The philosophy of sociality. Oxford University Press. [YK] logical encoding. Journal of Memory and Language 34:311–34. [aMJP] Turk-Browne, N. B., Scholl, B. J., Johnson, M. K. & Chun, M. M. (2010) Implicit Wijnen, F. & Kolk, H. H. J. (2005) Phonological encoding, monitoring, and language perceptual anticipation triggered by statistical learning. Journal of Neuroscience pathology: Conclusions and prospects. In: Phonological encoding in normal and 30:11177–87. [MAJ] pathological speech, ed. R.J. Hartsuiker, R. Bastiaanse, A. Postma & F. Wijnen, Tversky, A. & Kahneman, D. (1973) Availability: A heuristic for judging frequency pp. 283–304. Psychology Press. [aMJP] and probability. Cognitive Psychology 5(2):677–95. [NM] Wilkes-Gibbs, D. & Clark, H. H. (1992) Coordinating beliefs in conversation. Umiltà, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C. & Journal of Memory and Language, 31:183–94. [SOY] Rizzolatti, G. (2001) I know what you are doing: A neurophysiological study. Willems, R. M., Özyürek, A. & Hagoort, P. (2007) When language meets action: Neuron, 32:91–101. [aMJP] The neural integration of gesture and speech. Cerebral Cortex 17(10):2322. Ungerleider, L. G. & Haxby, J. V. (1994) “What” and “where” in the human brain. [ASD] Current Opinion in Neurobiology 4(2):157–65. [ASD] Willems, R. M., Özyürek, A. & Hagoort, P. (2009) Differential roles for left inferior Van Berkum, J. J. A., Brown, M. C., Zwitserlood, P., Kooijman, V. & Hagoort, P. frontal and superior temporal cortex in multimodal integration of action and (2005) Anticipating upcoming words in discourse: Evidence from ERPs and language. NeuroImage 47:1992–2004. [ASD] reading times. Journal of Experimental Psychology: Learning, Memory, and Wilson, M. (2002) Six views of embodied cognition. Psychonomic Bulletin and Cognition 31:443–67. [NM, aMJP] Review 9:625–36. [GD] Van Berkum, J. J. A., van den Brink, D., Tesink, C. M. J. Y., Kos, M. & Hagoort, P. Wilson, M. & Knoblich, G. (2005) The case for motor involvement in perceiving (2008) The neural integration of speaker and message. Journal of Cognitive conspecifics. Psychological Bulletin 131:460–73. [aMJP] Neuroscience 20:580–91. [HK] Wilson, M. & Wilson, T. P. (2005) An oscillator model of the timing of turn-taking. Van den Bussche, E., Van den Noortgate, W. & Reynvoet, B. (2009) Mechanisms Psychonomic Bulletin & Review 12:957–68. [aMJP] of masked priming: A meta-analysis. Psychological Bulletin 135:452–77. Wilson, S. M. & Iacoboni, M. (2006) Neural responses to non-native phonemes [aMJP] varying in producibility: Evidence for the sensorimotor nature of speech per- Van Lancker, D. & Breitenstein, C. (2000) Emotional dysprosody and similar ception. Neuroimage 33(1):316–25. [ASD, GH] dysfunctions. In: Behavior and mood disorders in focal brain lesions, ed. Wilson, S. M., Saygin, A. P., Sereno, M. I. & Iacoboni, M. (2004) Listening to speech J. Bogousslavsky & J. L. Cummings, ch. 12. Cambridge University Press. activates motor areas involved in speech production. Nature Neuroscience [KJA] 7(7):701–702. [ASD, aMJP] Van Overwalle, F. & Baetens, K. (2009) Understanding others’ actions and goals by Wohlschläger, A. (2000) Visual motion priming by invisible actions. Vision Research mirror and mentalizing systems: A meta-analysis. NeuroImage 48(3):564–84. 40:925–30. [aMJP] DOI: 10.1016/j.neuroimage.2009.06.009. [MP] Wolpert, D. M. (1997) Computational approaches to motor control. Trends in Van Schie, H. T., van Waterschoot, B. M. & Bekkering, H. (2008) Understanding Cognitive Sciences 1:209–16. [MAJ, MIM, aMJP] action beyond imitation: Reversed compatibility effects of action observation in Wolpert, D. M., Diedrichsen, J. & Flanagan, J. R. (2011) Principles of sensorimotor imitation and joint action. Journal of Experimental Psychology: Human Per- learning. Nature Reviews Neuroscience 12:739–51. [FC] ception and Performance 34:1493–500. [aMJP] Wolpert, D. M., Doya, K. & Kawato, M. (2003) A unifying computational framework van Wassenhove, V., Grant, K. W. & Poeppel, D. (2005) Visual speech speeds up the for motor control and social interaction. Philosophical Transactions of the Royal neural processing of auditory speech. Procedings of the National Academy of Society B: Biological Sciences 358(1431):593–602. DOI:10.1098/ Sciences 102(4):1181–86. [ASD, GH] rstb.2002.1238. [HK, GP, aMJP] Van Wijk, C. & Kempen, G. (1987) A dual system for producing self-repairs in Wolpert, D. M., Ghahramani, Z. & Flanagan, J. R. (2001) Perspectives and problems in spontaneous speech: Evidence from experimentally elicited corrections. motor learning. Trends in Cognitive Sciences 5:487–94. [MAJ, arMJP] Cognitive Psychology 19:403–40. [aMJP] Wolpert, D. M., Ghahramani, Z. & Jordan, M. I. (1995) An internal model for Venezia, J. H., Saberi, K., Chubb, C. & Hickok, G. (2012) Response bias modulates sensorimotor integration. Science 269(5232):1880–82. [GH] the speech motor system during syllable discrimination. Frontiers in Wrangham, R. (2009) Catching fire: How cooking made us human. Basic Books. [KJA] Psychology 3. article 157, doi: 10.3389/fpsyg.2012.00157 [GH] Wright, B. & Garrett, M. F. (1984) Lexical decision in sentences: Effects of syntactic Vesper, C., van der Wel, R. P. R. D., Knoblich, G. & Sebanz, N. (2011) Making structure. Memory & Cognition 12:31–45. [aMJP] oneself predictable: Reduced temporal variability facilitates joint action Xu, J., Gannon, P. J., Emmorey, K., Smith, J. F. & Braun, A. R. (2009) coordination. Experimental Brain Research 211(3–4):517–30. DOI:10.1007/ Symbolic gestures and spoken language are processed by a common neural s00221-011-2706-z. [GP] system. Proceedings of the National Academy of Sciences 106(49):20664–69. Vigliocco, G., Antonini, T. & Garrett, M. F. (1997) Grammatical gender is on the tip [ASD] of Italian tongues. Psychological Science 8:314–17. [aMJP] Xu, S., Zhang, S. & Geng, H. (2011) Gaze-induced joint attention persists under Vigliocco, G. & Hartsuiker, R. J. (2002) The interplay of meaning, sound, and syntax high perceptual load and does not depend on awareness. Vision Research in sentence production. Psychological Bulletin 128:442–72. [aMJP] 51:2048–56. [HK] Vigneau, M., Beaucousin, V., Hervé, P. Y., Duffau, H., Crivello, F., Houdé, O., Yoon, S. O., Koh, S. & Brown-Schmidt, S. (2012) Influence of perspective and goals Mazoyer, B. & Tzourio-Mazoyer, N. (2006) Meta-analyzing left hemisphere on reference production in conversation. Psychonomic Bulletin & Review language areas: Phonology, semantics, and sentence processing. NeuroImage 30 19:699–707. [SOY] (4):1414–32. DOI: 10.1016/j.neuroimage.2005.11.002. [KJA, ASD, aMJP] Yoshida, M., Dickey, M. W. & Sturt, P. (2013) Predictive processing of syntactic Vissers, C. T., Chwilla, D. J. & Kolk, H. H. (2006) Monitoring in language percep- structure: Sluicing and ellipsis in real-time sentence processing. Language and tion: The effect of misspellings of words in highly constrained sentences. Brain Cognitive Processes 28:272–302. [aMJP] Research 1106:150–63. [aMJP] Yuen, I., Davis, M. H., Brysbaert, M. & Rastle, K. (2010) Activation of articulatory Vygotsky, L. S. (1962) Thought and language (E. Hanfmann & G. Vakar, trans.). MIT information in speech perception. Proceedings of the National Academy of Press. [GMO] Sciences 107:592–97. [aMJP] Warker, J. A. & Dell, G. S. (2006) Speech errors reflect newly learned phonotactic Zekveld, A. A., Heslenfeld, D. J., Festen, J. M. & Schoonhoven, R. (2006) Top-down constraints. Journal of Experimental Psychology: Learning, Memory, and Cog- and bottom-up processes in speech comprehension. NeuroImage nition 32(2):387. [TFJ] 32:1826–36. [RL] Watkins, K. & Paus, T. (2004) Modulation of motor excitability during speech percep- Zlatev, J. (2008) From proto-mimesis to language: Evidence from primatology and tion: The role of Broca’sarea.Journal of Cognitive Neuroscience 16:978–87. social neuroscience. Journal of Physiology-Paris 102(1–3):137–51. DOI: [aMJP] 10.1016/j.jphysparis.2008.03.016. [MP]