<<

Language, Cognition and Neuroscience

ISSN: 2327-3798 (Print) 2327-3801 (Online) Journal homepage: https://www.tandfonline.com/loi/plcp21

Phonetic encoding in production: a review of open issues from 1989 to 2018

Marina Laganaro

To cite this article: Marina Laganaro (2019) Phonetic encoding in utterance production: a review of open issues from 1989 to 2018, Language, Cognition and Neuroscience, 34:9, 1193-1201, DOI: 10.1080/23273798.2019.1599128 To link to this article: https://doi.org/10.1080/23273798.2019.1599128

Published online: 01 Apr 2019.

Submit your article to this journal

Article views: 349

View related articles

View Crossmark data

Citing articles: 5 View citing articles

Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=plcp21 LANGUAGE, COGNITION AND NEUROSCIENCE 2019, VOL. 34, NO. 9, 1193–1201 https://doi.org/10.1080/23273798.2019.1599128

REVIEW ARTICLE Phonetic encoding in utterance production: a review of open issues from 1989 to 2018 Marina Laganaro Faculty of Psychology and Educational Science, University of Geneva

ABSTRACT ARTICLE HISTORY Phonetic encoding refers to the mapping of an abstract linguistic code of the utterance into motor Received 15 October 2018 programmes which guide speech articulation. The encoding of speech gestures involves complex Accepted 14 March 2019 cognitive-motor planning, which has received limited attention in the psycholinguistic literature on KEYWORDS language production relative to linguistic encoding processes. Here we will review some issues on Phonetic encoding; phonetic encoding and the related empirical results by integrating evidence from psycholinguistic, phonological; syllable; motor phonetic, neuropsychological and neuroimaging studies from the last 30 years. In particular, we will speech; time-course focus on (i) the distinction between phonological and phonetic encoding, (ii) the nature and size of phonetic representations and (iii) the dynamics of phonetic encoding in the time-course of utterance planning. We will end up showing that the transformation of a linguistic code into motor programmes likely involves a larger proportion of the overall utterance planning time than acknowledged in current models and with many open questions for future research.

Introduction planning of an abstract linguistic code. In the 1999 model (Levelt, Roelofs, & Meyer, 1999, hereafter In order to produce an utterance, a speaker has to select “LRM1999”), two distinct systems are clearly identified: the appropriate words, morphemes and syntactic struc- a linguistic encoding system, involving several core pro- ture and to plan and execute the combination of articu- cessing stages (conceptual preparation, lexical selection, latory gestures giving rise to an auditory signal, which morphological and phonological encoding) and a will be perceived and interpreted by the listener. Plan- second system involving a processor named phonetic ning at the lexical and morpho-syntactic levels corre- encoding, which “encodes the selected word in its sponds to the linguistic preparation of the message, context as a motor program” (LRM1999). The cognitive- while programming the articulatory gestures is a motor motor processor transforming an abstract linguistic task. The transformation of an abstract linguistic code code into articulatory movements represents however into motor programmes hence refers to the cognitive- a complex task involving the planning of (invariant) motor interface processor in the utterance planning, speech gestures, but also their adaptation to the actual which is called phonetic encoding1 or encoding of “pho- speech context and the control of the motor commands. netic plans” (Levelt, 1989) in the psycholinguistic litera- In the following we will review some key issues on ture, but has been referred to in multiple other ways in motor speech planning as they have been addressed in the motor speech literature (e.g. “speech motor plan- the psycholinguistic literature since Levelt’s 1989 book. ning”, Van der Merwe, 2008; “higher order speech We will first address the arguments in favour or against motor control”, Brendel et al., 2010; “planning frame the distinction made in some models between the encod- with speech sound maps”, Guenther, 2002, Guenther, ing of an abstract (phonological) utterance form and the Hampson, & Johnson, 2006). Hereafter we will use “pho- processing of phonetic plans. We will then turn to the netic encoding” to stick to the psycholinguistic literature, issue of the representation and size of phonetic plans or more broadly “motor speech planning”. and finally to the dynamics of phonetic planning in The psycholinguistic literature and models of relationship to the whole utterance planning process. language production describe multiple processes under- pinning utterance planning at different linguistic levels, The phonological-phonetic distinction while phonetic encoding usually refers to a single pro- cessing stage, giving an overall impression that motor In Levelt’s(1989) model of language production as well speech planning involves fewer computations than the as in further models (LRM1999), two distinct processes

CONTACT Marina Laganaro [email protected] © 2019 Informa UK Limited, trading as Taylor & Francis Group 1194 M. LAGANARO

Figure 1. Illustration of a possible phonetic origin (A) and of the phonological origin (B) of phonetic errors (here “seal”-(si:l/- produced [si:l]).̬

h h underlie the preparation of the utterance form: planning pronounced [p ejl][stIn], not to [pejl][st In], Kenstowicz an abstract linguistic (phonological) form and encoding a & Kisseberth, 1979). Both observations, the involvement motor (phonetic) plan. Such a phonological-phonetic dis- of (well-formed) phonemes in speech errors and their tinction is not acknowledged in all models. Hickok (2014) contextual phonetic adaptation, favour the idea that seg- for instance proposes that language production does not mental exchanges occur between abstract, phonetically involve the encoding of abstract (phonological) utter- underspecified, segmental units. ance forms: in his proposal, the lexical-semantic system Experimental investigation of word form planning (the lemmas) is directly connected to motor pro- also brought evidence in favour of abstract phonological grammes. Other authors have also claimed that words units. For instance, Roelofs (1999) demonstrated using a are stored in memory along with their phonetic rep- form-preparation paradigm that phoneme overlap resentation rather than with abstract phonological between words speeds up the production of the target codes (Browman & Goldstein, 1989; Pierrehumbert, word, but that this is not the case for feature overlap, 2002). hence showing that phonemes seem to represent plan- The main argument in favour of abstract linguistic ning units. units involved in the planning of the utterance form Finally, a further argument favouring the phonologi- stems from the observation of speech errors, as they cal-phonetic distinction can be found in research with usually involve infra-lexical substitutions and exchanges left-hemisphere damaged speakers, where dissociations (phonological slips of the tongue, Fromkin, 1973). have been reported between patients producing Phoneme errors and in particular metatheses, i.e. errors mainly phonological errors, i.e. phoneme substitutions in which two phonemes exchange their position within and exchanges (in particular in conduction and Wernicke a word (“perfumes” produced “ferpumes”, Wilshire, 2002) aphasia, Blumstein, 1990; Kohn & Smith, 1990; Laganaro or between words (mait a winute), have been interpreted & Zimmermann, 2010) and patients with apraxia of as the miss-ordering of correctly encoded segments speech producing mainly phonetic errors (ill-formed, dis- (Dell, 1986; Shattuck-Hufnagel, 1979, 1992) and therefore torted phonemes, Blumstein, Cooper, Goodglass, Statle- argue in favour of a decomposed representation of nder, & Gottlieb, 1980; Darley, Aronson, & Brown, 1975). words relying on segmental units. Some further obser- In particular, a double dissociation between phonologi- vations related to speech errors contribute to suggesting cal and phonetic errors has been reported by Buchwald that the involved infra-lexical units are bound to be and Miozzo (2011), based on the acoustic analyses of abstract. First, it has long been argued that whole speech errors produced by two left-hemisphere (abstract, phonological) segments are involved in phono- damaged speakers. The authors analyzed the stop con- logical slips of the tongue produced by (heathy, unim- sonant following a word-initial /s/ deletion (e.g. [p] in paired) adult speakers, while sub-phonemic errors (ill- [pot] from /(s)pot/) in comparison to a word onset stop formed phonemes) or phonotactic violations are ([p] from /pot/) in two left-hemisphere damaged claimed to be extremely rare, at least in listener’s percep- English-speaking patients. In one patient the stop conso- tion (Meyer, 1992, see Aldarete and Topper, 2018 for nants following a deleted onset /s/ had the same acous- slightly larger proportions of phonotactic violations). tic properties as word onset aspirated stops, whereas in Second, the phonetic specifications of the segments the second patient stop-onsets due to deleted /s/ involved in metatheses adapt to their new phonological onsets were produced without the aspiration. Hence, context, as for instance the aspiration of initial plosives the acoustic analyses highlighted two different patterns when exchanged with non-initial plosives (e.g. the for the same perceived kind of errors, which have been h metathesis of /t/ and /p/ in tail spin – [t ejl][spIn] – is interpreted as evidence in favour of a phonological LANGUAGE, COGNITION AND NEUROSCIENCE 1195 origin of /s/ omission in the first patient and a phonetic speakers, raises the issue of the way phonological and origin in the second patient. phonetic encoding may interact. There are in particular Nevertheless, when acoustic or articulatory analyses two possible origins of phonetic errors (as for “seal”–[si: were applied to speech errors in slips of the tongue l] – produced [si:l]).̬ The first source of error may be pho- that have usually been considered as segmental errors, netic itself (Figure 1A): a (correctly activated) phonologi- adifferent picture also emerges, which questions one cal form (e.g. /si:l/) spreads activation to the related of the previous arguments in favour of phonological- phonetic plans ([si:l]; [zi:l], [si.m], as suggested in phonetic distinction. Frisch and Wright (2000) analyzed LRM1999; see also Roelofs, 1997), which may lead to a acoustically the errors produced on /s/ and /z/ in a phonetic error, if it happens that two motor plans get tongue twister paradigm (e.g. “sit zap zoo sip”) and activated and selected simultaneously. On the other showed that (categorical) phoneme substitutions hand, the origin of a phonetic error may be phonologi- coexist in the same speakers along with graded (pho- cal: due to cascading activation from two activated netic) errors, in which acoustic traces of the intended lexical-phonological forms (e.g. “seal” and “zeal”)to phonological target are visible. The co-existence of cat- the corresponding phonetic plans, as illustrated in egorical (phonemic) and gradient (phonetic) errors has Figure 1B. been reported in several studies analyzing errors elicited with tongue twister-like or SLIP (spoonerism of labora- The syllable as a core phonetic plan tory induced predisposition) tasks with acoustical or articulatory measures (Goldrick & Blumstein, 2006; Gold- In the framework of models arguing that phonetic plans rick, Keshet, Gustafson, Heller, & Needle, 2016; Goldstein, are processed based on abstract phonological codes, Pouplier, Chen, Saltzman, & Byrd, 2007; McMillan & there are at least two options for the phonological-to- Corley, 2010; Pouplier & Hardcastle, 2005). A similar phonetic coding. For instance, phonetic plans may be observation has also been reported by Kurowski and computed from the sequence of (phonetically underspe- Blumstein (2016) in the analysis of /s/-/z/ substitution cified) sub-lexical units or stored phonetic plans may be errors produced by brain-damaged speakers, where retrieved from memory. A related question concerns the acoustic traces of the original target (either /s/ or /z/) size of the phonetic plans, i.e. whether motor pro- were found in all speakers with aphasia. grammes are built/retrieved for each segment or if The observation of such a continuum in speech errors they are based on larger units such as sub-syllabic com- may be taken in favour of models ponents (onsets, rhymes, Ziegler, 2009), syllables, words, that exclude the retrieval/encoding of an abstract pho- or even . nological code. It can however also be interpreted in Given the frequency of the use of articulated speech the framework of models holding that abstract phonolo- (a speaker produces about 16,000 words per day, Mehl, gical utterance forms are encoded before speech ges- Vazire, Ramirez-Esparza, Slatcher, & Pennebaker, 2007), tures are planned by assuming cascading processes it is very likely that speech motor plans are not built from phonological to phonetic representations (see on-line each time the speaker plans an utterance, but Baese-Berk & Goldrick, 2009; Goldrick & Blumstein, that stored gestural scores are retrieved from memory. 2006; Goldrick & Chu, 2014). In such an interpretation, Also, based on the observation that the syllable is the the (abstract) target segment (e.g. /s/) and a competing main domain of co-articulation (Browman & Goldstein, erroneous segment (e.g. /z/) are both activated in some 1988), i.e. the main size-pattern of articulatory organis- circumstances and spread activation to their correspond- ation (Krakow, 1999 for a review), it has been suggested ing phonetic plans resulting in intermediate speech ges- that speech gestural scores are likely syllable-sized tures (e.g. partially voiced /s/: [s]).̬ Finally, based on the (Crompton, 1981). Stored syllable-sized phonetic plans observation that graded errors are found along with have thus been integrated already in the Levelt’s more categorical errors and that their distribution may (1989) model, where the “syllabary” refers to chunk rep- vary according to the linguistic material has been taken resentations for each syllable of the language, specifying by Goldrick et al. (2016) to suggest that gradient errors the phonetic plans. Syllabification of the planned utter- may be due either to cascading from phonological to ance form is applied in Levelt’s model to the abstract phonetic representation or more specifically to speech phonological form, which output activates the corre- programming. sponding phonetic plans. The main argument for a sylla- There are hence a series of arguments favouring the bification rule applied on the phonological code comes phonological-phonetic distinction, but the evidence from re-syllabification at word boundaries in utterance reviewed above that graded errors co-exist with seg- production (see for example the re-syllabification mental errors, both in healthy and in brain-damaged between the French words “cher” and “ami” in the 1196 M. LAGANARO utterance “cher ami”–dear friend-, where syllable of a syllable frequency effect in such population points boundaries in “cher ami” /ʃɛ.Ra.mi/ do not correspond to phonetic plans for syllables. to the syllables in the isolated words /ʃɛR/ and /a.mi/). Further evidence in favour of syllable-sized gestural Empirical evidence in favour of stored syllable-sized scores comes from a study with bilingual speakers gestural scores has been sought some years after the (Alario, Goslin, Michel, & Laganaro, 2010) in which sylla- 1989 book, based on the syllable frequency effect. The ble frequency was manipulated simultaneously in two underlying idea is that if syllable-sized phonetic plans languages (French and Spanish) using pseudo-words are stored in memory, accessing such representations composed of phonological syllables which were should be dependent on their frequency of use/practice. common to both languages (e.g. /ku/, pal/) but with A facilitatory effect of high frequency syllables over low different frequency of use across languages. The partici- frequency syllables has first been reported with word- pants were French-Spanish early and late bilinguals who symbol (Levelt & Wheeldon, 1994) or word-position were tested in each language. The frequency of syllables association tasks (Cholin, Dell, & Levelt, 2011; Cholin, in the non-spoken language affected performance in the Levelt, & Schiller, 2006), in which the participants had spoken language in late bilinguals but not in early bilin- to produce pseudo-words or words composed of high guals. Besides confirming the syllable frequency effect, frequency or low-frequency syllables. The results sup- this pattern of results also supports the intuitive idea ported the syllable frequency effect, with items com- (based on the observation of the foreign accent when posed of high frequency syllables being initiated faster speaking their second language) that late bilinguals than low-frequency syllables, and therefore gave use first language (syllable-sized) gestural scores when support to the idea of stored syllables. They did speaking both languages, whereas early bilinguals have however not provide direct evidence in favour of a pho- separate phonetic plans for the two languages. netic locus of stored syllables, as the results may as well be compatible with phonological syllables. In particular, Different phonetic encoding processes for the fact that lexical frequency effects have also been frequent and less frequent speech units? reported using the exact same paradigms, raises the question on how to experimentally target phonetic Although the idea of stored syllable-sized gestural scores encoding, i.e. how to separate effects due to motor is now largely acknowledged and is integrated in other speech planning from those due to linguistic encoding psycholinguistic and neurocomputational models processes. More direct empirical evidence in favour of (Bohland, Bullock, & Guenther, 2010; Hickok, 2012), a phonetic locus of the syllable frequency effect has whether phonetic plans are stored for all syllables of a been reported by Laganaro and Alario (2006) using a language is still an open question. The response to this delayed production paradigm. In this study, syllable fre- question is bound to depend on the amount of practice. quency affected the production latencies in an immedi- In other motor domains it has been shown that several ate production task and in the delayed production thousand hours of practice are necessary to reach when the delay was filled by an articulatory suppression “expert” movements (Ericcson, Krampe, & Tesche- task, but not in a standard delayed production task (but Romer, 1993). Applying a similar rationale to speech, it see Croot et al., 2017 for different results in English). As has been suggested that only the most used syllables the articulatory suppression task is thought to interfere in a language are stored in memory and recalled when with phonetic encoding processing while leaving phono- required, while the less frequent syllables have to be logical encoding relatively intact (see detailed rationale computed on-line (Whiteside & Varley, 1998). However, in Laganaro & Alario, 2006), these results point to a pho- it happens that speech is one of the most practiced netic locus of the effect. Converging evidence in favour motor behaviours and even less frequent syllables are of a syllable frequency effect at the level of phonetic highly practiced: at a rate of 16000 words per day planning comes from studies on errors produced by (Mehl et al., 2007), speakers articulate over 8 million syl- brain-damaged speakers with apraxia of speech. It has lables per year, meaning that the about 500 most fre- been shown that patients suffering from apraxia of quent syllables (representing for instance 85% of the speech produce higher rates of speech errors on words syllables in Dutch, Schiller, Meyer, & Baayen, 1996) are or pseudo-words composed of low (vs. high) frequency articulated several thousand times, but also the less fre- syllables (Aichert & Ziegler, 2004; Laganaro, 2005; Laga- quent syllables reach several hundred productions per naro, Croisier, Bagou, & Assal, 2012; Staiger & Ziegler, year. To the extent to which lower frequency syllables 2008). Given that the underlying impairment in apraxia are also produced thousands of times over several of speech is ascribed to phonetic encoding (Code, years, and that practice leads to storage of motor pro- 1998; Darley et al., 1975; Ziegler, 2008), the observation grammes, it is likely that phonetic plans are stored for LANGUAGE, COGNITION AND NEUROSCIENCE 1197 syllables of lower frequency as well. The syllable fre- activation of lexical and phonological information and quency effect may thus reflect three different phenom- criticising the serial spatio-temporal decomposition), ena. First, it may reflect the fact that low frequency where the largest proportion of the time taken to articu- syllables may be more difficult/slower to retrieve than late a word in referential production paradigms is attrib- more frequent ones. Second, it may be due to the retrie- uted to linguistic encoding (see Figure 1A), with phonetic val of high frequency syllables and on-line computation encoding being engaged in the last 150 ms preceding of low frequency syllables. Finally, the frequency effect articulation. Nevertheless, whereas direct empirical evi- may as well be the result of easier computation/assem- dence supports the time-estimates for the main linguistic bling of speech gestures for frequent syllables. Only in encoding processes (lexical selection, phonological code the second case (retrieval versus computation), the retrieval), the timing of phonetic encoding devised by underlying processes differ for frequent and infrequent Indefrey (2011) was based essentially on a MEG study syllables. This issue has been addressed in a study by showing neural activations in Broca’s area around 400– Bürki, Pellet, and Laganaro (2015) in which electrical 500 ms after stimulus presentation in picture naming event-related potentials (ERPs) were analyzed preceding tasks (Salmelin, Hari, Lounasmaa, & Sams, 1994). the acoustic onset of the production of disyllabic Given the rationale developed above about speech pseudo-words composed of high frequency, low fre- errors, the production of phonemic-phonetic errors can quency and non-existent (novel) initial syllable. Behav- represent a good index for the time-course of phonolo- ioural results revealed slower production latencies for gical-phonetic encoding. A study by Möller, Jansma, novel than for high frequency syllables in the immediate Rodriguez-Fornells, and Münte (2007) analysed ERPs in production but not in a standard delayed (not filled by speech errors elicited with a SLIP (Spoonerisms of Lab- articulatory suppression) production task. Different ERP oratory Induced Predisposition) task. They reported patterns were observed between high frequency and diverging ERPs from 350 ms after the onset of the low frequency/novel syllables around 170 ms before written word pairs eliciting speech errors relative to the onset of articulation in the immediate production. error-free trials and interpreted these results as an indi- These different patterns involved different distribution cation of the time-window of conflict during phonetic of the electric fields at scalp (different topographies) encoding. In a referential production task eliciting for high versus low frequency and novel syllables, indi- errors with pictures associated with tongue-twister utter- cating the recruitment of different brain networks. The ances, Monaco, Pellet, and Laganaro (2017) also reported observation of different brain processes is compatible ERP divergences between the tongue-twister and the with the hypothesis that speakers retrieve stored sylla- control utterances in about the same time-window ble-sized motor programmes for frequent syllables and (after 370 ms following picture onset). Also, ERP diver- assemble motor plans on-line for very low frequency gences between left-hemisphere damaged patients pro- and novel syllables. Whether only speech plans corre- ducing phonemic and phonetic errors and controls in a sponding to articulatory gestures that have not been picture naming task (Laganaro, Python, & Toepel, 2013) practiced by the speaker (novel and very low frequency have been reported in a similar time-window (after syllables) are assembled or if also less practiced (low fre- 380–400 ms following picture presentation). Hence, the quency) syllables are computed on-line remains an open results from these three studies converged on a time question that requires further investigation and that may window associated with phonological-phonetic encod- vary depending on the number of syllables in a given ing after 350–370 ms following stimuli presentation. At language and on the inter-individual variability in the first sight, these results also seem to fit with the time- amount of speech practice. window associated to syllabification and phonetic encoding in the estimates by Indefrey and Levelt (2004), i.e. after 350 ms and covering the last 150 ms pre- Time-course of phonetic encoding ceding articulation (see Figure 2A). However, in the two As stated in the introduction, in psycholinguistic models studies mentioned above and using referential tasks, phonetic encoding represents the very last process the production latencies were much longer (740 ms in leading to articulation of an utterance and is preceded Monaco et al., 2017; 820 ms in Laganaro et al., 2013), by several linguistic processing stages. This is also meaning that far more than 150 ms seem to separate reflected in the estimated time-course of word planning the moment phonetic encoding is engaged from the by Indefrey and Levelt (2004, see also Indefrey, 2011, but onset of articulation. A similar conclusion on early invol- see Miozzo, Pulvermuller, & Hauk, 2015; Munding, vement of motor speech planning was reached in a mag- Dubarry, & Alario, 2016; Strijkers, Costa, & Pulvermüller, netoencephalography (MEG) study comparing overt and 2017 for different accounts claiming early parallel silent production in a picture naming using (Liljeström, 1198 M. LAGANARO

Figure 2. (A) Schematic representation of encoding processes involved from the concept to the articulation of an utterance adapted from Levelt et al. (1999) with the estimated time course from Indefrey (2011). (B) Adaptations to the model in red: longer processing time for motor speech planning (see text for details and rationale).

Kujala, Stevenson, & Salmelin, 2015) (before 300 ms). It phonetic planning may be related to differences in the seems therefore that only in studies using written span of phonetic plans, i.e. whether articulation starts stimuli to elicit production (Möller et al., 2007, see also once the motor programme for the first syllable is the results by Bürki et al., 2015 mentioned in the previous ready (Cholin et al., 2006) or if two (Cholin et al., 2011) section), the time window associated to phonetic encod- or more syllables are encoded before articulation is ing encompasses the last 170–150 ms preceding the initiated, thus lengthening the process. A second vocal onset, whereas in studies eliciting production reason is related to the fact that motor speech planning with referential tasks (picture naming), the time-period does not involve only retrieving or assembling consecu- associated with phonetic encoding seems to be tive syllable-sized phonetic plans, and it may be larger stretched (see also Valente, Pinet, Alario, & Laganaro, than the syllable (Ziegler, 2009). Finally, it is assumed 2016 for longer word form encoding in picture naming that phonetic plans encode the abstract, invariant relative to reading). Although self-monitoring processes sequence of gestures for the different articulators are likely engaged in parallel with motor speech plan- involved. However, at this point there are still many poss- ning and might be more demanding especially in ible degrees of freedom for the exact trajectories of each tongue-twister paradigms, the converging evidence articulator according to the context due to the many-to- from different studies rather suggests that motor one nature of the relationship between articulation and speech planning often lasts longer than 150 ms (see the acoustic result. Further encoding processes are there- Figure 2B). fore necessary to feature the gestures into precise neuro- Several non-exclusive reasons possibly underlie muscular commands as a function of the articulation longer time-periods for phonetic encoding as reported context (position of articulators and preceding and fol- in several studies. A first interpretation of extended lowing gestures), of speech rate, and loudness. LANGUAGE, COGNITION AND NEUROSCIENCE 1199

Although the contextualisation and adaptation of speech Acknowledgments gestures are usually not detailed in psycholinguistic I acknowledge support of the Swiss National Science Foun- models, more detailed accounts from the motor dation grant no CRSII5_173711. speech literature posit several processes underlying motor speech planning (see for instance Guenther, Hampson, & Johnson, 1998, Guenther, Ghosh, & Tourville, Disclosure statement 2006; Kröger, Kannampuzha, & Neuschaefer-Rube, 2009; No potential conflict of interest was reported by the author. Van der Merwe, 2008 for models proposing at least two processing stages between the linguistic phonological Funding code and motor execution) and the corresponding time-course is likely more extended than described in This work was supported by Swiss National Science Foundation current (psycholinguistic) models. [SNSF grant number CRSII5_173711].

Conclusion References A series of mental processes underpinning planning Aichert, I., & Ziegler, W. (2004). Syllable frequency and syllable structure in apraxia of speech. Brain and Language, 88, from concept to articulation have been described and 148–159. investigated in the framework of psycholinguistic Alario, F. X., Goslin, J., Michel, V., & Laganaro, M. (2010). The models of language production, with phonetic encoding functional origin of foreign accent: Evidence from the sylla- referring to the very last processing stage interfacing the ble frequency effect in bilingual speakers. Psychological abstract linguistic (phonological) code and motor pro- Science, 21,15–20. Alderete, J, & Tupper, P. (2018). Phonological regularity, percep- grammes. Whereas the distinction between the phonolo- tual biases, and the role of phonotactics in gical and the phonetic nature of utterance form analysis. Wiley Interdiscip Rev Cogn Sci, 9(e1466). doi: 10. representation is still debated despite 30 years of inves- 1002/wcs.1466 tigation in the field, cascading activation between these Baese-Berk, M., & Goldrick, M. (2009). Mechanisms of interaction encoding levels seems to be at play in the generation of in speech production. Language and Cognitive Processes, 24, – phonetic errors. Independently of the phonological vs. 527 554. Blumstein, S. (1990). Phonological deficits in aphasia: phonetic debate, models adhere to stored syllable- Theoretical perspectives. In A. Caramazza (Ed.), Cognitive neu- sized gestural scores guiding articulation, at least for ropsychology and (pp. 33–53). Hillsdalle: the most frequent syllables in a language. It is still Lawrence Erlbaum. unclear however whether “phonetic plans” for less fre- Blumstein, S. E., Cooper, W. E., Goodglass, H., Statlender, S., & quent syllables can only be encoded via the on-line com- Gottlieb, J. (1980). Production deficits in aphasia: A voice- – putation of motor plans from sub-syllabic units and if the onset time analysis. Brain and Language, 9, 153 170. Bohland, J. W., Bullock, D., & Guenther, F. H. (2010). Neural rep- syllable is the largest size unit for stored gestural scores resentations and mechanisms for the performance of simple or if larger units are also represented (for instance for speech sequences. Journal of Cognitive Neuroscience, 22(7), very frequently pronounced utterances). In other 1504–1529. words, what still needs to be investigated is if any Brendel, B., Hertrich, I., Erb, M., Lindner, A., Riecker, A., Grodd, W., highly practiced utterance leads to a mental trace of & Ackermann, H. (2010). The contribution of mesiofrontal cortex to the preparation and execution of repetitive syllable the corresponding speech plans. Finally, while reviewing productions: An fMRI study. Neuroimage, 50, 1219–1230. the estimated time-course of phonetic encoding relative Browman, C. P., & Goldstein, L. (1988). Some notes on syllable to other utterance encoding processes, we ended up structure in articulatory . Phonetica, 45, 140–155. suggesting that processing phonetic plans and adapting Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as them to the speech contexts may require a larger pro- phonological units. Phonology, 6, 201–251. portion of time and of processes than postulated in Buchwald, A., & Miozzo, M. (2011). Finding levels of abstraction in speech production: Evidence from sound-production current models. impairment. Psychological Science, 22(9), 1113–1119. Bürki, A., Pellet, P., & Laganaro, M. (2015). Do speakers have Note access to a mental syllabary? ERP comparison of high fre- quency and novel syllable production. Brain and Language, 1. Notice that in Levelt (1989) the generation of phonetic 150,90–102. plans is embedded in the “phonological encoding” pro- Cholin, J., Dell, G. S., & Levelt, W. J. M. (2011). Planning and cessor, but in later models (Levelt, Roelofs and Meyer, articulation in incremental word production: Syllable-fre- 1999), an explicit distinction is made between phonolo- quency effects in English. Journal of Experimental gical encoding and phonetic encoding. Psychology: Learning, Memory, and Cognition, 37, 109–122. 1200 M. LAGANARO

Cholin, J., Levelt, W. J. M., & Schiller, N. O. (2006). Effects of syl- Kohn, S. E., & Smith, K. L. (1990). Between-word speech errors in lable frequency in speech production. Cognition, 99, 205– conduction aphasia. Cognitive Neuropsychology, 7, 133–156. 235. Krakow, R. A. (1999). Physiological organization of syllables: A Code, C. (1998). Major review: Models, theories and heuristics in review. Journal of , 27,23–54. apraxia of speech. Clinical and Phonetics, 12,47– Kröger, B. J., Kannampuzha, J., & Neuschaefer-Rube, C. (2009). 65. Towards a neurocomputational model of speech production Crompton, A. (1981). Syllables and segments in speech pro- and perception. Speech Communication, 51, 793–809. duction. Linguistics, 19, 663–716. Kurowski, K., & Blumstein, S. E. (2016). Phonetic basis of phone- Croot, K., Lalas, G., Biedermann, B., Rastle, K., Jones, K., & Cholin, mic paraphasias in aphasia: Evidence for cascading acti- J. (2017). Syllable frequency effects in immediate but not vation. Cortex, 75, 193–203. delayed syllable naming in English. Language, Cognition Laganaro, M. (2005). Syllable frequency effect in speech pro- and Neuroscience, 32(9), 1119–1132. duction: Evidence from aphasia. Journal of Neurolinguistics, Darley, F., Aronson, A., & Brown, J. (1975). Motor speech dis- 18, 221–235. orders. Philadelphia: W.B. Saunders. Laganaro, M., & Alario, F. X. (2006). On the locus of syllable fre- Dell, G. S. (1986). A spreading-activation theory of retrieval in quency effect. Journal of Memory and Language, 55, 178–196. sentence production. Psychological Review, 93, 283–321. Laganaro, M., Croisier, M., Bagou, O., & Assal, F. (2012). Ericcson, K. A., Krampe, R. T., & Tesche-Romer, C. (1993). The role Progressive apraxia of speech as a window into the study of deliberate practice in the acquisition of expert perform- of speech planning processes. Cortex, 48, 963–971. ance. Psychological Review, 100, 363–406. Laganaro, M., Python, G., & Toepel, U. (2013). Dynamics of pho- Frisch, S. A., & Write, R. (2000). The phonetics of phonological nological-phonetic encoding in word production: Evidence speech errors: An acoustic analysis of slips of the tongue. from diverging ERPs between stroke patients and controls. Journal of Phonetics, 30, 139–162. Brain and Language, 126, 123–132. Fromkin, V. (1973). Speech errors as linguistic evidence. The Laganaro, M., & Zimmermann, C. (2010). Origin of phoneme Hague: de Gruyter Mouton. substitution and phoneme movement errors in aphasia. Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from Language and Cognitive Processes, 25,1–37. phonological planning to articulatory processes: Evidence Levelt, W. (1989). Speaking: From intention to articulation. from tongue twisters. Language and Cognitive Processes, 21, Cambridge, MA: MIT Press. 649–683. Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of Goldrick, M., & Chu, K. (2014). Gradient co-activation and speech lexical access in speech production. Behavioral and Brain error articulation: Comment on Pouplier and Goldstein 2010. Sciences, 22,1–75. Language, Cognition and Neuroscience, 29, 452–458. Levelt, W. J., & Wheeldon, L. (1994). Do speakers have access to Goldrick, M., Keshet, J., Gustafson, E., Heller, J., & Needle, J. a mental syllabary? Cognition, 50, 239–269. (2016). Automatic analysis of slips of the tongue: Insights Liljeström, M., Kujala, J., Stevenson, C., & Salmelin, R. (2015). into the cognitive architecture of speech production. Dynamic reconfiguration of the language network preceding Cognition, 149,31–39. onset of speech in picture naming. Human Brain Mapping, 36 Goldstein, L., Pouplier, M., Chen, L., Saltzman, E., & Byrd, D. (3), 1202–1216. (2007). Dynamic action units slip in speech production McMillan, C. T., & Corley, M. (2010). Cascading influences on the errors. Cognition, 103, 386–412. production of speech: Evidence from articulation. Cognition, Guenther, F. H. (2002). Neural control of speech movements. In 117, 243–260. A. Meyer, & N. Schiller (Eds.), Phonetics and phonology in Mehl, M. R., Vazire, S., Ramirez-Esparza, N., Slatcher, R. B., & language comprehension and production: Differences and Pennebaker, J. W. (2007). Are women really more talkative similarities (pp. 209–240). Berlin: Mouton de Gruyter. than men? Science, 317, 82. Guenther, F. H., Ghosh, S. S., & Tourville, J. A. (2006). Neural Meyer, A. (1992). Investigation of phonological encoding modeling and imaging of the cortical interactions underlying through speech error analyses: Achievements, limitations, syllable production. Brain and Language, 96, 280–301. and alternatives. Cognition, 42, 181–211. Guenther, F. H., Hampson, M., & Johnson, D. (1998). A theoreti- Miozzo, M., Pulvermuller, F., & Hauk, O. (2015). Early parallel acti- cal investigation of reference frames for the planning of vation of and phonology in picture naming: speech movements. Psychological Review, 105, 611–633. Evidence from a multiple linear regression MEG study. Hickok, G. (2012). Computational neuroanatomy of speech pro- Cerebral Cortex, 25, 3343–3355. duction. Nature Reviews Neuroscience, 13(2), 135–145. Möller, J., Jansma, B. M., Rodriguez-Fornells, A., & Münte, T. F. Hickok, G. (2014). The architecture of speech production and (2007). What the brain does before the tongue slips. the role of the phoneme in speech processing. Language Cerebral Cortex, 17(5), 1173–1178. and Cognitive Process, 29,2–20. Monaco, E., Pellet, P., & Laganaro, M. (2017). Facilitation and Indefrey, P. (2011). The spatial and temporal signatures of word interference of phoneme repetition and phoneme similarity production components: A critical update. Language in speech production. Language, Cognition and Neuroscience, Sciences, 2, 255. doi:10.3389/fpsyg.2011.00255 32(5), 650–660. Indefrey, P., & Levelt, W. J. M. (2004). The spatial and temporal Munding, D., Dubarry, A.-S., & Alario, F.-X. ( 2016). On the cortical signatures of word production components. Cognition, 92, dynamics of word production: A review of the MEG evidence. 101–144. doi:10.1016/j.cognition.2002.06.001 Language, Cognition and Neuroscience, 31, 441–462. Kenstowicz, M., & Kisseberth, C. (1979). Generative phonology: Pierrehumbert, J. (2002). Word-specific phonetics. Laboratory Description and theory. New York: Academic Press. phonology VII. Berlin: Mouton de Gruyter, 101–139. LANGUAGE, COGNITION AND NEUROSCIENCE 1201

Pouplier, M., & Hardcastle, W. (2005). A re-evaluation of the Strijkers, K., Costa, A., & Pulvermüller, F. (2017). The cortical nature of speech errors in normal and disordered speakers. dynamics of speaking: Lexical and phonological knowledge Phonetica, 62, 227–243. simultaneously recruit the frontal and temporal cortex Roelofs, A. (1997). The WEAVER model of word-form encoding within 200 ms. NeuroImage, 163, 206–219. in speech production. Cognition, 64, 249–284. Valente A, Pinet S, Alario F-X, Laganaro M. (2016) “When” does Roelofs, A. (1999). Phonological segments and features as plan- picture naming take longer than word reading? Frontiers in ning units in speech production. Language and Cognitive Psychology 7, 31. doi:10.3389/fpsyg.2016.00031 Processes, 14, 173–200. Van der Merwe, A. ( 2008). A theoretical framework for the Salmelin, R., Hari, R., Lounasmaa, O. V., & Sams, M. (1994). characterization of pathological speech sensorimotor Dynamics of brain activation during picture naming. control. In M. R. McNeil (Ed.), Clinical management of sensor- Nature, 368, 463–465. imotor speech disorders (2nd ed., pp. 407–409). New York: Schiller, N. O., Meyer, A. S., & Baayen, R. H. (1996). A comparison Thieme Medical Publishers. of lexeme and speech syllables in Dutch. Journal of Whiteside, S. O., & Varley, R. A. (1998). A reconceptualisation of Quantitative Linguistics, 3,8–28. apraxia of speech: A synthesis of evidence. Cortex, 34,221–231. Shattuck-Hufnagel, S. (1979). Speech errors as evidence for a Wilshire, C. (2002). Where do aphasic phonological errors come serial order mechanism in sentence production. In W. E. from? Evidence from phoneme movement errors in picture Cooper, & E. C. Walker (Eds.), Sentence processing (pp. 295– naming. Aphasiology, 16, 169–197. 342). Hillsdale: LEA. Ziegler, W. (2008). Apraxia of speech. In G. Goldenberg, & B. L. Shattuck-Hufnagel, S. (1992). The role of word structure in seg- Miller (Eds.), Neuropsychology and Behavioral Neurology (pp. mental serial ordering. Cognition, 42, 213–259. 269–286). Edinburgh: Elsevier. Staiger, A., & Ziegler, W. (2008). Syllable frequency and syllable Ziegler, W. (2009). Modelling the architecture of phonetic plans: structure in the spontaneous speech production of patients Evidence from apraxia of speech. Language and Cognitive with apraxia of speech. Aphasiology, 22, 1201–1215. Processes, 24, 631–661.