<<

The Effect of Articulatory Constraints and Auditory

Information on Patterns of Intrusions and Reductions.

By

Anneke W. Slis

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Department of Pathology

Oral Dynamics Lab

University of Toronto

© Copyright by Anneke Slis 2014

The Effect of Articulatory Constraints and Auditory Information on Patterns of Intrusions and Reductions

Anneke W. Slis

Doctor of Philosophy

Department of Speech Language Pathology University of Toronto

2014

Abstract

This dissertation seeks to answer the question whether articulatory constraints and auditory information affect intrusion and reduction . These intrusions and reductions of articulatory movement result from a general tendency to stabilize movement coordination. Stabilisation of speech movement coordination is an autonomous self-organizing process. This process, however, can be affected by factors related to articulatory properties and auditory information.

To assess how these factors affect movement coordination, three studies were performed. The first study examined differences in articulatory variability in the onsets of word pairs such as cop top and top top. To this end, different phonetic contexts and speaking rate were manipulated. As word pairs like top top are frequently used as control stimuli and word pairs like cop top as experimental stimuli, this study investigated how these two word pairs differ in movement control. The second study examined how constraints on individual articulators, manipulated by phonetic context, and speaking rate affected the number of intrusions and reductions. The third study investigated how these intrusions and reductions were influenced by the presence or absence of auditory information. Movements of the tongue tip, tongue dorsum

ii

and lower lip were recorded with electromagnetic articulography. The first study revealed that word pairs with alternating and identical onset consonants differ to such an extent that using identical onset word pairs as control stimuli is not recommended for future studies. The second study revealed that articulatory constraints resulted in asymmetrical patterns of intrusions: compared to a high back vowel context, a low vowel context resulted in more intrusions in general. In addition, in a front vowel context, the tongue dorsum intruded more frequently than the tongue tip and lower lip. The third study showed that speakers made fewer intrusions without auditory information available than with auditory information available. The results, which are explained within the framework of Articulatory Phonology and Task

Dynamics, support the notion that articulatory constraints and auditory information influence coupling strength and movement coordination as reflected in intrusion and reduction patterns.

iii

Acknowledgments

I would like to extent my deepest gratitude to my graduate advisor Pascal van Lieshout and committee members Keren Rice and Jeffrey Steele. Only with their extensive knowledge and feedback was I able to develop my understanding of an, to my opinion, extremely complicated area of speech science. Pascal van Lieshout has been an unbelievable source of knowledge and an extremely patient mentor in the whole process leading up to this final version of my dissertation. Keren Rice and Jeffrey Steele have been tremendously supportive in their theoretical contributions and during the many revisions of the manuscript.

The technical assistance from Aravind Namasivayam, Konstantin Alexandrovych and James Le made it possible to collect the data I needed for my thesis, which Radu Craioveanu, without complaining, helped analyzing. I’m very grateful to Mark Noseworthy’s voice recordings and to all the patient participants, who dedicated several hours of their lives repeating word pairs.

Celine Miller was a great support in finding and scheduling new participants. Toni Rietveld and

Barbara Reid helped in statistical and the editing process for chapter 2. I also wish to thank the

University of Toronto and the department of Speech Language pathology in providing me with ample financial support in the form of fellowships and awards to be able to attend complete my

PhD.

A special thank you goes out to Sieb Nooteboom, who, before I entered the University of

Toronto to pursue my PhD degree, introduced me to the fascinating world of speech errors. On several occasions, I had the opportunity to reflect on my designs with Louis Goldstein and

Marianne Pouplier, which has been extremely helpful. Of course, their work on intrusions and reductions has led to this dissertation. I’m also thankful to my friends inside and outside the world of academia who directed me back to real life when nessecary. I wish to thank my family,

iv

in particularmy two children, Lotte and Tim, and a partner, Tedde, who kept me with two feet on the ground and reminded me what life was really about.Thank you!

v

Table of Contents

LIST OF TABLES ...... X

LIST OF FIGURES ...... XIII

1 CHAPTER 1 ...... 1

1.1 INTRODUCTION ...... 1

1.2 PERCEPTION STUDIES: SEGMENTAL ERRORS ...... 5

1.3 ACOUSTIC AND KINEMATIC STUDIES: SUB-SEGMENTAL ERRORS ...... 10

1.4 THE TASK DYNAMIC MODEL AND ARTICULATORY PHONOLOGY ...... 16

1.4.1 General characteristics ...... 16

1.4.2 The inter-gestural coordination level ...... 18

1.4.3 The inter-articulator coordination level ...... 20

1.5 INTRUSIONS AND REDUCTIONS IN TASK DYNAMICS ...... 24

1.6 STUDIES AND HYPOTHESES...... 32

1.6.1 Study 1 ...... 32

1.6.2 Study 2 ...... 34

1.6.3 Study 3 ...... 35

1.6.4 Methods...... 35

1.7 REFERENCES ...... 38

2 CHAPTER 2 ...... 52

2.1 ABSTRACT ...... 54

2.2 INTRODUCTION ...... 55

2.2.1 Background ...... 55

vi

2.2.2 Current Study ...... 58

2.3 METHODS ...... 60

2.3.1 Participants...... 60

2.3.2 Stimuli ...... 61

2.3.3 Procedures ...... 62

2.3.4 Instrumentation ...... 64

2.3.5 Data Processing...... 65

2.3.6 Analysis ...... 67

2.4 RESULTS ...... 70

2.4.1 Median Movement Range ...... 70

2.4.2 Variability ...... 72

2.4.3 Correlation Values...... 74

2.5 DISCUSSION ...... 75

2.6 ACKNOWLEDGEMENTS ...... 80

2.7 APPENDIX ...... 80

2.8 REFERENCES ...... 81

3 CHAPTER 3 ...... 96

3.1 ABSTRACT ...... 98

3.2 INTRODUCTION ...... 99

3.2.1 Current definitions of speech errors and historical background ...... 100

3.2.2 Theoretical framework...... 103

3.2.3 Articulatory Phonology (AP) ...... 103

3.2.4 Current study ...... 108

3.3 METHOD ...... 110

vii

3.3.1 Participants...... 110

3.3.2 Stimuli ...... 111

3.3.3 Procedures ...... 112

3.3.4 Analysis ...... 114

3.4 RESULTS ...... 121

3.4.1 Part 1 Effects of rate, part of trial and word position ...... 122

3.4.2 Part 2 Effects of vowel, coda#onset, and type of articulator ...... 123

3.4.3 Part 3 relative difference ...... 125

3.5 SUMMARY AND DISCUSSION ...... 130

3.6 CONCLUSIONS ...... 136

3.7 ACKNOWLEDGEMENTS ...... 137

3.8 APPENDIX ...... 138

3.9 REFERENCES ...... 138

4 CHAPTER 4 ...... 148

4.1 ABSTRACT ...... 150

4.2 INTRODUCTION ...... 151

4.2.1 Background ...... 151

4.2.2 Articulatory Phonology (AP) and Task Dynamics ...... 153

4.2.3 Current study ...... 156

4.3 METHODS ...... 158

4.3.1 Participants...... 158

4.3.2 Stimuli ...... 158

4.3.3 Procedure ...... 159

4.3.4 Instrumentation ...... 161

viii

4.3.5 Data Processing...... 162

4.3.6 Analysis ...... 164

4.4 RESULTS ...... 165

4.4.1 Lombard effect ...... 165

4.4.2 Reductions and intrusions ...... 166

4.4.3 Regression analysis ...... 168

4.4.4 Summary ...... 169

4.5 DISCUSSION ...... 169

4.6 ACKNOWLEDGEMENTS ...... 175

4.7 REFERENCES ...... 175

5 CHAPTER 5 ...... 184

5.1 INTRODUCTION ...... 185

5.2 SUMMARY OF THE FINDINGS ...... 185

5.2.1 Study 1 ...... 186

5.2.2 Study 2 ...... 190

5.2.3 Study 3 ...... 193

5.3 CONTRIBUTIONS ...... 196

5.3.1 Empirical contributions ...... 196

5.3.2 Theoretical contributions: Articulatory phonology and Task Dynamics ...... 197

5.3.3 Methodological contributions ...... 203

5.3.4 Future directions and limitations ...... 204

5.4 REFERENCES ...... 206

ix

List of Tables

Table 2-1 Individual normal (N) and fast rates (F) in metronome Beats per Minute (bpm) and number of syllables per second for each participant...... 86

Table 2-2 Results for the series of ANOVA’s: dependent variables “COEFFVAR” and “median movement range values” (median) for target and non-target articulators. The within subject variables are listed in the first column with all the possible interactions. Significant results (p <

0.003) are indicated with an asterix...... 87

Table 2-3 Mean correlation values between a target articulator and a simultaneous non-target articulator for word pairs with alternating and identical onsets collapsed across articulator, context and word position in normal and fast speaking rate. Standard deviations are between brackets...... 89

Table 2-4 Stimulus list. The columns represent the non-target articulators tongue dorsum (TD), tongue tip (TT) and lower lip (LL). The cells contain the word pairs in which the non-target articulator appears in the second word. The coda#onset is bold and upper case. Thus, for example, the tongue dorsum appears as a non-target in the two words coP Top and coT Pot. The target articulator in the second word forms the onset of that particular word, i.e. the tongue tip and the lower lip respectively in this particular example...... 90

Table 3-1 Results for all 4 RM-ANOVA’s: Degrees of Freedom, F-values and p-value for reductions (left columns) and intrusions (right column). Asterisks indicate significance at a level of *p < 0.0125, ** p < 0.001, ***p<0.0001...... 121

Table 3-2 Mean ratio of reductions (M) and standard deviations (SD) for target articulators during word 1 and word 2 at a fast and normal speaking rate. Separate values are listed for the start, middle and end of a trial. The values are collapsed across words and participants...... 122

x

Table 3-3 Mean ratio of intrusions (M) and standard deviations (SD) for non-target articulators during word 1 and word 2 at a fast and normal speaking rate. Separate values are listed for the start, middle and end of a trial. The values are collapsed across words and participants...... 123

Table 3-4 Mean ratio of reductions (M) and standard deviations (SD) for the target lower lip

(LL), tongue tip (TT), and tongue dorsum (TD) in the context of the four different vowels and two different coda consonants. The values are collapsed across words and participants, rate, word position and part of the trial...... 124

Table 3-5 Mean ratio of reductions (M) and standard deviations (SD) for the non-target lower lip (LL), tongue tip (TT), and tongue dorsum (TD) in the context of the four different vowels and two different coda consonants. The values are collapsed across words and participants, rate, word position and part of the trial...... 125

Table 4-1. Mean duration values (ms) and movement range differences (%) for masked and unmasked speech at fast and normal speaking rates. Standard deviations are reported in parentheses...... 166

Table 4-2 Repeated measures GLM ANOVA analysis. Values of ratio of reductions and intrusions (dependent variable) for 14 Canadian English-speaking participants. Independent variables were masking, rate, trial position. Main effects, two- and three-way interactions are reported. Data are collapsed across words. The effect is considered significant at p < 0.05

(marked with an asterisk * p < 0.05, ** p < 0.01, *** p < 0.001)...... 167

Table 4-3 ratio of reduction means (M) and standard deviations (SD) averaged across individual participants (n = 14), collapsed across words. Independent variables are masking (unmasked and masked), rate (normal, fast) and trial part (start, middle and end)...... 167

xi

Table 4-4 intrusion means (M) and standard deviations (SD) averaged across individual participants (n = 14), collapsed across words. Independent variables are masking (unmasked and masked), rate (normal, fast) and trial part (start, middle and end)...... 168

Table 4-5 Regression values (R2) and their significance for masked (N+) or unmasked (N-) in normal (N) and fast (F) speech...... 169

xii

List of figures

Figure 1-1: “The organization of the task-dynamic model of (Saltzman and

Munhall, 1989; Browman & Goldstein, 1992; Nam & Saltzman, 2003).” From: Goldstein, et al.

(p.5, 2006)...... 19

Figure 1-2 “(a) A gestural score for the word “bad” showing the activation intervals for the three gestures composing the word and driving input to the interarticulator level. (b) The coupling graph for the word “bad” in which the lines indicate coupling relationships between pairs of gestures. Solid lines represent in-phase coupling, dashed lines represent anti-phase coupling.” from: Goldstein et al, (p. 6, 2006)...... 20

Figure 1-3 the figure shows the position of 8 of the 12 coils: the tongue dorsum, tongue tip and lower lip are of interest for the two studies. 4 more coils are located behind the two ears and both sides of the mouth...... 37

Figure 2-1 example of the word pat. The upper graph shows the audio signal. "maximum lower lip" indicates the maximum for the target articulator “lower lip” in the first word, “maximum tongue dorsum” indicates the maximum for the non-target articulator and “maximum tongue tip” is the maximum for the tongue tip articulator (coda consonant). The minimum values are indicated with minimum tongue tip, tongue dorsum and lower lip. The vertical axis represents the movement range (rescaled to percentages for the purpose of a different study) and the horizontal axis represents time (seconds). The vertical arrow pointing downwards from the maximum value for the lower lip indicates the measured value for the non-target articulator

(tongue dorsum)...... 91

Figure 2-2 on the vertical axis: median movement ranges (mm) for word pairs with alternating

(alt) and identical onsets (ident). Bars on the horizontal axis: word positions 1 and 2 and rates

“fast” and “normal”. The upper part presents the pooled results for the target articulators, the

xiii

lower part the pooled results for the non-target articulators. Error bars represent standard error.

...... 92

Figure 2-3 on the vertical axis: median movement range values (mm) for words with alternating onsets (alt) and identical onsets (ident). Bars on the horizontal axis: coda#onset (e.g., /k#p/ or

/t#p/ for the words tock pock and cot pot when the lower lip is the target in the second word and pock tock and pot cot when the lower lip is the non-target in the second word) combinations, presented for the different target articulators (upper graph) and non-target articulators (lower graph) tongue tip (TT), tongue dorsum (TD) and lower lip (LL) in 4 different vowel contexts.

Error bars represent standard error...... 93

Figure 2-4 on the vertical axis: COEFFVAR values (mm) for alternating onset (alt) and identical onset (ident) word pairs. Bars on the horizontal axis: word positions 1 and 2 and rates “fast” and

“normal”. The upper part presents the pooled results for the target articulators and the lower part presents the pooled results for the non-target articulators. Error bars represent standard error. . 94

Figure 2-5 on the vertical axis: COEFFVAR values (mm) for alternating onset and identical onset word pairs. Bars on the horizontal axis: coda#onset, presented for the different target

(upper part) and non-target (lower part) articulators tongue tip (TT), tongue dorsum (TD) and lower lip (LL) in 4 different vowel contexts. Error bars represent standard error...... 95

Figure 3-1 Waveform and two words of one word pair “pack tack”. The vertical axis displays relative movement range (%) of the individual articulators lower lip (LL, solid line), tongue dorsum (TD, dashed line), and tongue tip (TT, dotted line). The maximum lower lip (indicated with a downwards pointed arrow) and the maximum tongue tip (indicated with a downwards pointed arrow) form the two maximum points for target constrictions. The non-target constrictions are located at the black squares (See text for more details)...... 115

xiv

Figure 3-2 15 overlaid repetitions of the word “pat cat”. The vertical axis represents the normalized displacements of the lower lip (LL), tongue dorsum (TD), and tongue tip (TT). The horizontal axis represents time, normalized for presentation purposes only. The squares indicate the location of the target articulator constriction maxima, the circles the location of the non- target articulator. The two arrows indicate two examples of intrusions based on the Median value of distribution plus 2 MADs. For the second word, there were no outliers as defined by this approach...... 118

Figure 3-3 ratio of reductions by difference measures at the individual participant level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the clear triangles the tongue tip. Numbers refer to individual participants...... 127

Figure 3-4 ratio of reductions by difference measures at the individual word level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the triangles the tongue tip. Vowels correspond to the vowels of the individual words (e.g., the tongue dorsum and the vowel /ɪ/ are involved in the word kip or kit)...... 127

Figure 3-5 ratio of intrusions by difference measures at the individual participant level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the triangles the tongue tip. Numbers represent the participants...... 128

Figure 3-6 ratio of intrusions by difference measures at the individual word level. The black circles represent the tongue dorsum, the grey squares the lower lip and the triangles the tongue tip. Vowels represent the vowels of the individual words. E.g., the tongue dorsum and the vowel

/ɪ/ can be the word tip or pit...... 129

xv

1 Chapter 1

1.1 Introduction

For over more than a century, speech errors have been a fruitful phenomenon for scientists to study and from which to infer how speech is planned and produced. These inferences have made it possible to develop speech production models, based mainly on traditional perceptual methods used to investigate these errors. These methods have led to the conclusion that errors mainly originate at the phonological level of planning. At this phonological level, articulatory constraints, introduced by phonetic context, do not play a role. Consequently, if and how co- articulatory constraints affect error patterns is not clear. The current dissertation seeks to answer the question whether kinematic error patterns are influenced by phonetic context and are influenced by auditory information.

Recent kinematic studies suggest that certain speech errors arise as a consequence of autonomous mechanisms, generally observed in movement coordination (Goldstein, Pouplier,

Chen, Saltzman & Byrd, 2007; Pouplier, 2003). These errors manifest themselves as intrusions and reductions of articulatory movements. Intrusions consist of activation of articulators that are not expected to be activated, given the intended speech segment (i.e., non-target articulators), whereas reductions involve reduced activation of (a set of) intended articulators (i.e., target articulators). The non-target articulators are target articulators in the adjacent word. These errors differ substantially from what has been found in traditional perceptual studies. What is particularly interesting is that these intrusions and reductions are not incorrect from a dynamical standpoint – they can be considered a logical consequence of underlying mechanisms intended to stabilize movement coordination. The general autonomous mechanisms of movement coordination, which cause intrusions and reductions, arise independently from biomechanical 1

2 properties of individual components and from auditory and visual information (cf. Kelso, 1995;

Peper & Beek, 1998; Peper & Beek, 1999). However, coordination is affected by its individual components. Individual components frequently differ in their characteristic properties which can interact with coordination dynamics (Kelso, 1995). Because articulators also differ in their characteristic movement properties, the question arises whether constraints at the level of articulation interact with movement coordination in such a way that certain errors occur more frequently than others. In this dissertation, I explore articulatory movement coordination in different phonetic contexts, resulting in different co-articulatory constraints, to investigate how context affects coordination in such a way that asymmetries in error patterns emerge. In addition, the fact that the majority of intrusions and reductions are not noticed by an external listener (Pouplier & Goldstein, 2005) raises an additional question, namely whether the speaker could still detect these intrusions and reductions based on the auditory output, and thus prevents them from turning into errors that distort the message. These questions lead to the main topics of this dissertation, namely to investigate to what extent co-articulatory constraints as well as auditory information interact with general autonomous principles of coordination in speech.

Exploring co-articulatory constraints and their consequences for resulting patterns of intrusions and reductions makes it possible to separate these articulatory constraints from the ones originating from different aspects of processing, such as word frequency or other lexical aspects often observed in error patterns (e.g., Corley, Brocklehurst & Moat, 2011; Goldrick, Baker,

Murphy & Baese-Berk, 2011; Nooteboom & Quene, 2007). This division is important, for example, when speech disorders are classified based on the presence of specific features in patterns of errors. Based on these patterns, claims are frequently made concerning the phonological versus articulatory basis of a specific pathology (e.g., Buchwald & Miozzo, 2012;

Buckingham & Yule, 1987; Pouplier & Hardcastle, 2005; Tuller, 1984; Wood, Hardcastle &

3

Gibbon, 2011). Because little is known, however, about how underlying co-articulatory constraints and auditory information affect intrusions and reductions in the normal population in general, the dissertation focuses on a non-pathological population and speech disorders are not further discussed in this thesis. Instead the thesis centers on the question how co-articulatory constraints contribute to intrusions and reductions in order to inform speech production models.

As mentioned above, many of the intruding or reducing movements of non-target and target articulators go unnoticed by the listener. Although not always perceived as incorrect in the case of small movement irregularities, articulatory coordination can affect speech production to such an extent that patterns eventually may surface in the form of, for example, a systematic sound change (e.g., Browman & Goldstein, 1990; Parrell, 2012) or, when large enough, as audible errors in natural language. In most research to date, these audible errors frequently have served as the basis for the components and parameters of speech production models (Dell & Reich,

1980; Fromkin, 1971; Shattuck-Hufnagel, 1979). However, the consequences of relying on perception instead of measuring speech production processes more objectively is that information on underlying mechanisms is ignored (see section 1.2). Investigating how co- articulatory constraints and auditory information contribute to error patterns and movement coordination using kinematic measures of articulation is thus important for developing and extending these models of speech and . The interaction between co- articulatory constraints and auditory information on one side and coordination on the other side is expected to surface as asymmetries in the number of intrusions and reductions, related to the type of articulators involved, and whether or not auditory information is available to the speaker.

The underlying mechanisms that cause these added and reduced movements have been described in the task dynamic model within the theory of Articulatory Phonology (AP)

4

(Goldstein, Byrd & Saltzman, 2006; Saltzman & Munhall, 1989), based on related models of dynamical systems (Haken, Kelso & Bunz, 1985; Kelso, 1995; Peper & Beek, 1998; Peper &

Beek, 1999; Van Lieshout, 2004). The current dissertation adopts the dynamical approach within Articulatory Phonology as its framework to investigate to what extent co-articulatory constraints influence movement variability in general, the occurrence of gestural reductions and intrusions specifically, and to what extent auditory information plays a role in these processes.

Before presenting the three core studies, a short outline of research is given in the first chapter. Section 1.2 discusses speech error studies based on perceptual transcription, and the limitations of these studies with regard to a possible role for (co-) articulation in error patterns1. Section 1.3 highlights findings from acoustic and kinematic studies which pave the way for the current dissertation. Section 1.4 introduces the theory of AP and the model of Task

Dynamics and how recent findings on intrusions and reductions fit within their concepts.

Finally, in section 1.5, the aims and research questions of the following three experimental chapters (2, 3 and 4) are outlined. The final chapter presents a general summary of the findings and their relevance in the context of current literature and speech production models.

1 Please note that in the discussion of perceptual studies, the term “speech error” will be used, in keeping with the terminology of the literature in that area. Beginning with section 1.3, which reviews the literature on the gradual nature of errors and some of the underlying mechanisms causing certain types of errors, the more neutral terms “intrusions” and “reductions” will be introduced. As will become clear, these intrusions and reductions are not errors from a dynamical point of view, but are a logical consequence of underlying mechanisms of coordination dynamics.

5

1.2 Perception studies: segmental errors

This chapter highlights some important findings and consequences of speech error studies based on perceptual transcription. It is emphasized how these traditional error studies influenced the way these errors were interpreted.

A speech error can be defined as an utterance produced by a speaker that differs from the intended utterance. The behavior of errors has frequently been used to discover processes that underlie error-free speech, which are subsequently integrated into various speech and language production models. Based on an abundance of error studies using perceptual transcription as a tool, models distinguish separate stages at which errors typically occur (Dell, 1988; Levelt,

Roelofs & Meyer, 1999). First, lexical items are retrieved from the lexicon and assembled into a syntactic structure. Errors originating at this stage surface as switched, omitted or replaced words or morphemes (Dell, 1988; Dell & Sullivan, 2004). At the next stage, the lexical items are assigned phonological units, such as phonemes, which are subsequently put into serial order

(Fromkin, 1971; Levelt, et al., 1999; Meyer, 1992; Shattuck-Hufnagel, 1979). At this phase, a speech error occurs when a given phonological unit is wrongfully substituted with another, a unit is transposed to a different location in the speech string, or units are omitted or added (Dell,

1988; Dell & Sullivan, 2004; Fromkin, 1971; Meyer, 1992). Two examples of common phonological errors are "teep a cape" for target "keep a tape", and "mang the mail" in place of

"bang the nail" (Fromkin, 1971). The first example involves the exchange of the phonemes /t/ and /k/. In the second example a nasal feature is added to the voiced bilabial stop /b/ and place of articulation has changed such that the alveolar /n/ changes to the bilabial /m/. The behavior observed in the two examples, which is typical of many phonological errors, has been taken as evidence that a discrete, static, unit such as the phoneme or feature is exchanged or added

(Fromkin, 1971; Guest, 2002; Meyer, 1992; Wickelgren, 1965). The notion that the phoneme or

6 feature is the smallest phonological unit involved in speech errors is supported by the observation that a speech error almost always results in a permissible string of phonemes in the language, that is, the resulting sequence is phonotactically well-formed (Baars & Motley, 1976;

Fromkin, 1971; Nooteboom, 1969; Wickelgren, 1965).

The idea that errors originate mainly at the phonological rather than at the articulatory level is further corroborated by error studies employing a silent speech paradigm (Dell, 1980; Dell &

Repka, 1992; Postma & Noordanus, 1996). In a silent speech task, the speaker is instructed to form a mental image of the speech string, without actually articulating the utterance. Using silent speech in speech error detection tasks eliminates overt articulatory influences. Dell

(1980) compared self-reported errors in silent and overt speech and observed that silent speech resulted in speech error patterns similar to those in overt speech. Similar results for mouthed

(involving silent articulation) and silent speech were found by Postma and Noordanus (1996). In addition, Dell (1980) showed that differences in velocity of articulatory movements when producing speech segments – speech segments were categorized as having either slow or fast velocity profiles – did not influence the number of errors. These studies support theories favoring the effect of phonological processes in the occurrence of errors rather than articulatory influences.

The way in which traditional speech error data have been collected and interpreted has several limitations. First of all, silent speech is characterized by activity in the cortical motor areas

(Moller, Jansma, Rodriguez-Fornells & Munte, 2007) and issues that arise at the articulatory level could actually be situated at the level of motor commands emanating from these areas.

Moreover, Livesay, Liebke, Samaras and Stanley (1996) showed that reciting silent language increases EMG activity of the lips. No activity was measured when the participant visualized an

7 image. Thus, not finding a difference between silent and overt speech error patterns does not mean that an error originates at the phonological level and that (co-) articulatory factors do not play a role in these tasks.

Secondly, most speech error studies suffer from biases introduced by perceptually transcribing errors. For example, listeners have been shown to ignore phonetic detail and disturbances

(Buckingham & Yule, 1987), to automatically restore speech sounds that are incorrect or even missing (Cohen, 1980; Samuel, 1981; Samuel, 1996; Warren, 1970) and to categorize the perceived speech stream into pre-existing categories that are familiar to the listener (Boucher,

1994; Buckingham & Yule, 1987). In addition, perceiving all acoustic changes caused by articulatory movements inside the vocal tract and subsequently defining the actual underlying nature of speech errors is nearly impossible (Kent, 1996; Koenig, 2004), and certain aspects of articulatory movements that are not perceived as incorrect may in fact be anomalous when observing the actual movements (Meyer, 1992; Pouplier & Goldstein, 2005). These incorrect movements are consequently not classified. Pouplier and Goldstein (2005) show, for example, that the bias towards /k/ replacing /t/ often found in error databases (Stemberger, 1991) could well be due to the fact that identifying a produced /t/ is more difficult when the tongue dorsum is

(slightly) activated than identifying a /k/ with an activated tongue tip (Pouplier & Goldstein,

2005). In addition, Marin, Pouplier and Harrington (2010) demonstrate that, if two simultaneous full constrictions are present in the vocal tract, the more posterior constriction influences the resulting spectral shape to a greater extent. Accordingly, an additional activated tongue dorsum influences the spectral shape more than an additional activated tongue tip. Given that a kinematic error study by Goldstein et al. (2007) did not find evidence for an articulatory asymmetry underlying tongue dorsum and tongue tip errors in the production of /k/ and /t/, they suggested that the observed asymmetry previously observed by Stemberger (1991) was likely

8 due to a perceptual bias.

Even though several perception studies have acknowledged that errors do exist that deviate from correctly produced speech segments (Butterworth & Whittaker, 1980; Fromkin, 1971; Meyer,

1992; Hockett, 1967), these errors have been characterized as rare instances of errors originating at the level of phonetic encoding or motor planning (see e.g., Levelt, et al., 1999, p.21), and consequently have been omitted frequently from systematic analyses (Pouplier & Hardcastle,

2005). Omitting ill formed speech segments simply because they are rare, or failing to detect these errors at all because of limitations of the perceptual system, created a bias towards errors consisting of correctly produced speech segments and an artificial division between higher phonological and lower phonetic processes (Laver, 1980; McMillan, Corley & Lickley, 2009;

Pouplier & Hardcastle, 2005).

Findings from perceptual studies which show asymmetries in error patterns are nonetheless interesting for theories claiming an articulatory basis for some of the errors. Several researchers have investigated these asymmetries (e.g., Shattuck-Hufnagel & Klatt, 1979; Stemberger, 1991).

Although a certain number of the asymmetries in perceptual error studies likely arise as a consequence of listener bias (Pouplier & Goldstein, 2005), some observed asymmetries can be caused by other, possibly articulatory, constraints as well. The so-called palatalization bias for example, in which a /s/ is replaced by /ʃ/ more frequently than /ʃ/ by /s/, does not result from perceptual confusion. Kinematic (Pouplier, 2008) and acoustic (Kochetov & Radisic, 2009) studies confirm that /ʃ/ indeed replaces /s/ more frequently than vice versa, although a slight bias exists for /ʃ/ to be perceived as /s/ (Pouplier & Goldstein, 2005). These findings confirm that what is articulated is not always perceived as such and, as will be shown in the next chapter,

9 these acoustic and kinematic studies reveal underlying patterns which do not surface in the traditional perception based error studies.

10

1.3 Acoustic and kinematic studies: sub-segmental errors

The limitations of perceptual transcription in investigating errors and their possible underlying co-articulatory constraints have been established in a series of studies that employed objective instrumental methods. Findings from studies examining acoustic data (Frisch & Wright, 2002;

Goldrick & Blumstein, 2006; Goldrick, et al., 2011; Laver, 1980), electro-muscular activity

(Mowrey & MacKay, 1990), electropalatography (Wood, et al., 2011), and movement data from individual articulators (Boucher, 1994; Frisch, 2007; Goldstein, et al., 2007; McMillan &

Corley, 2010; McMillan, et al., 2009; Pouplier, 2003; Pouplier, 2007; Pouplier, 2008; Stearns,

2006) reveal that speech does not always result in physical events that generate correct speech segments, and that these ill-formed speech segments are much more common than previously assumed. Frequently, errors were measured in these studies in which a non-target articulator was activated only gradually. Interestingly, many instances of these measured errors were not perceived as incorrect by the listener (Mowrey & MacKay, 1990; Pouplier & Goldstein, 2005).

This made it necessary to redefine the concept of “speech error” for researchers who approach errors using acoustic, physiological and/or kinematic methods. This aspect will be discussed in more detail when addressing kinematic studies.

Several findings from the acoustic and kinematic studies, cited above, suggest that (co-) articulatory factors affect error outcomes. For example, a study on vowel errors (Laver, 1980) revealed that producing words with vowels which required activating similar muscles, such as peep pip or pup pop, never resulted in errors, whereas words with vowels whose articulation triggered the use of different muscles, such as pep poop, resulted in blending of the two vowels.

Furthermore, Frisch and Wright (2000) observed an interesting effect of the coda consonant on voicing errors of the following onset consonant. Speech errors were elicited by having participants produce tongue twisters like sung zone Zeus seem. These kind of twisters induced

11 errors involving acoustic realizations of features intermediate between typical /s/ and /z/ productions. The researchers found that, in the case of the twister sung zone Zeus seem, the word seem was never produced as voiced and thus never contained an error. Based on a study by

Pirello, Blumstein and Kurowski (1997) which revealed that the voicing of fricatives in CV syllables was influenced by the voicing characteristics of a preceding segment, Frisch and

Wright suggested that the preceding /s/ in Zeus constrained the production of the following /s/ in seem. Similar types of constraints could explain the earlier mentioned palatalization bias.

Pouplier (2008) observed that with errors involving /ѕ/, which consists of a tongue tip gesture, and /ʃ/, which involves a tongue tip and tongue body gesture, a change in the tongue body constriction for /ʃ/ always resulted in a change in the tongue tip position as well due to the physical coupling between tongue tip and tongue body. Similarly, the palatalization bias, in which /s/ results more frequently in a /ʃ/ production than vice versa, can be explained as the result of the physical coupling between tongue tip and tongue dorsum which could have triggered an intrusion of the tongue body. Finally, a recent kinematic study reveals that more errors were produced during onset consonants in the context of the high vowel /ɪ/ than in the context of the low back vowel /ɑ/ (Goldstein et al., 2007). All these studies suggest that factors at the level of articulation affect the occurrence of errors. Given the relevance of the latter study to the work presented in the thesis, some more detail about this study will be presented here.

Employing a repetitive speech task, Goldstein et al. (2007) sampled movement data of the tongue tip, tongue dorsum and the lower lip with electromagnetic mid-sagittal articulography

(EMMA). The experimental stimuli consisted of word pairs in which the onset consonants alternated, such as cop top. In the remainder of the dissertation, these types of word pairs will be referred to as alternating onset consonant word pairs. In contrast, so-called non-alternating onset consonant word pairs formed the control stimuli for which the onset consonants did not differ,

12 for example top top or cop cop. The movement amplitude levels from the non-target tongue tip

(i.e., a movement of the tongue tip during the onset of cop instead of top) during a tongue dorsum target constriction (i.e., a movement of the tongue dorsum during the onset of cop) as well as from the non-target tongue dorsum during a tongue tip target constriction (i.e., the onset of top) were measured. Errors were defined based on the mean activation levels and standard deviations of non-target articulators produced in the non-alternating condition. When the non- target activation of an articulator in an alternating trial was greater than two standard deviations from the mean value of the identical non-target articulator in the non-alternating trial, the activation was considered anomalous. Based on this statistical criterion, the results indicated that a non-target articulator could be activated simultaneously with the target articulator2. These instances of elevated articulatory movement differed substantially from movement patterns observed in cases of normal variability; the movements appeared to intrude when a target articulator constricted the vocal tract. For this reason, these were called intrusion errors.

Likewise, when the movement of a target constriction was reduced too much, a so-called reduction error occurred. Speaking rate influenced the error patterns in the study by Goldstein et al. (2007) such that speech produced at a fast rate induced more errors than speech produced at a normal rate. Interestingly, especially at a fast rate, the intrusions and reductions built up over the course of a trial with more errors being found towards the end than at the start of a trial.

As mentioned above, an interesting asymmetry was found related to the type of vowel following the word onset: more errors were detected in the /ɪ/ condition than in the /ɑ/ condition. The

2 Although constrictions are characterized by a combination of different articulators, the term "articulator" is used throughout this dissertation to refer to individual tongue tip, tongue dorsum or lower lip constrictions.

13 authors hypothesized that, compared to /ɑ/, the tongue shape for /ɪ/ was more compatible with those for /k/ and /t/ articulations (Goldstein, et al., 2007). As the tongue is already elevated during the vowel /ɪ/, the speaker is more likely to produce intrusions in the vicinity of a consonantal tongue tip or tongue dorsum constriction. This finding suggests that co-articulatory properties influence the degree to which non-target articulators are activated. However, another possibility for the asymmetry, the authors mentioned, could have been their method to define intrusions and reductions. They observed that, for the non-alternating onset sequences, the standard deviations for non-target tongue tip and tongue body movements in the context of /ɪ/ had a tendency to be smaller than in the context of /ɑ/. They did not mention whether this difference in standard deviation was also observed in the alternating onset sequences. If not, then the errors in the /ɪ/ condition could have been inflated because of the smaller standard deviation in non-alternating sequences that formed the baseline.

Several additional error studies suggest that movement patterns from alternating and non- alternating onset word pairs are characterized by different co-articulatory properties. For example, in a single case study on velum movement errors, Goldstein et al. (2007) observed that the velum showed lower movement amplitude in alternating onset than in non-alternating onset word pairs (bang bad and kim kid versus bang bang, bad bad etc.). In contrast, Stearns (2006) observed more tongue dorsum activation variability in alternating onset word pairs (top cap cop tab) compared to baseline activations in non-alternating onset nonsense syllables (/tɑ tæ tæ tɑ/ and /kɑ kæ kæ kɑ/). Moreover, she detected that, for some participants, the mean tongue dorsum movements during tongue tip constrictions in alternating onset sequences were higher than the mean amplitudes of the non-target tongue dorsum in non-alternating onset sequences. One possible explanation provided was that the speakers had hyper-articulated the alternating onset sequences to enhance contrast (Stearns, 2006). This explanation is supported by findings of

14

Goldrick and Blumstein (2006), who observe that voiced onset consonants in alternating word pairs, such as keff geff geff keff, tended to show less variance in VOT than the same voiced onset consonants in the control condition, as for example in geff geff3. In summary, these studies reveal that it is indeed possible that co-articulatory differences in experimental and control stimuli could underlie the asymmetric vowel results in the Goldstein et al. study. These studies also reveal that the way in which errors are defined can affect the actual outcome of patterns substantially.

Notwithstanding the controversial results related to vowel influences, the actual intrusions and reductions, which are frequently gradual in nature, need to be explained. The findings of the above mentioned studies reveal that (cognitive) models that describe speech planning as a two- step process, in which qualitative, discrete speech units are translated into quantitative, often gradient, articulatory movements (Dell, 1988; Fromkin, 1971; cf. Browman & Goldstein, 1990), seem too limited, in that they do not recognize the existence of subphonemic errors. Several recent models have provided a different account for these sub-phonemic errors. The Cascading model (Goldrick & Blumstein, 2006; McMillan & Corley, 2010; Pouplier & Goldstein, 2010) explains gradual errors as resulting from several partially activated phonological representations that cascade down to the level at which they are executed, leaving so-called phonetic "traces" when a resulting speech segment is produced. However, whereas this model can explain intrusions as a result of competing phonological representations, it fails to account for the finding that speaking rate affected the number of errors. One possibility is that speaking rate

3 Conversely, Pouplier (2007) suggested that the low variability in VOT in alternating onset sequences in Goldrick and

Blumstein's study might have been due to the fact that tokens that were perceived as errors were excluded from measurement, thus affecting the overall variability of tokens in alternating onset sequences.

15 affects the final articulatory movements because of time constraints to retrieve and verify a correct speech plan at the phonological stage (Dell & Reich, 1980; Levelt, et al., 1999) or because of supposed articulatory difficulties. However, articulatory difficulties or monitoring issues are not likely to increase the number of errors at the higher speaking rates used in these studies, because, as McMillan and Corley (2010) point out, the rate at which typical English is spoken is considerably higher (240 syllables/min) than the rate often employed in error elicitation tasks (180-210 syllables/min). Typical spoken English, of course, is characterized by many function words and other short items, but the task in the Goldstein et al. (2007) study consisted of similar short items like cop top produced at speaking rates of 240 syllables/min, making it quite similar to short words in running speech. Moreover, producing cop top many times doesn't require much planning once the production of the sequence has started, excluding the possibility of time constraints verifying the speech plan. Finally, the cascading model cannot account for the finding that errors built over the course of a trial (see for a discussion Goldrick

& Chu, 2013; Pouplier & Goldstein, 2013). These findings render this model of speech planning less viable as a potential candidate to explain the intrusions and reductions.

The speaking rate effect, the increase in number of errors towards the end of the trial and the gradual nature of errors are consistent with principles outlined in the theory of AP and Task

Dynamics. This theory describes the phonological and phonetic realizations as microscopic and macroscopic properties within the same system in which the gesture is a quantitative as well as a discrete unit (Browman & Goldstein, 1989; Browman & Goldstein, 1990; Gafos & Benus,

2006). The concepts are discussed in more detail in the next section.

16

1.4 The Task Dynamic Model and Articulatory Phonology

1.4.1 General characteristics

Articulatory Phonology is a linguistic theory that implements phonological units as vocal tract activity (Fowler, 1995). The theory incorporates concepts of Task Dynamics which are used to model the different stages (Saltzman & Munhall, 1989). When modeling Task Dynamics, the kinematics of the individual articulators are expressed in terms of linear second order equations

(Saltzman & Byrd, 2000). These mathematical laws result in continuous trajectories of articulatory movements (Gafos & Benus, 2006; Goldstein & Fowler, 2003). My dissertation will limit itself to the generic principles of AP and Task Dynamics, without describing the mathematical formulas.

In AP, the smallest unit of speech planning and production is the gesture (Browman &

Goldstein, 1989; Tilsen & Goldstein, 2012). A gesture is a dynamically defined action unit of speech which can be described cognitively as an abstract neural representation of a constriction action. At the same time, it can be described articulatorily as the actual action of gestures to modify the shape of the vocal tract (Fowler & Saltzman, 1993; Goldstein et al., 2007; Tilsen &

Goldstein, 2012). The gestural action is characterized by movements of a specific set of articulators which are temporally and spatially controlled in a coordinated way so as to constrict the vocal tract in order to reach a linguistically significant goal (Goldstein, et al., 2006).

Evidence that the gesture is an abstract unit of speech that is actively controlled is provided by studies which show that articulators, forming a particular gesture, compensate immediately when perturbed in order to realize the originally planned constriction goal (e.g., Kelso, Tuller,

Vatikiotis-Bateson & Fowler, 1984). These compensations typically occur within 20-30 ms after the perturbation. Articulators that are not considered part of a gesture are not actively controlled.

17

If movement is observed in these articulators, it is deemed a passive change due to biomechanical linkages with actively controlled articulators (Browman & Goldstein, 1986;

Fowler & Saltzman, 1993). For example, producing a /p/ requires the movements of the lower lip and jaw to be coordinated in such a way that they result in a bilabial gesture. The tongue tip will only show passive movement as a result of the raising jaw; although this is an articulatory movement within the vocal tract which can be measured, this movement is not part of a gesture

(see Fowler & Saltzman, 1993).

In describing how gestures behave and interact, the model distinguishes two distinct, but interrelated levels: a level of inter-gestural coordination, at which the strength of individual gestures and temporal relations between gestures are defined, and the level of inter-articulatory coordination, at which the tract variables (i.e., the constriction degree and location of a gesture) and the trajectories of individual articulators are defined (Saltzman & Byrd, 2000). Spatial characteristics are expressed in terms of constriction degree and location within the vocal tract.

Timing refers to a temporal constraint related to the activation of gestures and their associated articulators. The relative timing within and between speech gestures is typically expressed with phase measures (Saltzman & Byrd, 2000). Thus, gestures that overlap in time are in phase with each other. These gestural relative phases are not characterized by a fixed value but can vary depending on linguistic and para-linguistic factors such as stress, and rate and type (vowel versus consonants) of the intended segment (Saltzman & Byrd, 2000). The phasing serves as the adhesive with which the individual gestures are "glued" together (Goldstein, et al., 2006). By determining specific phase relations or couplings between sets of gestures as their patterns unfold over time, larger coordinated structures, including segments, syllables and words are realized. Many observed context-dependent phenomena in speech can be explained by this so- called coproduction of gestures (Goldstein & Fowler, 2003). Also the previously discussed

18 intrusions can be explained as gestures that overlap spatially and temporally. The reductions of target articulator movements can be seen as reduced gestures.

An important characteristic of the gesture is that it can be modeled within Task Dynamics as a critically damped oscillator that acts as a point attractor4 (Goldstein, et al., 2006; Saltzman &

Byrd, 2000; Saltzman & Munhall, 1989). Point attractors have been successful in describing skilled human movements in general (Turvey, 1990) and according to AP the behavior of articulatory gestures is no different (Goldstein, et al., 2006). This means that, according to dynamical systems theory, the behavior of speech gestures and interactions between them are shaped by rules of motion characteristic of movement in general (Saltzman, Löfqvist, Kay,

Kinsella-Shaw & Rubin, 1998; Van Lieshout & Neufeld, 2014), as has been emphasized earlier.

1.4.2 The inter-gestural coordination level

A general outline of the Task Dynamics model is shown in figure 1.1. At the inter-gestural coordination level, several gestural parameters and their timing relations are determined. Two different components are distinguished: the gestural planning oscillator and the gestural score.

The inter-gestural coordination level gets its input from the coupling graph. In this coupling graph, retrieved from the lexicon, the coupling between the individual oscillators (i.e., gestures) is specified. Oscillators can either be coupled in- or out-of-phase. Onset consonants followed by a vowel are always in phase, while vowels followed by a coda consonant are always out of phase (Goldstein et al., 2006; Nam, Goldstein & Saltzman, 2009). This coupling information is

4 A gesture is modelled as a point attractor, in terms of a damped, second-order linear differential equation. This resembles the behavior of a damped mass-spring system whose motion decays over time to an equilibrium position (Saltzman & Munhall,

1989).

19 input for the gestural planning oscillator (Figure 1.1), which forms part of the intergestural coordination level.

Figure 1-1: “The organization of the task-dynamic model of speech production (Saltzman and Munhall, 1989; Browman & Goldstein, 1992; Nam & Saltzman, 2003).” From: Goldstein, et al. (p.5, 2006).

In the gestural planning oscillator the individual oscillators settle into stable patterns, based on the input from the coupling graph (Goldstein, et al., 2006; Nam, et al., 2009). The output of the gestural planning oscillator is a coupled set of limit cycle oscillations with stabilized relative phases.

When the final relative phase values between the oscillators have been determined and are stabilized, they serve as input for the gestural score. In the gestural score, an example of which is shown in figure 1.2, individual gestures are specified in terms of their dynamic parameters stiffness, duration, target position, and damping coefficients that are characteristic for the specific gesture's point attractor system (Goldstein, et al., 2006; Saltzman & Munhall, 1989).

The parameters determine the strength with which the gesture shapes the vocal tract. The following constriction gestures are distinguished in English: bilabial, tongue tip, tongue body,

20 tongue dorsum, velum, and glottal (Browman & Goldstein, 1989).

Figure 1.2 gives an example of a gestural score for the word bad. The labial gesture for the onset consonant /b/ is in phase with the tongue body gesture for the vowel /æ/. In contrast, the tongue body and tongue tip gestures (i.e., the vowel and coda consonant) are out of phase.

Figure 1-2 “(a) A gestural score for the word “bad” showing the activation intervals for the three gestures composing the word and driving input to the interarticulator level. (b) The coupling graph for the word “bad” in which the lines indicate coupling relationships between pairs of gestures. Solid lines represent in-phase coupling, dashed lines represent anti-phase coupling.” from: Goldstein et al, (p. 6, 2006).

1.4.3 The inter-articulator coordination level

The output of the gestural score forms the input for the inter-articulator coordination level that accounts for coordination between individual articulators which are controlled by a specific gesture (Saltzman & Byrd, 2000). At this level, tract variables and articulator trajectories are specified. The context-independent tract variables specify the location and degree of constriction for a certain gesture. Stiffness, damping, and target coordinates, specified at the gestural level, are linked with these tract variables (Saltzman and Byrd, 2000). For example, the predefined strength of the tongue tip gesture during /t/ is higher than during /s/ and the tract variable is thus assigned a different constriction degree depending on the gestural requirements.

21

The context-dependent model-articulator specifies the articulatory trajectories, that is, which articulators shape a gesture and how these articulators are coordinated with each other

(Goldstein, et al., 2006; Saltzman, Nam, Goldstein & Byrd, 2006). The degree to which each articulator contributes to a constriction is influenced by which gestures are simultaneously produced, resulting in articulators that are either activated in succession or are overlapping.

Because of their in-phase timing requirements, the articulators needed to produce a following vowel, for example, are activated simultaneously with the articulators that form the initial consonant constriction (Fowler, 1980; Goldstein, et al., 2006; Nam, et al., 2009). When two overlapping gestures share an articulator but impose contradictory requirements on this articulator, its activation is averaged or blended (Fowler, 1995; Saltzman & Munhall, 1989).

Each gesture has an individual degree of blending strength that is related to the demands placed on the vocal tract. When overlap occurs between gestures with the same blending strength, the outcome will be an average of the two competing influences. The total contribution of individual gestures on the movements of the vocal tract results from how well individual gestures blend and the relative activation levels of gestures (Fowler & Saltzman, 1993; Saltzman

& Munhall, 1989). Therefore, while the underlying linguistic requirements of a gesture are context-independent, context-dependency is automatically introduced by temporally and spatially overlapping gestures for which resulting movements are subject to principles of blending (Fowler & Saltzman, 1993). The combination of activity that is most efficient for a specific task depends on the ongoing conditions (Fowler, 1995; Saltzman & Munhall,

1989).Take for example an onset /b/ followed by /ɪ/. These segments will be characterized by a bilabial gesture, containing upper and lower lip and jaw for /b/ and an almost simultaneously starting vowel gesture for /ɪ/, containing the tongue body and jaw. The two gestures can overlap without many difficulties, because they only share a jaw (Keating, Lindblom, Lubker &

22

Kreiman, 1994). However, in the case of a following high vowel such as /ɪ/, the jaw will be in a higher position than with a following lower vowel /ɑ/ (Recasens, 1999). When a /k/ and /ɪ/ overlap, the gestures, again, share a jaw, but they also share a tongue body. The /k/ will be fronted by the following /ɪ/; the location of the tongue body gesture is thus affected. The degree of articulatory variability depends, among other things, on the extent to which overlapping gestures share the same articulators (Fowler & Saltzman, 1993) and on the mechano-inertial properties of the articulators (Iskarous, Fowler & Whalen, 2010; Recasens & Espinosa, 2009).

When bilabial consonants are produced, for instance, the tongue is free to move and will be influenced more by the surrounding vowels than when a velar or alveolar consonant is produced.This (lack of) ability to resist disrupting influences from context is referred to as co- articulatory resistance (Fowler and Saltzman, 1993; Iskarous et al., 2010; Recasens & Espinosa,

2009). Some gestures are more prone to co-articulatoy influences from surrounding speech segments than others. Gestures which are more co-articulatorily resistant are also the ones that are the most aggressive in exerting their influence on other speech gestures (Fowler and

Saltzman, 1993; Recasens & Espinosa, 2009).

A very important characteristic of the inter-articulator coordination level is that explicit planning is not required: “in all cases, the articulatory movement patterns emerge as implicit consequences of the gesture-specific dynamical parameters” (Saltzman & Munhall, 1989, p.

341). Due to this characteristic, speech phenomena such as motor equivalence and compensatory articulation due to unexpected mechanical perturbations have been successfully explained in the context of this model (Fowler & Saltzman, 1993; Kelso, et al., 1984; Saltzman et al., 1998). Coarticulation can be explained as being a natural perturbation of articulatory movement, similar to the effects of artificial perturbations applied in an experimental setting

(see Fowler & Saltzman, 1993). The model also provides a framework that can explain the error

23 patterns found in the recent kinematic and acoustic studies summarized earlier, as will be outlined in the following section.

24

1.5 Intrusions and reductions in Task Dynamics

The results on articulatory errors mentioned in section 1.2 can be explained within the framework of AP. For example, the gestural blending principle is nicely illustrated in the findings of the vowel error study by Laver (1980), which revealed that producing words with different vowels that require activating similar muscles never resulted in errors whereas words with two vowels whose articulation triggered the use of different muscles resulted in blending of the two vowel gestures. Finally, as already mentioned, intrusions can be seen as resulting from overlapping simultaneously activated gestures, produced in-phase; reductions can be considered as the result of reducing gestural activations.

The principles of coordination described in the model of Task Dynamics also provide a possible framework for explaining the increased number of gestural intrusions observed towards the end of a trial in terms of synchronizing oscillators (Goldstein et al, 2007). As gestures can be described as limit cycle oscillators, they adhere to the general behavior of movement described and modeled in dynamical systems theory (DST; see Van Lieshout, 2004 for a review of DST principles as applied to speech). The principle of coupled dynamical systems goes back to the

17th century when Christian Huygens, a Dutch physicist, observed a set of pendulum clocks on a common wall. Although the pendulum clocks started out with different relative frequencies and phases, they spontaneously synchronized after a while (see Goldstein et al., 2007). This synchronizing principle is common in many systems, including the system of movement organization observed in humans (Turvey, 1990), and is often referred to as entrainment or frequency locking. The main principle behind synchronizing movements is that certain modes of coordination are more stable than others and that a system will always converge or move towards the most stable mode (Kelso, 1995; Peper & Beek, 1999; Turvey, 1990). In-phase (0- degree) movements are more stable than out-of phase movements (180-degree), as are simpler

25 frequency ratios, such as 1:2 or 1:1 compared to 2:3 (Peper, Beek & Van Wieringen, 1995a;

Peper, Beek & Van Wieringen, 1995b).

How does this mechanism of entrainment translate to articulatory coordination in speech? In the word pair cop top, the consonants are realized with a tongue dorsum constriction for /k/, a lip closure for /p/, a tongue tip constriction for /t/, and a final lip closure for /p/. When gestures are seen as oscillators, one oscillator describes the dynamics of the tongue tip for /t/ and has a frequency of 1 occurrence per word pair, another one relates to the tongue dorsum for /k/, and also has a frequency of 1; in contrast, the oscillator for the lower lip in /p/ has a frequency that is twice the frequency for tongue tip and tongue dorsum. The temporal relationship between each of the tongue gestures (tongue tip and tongue dorsum) and lip closure gestures is thus characterized by a ratio of 1:2, which in terms of the oscillator model is less stable than a ratio of 1:1 and thus more difficult to acquire and maintain from a coordinating point of view.

Because a higher frequency affects coupling strength (Kelso, 1995), the temporal relationship is less stable at a higher speaking rate. Driven by the two identical coda consonants, speakers add an extra tongue tip gesture to a tongue dorsum gesture during the onset of cop and/or an extra tongue dorsum during the tongue tip gesture in the word top to stabilize their coordination.

Therefore, by adding a tongue tip gesture during the activation of the tongue dorsum for the velar onset consonant of the word cop in the word pair top cop, the coupling relationship between the tongue and lip gestures is reduced to a simpler and more stable 1:1 ratio. This triggering role of the coda consonant in adding an extra onset gesture has been substantiated by a kinematic study in which the syllable was either open or closed (Pouplier, 2008). Closed syllables triggered more intrusions than open syllables, confirming the triggering role of the coda consonant which together with the alternating onset consonants form a 1:2 frequency mode. However, from a phonological point of view, having a simultaneous non-target tongue tip

26 and a target tongue dorsum gesture for /k/ is not appropriate because it conflicts with English phonotactics. The same applies to the /t/ realization for top but with the roles reversed. The specific patterns of gestural coordination required in a specific language, in which more unstable couplings can occur, are learned during the course of speech development (Goldstein et al.,

2007). In the Goldstein et al. study (2007), an intrusion was often accompanied by a reduction of the target gesture, suggesting that the intruding gesture triggered a simultaneous reduction to correct for the conflicting simultaneous tongue tip and tongue dorsum gesture. Interestingly, the resulting patterns of intrusions resemble error patterns that are frequently observed in databases of naturally occurring speech errors. It has been found that onsets of words typically switch with onsets in other words (Shattuck-Hufnagel, 1992) and words that share their rhyme but differ in their onsets, called the repeated phoneme effect, also participate more in errors (e.g., Dell &

Reich, 1980; Dell, 1984).

The fact that the system changes its mode for tongue gestures towards the higher frequency for bilabial gestures is consistent with data from bimanual finger tapping experiments. In a study in which skilled drummers performed polyrhythmic patterns, Peper, et al. (1995b) showed that the hand tapping the higher frequency influenced the slow hand more than vice versa, especially at a higher tapping rate. When the system is operating under greater speed demands, the overall coupling strength reduces. Under fast conditions, modes with more complex frequency ratios, such as 1:2, transform more quickly into modes with the simpler oscillator frequency ratio of 1:1

(Peper & Beek, 1999; Peper et al., 1995). This effect of rate on coupling strength was also found by Goldstein et al. (2007): intruding gestures were more frequent when speakers produced the trials at a higher speaking rate. At the start of the trial, it was still possible to have the appropriate 1:2 ratio; however, towards the end of the trial more errors were found, indicating a shift towards the 1:1 mode. An intrusion in these types of tasks can thus be seen as resulting

27 from the interplay between intrinsically stable modes of frequency locking and the learned patterns of more complex gestural coordination modes that conform to phonological requirements. Learning and attention can stabilize movement coordination to a certain extent, and thus can prevent the system from shifting to simpler, that is, more basic coordination modes

(Kelso, 1995; Kelso & Zanone, 2002).

Entrainment is considered a result of autonomous oscillatory dynamics, which work irrespective of the properties of the components and occur at every level (Kelso, 1995; Simko & Cummins,

2010). The hypothesis that intrusions and reductions result from a shift to a more stable pattern suggests that intrusions and reductions originate at the level of inter-gestural planning oscillators

(see p. 19): the 1:1 coupling mode is a result of stabilizing the relative phases of oscillators

(Nam, et al., 2009). Goldstein et al. (2007) also speculate in their discussion “that these errors and their oscillatory dynamics that underlie them are occurring at the speech planning level

(Goldstein et al. 2006; Saltzman, et al., 2006) rather than purely at the level of low-level articulatory execution.” (p. 408). The participants in their study reported that the intrusions and reductions occurred in their head in silent speech without the actual movements of the articulators. The authors took this observation as support for the role of dynamics in speech planning, just like the phonemes or features in the studies by Dell and Repka (1992) and Postma and Noordanus (1996).

The question thus arises as to how the vowel effect, found in the Goldstein et al. study (2007), is accounted for. It seems that co-articulatory constraints influence the degree to which the system entrains. Studies show that, besides frequency and rate, the degree to which a system is stable or unstable also depends on neuromuscular and biomechanical properties (Carson, 1996; Lagarde

& Kelso, 2006; Peper & Beek, 1999; Temprado, Chardenon & Laurent, 2001). By manipulating

28 positions of the forearm (supine, neutral, and prone) and pacing frequency in a finger-tapping paradigm, Carson (1996) showed that lower level neuromuscular factors such as muscle length affect finger coordination stability. Lagarde and Kelso (2006) found that flexion and extension of finger movement interacted with auditory and tactile information: “flexion” combined with

“sound” and “extension” combined with “touch” were both more stable than other combinations. Temprado, et al. (2001) confirmed that biomechanical constraints at the peripheral level can affect bimanual coordination dynamics of limbs. In line with these findings, it is hypothesized that the degree of intrusions and reductions is influenced by co-articulatory constraints as well. In this regard, it is less important at what exact level oscillatory dynamics occurs; the focus of this dissertation is rather whether dynamical principles and co-articulatory constraints, both measured at the articulatory level, interact. Oscillatory dynamics occur at all levels from the neural processes in the brain to the peripheral level of muscle activation and limb movement (Kelso, 1995). As such, even when intrusions and reductions originate at the speech planning level, it is hypothesized that lower level co-articulatory constraints at the inter- articulator level, such as blending strength, can influence the resulting coordination pattern of articulatory movements. The finding that /s/ resulted more frequently in a /ʃ/ production than vice versa, which has been explained as the result of the physical coupling between tongue tip and tongue dorsum possibly triggering an intrusion of the tongue body, points in that direction.

The vowel asymmetry found by Goldstein et al. (2007) also suggests that co-articulatory factors trigger intrusions.

Besides biomechanical factors influencing coordination dynamics, studies have revealed that entrainment can originate between a subject and its environment (Beek, Peper & Stegeman,

1995; Van Lieshout, 2004). An often observed effect is the so-called "anchoring" with the environment, in which movements entrain with, for example, a metronome (Fink, Foo, Jirsa &

29

Kelso, 2000; Repp, 2005). Both visual (Bogaerts, Buekers, Zaal & Swinnen 2003; Roerdink,

Peper & Beek, 2005), and auditory information (Lagarde & Kelso, 2006; Namasivayam, Van

Lieshout, McIlroy & de Nil, 2009) have been shown to stabilize coordination dynamics. For example, hands start oscillating in-phase with increased rates under pressure of an external stimulus such as an oscillating target signal (see Peper & Beek, 1998). Roerdink, et al. (2005) found significant effects of visual feedback on the stability between unimanual hand motion and an external target stimulus. Lagarde & Kelso (2006) showed that both auditory and tactile stimuli stabilized flexion and extension movement of fingers.

Additional evidence that suggests that auditory information stabilizes speech production is the observation that speakers adjust their speech by speaking more loudly and slowly, consequently employ larger speech movement amplitudes when deprived of auditory information (e.g., Huber

& Chandrasekaran, 2006; Namasivayam & Van Lieshout, 2011). These adjustments suggest that the system is trying to stabilize the movements by enabling stronger kinesthetic feedback entrainment with neural oscillators when the auditory source is eliminated (Namasivayam &

Van Lieshout, 2011; Namasivayam, et al., 2009; Ridderikhoff, Peper & Beek, 2007; Van

Lieshout, Hulstijn & Peters, 2004). Kinesthetic signals are coupled with neural oscillators so that the output of a given system is stabilized (Van Lieshout et al., 2004; Williamson, 1998).

The effect of those compensatory mechanisms, based on proprioceptive loops in particular, is hypothesized to be reduced by certain speaking conditions, like speaking faster and using smaller movement ranges, which lead to a decreased stability of movement output and coordination. In line with these assumptions, Ridderikhof et al. (2007) showed that stronger kinesthetic feedback signals stabilized movement coordination. These researchers instructed participants to coordinate movements of the right hand with a motor driven left hand, either in- or out-of-phase. Participants had to activate the muscles of the left hand or keep these muscles

30 as relaxed as possible. In the active condition, timing error corrections were more pronounced, suggesting that expected sensory sensation provides an extra reference for error correction

(Ridderikhof et al., 2007). Smaller movements are expected to result in a decreased feedback gain which at some point may impact coupling strength and increase coordination variability

(Van Lieshout et al., 2004).

Additional evidence for bi-directionality between different levels of speech production has been found in studies in which articulators were perturbed (Saltzman, 1992; Saltzman, et al., 1998).

Perturbing the lower lip when speakers produced sequences such as /pæpæpæ/ and /pǝ'ѕæpæpl/, permanently changed the relative phasing between the lower lip and the laryngeal gestures, suggesting inter-gestural timing and inter-articulatory dynamics have to be coupled in a bidirectional manner. A change in articulatory state is adjusting the gestural activation pattern, providing evidence for a feedback loop from the inter-articulatory level back to the gestural level.

Although this evidence suggests a possible role for feedback from the interarticulatory level to the gestural level, no feedback loops, proprioceptive or auditory, have been specified in the current Task Dynamics model of AP that can account for its possible effects on speech coordination. In AP the goals of speech production are gestural targets (Fowler, 2007; Goldstein

& Fowler, 2003) which structure the acoustic speech signal. Evidence for this is the fact that infants, for example, can imitate vowel productions from an adult although the acoustic result is not the same. The hypothesis is that these infants produce and perceive “distal events” in the form of gestures (Goldstein & Fowler, 2003). Listeners, in turn, retrieve the gestural information from the acoustic signal (e.g., Goldstein & Fowler, 2003). Interestingly, studies have shown that speakers adjust their articulations immediately after an articulator has been disturbed, a

31 mechanism known as motor equivalence (Kelso et al., 1984; Saltzman et al. 1998). Given the fact that the adjustments observed in the perturbation studies are fast, i.e., within 20-30 ms after perturbation, it is unlikely that auditory information is involved in correcting this perturbation.

This raises the question what role auditory information plays in speech coordination and whether auditory feedback should be specified at all within the model of Task Dynamics and

AP.

Error studies show that auditory feedback information plays a significant role in error detection and correction (Postma, 2000; Postma & Noordanus, 1996) and that auditory information is used to maintain speech coordination (Lackner & Tuller, 1979). However, not much is known about how the presence of auditory information controls coordination dynamics to prevent speech errors. Postma and Kolk (1992) and Postma, Kolk and Povel (1991) revealed that noise masked and unmasked speech resulted in a similar number of phonemic speech errors, which indeed suggest that auditory information does not affect the number of speech errors produced.

However, because the results of these studies are, again, based on perceptual transcription data

(see 1.2 for the limitations regarding analysis based on perceptual judgments), more data is needed to investigate how auditory information affects the more fine-grained kinematic patterns observed with intrusions and reductions.

32

1.6 Studies and Hypotheses

The dissertation presents data from three studies which investigate whether co-articulatory constraints and available auditory information influence the coupling strength between gestures and consequently affect the number of intrusions and reductions.

1.6.1 Study 1

Before intrusions and reductions can be interpreted and linked to certain lower level articulatory mechanisms, more quantitative data are needed to identify general co-articulatory properties of speech that may affect the distribution of normal articulatory variability in typical speech error tasks (see also Frisch, 2007). Therefore, the main goal of the first study, described in chapter 2, is to investigate factors that contribute to normal variability in movement patterns resulting from word pairs produced in a repetitive speech task. The study quantifies how different types of word pairs (i.e., cop top versus top top) and phonetic contexts (i.e., different consonant-vowel- consonant combinations such as cop top versus kip tip) affect movement range and normal variability of speech movements in repetitive speech. The stimuli consist of CVC-CVC word pairs with alternating onset consonants (i.e., cop top) and CVC-CVC word pairs in which the onsets are identical (i.e., top top). Four different vowels are selected based on their different locations within the vowel space: a high back vowel /u/, a low back vowel /ɑ/, a high front vowel /ɪ/ and a lower front vowel /æ/5. In addition to the two tongue articulators “tongue

5 These 4 vowels can be distinguished along the front-back and high-low dimensions of the vocal tract. However, vowels in

English are also frequently categorized as being tense and lax. This vowel feature, however, is controversial and describing it

33 dorsum” and “tongue tip”, the articulator “lower lip” is selected, as this articulator involves a different organ. Speakers are instructed to produce these word pairs in a repetitive speech task.

It is predicted that:

1. Movement range and variability values of articulator movements in word pairs with

alternating onset consonants will differ from values measured during word onsets with

identical onsets, based on the observations described in section 1.2 (Goldrick &

Blumstein, 2006; Goldstein et al., 2007; Stearns, 2006). More specifically, based on the

observations of Frisch (2007) and Stearns (2006), it is predicted that movement ranges of

articulators will be larger in word pairs with different onsets than in word pairs with

identical onsets.

2. Furthermore, based on evidence that shows that type of consonant cluster restricts

articulator movements to different degrees (e.g., Recasens, Pallares & Solanas, 1993),

the combination of a coda consonant and the following onset (e.g., t#p in the word pair

cot pot or p#t in the word pair cop top) is expected to influence the movements of the

target and non-target articulators during the onset as well.

with exact physiological or acoustic correlates is problematic (Durand, 2005; Stevens, 2000). A few characteristics however, seem to be consistent. Only lax vowels can occur before /ɳ/ and lax vowels never can never occur in open syllables (see for a review Durand, 2005). Based on these characteristics the vowels in words like kip, cap, and cop can be categorized as “lax” and the vowel in toop as a tense vowel. When the results suggest that the lax-tense distinction might have been a factor, this will be discussed. The vowels in cop top and tip kip were selected for a direct comparison with the same stimuli used in Goldstein et al.

(2007). The other vowels were added to create a high-low and front-back distinction.

34

1.6.2 Study 2

The second study, described in chapter 3, examines whether the relative position of an articulator influences the occurrence of intrusions and reductions. The word pairs with alternating onsets collected during the first study form the stimuli for this second study as well.

The different CVC-CVC combinations, which are hypothesized to result in different degrees of blending strength and resistance between the gesture forming the onset consonant and the gesture forming the vowel, and between the gesture forming the onset and the non-target articulator, are predicted to affect the patterns of intrusions and reductions.

1. Based on the data in Goldstein et al. (2007), it is hypothesized that a high vowel context

results in more intrusions than low vowel context because the tongue articulator is

already in a high constrained position during the onset consonants.

2. In addition, as the tongue articulators are free to move during a bilabial closure but are

anatomically attached to each other, it is hypothesized that the number of intrusions of

the tongue articulators (tongue tip as well as tongue dorsum) during a bilabial closure

differs from the number of intrusions of the tongue dorsum during a tongue tip

constriction and vice versa. In a similar line of reasoning, the number of reductions of

bilabial closures is predicted to differ from the tongue articulators. How exactly coupling

strength is affected can not be predicted at this point: it is possible that, when an

articulator is not constrained, this articulator entrains easier with another articulator.

However, it can also be predicted that, because of mechanical limitations, articulators

entrain faster when they are constrained by linkage.

35

1.6.3 Study 3

The final study, described in chapter 4, investigates if (lack of) auditory information contributes to patterns of intrusions and reductions. The effect of auditory information on the number of intrusions and reductions is investigated by masking the speech of listeners with white noise. A different set of data is collected for this study with a new group of participants.

The hypothesis tested is that auditory information serves as an external stabilizing cue in speech movement coordination (see e.g., Lagarde & Kelso, 2006; Namasivayam et al., 2009; Repp,

2005). Consequently, it is predicted that the 1:2 mode of coordination is more easily maintained when auditory information is available.

1.6.4 Methods

Movement data for both stimuli sets were collected with Electro Magnetic Articulography

(EMA), which allows for 3D-recordings of articulatory movements inside and outside the vocal tract. The measurements of the EMA system are based on electromagnetic fields that are generated by 6 transmitters, driven at different frequencies and attached to a cube in a specific configuration. The head of the participant is located inside this cube. On each of the participant's articulators of specific interest (see figure 1.3), small sensor coils are attached with surgical glue

(either PeriAcryl or Cyanoveneer). When the sensor coils are placed in the magnetic field, a current is induced, which is proportional to the distance of the coils to each of the transmitters.

For the purpose of both studies, sensor coils are placed on the mid-sagittal vermilion border of the upper and lower lip, the tongue tip (1 cm behind the apex), the tongue body (3 cm behind the tongue tip coil), and the tongue dorsum. The tongue dorsum coil is placed depending on how far

36 back the participant tolerates the sensor. Chapter 2 describes the process of data processing in more detail.

The current dissertation takes a statistical approach in defining intrusions and reductions which differs in some aspects from the earlier studies described in section 1.3. This is not a trivial matter, as one of the major challenges has been to define what part of movement variability can be quantified as an error and what part belongs to normal variability in a data set since the rise of kinematic and acoustic studies. This issue applies to all types of acoustic or kinematic data, whether it involves voicing or articulatory movements of the tongue and the lips. Consequently, researchers defined errors in these studies in a variety of ways, mainly approaching errors statistically. The most important differences involve the use of a set of control stimuli or not.

Whereas Mowrey and MacKay (1990) and Frisch and Wright (2002) did not specify a quantitative independent measure, Goldrick and Blumstein (2006) and the series of studies by

Pouplier and colleagues (2003; 2007; 2008) used a set of control stimuli to quantify an error threshold. However, the use of control stimuli is avoided in the studies in this dissertation because of their different co-articulatory properties when compared to stimuli with alternating onsets, as shown in the findings in chapter 2. Intrusions and reductions in chapter 3 and 4 are therefore defined based on the normal distributed variability of articulatory movements within a trial where each articulator has both target and non-target activations: outliers form the intrusions and reductions. Both studies calculate intrusions and reductions based on medians and median absolute deviations (MAD) of movement ranges of individual articulators within a trial.

A movement range value of two or more MAD’s higher than the median non-target movement range is labeled an intrusion and a value two or more MAD’s lower than the target median is labeled a reduction. How the movement ranges of target and non-target articulators are

37 determined and how intrusions and reductions are defined is described in detail in chapter 2 and

3.

Figure 1-3 the figure shows the position of 8 of the 12 coils: the tongue dorsum, tongue tip and lower lip are of interest for the two studies. 4 more coils are located behind the two ears and both sides of the mouth (part of the figure is from: http://www.asel.udel.edu/speech/tutorials/production/struct.htm).

38

1.7 References

Baars, B., & Motley, M. (1976). Spoonerisms as sequencer conflicts: Evidence from artificially

elicited errors. The American Journal for Psychology, 89(3), 467-484.

Beek, P. J., Peper, C. E., & Stegeman, D. (1995). Dynamical models of movement coordination.

Human Movement Science, 14, 573-608.

Bogaerts, H., Buekers, M. J., Zaal, F. T., & Swinnen, S. P. (2003). When visuo-motor

incongruence aids motor performance: the effect of perceiving motion structures during

transformed visual feedback on bimanual coordination. Behavioural Brain Research, 138,

45-57.

Boucher, V. (1994). Alphabet-related biases in psycholinguistic enquiries: Considerations for

direct theories of speech production and perception. Journal of Phonetics, 22(1), 1-18.

Browman, C. P., & Goldstein, L. (1986). Towards an articulatory phonology. In Colin Ewen &

John Anderson (Eds.), Phonology Yearbook (pp. 219-252). Cambridge University press.

Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units.

Phonology, 6, 201-251.

Browman, C. P., & Goldstein, L. (1990). Representation and Reality: Physical Systems and

Phonological Structure. Haskins Laboratories Status Report on Speech Research,

105/106, 83-92.

Browman, C. P., and Goldstein, L. (1992), Articulatory phonology: An overview. Phonetica,

49(3-4), 155-180.

39

Buchwald, A., & Miozzo, M. (2012). Phonological and motor errors in individuals with

acquired sound production impairment. Journal of Speech Language and Hearing

Research, 55(5), 1573-1586.

Buckingham, H., & Yule, G. (1987). Phonemic false evaluation: theoretical and clinical aspects.

Clinical and Phonetics, 1(2), 113-125.

Butterworth, B., & Whittaker, S. (1980). Peggy Babcock's relatives. In George E. Stelmbach &

Jean Requin (Eds.), Tutorials in Motor Behavior, Vol. 1 (pp. 647-656). Amsterdam: North

Holland.

Carson, R. (1996). Neuromuscular-skeletal constraints upon the dynamics of perception-action

coupling. Experimental Brain Research, 110, 99-110.

Cohen, A. (1980). Correcting of Speech Errors in a Shadowing Task. In. V. A. Fromkin (ed),

Errors in : Slips of the Tongue, Ear, Pen and Hand (pp. 157-163).

N. Y., London: Academic Press.

Corley, M., Brocklehurst, P., & Moat, H. (2011). Error biases in inner and overt speech:

Evidence from tongue twisters. Journal of Experimental Psychology: Learning, Memory

and Cognition, 37(1), 162-175.

Dell, G. S. (1980). Phonological and lexical encoding in speech production: an analysis of

naturally occurring and experimentally elicited speech errors (Unpublished doctoral

dissertation), University of Toronto, Toronto.

40

Dell, G. S. (1984). Representation of serial order in speech: Evidence from the repeated

phoneme effect in speech errors. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 10(2), 222-233.

Dell, G. S. (1988). The retrieval of phonological forms in production: Tests of predictions from

a connectionist model. Journal of Memory and Language, 27, 124-142.

Dell, G. S., & Reich, P. A. (1980). Towards a Unified Model of Slips of the Tongue. In. V. A.

Fromkin (ed), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand

(pp. 273-286). N. Y., London: Academic Press.

Dell, G. S., & Repka, R. R. (1992). Errors in inner speech. In Bernard J. Baars (ed.)

Experimental Slips and Human Error: Exploring the Architecture of Volition (pp. 237-

262), Plenum Press, New York.

Dell, G. S., & Sullivan, J. M. (2004). Speech errors and language production:

neuropsychological and connectionist perspectives. The Psychology of Learning and

Motivation, 44, 63-108.

Durand, J. (2005). Tense/lax, the vowel system of English and phonological theory. In P. Carr,

J. Durand, & C. Ewen (eds) Headhood, Elements, Specification andContrastivity (pp. 77-

98), Amsterdam: John Benjamins.

Fink, P., Foo, P., Jirsa, V., & Kelso, J. A. S. (2000). Local and global stabilization of

coordination by sensory information. Experimental Brain Research, 134, 9-20.

Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of Phonetics,

8(113), 133.

41

Fowler, C. A. (1995). Speech production. In Joanne L. Miller & Peter D. Eimas (Eds.), Speech,

Language, and Communication (pp. 29-61). Academic Press.

Fowler, C. A. (2007). Speech production. In M. G. Gaskell (Eds.), The Oxford Handbook of

Psycholinguistics (pp. 489-501). Oxford University Press.

Fowler, C. A., & Saltzman, E. L. (1993). Coordination and coarticulation in speech production.

Language and Speech, 36, 171-195.

Frisch, S. A. (2007). Walking the Tightrope between Cognition and Articulation: The State of

the Art in the Phonetics of Speech Errors. In C. T. Schutze and V. S. Ferreira (Eds), MIT

Working Papers in Linguistics, Vol. 53. The State of the Art in Speech Error Research

(pp.155-171). Cambridge, MA.

Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic

analysis of slips of the tongue. Science, 30(2), 139-162.

Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language: Journal of

the Linguistic Society of America, 47(1), 27-52.

Gafos, A. I., & Benus, S. (2006). Dynamics of Phonological Cognition. Cognitive Science, 30,

905-943.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning to

articulatory processes: Evidence from tongue twisters. Language & Cognitive Processes,

21(6), 649-683.

42

Goldrick, M., & Chu, K. (2013). Gradient co-activation and speech error articulation: comment on Pouplier and Goldstein (2010). Language and Cognitive Processes, Advance online publication. doi:10.1080/01690965.2013.807347

Goldrick, M., Ross Baker, H., Murphy, A., & Baese-Berk, M. (2011). Interaction and

representational integration: Evidence from speech errors. Cognition, 121, 58-72

Goldstein, L., Byrd, D., & Saltzman, E. L. (2006). The Role of Vocal tract gestural action units

in understanding the evolution of phonology. In Michael Arbib (Eds.), From Action to

Language: The Mirror Neuron System (pp. 215-249). Cambridge university press.

Goldstein, L., & Fowler, C. A. (2003). Articulatory Phonology: A Phonology for Public

Language Use. In N. Schiller & A. Meyer (Eds.), Phonology and Phonetics, Vol. 6 (pp.

159-207). Berlin, : Mouton de Gruyter.

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E. L., & Byrd, D. (2007). Dynamic action units

slip in speech production errors. Cognition, 103(3), 386-412.

Guest, D. (2002). Phonetic features in language production: an experimental examination of

phonetic feature errors (Unpublished doctoral dissertation), University of Illinois, Urbana-

Champaign.

Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical model of phase transitions in

human hand movements. Biological Cybernetics, 51, 347-356.

Hockett, C. (1967). Where the tongue slips, there slip I. To Honor Roman Jakobson, Vol. II.

Janua Linguarum, 32, 910-936.

43

Huber, J., & Chandrasekaran, B. (2006). Effects of increasing sound pressure level on lip and

jaw movement parameters and consistency in young adults. Journal of Speech Language

and Hearing Research, 49, 1368-1379.

Iskarous, K., Fowler, C., & Whalen, D. (2010). Locus equations are an acoustic expression of

articulator synergy. Journal of the Acoustical Society of America. 128(4), 2021-2032.

Keating, P. A., Lindblom, B., Lubker, J., & Kreiman, J. (1994). Variability in jaw height for

segments in English and Swedish VCVs. Journal of Phonetics, 22, 407-422.

Kelso, J. A. S. (1995). Dynamic Patterns. MIT Press.

Kelso, J. A. S., Tuller, B., Vatikiotis-Bateson, E., & Fowler, C. A. (1984). Functionally specific

articulatory cooperation following jaw perturbations during speech: Evidence for

coordinative structures. Journal of Experimental Psychology: Human Perception and

Performance, 10, 812-832.

Kelso, J. A. S., & Zanone, P. G. (2002). Coordination dynamics of learning and transfer across different effector systems. Journal of Experimental Psychology: Human Perception and

Performance, 28(4), 776-797.

Kent, R. (1996). Hearing and believing: Some limits to the auditory-perceptual assessment of

speech and voice disorders. American Journal of Speech Language Pathology, 5(3), 7-23.

Kochetov, A., & Radisic, M., (2009). Latent consonant harmony in Russian: Experimental

evidence for Agreement by Correspondence. In Maria Babyonyshev, Darya Kavitskaya,

& Jodi Reich (Eds.), Proceedings of the Seventeenth Formal Approaches to Slavic

44

Linguistics (FASL) (pp. 111-130). Ann Arbor, MI: Jindrich Toman’s Michigan Slavic

Publications.

Koenig, L. (2004). Towards a physical definition of the vowel systems of . In Victor

H. Yngve & Zdzislaw Wasik (Eds.), Hard-Science Linguistics (pp. 49-66). Continuum.

Lackner, J., & Tuller, B. (1979). Role of efference monitorng in the detection of self-produced

speech errors. In M. Cortese & C. Edward (Eds.), Sentence Processing: Psycholinguistic

Studies Presented to Merrill Garrett (pp. 281-294). Hillsdale, NJ: Lawrence Erlbaum

Associates.

Lagarde, J., & Kelso, J. A. S. (2006). Binding movement, sound and touch: multimodal

coordination dynamics. Experimental Brain Research, 173, 673-688.

Laver, J. D. M. (1980). Slips of the tongue as nueromuscular evidence for a model of speech

production. In Hans W. Dechert & Manfred Raupach (Eds.), Temporal Variables in

Speech. Studies in Honour of Frieda Goldman-Eisler (pp. 21-26). The Hague Mouton.

Levelt, W., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production.

Behavioral and Brain Sciences, 22, 1-75.

Livesay, J., Ashley, L., & Samaras, M. (1996). Covert speech behavior during a silent language

recitation task. Perceptual & Motor Skills, 83, 1355-1362.

Marin, S., Pouplier, M., & Harrington, J. (2010). Acoustic consequences of articulatory

variability during productions of /t/ and /k/ and its implications for speech error research.

Journal of the Acoustical Society of America, 127(1), 445-461.

45

McMillan, C. T., & Corley, M. (2010). Cascading influences on the production of speech:

Evidence from articulation. Cognition, 117, 243-260.

McMillan, C. T., Corley, M., & Lickley, R. (2009). Articulatory evidence for feedback and

competition in speech production. Language & Cognitive Processes, 24(1), 44-66.

Meyer, A. S. (1992). Investigation of phonological encoding through speech error analyses:

Achievements, limitations, and alternatives. Cognition, 42, 181-211.

Moller, J., Jansma, B., Rodriguez-Fornells, A., & Munte, T. (2007). What the brain does before

the tongue slips. Cerebral Cortex, 17, 1173-1178.

Mowrey, R., & MacKay, I. (1990). Phonological primitives: Electromyographic speech error

evidence. Journal of the Acoustical Society of America, 88(3), 1299-1312.

Nam, H., & Saltzman, E. L. (2003) A competitive, coupled oscillator of syllable structure. In M.

J. Solé, D. Recasens, and J. Romero (Eds) Proceedings of the XIIth International

Congress of Phonetic Sciences, Vol. 3 (pp. 2253-2256), Barcelona.

Namasivayam, A. K., & Van Lieshout, P. (2011). Speech motor skill and stuttering. Journal of

Motor Behavior, 43(6), 477-489.

Namasivayam, A. K., Van Lieshout, P., McIlroy, W., & de Nil, L. (2009). Sensory feedback

dependence hypothesis in persons who stutter. Human Movement Science, 28, 688-707.

Nooteboom, S. (1969). The tongue slips into patterns. NOMEN: Leyden studies in linguistics

and phonetics, The Hague: Mouton. 114-132.

46

Nooteboom, S., & Quene, H. (2007). Strategies for Editing out Speech Errors in Inner Speech.

Proceedings of the XVI International Conference on Spoken Language Processing (pp.

1945-1948), Saarbrücken, Germany.

Parrell, B. (2012). The role of gestural phasing in Western Andalusian Spanish aspiration.

Journal of Phonetics, 40(1), 37-45.

Peper, C. E., & Beek, P. J. (1998). Distinguishing between the effects of frequnecy and

amplitude on interlimb coupling in tapping a 2:3 polyrhythm. Experimental Brain

research, 118, 78-92.

Peper, C. E., & Beek, P. J. (1999). Modeling rhythmic interlimb coordination: The roles of

movement amplitude and time delays. Human Movement Science, 18, 263-280.

Peper, C. E., Beek, P. J., & Van Wieringen, P. C. W. (1995a). Coupling strength in tapping a 2:3

polyrhythm. Human Movement Science, 14, 217-245.

Peper, C. E., Beek, P. J., & Van Wieringen, P. C. W. (1995b). Multifrequency coordination in

bimanual tapping: Asymmetrical coupling and signs of supercriticality. Journal of

Experimental Psychology, 21(5), 1117-1138.

Pirello, K., Blumstein, S. E., & Kurowski, K. (1997). The characteristics of voicing in syllable-

initial fricatives. Journal of the Acoustical Society of America, 101(6), 3754-3765.

Postma, A. (2000). Detection of errors during speech production: A review of speech

monitoring models. Cognition, 77, 97-131.

47

Postma, A., & Kolk, H. (1992). The effects of noise masking and required accuracy on speech

errors, disfluencies, and self-repairs. Journal of Speech and Hearing Research, 35(3), 537-

544.

Postma, A., Kolk, H., & Povel, D. (1991). Disfluencies as Resulting from Covert Self-Repairs

applied to Internal Speech Errors. In H. F. M. Peters, W. Hulstijn & C. W. Starkweather

(Eds.), Speech Motor Control and Stuttering (pp. 141-147). Elsevier Science Publishers

B.V.

Postma, A., & Noordanus, C. (1996). Production and detection of speech errors in silent,

mouthed, noise-masked, and normal auditory feedback speech. Language and Speech,

39(4), 375-392.

Pouplier, M. (2003). Units of Phonological Encoding: Empirical evidence. (Unpublished

doctoral dissertation), Yale University, New Haven.

Pouplier, M. (2007). Tongue kinematics during utterances elicited with the SLIP technique.

Language and Speech, 50(3), 311-341.

Pouplier, M. (2008). The role of a coda consonant as error trigger in repetition tasks. Journal of

Phonetics, 36, 114-140.

Pouplier, M., & Goldstein, L. (2005). Asymmetries in the perception of speech production

errors. Science, 33(1), 47-75.

Pouplier, M., & Goldstein, L. (2010). Intention in articulation: Articulatory timing in alternating

consonant sequences and its implications for models of speech production. Language and

Cognitive Processes, 25(5), 616-649.

48

Pouplier, M., & Goldstein, L. (2013). The relationship between planning and execution is more

than duration: response to Goldrick & Chu. Language and Cognitive Processes, Advance

online publication. DOI:10.1080/01690965.2013.834063

Pouplier, M., & Hardcastle, W. (2005). A re-evaluation of the nature of speech errors in normal

and disordered speakers. Phonetica, 62(2-4), 227-243.

Recasens, D. (1999). Lingual coarticulation. In W. Hardcastle & N. Hewlett (Eds.),

Coarticulation (pp. 80-104). Cambridge: University press.

Recasens, D., & Espinosa, A. (2009). An articulatory investigation of lingual coarticulatory

resistance and aggressiveness for consonants and vowels in Catalan. Journal of the

Acoustical Society of America, 125(4), 2288-2298.

Recasens, D., Pallares, M., & Solanas, A. (1993). An electropalatographic study of stop

consonants. Speech Communication, 12, 335-355.

Repp, B. (2005). Sensory synchronization: A review of the tapping literature. Psychonomic

Bulletin and Review, 12(6), 969-992.

Ridderikhoff, A., Peper, C. E., & Beek, P. J. (2007). Error correction in bimanual coordination

benefits from bilateral muscle activity: evidence from kinesthetic tracking. Experimental

Brain Research, 181, 31-48.

Roerdink, M., Peper, C. E., & Beek, P. J. (2005). Effects of correct and transformed visual

feedback on rhythmic visuo-motor tracking: Tracking performance and visual search

behavior. Human Movement Science, 24, 379-402.

49

Saltzman, E. L. (1992). Biomechanical and haptic factors in the temporal patterning of limb and

speech activity. Human Movement Science, 11, 239-251.

Saltzman, E. L., & Byrd, D. (2000). Task-dynamics of gestural timing: Phase windows and

multifrequency rhythms. Human Movement Science, 19, 499-526.

Saltzman, E. L., Löfqvist, A., Kay, B., Kinsella-Shaw, J., & Rubin, P. (1998). Dynamics of

intergestural timing: A perturbation study of lip-larynx coordination. Experimental Brain

Research, 123(4), 412-424.

Saltzman, E. L., & Munhall, K. G. (1989). A Dynamical Approach to Gestural Patterning in

Speech Production. Haskins Laboratories Status Report on Speech Research, 1(4), 333-

382.

Saltzman, E. L., Nam, H., Goldstein, L., & Byrd, D. (2006). The distinctions between state,

parameter and graph dynamics in sensorimotor control and coordination. In A .G. Feldman

(Ed.), Progress in Motor Control: Motor Control and Learning over the Lifespan. New

York: Springer (pp. 63-73). Publisher Springer US.

Samuel, A. (1981). The role of bottom-up confirmation in the phonemic restoration illusion.

Journal of Experimental Psychology: Human Perception and Performance, 7(5), 1124-

1131.

Samuel, A. (1996). Phoneme Restoration. Language and Cognitive Processes, 11(6), 647-654.

Shattuck-Hufnagel, S. (1979). Speech Errors as Evidence for a Serial-Ordering Mechanism in

Sentence Production. In William E. Cooper & Edward C. T. Walker (Eds.). Sentence

50

Processing: Psycholinguistic Studies Presented to Merrill Garrett. Hillsdale, NJ:

Lawrence Erlbaum. 295-342.

Shattuck-Hufnagel, S. (1992). The role of word structure in segmental serial ordering.

Cognition, 42, 213-259.

Shattuck-Hufnagel, S., & Klatt, D. (1979). The limited use of distinctive features and

markedness in speech production: evidence from speech error data. Science, 18(1), 41-55.

Simko, J., & Cummins, F. (2010). Embodied task dynamics. Psychological Review, 117(4),

1229-1246.

Stearns, A. (2006). Production and Perception of Place of Articulation Errors (Unpublished

master’s thesis), College of Arts and Sciences, University of South Florida.

Stemberger, J. (1991). Apparent anti-frequency effects in language production: The addition

bias and phonological underspecification. Journal of Memory and Language, 30, 161-185.

Stevens, K. N. (2000). Acoustic Phonetics. MIT Press.

Temprado, J., Chardenon, A., & Laurent, M. (2001). Interplay of biomechanical and

neuromuscular constraints on pattern stability and attentional demands in a bimanual

coordination task in human subjects. Neuroscience Letters, 303, 127-131.

Tilsen, S., & Goldstein, L. (2012). Articulatory gestures are individually selected in production.

Journal of Phonetics, 40, 764-779.

Tuller, B. (1984). On categorizing aphasic speech errors. Neuropsychologia, 22(5), 547-557.

Turvey, M. (1990). Coordination. American Psychology, 45(8), 938-953.

51

Van Lieshout, P. (2004). Dynamical systems theory and its application in speech. In B.

Maassen, R. Kent, H. Peters, P. Van Lieshout & W. Hulstijn (Eds.), Speech Motor Control

in Normal and Disordered Speech (pp. 51-82), Oxford University Press.

Van Lieshout, P., Hulstijn, W., & Peters, H. (2004). Searching for the weak link in the speech

production chain of people who stutter: A motor skill approach. In B. Maassen, R. Kent,

H. F. M. Peters & W. Hulstijn (Eds.), Speech Motor Control in Normal and Disordered

Speech (pp. 313-356). Oxford: University Press.

Van Lieshout, P., & Neufeld, C. (2014). Coupling dynamics interlip coordination in lower lip

load compensation. Journal of Speech Language and Hearing Research, 57, 597-615.

Warren, R. (1970). Perceptual restoration of missing speech sounds. Science, 167(3917), 392-

393.

Wickelgren, W. (1965). Distinctive features and errors in short-term memory for English

vowels. Journal of the Acoustical Society of America, 38(4), 583-589.

Williamson, M. M. (1998). Neural control of rhythmic arm movements. Neural Networks, 11(7-

8), 1379-1394.

Wood, S., Hardcastle, W., & Gibbon, F. (2011). EPG patterns in a patient with phonemic

paraphasic errors. Journal of Neurolinguistics, 24, 213-221.

2 Chapter 2

The following study has been published in the Journal of the Acoustical Society of America

(Slis, A. W., & Van Lieshout, P. H. H. M. (2013). The Effect of phonetic Context on Speech

Movements in repetitive Speech. Journal of the Acoustical Society of America, 134(6), 4496-

4507). The lay out differs from the remainder of the dissertation. The heading numbers, figure, and table captions have been adjusted to fit the dissertation. All the figures and tables appear at the end of the manuscript.

Reproduced with permission from JOURNAL OF THE ACOUSTICAL SOCIETY OF

AMERICA. Copyright 2013, Acoustical Society of America.

A link to the published paper can be found at http://dx.doi.org/10.1121/1.4828834

52

53

The Effect of phonetic Context on Speech Movements in repetitive Speech

Anneke W. Slis6, and Pascal van Lieshout

Department of Speech Language Pathology, Oral Dynamics Lab, 160-500 University Avenue,

University of Toronto, Toronto, Ontario, M5G 1V7,

October 16, 2013

Phonetic Context Effects on Speech Movements

6 Author to whom correspondence should be addressed: Electronic mail: [email protected]

54

2.1 ABSTRACT

This study examined how, in repetitive speech, articulatory movements differ in degree of variability and movement range depending on co-articulatory constraints manipulated by phonetic context and type of CVC-CVC word pair. These pairs consisted of words that either differed in onset consonants but shared rhymes, or were identical. Co-articulatory constraints were manipulated by employing different combinations of vowels and consonants. The word pairs were produced in a repetitive speech task at a normal and fast speaking rate. Articulatory movements were measured with 3D electro-magnetic articulography. As measures of variability, median movement ranges and the coefficient of variation of target and non-target articulators were determined. To assess possible biomechanical constraints, correlation values between target and simultaneous non-target articulators were calculated as well. The results revealed that word pairs with different onsets had larger movement ranges than word pairs with identical onsets. In identical word pairs, the coefficient of variation showed higher values in the second than in the first word. This difference was not present in the alternating onset word pairs. For both types of word pairs, higher speaking rates showed higher correlations between target and non-target articulators than lower speaking rates, suggesting stronger biomechanical constraints for the former condition.

PACS number: 43.70 Jt

55

2.2 INTRODUCTION

2.2.1 Background

A long-standing challenge in speech production research involves assessing variability in speech movements (e.g., Rudy & Yunusova, 2013; Van Lieshout & Namasivayam, 2010) and inferring at what stage in the speech production process this variability has been introduced.

Because of the rise of studies that investigate kinematic and acoustic aspects of repetitive speech, such as research on speech errors (e.g., Goldrick & Blumstein, 2006; Goldstein,

Pouplier, Chen, Saltzman, & Byrd, 2007; McMillan & Corley, 2010), the current study examined a specific source of variability in articulator movement for this type of task, namely a difference in the phonetic characteristics of word pairs: word pairs with different onsets and identical rhyme such as cop top, versus word pairs with identical onsets as well as rhyme, like top top. In addition, it examined whether possible differences between these word pairs depends on the phonetic characteristics of individual word pairs. These differences can be studied for articulators that execute specific target movements to produce speech segments, as well as for articulators that do not. In this paper, target articulators are articulators active in an onset constriction and non-target articulators are articulators that are not active during the onset constriction but are actively involved in the onset of the other, second, word, in case of alternating onsets.

Repetitive speech is efficient in collecting a great amount of data within a relatively short time and at the same time allows researchers to control for confounding effects introduced by natural speech (Hertrich & Ackerman, 2000; Van Lieshout & Moussa, 2000). Repetitive speech is typically employed in studies that investigate speech errors (Goldrick & Blumstein,

2006; Goldstein et al., 2007) or factors that impact speech motor control such as speaking rate

56

(e.g., Hertrich & Ackermann, 2000; Perkell & Zandipour, 2002; Rochet-Capellan & Schwartz,

2007; Van Lieshout & Moussa, 2000).

How repetitive speech is structured can influence how individual speech segments are realized. Several studies have revealed that the number of syllables and differences in onsets of consecutive syllables produced in repetitive speech affect kinematic properties. For example, kinematic data indicated that 3-syllable repetitions, such as "papiter" or "pipapter" resulted in longer acceleration phases and higher variability in lip coupling than 2- or 1-syllable repetitions, such as "papa" or "pipa" (Van Lieshout & Moussa, 2000). Regarding differences in onsets of consecutive syllables, Rochet-Capellan and Swartz (2007) observed that higher speaking rates could be achieved in sequences in which the onsets alternated, such as "tapa", compared to sequences with identical onsets, like "tata", suggesting a constraint on producing these syllables.

In line with this, Sevald, Dell and Cole (1995) found that syllables with repeated onset consonants were produced slightly slower than syllables that did not repeat speech segments.

Finally, several observations of recent speech error studies have indicated that within repetitive speech, phonetic differences in onsets of word pairs affected variability of VOT (Goldrick &

Blumstein, 2006) and kinematic measures of articulatory movement (Frisch, 2007; McMillan &

Corley, 2010). In these studies, words pairs that consisted of words with identical speech segments typically were compared with word pairs characterized by different onset consonants and identical rhymes to establish thresholds for defining error patterns (Frisch, 2007; Goldrick

& Blumstein, 2006; Goldstein et al., 2007). Stearns (as cited in Frisch, 2007), for example, detected that for some participants the mean tongue dorsum movements during tongue tip constrictions in word pairs with alternating onsets were larger than the baseline mean amplitudes in word pairs with identical onsets. In addition, Goldstein et al. (2007) observed that for sequences in which the onset consonants were the same, variability of non-target tongue tip

57 and tongue body movements in the context of /ɪ/ was smaller, e.g., kip kip and tip tip, than in the context of /ɑ/, e.g., cop cop and top top.

McMillan and Corley (2010) argued that the type of variability observed in word pairs with different onset consonants is caused by competing phonological speech units that are simultaneously selected. Both are ultimately activated and information from the two units cascades down to the articulatory level, introducing variability at the level of production.

Identical onsets do not compete, resulting in less articulatory variability. Their view suggests that competition at the planning stage is a source of variability measured at the articulatory level. When this variability is introduced solely at the phonological level, causing the differences between alternating and identical onsets, all articulators should behave the same, regardless of the type of articulator or the context in which the articulator is activated. However, other studies have shown that constraints at the level of production are an additional source of variability (e.g., Goldstein et al., 2007; Hardcastle & Hewlett, 1999; Recasens & Espinosa,

2009; Rochet-Capellan & Schwartz, 2007). The question addressed in the current study focusses on whether additional constraints contribute to variability of word pairs with alternating onsets and identical onsets. The role of co-articulatory constraints on differences in variability between word pairs with alternating and identical onsets is mostly inferred from observation instead of from quantitative kinematic or acoustic data. Thus, given the important role of repetitive speech in investigating the contribution of non-target and target articulators in speech motor studies, the concept and potential sources of variability in different types of word pairs in repetitive speech has to be defined more clearly in a quantitative way.

While movement patterns in repetitive speech have been addressed to some extent, results are based on the behavior of only one articulator and on a small number of participants

58

(Van Lieshout & Moussa, 2000), or on repetitive speech consisting of open syllables (Hertrich

& Ackermann, 2000; Perkell & Zandipour, 2002; Rochet-Capellan & Schwartz, 2007).

Preliminary evidence suggests that CV and CVC syllables in repetitive speech differ in the way they are realized (Van Lieshout, Hijl, & Hulstijn, 1999). Consequently, the movements of different types of articulators in closed syllables (e.g., Goldrick & Blumstein, 2006; Goldstein et al., 2007) needs to be investigated further. Finally, whereas previous studies have demonstrated that certain factors affect the movements of the jaw and the target articulators in repetitive speech (Hertrich & Ackermann, 2000; Rochet-Capellan & Schwartz, 2007), to the best of the authors' knowledge, no evidence is available on how repetitive speech affects the variability of movements of non-target articulators. Non-target articulator behavior is important, for example, in studies investigating error patterns, as they arise from articulators that should not be activated, or coproduction, in which non-target articulators are activated during an onset consonant because of a following vowel (Fowler & Saltzman, 1993). Moreover, target and non-target articulatory influences can stretch over longer distances than neighboring speech segments

(Grosvald, 2010). Little is known about how non-target activations, especially, are controlled over longer distances and if a target constriction in one word influences the non-target position in the following word. The present study investigates whether phonetic context across words

(identical versus different onsets) contributes to variability in speech and whether this variability differs depending on the phonetic context within words (different consonant and vowel combinations).

2.2.2 Current Study

A primary objective of the current study was to investigate potential factors that may contribute to variability of target and non-target movement patterns in repetitive speech. More

59 specifically, phonetic context across (i.e. different types of word pairs) and within words was investigated as a source of variability. “Phonetic context” in the present study refers to different combinations of consonants and vowels that result in different (bio-) mechanical requirements for articulators. It is well established that phonetic context within a word or syllable is an important factor that contributes to patterns of articulatory movement variability. Context affects articulators that actively constrict the vocal tract (e.g., Hardcastle & Hewlett, 1999; Rudy &

Yunusova, 2013; Zharkova & Hewlett, 2009) as well as articulators that are not actively involved in constricting the vocal tract (Fowler & Saltzman, 1993; Hardcastle & Hewlett, 1999;

Recasens & Espinosa, 2009; Recasens, Pallares, & Solanas, 1993). In the latter case, articulators move as a consequence of another activated articulator. The degree to which context influences articulator movement depends on the mechano-inertial properties of articulators, and to what extent these articulators are involved in forming constrictions (Iskarous, Fowler, & Whalen,

2010; Recasens & Espinosa, 2009).

The current study particularly focused on how word pairs with different and identical onsets, e.g., tip tip versus kip tip, induce differences in movement range and variability of target and non-target articulators. In kip tip, the target articulators consist of the tongue dorsum in the first word and tongue tip in the second word. The non-target articulators entail the tongue tip during the onset of the first and tongue dorsum during the onset of the second word. In addition, the study investigated how different consonants and vowels within a word, e.g. cop top versus kip tip or cot pot versus tock pock (the coda preceding the onset /p/ in the second word differs), affect movement ranges and variability of target and non-target articulator movements during onset consonants. Especially the interaction between phonetic context across and within word pairs is of interest for the current study. Based on the observations mentioned in Frisch (2007), it was expected that movement range would be larger in word pairs with different onsets than in

60 word pairs with identical onsets. Furthermore, based on evidence that type of consonant cluster restricts the tongue to different degrees (e.g., Recasens et al. 1993), the combination of a coda consonant and the following onset (e.g., t#p in the word pair cot pot) was expected to influence the movements of the target and non-target articulators during the onset as well (in this example the non-target tongue dorsum and the lower lip during the production of /p/ in pot). Finally, based on the observation that syllables with identical onsets are more difficult to produce at higher speaking rates than syllables that have different onsets (Rochet-Capellan & Schwartz,

2007), it is argued that in repetitive speech the factor "speaking rate" affects movement range and variability measures of articulators as a function of word pair type, i.e., word pairs with different versus identical onset consonants. To investigate whether possible biomechanical constraints related to type of word pair and speaking rate contribute to possible differences in variability in movement patterns, correlations between movement range values of target and time-synchronous non-target articulators in word pairs with alternating and identical onsets were calculated. Correlations have been used in other studies to investigate biomechanical constraints between articulators (e.g., Green & Wang, 2003; Stone, Epstein, & Iskarous, 2004).

Specifically, it is argued that if the correlation between movement range values of simultaneous target and non-target articulations is high, the articulators are not moving independently from one other due to biomechanical constraints.

2.3 METHODS

2.3.1 Participants

Originally, 21 individuals participated in the study. For this paper, data from fourteen monolingual speakers of Canadian English, 7 males and 7 females, ranging between 19 and 45 years of age, were included. A participant's dataset was excluded if one of the two sessions (see

61 below) was not completed (n = 4), if too many data points were lost because of a broken coil (n

= 1), or if the participant was unable to perform the task (n = 1). The dataset of one participant was excluded because the inclusion criteria (outlined below) were not met, which only became apparent after the study was completed.

A speaker was considered monolingual if only Canadian English was spoken at home and in school during childhood. However, speakers who learned other languages in school, where English was the main language of instruction, were included. A speaker also had to report normal vision (after correction), since part of the test involved the stimuli, and had to have no history of speech, hearing or language difficulties. All participants gave written informed consent to participate in the study and were compensated. The study was approved by the Health Sciences Research Ethics Board at the University of Toronto.

2.3.2 Stimuli

The stimulus material consisted of CVC#CVC word pairs with identical vowel and coda consonants, but with alternating onset consonants, such as in cop top, and word pairs in which the words were identical, like cop cop and top top. To manipulate phonetic context within a word four different vowels and three different consonants were employed. The consonants were

/p/, /t/ and /k/ and the vowels were /æ/ as in cat, /ɪ/ as in kit, /u/ as in coot, and /ɑ/ as in cot. The coda#onset variable resulted in two different combinations for one particular articulator. For example, in the word top#cop, the non-target tongue tip occurs during the onset /k/ that is preceded by a coda /p/; during tock#pock, the non-target tongue tip occurs during the onset /p/, preceded by the coda /k/. In this example, the underlined consonants are the coda and onset combination when the tongue tip is the non-target articulator.

62

Word position was manipulated such that the same word appeared either in the first or second position of a pair (e.g., cop top and top cop). Primary stress always fell on the first word of the pair. Stress was not balanced in this study because its design was based on a different study that did not require rigid control of stress. Moreover, stress placed on the first word facilitated synchronizing with the metronome (see C. Procedures). For the complete stimulus list, see table 2.4 in the appendix.

2.3.3 Procedures

Participants were instructed to repeat a word pair as frequently as possible for a maximum of 17 repetitions. Two different speaking rates were employed, normal and fast.

These were determined separately for each participant (see Table 2.1) in order to account for individual differences in speech motor skills (Namasivayam & Van Lieshout, 2011). To estimate "normal speaking rate" for an individual speaker, the participant was instructed to repeat the word pair mik nik at a comfortable rate. Subsequently, the experimenter calculated the number of repetitions of a word pair per minute. Next, the participant was asked to repeat the same word pair as fast as possible and was challenged to produce these at such a rate that it became almost impossible to produce the words without starting to stumble after several repetitions. Ninety percent of this speaking rate value was defined as the "fast rate" in order to keep the task manageable yet challenging for each participant. During the experiment, a combined visual and auditory metronome controlled the rates (see e.g., Repp & Penel, 2002).

INSERT TABLE 2.1

The word pairs, presented with a size of approximately 3.35 cm per letter, appeared in the center of a 19-inch monitor, located at a one-meter distance. The first word of a pair was always shown in upper-case letters to indicate it had to be produced with primary stress,

63 whereas the second one was shown in lower-case letters. At the same time, the word pair was presented twice over headphones, so that the participant knew how to produce them in case he or she was confused by the orthography. These spoken words were produced by a male

Canadian English speaker and recorded prior to the study. Next, four auditory (1000 Hz sine wave tones) and visual metronome beats were presented to prepare for the trial. Because of the way the computer files that contained the stimulus information for each participant were constructed, the duration of the beats depended on individual speaking rate (see above). For example, if two metronome beats per second were offered, each beat sounded for 250 ms, followed by a 250-ms silence. Visual metronome beats appeared as blinking red dots on the monitor just above the word pair. The moment these dots turned green, the actual trial started and the participant began repeating the word pairs aloud. Both words needed to be pronounced during one beat of the metronome. Both the word pair and the visual and auditory metronome signals were present for the entire trial. The participant was encouraged to finish the trial on a single breath.

Two separate lists were constructed. When a list contained a certain combination of onset consonants, i.e. /k/ and /t/ in cop top, the reversed word pair, top cop, would appear in the other list. Identical onset word pairs with the same rhyme but different onset consonants (i.e., top top and cop cop) never occurred in the same list. The order of stimuli in each list was randomized.

The study consisted of two sessions, separated by at least one week, except for one participant who came in for the second session 4 days later. Both days, participants completed

24 trials in which the onsets of word pairs alternated and 24 trials in which the onsets were identical. Each session consisted of two blocks, one in which the speaker had to produce the word pairs at a normal speaking rate, and one in which the speaker had to produce the word

64 pairs at a fast rate. When one list was used in the normal speaking rate condition, the other list was offered in the fast rate condition. For the second session, the lists were reversed, counterbalancing between the two sessions. Speaking rate was blocked, as pilot studies had shown that it was too difficult for participants to switch rates randomly. Although a metronome was used to control rate as much as possible, the normal rate condition was always offered first.

Prior to the actual experimental trials in both normal and fast rate conditions, the participants performed three extra practice trials.

2.3.4 Instrumentation

Articulatory movement data from the tongue dorsum, tongue body, tongue tip, jaw, upper and lower lips were collected with an AG500 EMA system (Zierdt, Hoole, & Tillman,

1999). A number of studies have shown that articulography in general is a reliable and accurate tool to collect speech movement data (e.g., Goldstein et al., 2007; Kroos, 2012; Van Lieshout,

Merrick, & Goldstein, 2008; Van Lieshout & Moussa, 2000; Yunusova, Green, & Mefferd,

2009). On the participant's specific articulators of interest, small sensor coils are attached with surgical glue (either PeriAcryl or Cyanoveneer). For the purpose of this study, sensor coils were placed on the mid-sagittal vermilion border of the upper and lower lip, the tongue tip (1 cm behind the apex), the tongue body (3 cm behind the tongue tip coil), and the tongue dorsum. The tongue dorsum coil placement depended on how far back the participant tolerated the sensor.

This resulted in coil placements that could vary between 1 and 2 centimeters (cm) posterior to the tongue body coil. The average location and standard deviation of the tongue dorsum measured from the apex was 5.57 cm (0.55) on the first day and 5.38 cm (0.55) on the second day. To measure the movements of the jaw, a mold of thermo-plastic was constructed that fitted over the lower incisor teeth. Onto this mold, a coil was glued in a mid-sagittal position.

65

Additional coils were placed on the participant's forehead, nose, and behind both ears for reference purposes (Van Lieshout et al., 2008; Van Lieshout & Moussa, 2000). Only movement data from the tongue dorsum, tongue tip and lower lip were of interest for the present study.

After the coils were attached to the articulators, the participant read "The Rainbow

Passage", a short paragraph that contains all the phonemes of American English. This way, the speaker could adapt to the coils attached to the articulators. Before the actual session started, the participant held a plastic device in his/her mouth, onto which a 3D bubble level was attached.

The device had to be placed exactly parallel to the horizontal axis of the EMA system and positional information was gathered to create a standard reference frame. This reference frame was used to remap raw position data of individual articulators in order to allow comparison of data across participants (Westbury, 1994). The raw movement signals were sampled at 200 Hz, and 3D positions over time were calculated using complex algorithms for each coil (Yunusova et al., 2009). This information was stored in the computer together with a simultaneously recorded acoustic signal, sampled at 16 KHz. At the same time, a Marantz digital recorder (type

PMD670) recorded the speech signal on one channel together with the metronome signal on the other channel at 48 KHz for future acoustic analysis.

2.3.5 Data Processing

The raw movement data were processed according to a standardized protocol (Van

Lieshout et al., 2008; Van Lieshout & Moussa, 2000). Using a 7th-order Hamming windowed

Butterworth filter with 6.0 Hz and 0.5 Hz as the high and low cut-off points, individual movement signals of articulators were band pass filtered to remove DC drift and high frequency noise while preserving the relevant frequency components for the movement data.

To select the onsets of the individual words forming the pairs, trial data were segmented

66 algorithmically using spatial and temporal criteria to determine minima and maxima in the relevant movement signals. Automatically placed landmarks for these positions were visually checked and, if necessary, corrected manually. The boundaries of word onsets were defined by two minima, namely the start and end of the movement cycle for the coda consonant. For example, for the word pair pat cat, the first word onset (i.e. /p/ in pat) was defined by the two minima of the movement cycle associated with the preceding tongue tip constriction for /t/ in cat. The second tongue tip constriction minimum also indicated the start of the next segment

(Figure 2.1). Each segment contained trajectory information about the relevant articulators when specific onset and coda consonants were produced.

For the purpose of this study, the maximum vertical movement ranges of the target movements for onset consonants in the first and second word were retrieved. The target /t/, /k/ and /p/ positions were represented by the maximum movement range of the tongue tip, tongue dorsum, and lower lip movements, respectively. These movement ranges were based on the traveled distance of an articulator from the minimum position of an articulator within a segment to the maximum position for closing movements in that same segment. The maximum value for the movement range of a non-target articulator was measured at the time when the target articulator was maximally constricted (Figure 2.1).

INSERT FIGURE 2.I

As is common in this type of study, the first and last repetitions of a trial were always disregarded, as these productions may behave somewhat differently than the rest of the trial. The remaining repetitions of a trial were included in the analysis when they were produced according to the following criteria: 1) When an audible error occurred but the trial was still fluent and could be segmented correctly into first and second syllables, for example coptop

67 coptop toptop, this error was included in the analysis. 2) When the error disturbed the flow of the first and second words, for example coptop coptop top coptop, the extra top was disregarded because it interfered with the intended sequence of target versus non-target activation. 3) When a speaker took a short breath that did not influence the flow of the trial, the trial was included. In the case of severe stumbling during the whole trial, the trial was disregarded, as it was impossible to determine accurate segmentation for this trial. 4) When the participant managed to repeat several word pairs correctly, data were analyzed for the correctly produced part. The median number of repetitions for a given trial was 15. In total, 40 trials out of 1344 (2.98 %) were discarded.

2.3.6 Analysis

Variability introduced by type of word pair and context was expressed using two measurements: median vertical movement ranges and the variability in movement ranges expressed as a coefficient of variation (COEFFVAR) of target and non-target articulators. The analysis addressed how the movement ranges of target and non-target articulators and the

COEFFVAR of these ranges were affected by "type of word pair" (alternating versus identical onsets), "context" (i.e. the vowel and the two different combinations of coda#onset during which interval the target or non-target articulator was measured), "type of non-target articulator" or

"type of target articulator" (tongue tip, tongue dorsum and lower lip), "rate" (normal and fast), and "word position" (first or second word).

2.3.6.1 Median Movement Ranges

The movement ranges of target and non-target articulators were expressed as median values, because this measure is less sensitive to outliers, and provides a more robust estimate of

68 the average movement range (Chau, Young, & Redekop, 2005). This statistical property is important because the whole trial was analyzed, including possible outliers. A median value was calculated based on all the measurement points in one trial for a specific target articulator during the onset in the first word. Another median value was calculated for the non-target articulator in the first word. Two additional medians, i.e. for target and non-target articulator, were calculated based on the movement ranges during the onset in the second word. Each trial thus resulted in four separate median movement range values.

A repeated-measures ANOVA was performed with median movement ranges of target articulators as the dependent measure and with "type of word pair", "rate" and "word position" as the within-subject variables. In addition, three separate repeated measures ANOVAs were performed for tongue tip, tongue dorsum, and lower lip, to assess how "type of word pair",

"coda#onset" and "vowel" affected the median values for each articulator separately. For post- hoc comparisons, a Tukey Kramer test was used. Four additional ANOVAs with the same independent variables were run, with the median movement ranges of non-target articulators as the dependent variables.

2.3.6.2 Variability: coefficient of variation (COEFFVAR)

To quantify the variability in movement ranges for different articulators, the Median

Absolute Deviation (MAD), expressed in millimeters, was calculated (Chau et al., 2005):

MAD(X) = med(X – med(X))

Med(X) is the median of the sample. Again, MAD values were derived separately for the first and second word for the target and non-articulator. Next, the COEFFVAR was calculated to correct for the fact that larger movement range values may cause larger variability. The

69

COEFFVAR is typically calculated by dividing the standard deviation by the mean (Howell,

2013, p. 44). Given the fact that median and MAD values are identical to means and standard deviations in the case of perfectly normal distributions, the COEFFVAR was calculated by dividing the MAD by the median. A second set of repeated measures analyses was performed, similar to the median movement ranges, this time with the COEFFVAR values of the target and non-target articulators as the dependent variable. Because of the multiple ANOVA's (16), a

Bonferroni correction was applied and the alpha level at which an effect was considered significant in all the tests was set at 0.003 (0.05/16).

2.3.6.3 Correlations

Based on the observation that syllables with identical onsets were more difficult to produce at higher speaking rates than syllables that have different onsets (Rochet-Cappelan &

Schwartz, 2007), "rate" and "type of word pair" were hypothesized to affect co-articulatory constraints. Co-articulatory constraints in the current study were assessed using Pearson's

Product-Moment correlations between the median movement ranges of the target articulator and those of the co-occurring non-target articulator. This means that, for example in the word cop, correlation values were derived between movement ranges of the tongue dorsum as a target and tongue tip as the non-target.

Because a sampling distribution of correlation values becomes highly skewed with a high mean “r” value, distributions of values cannot automatically be compared. To be able to perform statistical tests between individual distributions of correlation values, the correlation values were transformed to z-scores using the Fisher transformation in which "r" is the correlation value (Ferguson & Takane, 1989, p. 206): zr = ½ loge (1 + r) - ½ loge (1- r).

Next, a repeated-measures ANOVA was performed with the transformed correlation

70 values as the dependent variable, and "rate" and "type of word pair" as within-subjects variables.

An alpha level of 0.05 was used.

2.4 RESULTS

Inspecting the distributions of median and COEFFVAR values of target and non-target articulators revealed that the data were not normally distributed. A log10 transformation corrected these distributions successfully. Consequently, all the following repeated measures

ANOVAs have been done on log transformed data. The following section will only comment on the significant effects (i.e., p < 0.003); Table 2.2 summarizes all the statistical results.

INSERT TABLE 2.2

2.4.1 Median Movement Range

2.4.1.1 Type of Word Pair, Rate, and Word Position

Figure 2.2 shows that word pairs with different onsets resulted in larger movement ranges of target articulators than word pairs with identical onsets. As can be observed from

Figure 2.2, similar results were found for the articulators in the non-target position.

INSERT FIGURE 2.2

2.4.1.2 Tongue Dorsum: Type of Word Pair, Vowel and Coda#onset

Movement ranges of the tongue dorsum as a target showed that word pairs with alternating onsets, such as the first word in cop top, resulted in larger movement ranges than word pairs with identical onsets, such as cop cop. In addition, /u/ resulted in the smallest ranges and differed from all the other vowels. /ɪ/ showed smaller ranges than the two low vowels but

71 larger ranges than the /u/ (figure 2.3).

INSERT FIGURE 2.3

Movement ranges of non-target tongue dorsum articulations in alternating onset trials, such as the first word in top cop, resulted in larger ranges than the ranges in identical onset trials, such as top top. With respect to vowel influences, the tongue dorsum produced significantly larger movement ranges in the context of /ɑ/ than in the context of the other vowels.

2.4.1.3 Tongue Tip: Type of Word Pair, Vowel and Coda#onset

Target tongue tip movement ranges were significantly affected by "type of word pair" and "vowel" and the interaction between these two. Figure 2.3 reveals that word pairs with different onsets (e.g., top cop) resulted in larger tongue tip movement ranges than word pairs with identical onsets (i.e. top top). It can also be observed from Figure 2.3 that /ɪ/ and /u/ both showed smaller ranges than the two low vowels. The high vowels /ɪ/ and /u/ differed from each other in the alternating trials, with /ɪ/ showing smaller movement range values than /u/. This difference disappeared in the identical onset trials. Finally, the interaction between vowel and coda#onset was significant. In the context of the two back vowels, larger movement ranges were revealed for coda /k/ preceding the target tongue tip than for the coda /p/. For the tongue tip as a non-target articulator smaller movement ranges were found for the high vowels /u/ and /ɪ/ than for the low vowels /ɑ/ and /æ/. Again, the vowel /ɪ/ showed even smaller values than /u/.

72

2.4.1.4 Lower Lip: Type of Word Pair, Vowel and Coda#onset

The type of vowel affected movement range for both the target and non-target lower lip movements, in the progression /u/ < /ɪ/ < /ɑ/ < /æ/.

2.4.1.5 Summary of Median Movement Ranges

For the target and non-target tongue dorsum and target tongue tip, word pairs with alternating onset consonants resulted in larger median movement ranges than word pairs with identical onset consonants. In addition, “vowel” affected movement ranges for all the target and non-target articulators. An interaction between “type of vowel” and “type of word pair” was found for the target tongue tip: /u/ showed larger movement range values than /ɪ/ in the alternating trials, and this difference disappeared in the identical onset trials. Only the tongue tip showed an effect of coda#onset in the context of the two back vowels: coda#onset /k#p/ showed larger ranges than coda#onset /p#k/.

2.4.2 Variability

2.4.2.1 Type of Word Pair, Rate, and Word Position

None of the variables affected the COEFFVAR values of the target articulators.

However, for the non-target articulators, "word position" affected the COEFFVAR values significantly: the non-target articulator in the second word showed more variability than in the first word. As can be observed in Figure 2.4, this difference in word position is related to a significant interaction between "type of word pair" and "word position": word pairs with identical onsets showed less variability in the first word compared to the second word, and the

73 variability in the first word was even less than the variability in both word positions for the alternating sequences. Word pairs with identical onsets showed a significantly higher

COEFFVAR in the second word, compared to the second word from the alternating onset condition. The word-position effect did not occur in alternating onset word pairs. It has to be kept in mind that the first word of a word pair always received primary stress.

INSERT FIGURE 2.4

2.4.2.2 Tongue Dorsum: Type of Word Pair, Vowel, and Coda#onset

The type of vowel affected COEFFVAR only for the tongue dorsum as a target articulator: high vowels showed higher COEFFVAR values than did low vowels. None of the other variables involving the tongue dorsum as a target and non-target articulator were significant.

INSERT FIGURE 2.5

2.4.2.3 Tongue Tip: Type of Word Pair, Vowel, and Coda#onset

Again, for the tongue tip as a target articulator, the only significant variable that affected

COEFFVAR was “vowel”. The high vowels resulted in larger COEFFVAR values than the two low vowels. For the tongue tip as non-target, type of vowel revealed a main effect as well. The highest variability was shown for /ɪ/, which differed from all the other vowels. In addition, the tongue tip in the context of the low vowels resulted in smaller variability values than it did in the context of the high vowels.

2.4.2.4 Lower Lip: Type of Word Pair, Vowel and Coda#onset

Again, similar to the tongue tip and tongue dorsum, only the type of vowel affected

74

COEFFVAR values for the lower lip as a target articulator. The vowel /u/ resulted in the highest variability and differed from all the other vowels. In addition, /ɪ/ showed a higher COEFFVAR value than /æ/. The COEFFVAR values of the non-target articulator were affected only by

"vowel". As with tongue tip, significantly more variability occurred in the context of /ɪ/ and /u/ compared to /æ/ and /ɑ/.

2.4.2.5 Summary Variability

No main effects for type of word pair were found. However, the non-target in the second word showed more variability than in the first word. This main effect was caused by a significant interaction between "type of word pair" and "word position": word pairs with identical onsets showed less variability in the first word compared to the second word, which also differed from the second word in alternating onset condition. The variability in the first word was even less than the variability in both word positions for the alternating sequences.

Type of vowel affected the COEFFVAR values for target tongue dorsum, tongue tip and lower lip and non-target tongue tip and lower lip. The non-target tongue dorsum failed to show an effect of vowel. In general, high vowels showed higher COEFFVAR values than low vowels.

2.4.3 Correlation Values

The ANOVA on the distributions of z-transformed correlation values revealed that word pairs with identical onsets result in overall higher values than those with alternating sequences, as shown in Table 2.3 (F(1, 13) = 41.31, p < 0.001). Moreover, a normal speaking rate showed lower correlation values than a fast rate (F(1, 13) = 6.09, p = 0.03) The interaction between

"rate" and "type of word pair" was not significant (F(1, 13) = 0.55, p = 0.47).

75

INSERT TABLE 2.3

2.5 DISCUSSION

To summarize the findings, the target and non-target movement ranges were larger in word pairs that consisted of alternating onsets than in words with identical onsets. Especially the target and non-target tongue dorsum and target tongue tip contributed to this difference.

Movement ranges of the lower lip were not affected by type of word pair. Whereas "vowel" influenced the values for all target and non-target articulators, these context effects were mostly comparable for both types of word pairs. The exception involved the tongue tip as a target, where the high vowels /ɪ/ and /u/ differed from each other in the alternating trials; however, this difference disappeared in the identical onset trials. Only for the tongue tip as a target articulator, a coda#onset effect was found: coda /k/ showed larger movement ranges than coda /p/ in the context of back vowels.

Regarding the COEFFVAR measures, only the non-target articulators in identical onset word pairs showed less variability in the first word than in the second word. In fact, the non- target articulator variability in the first word was even less than the variability in both word positions for the alternating onset word pairs. Moreover, the variability of the non-target articulator in the second word for the identical onset word pairs was higher than the variability in the second word for the alternating onset pair. For the words with alternating onset consonants, no difference in variability of the non-target articulator in the first or second word was found. With respect to the behavior of individual articulators, it was revealed that movement of individual articulators did not result in different COEFFVAR values as a function of word pair. Again, for each individual articulator, only type of vowel influenced the variability of the movement ranges, with the exception of the non-target tongue dorsum.

76

Finally, the correlation values revealed that words with alternating onsets showed lower values than words with identical onsets. Moreover, the correlation values were the only measures affected by speaking rate: normal rate showed lower values than fast rate.

The main effect of "type of word pair" on target and non-target movement ranges confirms the hypothesis that the average movement range of articulators is larger in word pairs with alternating onsets than in word pairs with identical onsets. This finding substantiates what had been observed by Stearns (as cited in Frisch, 2007) concerning differences between the two types of word pairs related to positions of the tongue dorsum. The finding that only target and non-target tongue dorsum and target tongue tip movement ranges differed depending on type of word pair, and that the movement ranges of the tongue tip in different types of word pairs depended on the following vowel, is at odds with McMillan and Corley's (2010) findings. If competition at the phonological level is the only cause for differences between the two types of word pairs, measured at the articulatory level, it is expected that this competition occurs irrespective of the speech segments that are to be produced. One reason for the different results could be the fact that the stimuli in the McMillan and Corley study only included velar and alveolar consonants, always followed by the same vowel; in the current study, the lower lip caused the differences in articulatory behavior and the tongue tip interacted with the following vowel. Thus the current study finds evidence for an additional source of variability that affects movement ranges of target articulators during alternating and identical onsets.

One possible explanation for larger movement ranges for the tongue articulators is that the onset consonants and the surrounding vowels in word pairs with alternating onsets are hyper-articulated because speakers need to enhance the distinction between these successive words (De Jong, Beckman, & Edwards, 1993; Lindblom, 1990). Because the tongue articulators are important for consonant and vowel production and are biomechanically linked to the jaw

77

(Van Lieshout & Neufeld, 2014), which is characterized by larger movements in hyper- articulated speech (De Jong et al., 1993), these articulators are likely affected the most. Van

Lieshout and Neufeld (2014) showed that the lower lip moves relatively independently from the jaw in a bilabial closure and is thus not as much affected by hyper-articulated speech. In addition, the height of the jaw affects the tongue tip to a larger extent than the tongue dorsum

(Mooshammer, Hoole, & Geumann, 2007). This can explain the finding that for the tongue tip as a target articulator, the differences in movement ranges between /ɪ/ and /u/ context are less pronounced in word pairs with identical onsets: the jaw can stay in a high position during the complete trial and the tongue tip needs to move only slightly to produce a /t/, then move back to a position for the high vowels /ɪ/ and /u/. Goldrick and Blumstein (2006) observed a similar difference between alternating and identical onset consonants in their study for measures of

VOT. They observed that the VOT for voiced onset consonants in word pairs with alternating onset consonants was significantly shorter than the VOT in word pairs with identical onsets, possibly to enhance the contrast in alternating sequences.

Whereas the current study showed no main effect for "type of word pair" on

COEFFVAR measures of non-target movements, an interesting interaction was found between the variables "word position" and "type of word pair" for non-target articulators: the non-target in the first word in the identical word pairs showed less variability than the non-target for any word position in words with alternating onsets. Additionally, the non-target articulator in identical onset word pairs varied less in the first word than in the second, which also differed from the second word in the alternating condition. This word-position asymmetry did not occur with alternating onset consonants. It is speculated that these differences occurred because speakers are likely to control the non-target articulator in alternating sequences much more stringently because this articulator becomes a target in the following word. Put differently, the

78 position of the articulator in both target and non-target positions may require more precise control in word pairs with alternating onsets compared to the identical onset word pairs where such a contrast does not exist, resulting in similar degrees of variability in the first and second word in alternating onset sequences. In the identical onset sequences, the non-target articulator may not be activated at all, so this articulator is more likely to vary as a passive consequence of the simultaneous movements of the target articulators and only becomes active to adopt its position for the vowel. These assumptions are supported by the finding that word pairs with identical onsets were characterized by higher correlation values between target and non-target articulations, compared to the alternating onset condition.

Nonetheless, the fact that the first word in word pairs with identical onsets shows less variability than the first word in the alternating onset word pairs needs to be explained further.

The observed pattern might arise from dissimilar degrees of initial strengthening at different prosodic boundaries¹ (e.g., Fougeron & Keating, 1997). Fougeron and Keating distinguish four prosodic domains: the phonological word, the phonological phrase, the intonational phrase and the utterance. Word boundaries are considered weaker than phrase boundaries and, as a consequence, initial segments are realized with less strength at word boundaries (e.g., Fougeron

& Keating, 1997). Considering the fact that Goffman, Gerken and Lusschesi (2007) found more movement variability for the lower lip and jaw in weak compared to strong syllables, it is assumed that strength at boundaries can be reflected in degree of variability. It could be argued that the two identical words in identical onset sequences form a phonological phrase, resulting in stronger articulation, and thus less variability, of the onset consonant in the first word, and weaker articulation in the second word. This stronger articulation on the first word is likely strengthened by the fact that, in the current study, the first word received primary stress. The onsets of the two dissimilar words in the alternating onset sequences might be strengthened at

79 the word level, resulting in equal variability at the onsets of the two words within an alternating sequence, but resulting in less strength (involving a word boundary), and thus more variability, when compared to the first word from a word pair with identical onsets. If true, it would suggest that speakers employ a form of supra-segmental control that influences the variability of the segment (see e.g., Fougeron & Keating, 1997).

It was hypothesized that in repetitive speech the factor "rate" affected movement ranges and variability measures of articulators for the two types of word pairs, but this was not confirmed. The correlation values, however, were strongly influenced by rate: higher rates caused higher correlation values. Assuming that correlations between articulators reflect biomechanical influences, this difference provides support for changes in mechanical constraints when performing sequences at higher rates. However, the correlation values in the current study are still rather low in an absolute sense, so clearly this is not the whole story. Also, the interaction between rate and type of word pair did not affect correlation values, although this was expected considering the fact that higher rates could be achieved in “tapa” sequences compared to “tata” sequences (Rochet-Capellan & Schwartz, 2007; Sevald et al., 1995). There remains a great deal of relative independence between productions of one versus the other articulator, even when they are mechanically linked, such as tongue tip and tongue body (see e.g., Green & Wang, 2003).

Finally, vowels and consonant contexts differ in how they affect movement ranges and variability. As expected for high vowels (Recasens & Espinosa, 2009), most articulators in target and non-target position showed smaller movement in high vowel context. In addition, the high vowel context was also characterized by more variability for all the articulators, except for the tongue dorsum in non-target position.

80

In conclusion, the movement range data indicated that the movements of the tongue were affected by phonetic context across words; however, the lower lip was unaffected. Whether word pairs are characterized by alternating or identical onsets thus contributes to variability in speech. Biomechanical constraints related to contributions of the jaw as well as speaker strategies, such as hyper-and hypo- articulating speech, are possible causes for this variability.

With the exception of the tongue tip in high vowel context, the type of vowel or coda#onset did not contribute to differences in variability between types of word pairs. Additional evidence for biomechanical constraints was found in the higher correlation between articulators in fast speech. Finally, the COEFFVAR data revealed that alternating and identical onset word pairs may be controlled differently depending on prosodic requirements, so this should be taken into account when designing future studies.

2.6 ACKNOWLEDGEMENTS

The study was supported by the Social Sciences and Humanities Research Council

(SSHRC) and in part by funding from the Canada Research Chairs program, both awarded to the second author. The authors would like to thank Aravind Namasivayam for his technical support during the EMA sessions, Radu Craioveanu for his help analyzing the data, Toni Rietveld for his statistical advice, and Jeffrey Steele and Keren Rice for their valuable comments on earlier versions.

¹ We would like to thank an anonymous reviewer for pointing out this possible explanation.

2.7 APPENDIX

INSERT TABLE 2.4

81

2.8 REFERENCES

Chau, T., Young, S., & Redekop, S. (2005). Managing variability in the summary and

comparison of gait data. Journal of Neuroengineering and Rehabilitation, 2(1), 22-22.

De Jong, K., Beckman, M. E., & Edwards, J. (1993). The interplay between prosodic structure

and coarticulation. Language and Speech, 36(2, 3), 197-212.

Ferguson, G. A., & Takane, Y. (1989). Statistical Analysis in Psychology and Education (6th

ed.) (pp. 1-439). New York: McGraw-Hill Book Co.

Fougeron, C., & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains.

Journal of the Acoustical Society of America, 101(6), 3728-3740.

Fowler, C. A., & Saltzman, E. L. (1993). Coordination and coarticulation in speech production.

Language and Speech, 36, 171-195.

Frisch, S. A. (2007). Walking the Tightrope between Cognition and Articulation: The State of

the Art in the Phonetics of Speech Errors. In C.T. Schutze and V.S. Ferreira (Eds), MIT

Working Papers in Linguistics, Vol. 53. The State of the Art in Speech Error Research

(pp.155-171). Cambridge, MA.

Goffman, L., Gerken, L., & Lusschesi, J. (2007). Relations between segmental and motor

variability in prosodically complex nonword sequences. Journal of Speech, Language, and

Hearing Research, 50(2), 444-458.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning to

articulatory processes: Evidence from tongue twisters. Language & Cognitive Processes,

21, 649-683.

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E. L., & Byrd, D. (2007). Dynamic action units

82

slip in speech production errors. Cognition, 103, 386-412.

Green, J. H., & Wang, Y. (2003). Tongue-surface movement patterns during speech and

swallowing. Journal of the Acoustical Society of America, 113(5), 2820-2833.

Grosvald, M. (2010). Long-distance coarticulation in spoken and signed language: An overview.

Language and Linguistics Compass, 4(6), 348-362.

Hardcastle, W., & Hewlett, N. (1999). Coarticulation: Theory, Data and Techniques (pp. 1-

386). Cambridge: Cambridge University Press.

Hertrich, I., & Ackermann, H. (2000). Lip-jaw and tongue-jaw coordination during rate-

controlled syllable repetitions. Journal of the Acoustical Society of America, 107(4), 2236-

2247.

Howell, D. C. (2013). Statistical Methods for Psychology (8th ed.) (pp. 1-770). Belmont, Ca.

Iskarous, K., Fowler, C. A., & Whalen, D. H. (2010). Locus equations are an acoustic

expression of articulator synergy. Journal of the Acoustical Society of America, 128(4),

2021-2032.

Kroos, C. (2012). Evaluation of the measurement precision in three-dimensional

Electromagnetic Articulography (Carstens AG500). Journal of Phonetics, 40, 453-465.

Lindblom, B. (1990). Explaining Phonetic Variation: A Sketch of the H&H Theory. In W. J.

Hardcastle & A. Marchal (Eds), Speech Production and Speech Modelling (pp. 403-439).

The Netherlands: Kluwer Academic Publishers.

McMillan, C. T., & Corley, M. (2010). Cascading influences on the production of speech:

Evidence from articulation. Cognition, 117, 243-260.

83

Mooshammer, C., Hoole, P., & Geumann, A. (2007). Jaw and order. Language and Speech, 50,

145-176.

Namasivayam, A. K., & Van Lieshout, P. H. H. M. (2011). Speech motor skill and stuttering.

Journal of Motor Behavior, 43, 477-489.

Perkell, J. S., & Zandipour, M. (2002). Economy of effort in different speaking conditions. II.

Kinematic performance spaces for cyclical and speech movements. Journal of the

Acoustical Society of America, 112(4), 1642-1651.

Recasens, D., & Espinosa, A. (2009). An articulatory investigation of lingual coarticulatory

resistance and aggressiveness for consonants and vowels in Catalan. Journal of the

Acoustical Society of America, 125(4), 2288-2298.

Recasens, D., Pallares, M. D., & Solanas, A. (1993). An electropalatographic study of stop

consonants. Speech Communication, 12, 335-355.

Repp, B. H., & Penel, A. (2002). Auditory dominance in temporal processing: New evidence

from synchronization with simultaneous visual and auditory sequences. Journal of

Experimental Psychology: Human Perception and Performance, 28, 1085-1099.

Rochet-Capellan, A., & Schwartz, J. L. (2007). An articulatory basis for the labial-to-coronal

effect: /pata/ seems a more stable articulatory pattern than /tapa/. Journal of the Acoustical

Society of America, 121(6), 3740-3754.

Rudy, K., & Yunusova, Y. (2013). The effect of anatomic factors on tongue position variability

during consonants. Journal of Speech, Language, and Hearing Research, 56, 137-149.

Sevald, C. A., Dell, G. S., & Cole, J. S. (1995). Syllable structure in speech production: Are

syllables chunks or schemas? Journal of Memory and Language, 34, 807-820.

84

Stone, M., Epstein, M. A., & Iskarous, K. (2004). Functional segments in tongue movement.

Clinical Linguistics & Phonetics, 18, 507-521.

Van Lieshout, P. H. H. M., Hijl, M., & Hulstijn, W. (1999). Flexibility and stability in bilabial

gestures: 2) Evidence from continuous syllable production. In. J. J. Ohala, Y. Hasegawa,

M. Ohala, D. Granville, & A. C. Bailey (eds.), Proceedings XIVth International Congress

of Phonetic Sciences: Vol. 1 (pp. 45-48). Berkeley, University of California.

Van Lieshout, P. H. H. M., Merrick, G., & Goldstein, L. (2008). An articulatory phonology

perspective on rhotic articulation problems: a descriptive case study. Asia Pacific Journal

of Speech, Language, and Hearing, 11, 283-303.

Van Lieshout, P. H. H. M., & Moussa, M. (2000). The assessment of speech motor behavior

using electromagnetic articulography. The Phonetician, 81, 9-22. Retrieved from

http://www.isphs.org/Phonetician/phonetician81.pdf (date last viewed 10/02/13).

Van Lieshout, P. H. H. M., & Namasivayam, A. K. (2010). Speech motor variability in people

who stutter. In Ben Maassen, & Pascal H.H.M. Van Lieshout (Eds.), Speech Motor

Control: New Developments in basic and applied Research (pp.191-214). Oxford: Oxford

University Press.

Van Lieshout, P. H. H. M., & Neufeld, C. (2014). Coupling dynamics interlip coordination in

lower lip load compensation. Journal of Speech, Language, and Hearing Research, 57,

597-615.

Westbury, J. R. (1994). On coordinate systems and the representation of articulatory

movements. Journal of the Acoustical Society of America, 95(4), 2271-2273.

Yunusova, Y., Green, J., & Mefferd, A. (2009). Accuracy Assessment for AG500, Electro-

85

magnetic Articulograph. Journal of Speech, Language, and Hearing Research, 52, 547-

555.

Zharkova, N., & Hewlett, N. (2009). Measuring lingual coarticulation from midsagittal tongue

contours: description and example calculations using English /t/ and /a/. Journal of

Phonetics, 37, 248-256.

Zierdt, A., Hoole, P., & Tillman, H. G. (1999). Development of a system for three-dimensional

fleshpoint measurements of speech movements. In. J. J. Ohala, Y. Hasegawa, M. Ohala,

D. Granville, & A. C. Bailey (eds.), Proceedings of the XIVth International Congress of

the Phonetic Science, Vol. 1 (pp.73-76). Berkeley, University of California.

86

Table 2-1 Individual normal (N) and fast rates (F) in metronome Beats per Minute (bpm) and number of syllables per second for each participant.

Participant

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mean sd

N (bpm) 80 80 92 90 113 90 100 110 90 88 96 76 92 88 91.79 10.50

# syll/s 3 3 3 3 4 3 3 4 3 3 3 3 3 3 3.06 0.35

fast (bpm) 110 110 117 120 144 130 120 142 126 120 126 120 118 119 123.00 10.08

# syll/s 4 4 4 4 5 4 4 5 4 4 4 4 4 4 4.10 0.34

87

Table 2-2 Results for the series of ANOVA’s: dependent variables “COEFFVAR” and “median movement range values” (median) for target and non-target articulators. The within subject variables are listed in the first column with all the possible interactions. Significant results (p < 0.003) are indicated with an asterix.

Target Non-target

Median COEFFVAR Median COEFFVAR

df F p F P F p F p

Word pair 1,13 104.94 < 0.00001* 0.07 0.8 42.62 < 0.0001 * 0.39 0.54

Wordposition 1,13 7.63 0.02 5.73 0.03 1.02 0.33 20.82 <0.001*

Rate 1,13 0.16 0.7 11.2 0.005 0.29 0.6 11.52 0.005

word pair*wordposition 1,13 4.29 0.06 2.04 0.18 0.57 0.46 71.13 <0.0001*

rate*word pair 1,13 1.33 0.27 7.13 0.02 2.97 0.1 1.15 0.3

Wordposition*rate 1,13 2.52 0.14 0.93 0.35 0.47 0.5 0.51 0.49

wordposition*rate*wordpair 1,13 0 0.96 0.96 0.34 0.09 0.77 0.12 0.73

TD word pair 1,13 75.14 < 0.00001* 1.25 0.28 14.41 0.002* 1.22 0.29

TD vowel 3,39 44.89 < 0.00001* 21.95 <0.00001* 8.42 < 0.001* 3.83 0.017

TD Coda#onset 1,13 0.13 0.72 3.26 0.09 9.17 0.01 0.77 0.39

TD word pair* vowel 3,39 1.44 0.24 1.07 0.37 0.2 0.89 1.41 0.25

TD Coda#onset*word pair 1,13 7.17 0.02 1.16 0.3 0.03 0.87 3.18 0.1

TD vowel* coda#onset 3,39 0.43 0.74 1.02 0.39 5.24 0.004 4.37 0.01

TD vowel*wordpair*coda#onset 3,39 0.34 0.8 1.12 0.35 0.61 0.61 0.72 0.55

TT word pair 1,13 136.28 <0.00001* 0.6 0.45 0 0.99 0.01 0.94

TT vowel 3,39 109.59 <0.00001* 27.92 <0.00001* 83.11 <0.00001* 47.31 <0.00001*

TT Coda#onset 1,13 9.99 0.007 3.35 0.09 0.24 0.63 0.09 0.77

TT word pair* vowel 3,39 12.25 < 0.00001* 3.1 0.04 0.86 0.47 1.6 0.2

TT Coda#onset*word pair 1,13 8.27 0.01 9.42 0.009 0.75 0.4 0.57 0.46

TT vowel* coda#onset 3,39 7.83 <0.001* 0.92 0.44 2.03 0.13 0.78 0.51

88

TT vowel*wordpair*coda#onset 3,39 2.01 0.13 1.32 0.28 0.22 0.88 0.97 0.42

LL word pair 1,13 0.02 0.89 0 0.99 7.76 0.015 1.43 0.25

LL vowel 3,39 30.96 <0.00001* 5.7 0.002* 126.78 <0.00001* 26.13 <0.00001*

LL Coda#onset 1,13 0.02 0.88 0.17 0.68 12.18 0.004 10.75 0.006

LL word pair* vowel 3,39 0.97 0.41 0.65 0.59 0.82 0.49 3.58 0.02

LL Coda#onset*word pair 1,13 3.17 0.1 0.02 0.89 5.65 0.03 2.16 0.13

LL vowel* coda#onset 3,39 0.33 0.8 0.02 0.1 0.82 0.49 0.66 0.58

LL vowel*wordpair*coda#onset 3,39 0.54 0.65 0.99 0.5 1.48 0.23 3.06 0.04

89

Table 2-3 Mean correlation values between a target articulator and a simultaneous non-target articulator for word pairs with alternating and identical onsets collapsed across articulator, context and word position in normal and fast speaking rate. Standard deviations are between brackets.

alternating Identical Mean

Normal 0.15 (0.49) 0.27 (0.44) 0.21 (0.47)

Fast 0.18 (0.45) 0.33 (0.45) 0.26 (0.45)

Mean 0.17 (0.47) 0.30 (0.45)

90

Table 2-4 Stimulus list. The columns represent the non-target articulators tongue dorsum (TD), tongue tip (TT) and lower lip (LL). The cells contain the word pairs in which the non-target articulator appears in the second word. The coda#onset is bold and upper case. Thus, for example, the tongue dorsum appears as a non-target in the two words coP Top and coT Pot. The target articulator in the second word forms the onset of that particular word, i.e. the tongue tip and the lower lip respectively in this particular example.

non-target articulator

alternating onset identical onset

Vowel TD LL TT TD LL TT

coP Top poCK Tock toP Cop toP Top toCK Tock coP Cop /ɑ/ coT Pot poT Cot toCK Pock poT Pot coT Cot poCK Pock

kiP Tip piCK Tick tiP Kip tiP Tip tiCK Tick kiP Kip /ɪ/ kiT Pit piT Kit tiCK Pick piT Pit kiT Kit piCK Pick

cooP Toop pooK Took tooP Coop tooP Toop took took cooP Coop /u/ cooT Poot pooT Coot tooK Pook pooT Poot cooT Coot pooK Pook

caP Tap paCK Tack taP Cap taP Tap taCK Tack caP Cap /æ/ caT Pat paT Cat taCK Pack paT Pat caT Cat paCK Pack

91

Figure 2-1 example of the word pat. The upper graph shows the audio signal. "maximum lower lip" indicates the maximum for the target articulator “lower lip” in the first word, “maximum tongue dorsum” indicates the maximum for the non-target articulator and “maximum tongue tip” is the maximum for the tongue tip articulator (coda consonant). The minimum values are indicated with minimum tongue tip, tongue dorsum and lower lip. The vertical axis represents the movement range (rescaled to percentages for the purpose of a different study) and the horizontal axis represents time (seconds). The vertical arrow pointing downwards from the maximum value for the lower lip indicates the measured value for the non-target articulator (tongue dorsum).

92

Figure 2-2 on the vertical axis: median movement ranges (mm) for word pairs with alternating (alt) and identical onsets (ident). Bars on the horizontal axis: word positions 1 and 2 and rates “fast” and “normal”. The upper part presents the pooled results for the target articulators, the lower part the pooled results for the non-target articulators. Error bars represent standard error.

93

Figure 2-3 on the vertical axis: median movement range values (mm) for words with alternating onsets (alt) and identical onsets (ident). Bars on the horizontal axis: coda#onset (e.g., /k#p/ or /t#p/ for the words tock pock and cot pot when the lower lip is the target in the second word and pock tock and pot cot when the lower lip is the non-target in the second word) combinations, presented for the different target articulators (upper graph) and non-target articulators (lower graph) tongue tip (TT), tongue dorsum (TD) and lower lip (LL) in 4 different vowel contexts. Error bars represent standard error.

94

Figure 2-4 on the vertical axis: COEFFVAR values (mm) for alternating onset (alt) and identical onset (ident) word pairs. Bars on the horizontal axis: word positions 1 and 2 and rates “fast” and “normal”. The upper part presents the pooled results for the target articulators and the lower part presents the pooled results for the non-target articulators. Error bars represent standard error.

95

Figure 2-5 on the vertical axis: COEFFVAR values (mm) for alternating onset and identical onset word pairs. Bars on the horizontal axis: coda#onset, presented for the different target (upper part) and non-target (lower part) articulators tongue tip (TT), tongue dorsum (TD) and lower lip (LL) in 4 different vowel contexts. Error bars represent standard error.

3 Chapter 3

96

97

The Effect of Phonetic Context on the Dynamics of Intrusions and Reductions

Anneke W. Slis7, and Pascal H.H.M. van Lieshout

Department of Speech Language Pathology, Oral Dynamics Lab, 160-500 University Avenue,

University of Toronto, Toronto, Ontario, M5G 1V7, Canada

7 Author, to whom correspondence should be addressed: Electronic mail: [email protected]

98

3.1 Abstract

Recent kinematic studies have described speech errors as intruding articulatory movements during the production of target constrictions intended to create a particular sound as well as reduced movements of these target constrictions. Because little is known about how co- articulatory constraints affect these errors, the current study examined whether these intrusions and reductions are affected by phonetic context. A repetitive speech task was employed with

CVC-CVC word pairs, which differed in their onsets and shared their rhymes. “Phonetic context” was manipulated by changing the rhyme part across different word pairs using specific vowel and consonant combinations. Vertical movements of tongue tip, tongue dorsum and lower lip were recorded with Electro-Magnetic Articulography (EMA). A measure of relative difference in position for an articulator comparing the production of target constrictions versus non-target constrictions for a given word pair was correlated with number of intrusions. The results revealed that in the vicinity of /ɑ/ and /æ/, articulatory movements resulted in more intrusions than in the vicinity of /u/. Furthermore, the tongue dorsum showed more intrusions in the context of /æ/ and /ɪ/ than the tongue tip and the lower lip in these contexts. These intrusions were correlated with the relative difference in position of the tongue dorsum for target versus non-target constrictions. The results suggest that speakers adjust their vocal tract space to accommodate for the fact that articulatory factors related to vowel context and type of articulator affect the number of intrusions. Such influence was not seen for reductions. The results are explained within a task-dynamical framework.

Keywords: intrusion, reduction, speech error, Task Dynamics, phonetic context, variability, gesture

99

3.2 Introduction

For over more than a century, speech errors have been a fruitful phenomenon for scientists to study and infer how speech is planned and produced. These inferences have made it possible to develop speech production models, based mainly on traditional perceptual methods used to investigate those errors. These studies have suggested that many of the speech errors follow certain predictable patterns, thought to originate predominantly at the phonological level of planning, prior to executing the actual articulatory movements and unaffected by preceding or following speech segments (Dell & Sullivan, 2004; Fromkin, 1971; Goldrick & Blumstein,

2006; Goldstein, Pouplier, Chen, Saltzman, & Byrd, 2007; Kent, 1996; Levelt, Roelofs, &

Meyer, 1999). Recent kinematic studies question some of the assumptions on the discrete phonological nature of speech errors in showing that certain speech errors arise as a consequence of autonomous mechanisms, generally observed in movement coordination

(Goldstein, et al., 2007). The nature of these errors differed substantially from what has been claimed based on traditional perceptual studies. Instead they manifested themselves in the form of so-called intrusions and reductions of articulatory movements. Intrusions were defined as significantly enhanced movements of articulators that were not expected to be activated, given the intended speech segment (e.g., movement of the tongue tip during the onset of the word cop), whereas reductions were defined as significantly reduced movements of (a set of) intended articulators (e.g., movement of the tongue tip during the word top). The term “significant” in this context refers to the fact that such events were identified on the basis of distributional characteristics of movements for these articulators (see current definitions of speech errors and historical background for more detail). The general autonomous mechanisms of movement coordination, which are claimed to cause these intrusions and reductions, arise from the interaction between the individual components as they are coupled together in performing a

100 specific task (cf. Kelso, 1995; Peper & Beek, 1998/1999). However, physical differences between structural components, such as found between individual (sets of) articulators, can influence coordination dynamics (Kelso, 1995). This raises the question whether different combinations of articulatory movements influence movement coordination such that certain intrusions and reductions occur more frequently than others, depending on phonetic context

(vowel and consonant combination) and type of articulator. Examining the effect of context and type of articulator on patterns of intrusions and reductions makes it possible to separate these influences from those originating from different aspects of processing, such as word frequency or other lexical aspects often observed in error patterns (e.g., Corley, Brocklehurst & Moat,

2011; Goldrick, Baker, Murphy & Baese-Berk, 2011; Nooteboom & Quene, 2007). Knowledge about how context and different articulators affect intrusion and reduction patterns is thus important for developing and extending these models of speech and language production.

3.2.1 Current definitions of speech errors and historical background

In general, a speech error can be defined as an utterance produced by the speaker that differs from the originally intended one. Many studies have revealed that a common type of speech error manifests itself perceptually as a wrongfully selected or misplaced speech unit, such as a phoneme or feature (Dell, 1986; Fromkin, 1971; Shattuck-Hufnagel, 1979). An example of such an error is "teep a cape" for intended "keep a tape" (Fromkin, 1971). Apparently, two phonemes have been switched, namely /t/ and /k/. Such errors have most commonly been explained as the consequence of abstract, discrete units, such as phonemes and features, that were wrongfully selected at the phonological level and that were subsequently executed in an apparently correct way (Dell & Sullivan, 2004; Fromkin, 1971; Levelt, 1989; Meyer, 1992; Shattuck-Hufnagel,

1979).

101

The dilemma with the claim that speech errors are phonemic in nature is that the majority of studies investigating these phenomena have transcribed errors perceptually. It is known that transcribing perceived speech introduces listener biases and that such biases may lead to possible sub-phonemic or articulatory errors going unnoticed (Kent, 1996). This results in labeling perceived errors as consisting of correctly produced but misplaced phonemes or features (cf. Cutler, 1981; Laver, 1980; McMillan, Corley, & Lickley, 2009; Pouplier &

Hardcastle, 2005). In contrast, findings from studies which examined acoustic data (Frisch &

Wright, 2002; Goldrick & Blumstein, 2006; Laver, 1980; Goldrick, Ross Baker, Murphpy &

Baese Berk, 2011), electro-muscular activity (Mowrey & MacKay, 1990), and movement data from individual articulators (Boucher, 1994; Frisch, 2007; Goldstein et al., 2007; McMillan &

Corley, 2010; McMillan et al., 2009; Pouplier, 2008) indicated that speech errors frequently resulted in physical events that did not correspond to discrete speech segments. Using X-ray data, Boucher (1994), for example, observed that a speaker produced an extra labio-dental motion for /v/ in the case of a correctly perceived /r/. This extra insertion was also observed in a tongue twister study by Mowrey & MacKay (1990) when measuring muscle activity. Even more so, errors were gradual in nature; they could range from almost no activation of articulators or muscles to full activation.

In addition to the evidence that errors did not always result in correct speech segments, several studies have shown that type of constriction can affect error patterns. Laver (1980), for example, revealed that producing words with different vowels that required activating similar muscle groups never resulted in errors whereas words with two vowels whose articulation triggered the use of different muscle groups resulted in an averaging of muscle activations for the two vowels.

Furthermore, Frisch and Wright (2000) observed that with the twister sung zone Zeus seem, the word seem was never produced as voiced and thus never contained a (gradual) voicing error. A

102 more systematic series of kinematic studies (e.g., Goldstein et al., 2007; Pouplier, 2008) revealed several interesting patterns that necessitated a completely different approach on how to explain the origin of speech errors.

The series of studies by Pouplier and colleagues (2007; 2008) employed a speech task in which speakers had to repeat word pairs like cop top. Speakers frequently activated a particular articulator which constricted the vocal tract in a way appropriate for a specific speech segment, simultaneously with an articulator that was not supposed to be activated at that time (these behaviors were labeled intrusions). In the word cop this meant that a tongue tip was activated simultaneously with a tongue dorsum during /k/. Likewise, although less frequent, articulations were observed in which the movements of target constrictions were reduced (these behaviors were labeled reductions). For example, in the word cop the movement of the tongue dorsum was reduced during the production of the onset consonant /k/. These instances of intruding non- target and reduced target constrictions differed substantially from movement patterns that were considered normal, based on distributional characteristics (using means and standard deviations of target and non-target constrictions). Similar to phenomena observed in acoustic error studies on voicing (Frisch & Wright, 2002) and voice onset time (Goldrick & Blumstein, 2006), the intruding and reduced movements varied between gradual and full manifestations. Speaking rate

(Goldstein et al., 2007) and word structure (Pouplier, 2008) affected the number of intrusions and reductions significantly: more of them appeared at fast speaking rates (Goldstein et al.,

2007) and CVC syllables with different onsets but similar vowels and coda consonants showed more occurrences of these events than CV syllables, which only shared vowels (Pouplier, 2008).

In addition, these type of errors increased over the course of a trial with more errors found at the end, especially in the fast rate condition. Most importantly for the current study, the type of vowel also significantly affected the number of these errors: more errors were detected in the /ɪ/

103 condition (e.g., kip tip) than in the /ɑ/ condition (e.g., cop top).

3.2.2 Theoretical framework

The gradual nature of the above-described intrusion and reduction errors is problematic for models in which discrete speech units are considered to be wrongfully selected at a phonological level and subsequently executed as correct speech segments (see also Goldrick & Blumstein,

2006; Goldstein et al., 2007; Mowrey & MacKay, 1990). However, there are other models that can potentially account for their gradual nature. The cascading model posits that several symbolic phonological representations are activated simultaneously and that both activations cascade to the next level (Goldrick & Blumstein, 2006; McMillan et al., 2009; for an overview see Pouplier & Goldstein, 2010). The resulting variability in speech productions reflects the level of activation of the individual phonological representations. This model can explain the gradual nature of errors, however it cannot account for the fact that errors increase over the course of a trial (Pouplier & Goldstein, 2013). The other model that could potentially address the presence and nature of intrusions and reductions is Articulatory Phonology (Browman &

Goldstein, 1989; Goldstein, Byrd, & Saltzman, 2006), which will be discussed next in more detail as this forms the main theoretical background for the current study.

3.2.3 Articulatory Phonology (AP)

Most of the findings are consistent with principles outlined in the theory of AP (Browman &

Goldstein, 1989; Fowler & Saltzman, 1993; Goldstein et al., 2006; Goldstein et al., 2007), in which the basic units of speech production are articulatory gestures. Combined with principles from Task Dynamics (Goldstein, et al., 2006; Haken, Kelso, & Bunz, 1985; Peper & Beek,

1999; Peper, Beek, & van Wieringen, 1995a and b; Saltzman, 1995) this theoretical framework

104 will be used in the current study to predict and discuss its findings. An important aspect of AP is that the cognitive components of speech planning and their physical implementations form the macroscopic and the microscopic dimensions of the speech production system (Browman &

Goldstein, 1995). This property makes it possible to explain speech production in a manner such that linguistic factors and physical aspects of (articulatory) movements share certain dynamic aspects that can be mapped onto each other in a transparent, reciprocal manner (Browman &

Goldstein, 1995).

A gesture is a dynamically defined unit of speech that embodies speech events at the phonological as well as at the articulatory level (Goldstein, et al., 2006; Tilsen & Goldstein,

2012). For the latter to reach a linguistically specified goal, a gesture constrains the movements of a set of articulators in the vocal tract. For example, to produce a /t/, the jaw, tongue tip, and tongue body are coordinated in such a way that the correct constriction degree (for a stop) and location (alveolar) for the consonant is reached within the appropriate time frame. Gestures are thus spatially as well as temporally specified and are able to overlap in both dimensions.

Consequently, an intrusion can be explained in the context of AP as the consequence of two simultaneously activated, overlapping target and non-target constrictions by different articulators. Likewise, reductions can be explained as gestural constriction actions that fail to reach their linguistic goal.

The finding that the rate of errors increases over time can be accounted for in the task dynamical part of AP, where each gesture is associated with a limit cycle oscillator. This means that similar to a damped mass spring system, the model claims that gestures act as point attractor systems in creating and releasing constrictions (Saltzman & Byrd, 2000; Saltzman & Munhall, 1989).

Learned over time, these oscillators are functionally coupled in a lawful manner to construct

105 larger units including syllables and words (Goldstein et al., 2006; Goldstein et al., 2007). How coupled gestures behave follows general principles of coupled oscillators described in models of dynamical systems (Haken, et al., 1985; Peper & Beek, 1999; Peper, Beek, & Stegeman, 1995;

Saltzman, 1995). Coupled oscillators are most stable in a 1:1 frequency mode. In the case of, for example, the sequence top top, the tongue tip and lower lip are in a 1:1 frequency mode: two tongue tip constrictions in onset position versus two lower lip constrictions in coda position.

However, in a word pair such as cop top, there is a 1:2 relationship between the single tongue tip and tongue dorsum constrictions in onset position and the two lower lip constrictions in coda position. This 1:2 ratio is deemed less stable than the 1:1 ratio (Peper & Beek, 1998; Peper &

Beek, 1999; Peper, et al., 1995a). As a consequence, Goldstein et al. (2007) theorized that extra constricting gestures were added to the target onset constrictions to make the articulatory movement patterns more stable (see also Peper, et al., 1995b; Van Lieshout, Hijl, & Hulstijn,

1999). Normally, the 1:2 ratio can be maintained as part of the required phonological specifications, but under certain conditions, for example at higher speaking rates, the coupling between the gestures weakens (for similar claims re limb control see e.g., Peper & Beek,

1998/1999). Over time, this may cause the system to switch to a more stable mode by adding non-target gestures. The increase in intrusions towards the end of a trial is thus explained by the fact that language specific constraints, namely the learned gesture and its coupling relations with other gestures, compete with extra-linguistic, dynamic principles of entrainment typical of coupled articulators in general (Goldstein et al., 2007). In this context, the term “speech error” is misleading. The actual behaviors (intrusions/reductions) can be considered appropriate from a dynamical perspective (they enhance stability), but from a phonological perspective, they are inappropriate as they serve no meaningful linguistic goal. Since the traditional terminology uses the term “error” we will adhere to that label, but remind readers to keep this important

106 distinction in mind.

An important aspect of the mechanism that generates intrusions and reductions is that it will arise independently from the types of muscles or articulators involved without the need for further specification: it is a purely autonomous process based on the interaction between the coupled structures (Saltzman, 1995; Peper, Beek & Stegeman, 1995). However, the specific implementation of the coupling relationship (which for ease of interpretation we will refer to as coordination) is affected to some degree by the physical properties of individual components

(Kelso, 1995). For example, differences in inertia of coupled oscillators will affect the stability of the coupling, especially when speeding up (Van Lieshout & Neufeld, 2014). From this, one can infer that phonetic context (as defined by combinations of articulators with often different inertia properties, like for example tongue and lips) affects the stability of coordination and thus, the occurrence of intrusions and reductions. This is shown in an ultrasound study by Pouplier

(2008), where it was observed that the tongue dorsum intruded more frequently than the tongue tip. In addition, Goldstein et al. (2007) showed context sensitivity in the occurrence of intrusions and reductions with regard to articulatory properties of the following vowel; in the context of a high vowel /ɪ/ more errors were measured than in a low vowel /ɑ/ context. For the latter finding, it was suggested that the tongue shape of the vowel /ɪ/ was more compatible with /k/ and /t/ productions than the tongue shape of /ɑ/. During the production of /ɪ/, the tongue constricts the vocal tract to a greater degree. The relatively high tongue position influenced by high vowel context may have resulted in more errors. Alternatively, an anonymous reviewer (Goldstein et al., 2007) suggested that the larger spatial distance between the constriction locations of a low vowel and either consonant might enable a resetting effect, i.e. reestablish its tongue position, such that a low vowel context is less error prone.

107

The question arises how to explain these phonetic context effects on intrusions and reductions in the framework of AP. In AP the underlying linguistic requirements of a gesture are context- independent; context-dependency is automatically introduced by temporally and spatially overlapping gestures, which either share articulators or differ in types of articulators (Fowler &

Saltzman, 1993; Goldstein, et al., 2006; Saltzman, Nam, Goldstein & Byrd, 2006). When gestures impose contradictory requirements on a shared articulator, the activation is blended or averaged (Fowler & Saltzman, 1993; Saltzman & Munhall, 1989). The extent to which overlapping gestures share the same articulators and the mechano-inertial properties of these articulators affect the degree of articulatory variability (Fowler & Saltzman, 1993; Iskarous,

Fowler, & Whalen, 2010; Recasens & Espinosa, 2009). Regarding the kip tip and cop top stimuli from Goldstein et al. (2007), the tongue tip gesture to form the onset /t/ of the second word consists of the articulators tongue tip and lower jaw. However, almost simultaneously with this onset, the vowel gesture starts, consisting of jaw and tongue body contributions, to produce the vowel /ɪ/. Thus, the tongue tip and vowel gestures share jaw contributions. In the case of a high vowel such as /ɪ/, the jaw will be in a higher position than with a following lower vowel /ɑ/

(Recasens, 1999). This position of the jaw passively affects the tongue as well (Mooshammer,

Hoole, & Geumann, 2007). When a bilabial consonant is produced, for instance during the onset of the first word in pat cat, the tongue is free to move and will be influenced more by the surrounding vowels than when a velar or alveolar consonant is produced. This phenomenon is referred to as co-articulatory resistance (Iskarous et al., 2010; Recasens & Espinosa, 2009) and can be seen as the ability to resist disrupting influences from context (Fowler and Saltzman,

1993). The more a gesture is able to resist disrupting influence to more aggressive it is in disrupting other gestures (Fowler & Saltzman, 1993; Recasens & Espinosa, 2009).

108

3.2.4 Current study

The current study has three distinct objectives:

1. Because of the relatively small number of studies that confirm principles of coupling

dynamics for speech movements, the first objective is to validate the results of the study

by Goldstein et al. (2007) with more participants and a different approach to define

intrusions and reductions. We used a different approach because previous work has

shown that the use of so-called control stimuli is problematic for a variety of reasons and

intrusions and reductions need to be defined based on clear task specifications for all

articulators involved in a given word pair (Slis & Van Lieshout, 2013).

2. The present study explores in more detail how phonetic context impacts intrusions and

reductions. Given the observation that more intrusions/ reductions were found for stimuli

with /ɪ/ compared to /ɑ/ (Goldstein et al. 2007), it is investigated if this effect is specific

to /ɪ/ or if other vowel/consonant combinations have a systematic influence on the

occurrence of intrusions and/or reductions. These different combinations are

hypothesized to result in divergent movement patterns of target and non-target

constrictions. Based on the degree to which active articulators are able to resist

disruptions, some of these combinations might trigger more intrusions and/or reductions

than others. In the study by Goldstein et al. (2007), intrusions and reductions were

collapsed into a single category. Thus, it is not clear whether the following vowel affects

intrusions, reductions, or both. For this reason, in the current study the impact of

phonetic context is studied separately for both categories. This is relevant because

vowels differ in more than one way from each other. For example, the vowels /ɪ/ and /ɑ/

109

are distinguished by height, and they are also distinguished by their position along the

front (/ɪ/) and back (/ɑ/) dimension. Therefore, in order to explore different dimensions

of vowel production that might cause changes in intrusion/ reduction ratios, the stimulus

material consists of four different vowels: a high-back /u/, low-back /ɑ/, low-front /æ/,

and high-front /ɪ/8. Furthermore, unlike previous studies we created more variation in

consonants used. In addition to alveolar /t/ and velar /k/, a bilabial stop /p/ was included

in the material. Whereas Goldstein et al. (2007) did not find a difference in the number

of intrusions involving tongue tip (during onset /k/) or tongue dorsum (during onset /t/)

gestures, an ultrasound study revealed a clear trend for the tongue dorsum to be more

often involved as the intruding gesture compared to the tongue tip (Pouplier, 2008).

Given that a larger set of onset and coda consonants are employed in the current study, it

is possible to explore phonetic context in a broader sense, as different combinations of

consonant combinations influence the movements of target and non-target constrictions

(Recasens, Pallares, & Solanas, 1993). For example, the two different coda#onset

combinations /p#k/ (top#cop) and /k#p/ (tock#pock), share the tongue tip as the intruding

articulator in the second word. Thus, it can be determined if the nature of this

combination changes the occurrence of tongue tip intrusions. Together, vowel and

consonant variations were used in the current study to explore the impact of phonetic

context on intrusions and reductions. Changes can be predicted based on the principles

of coupling dynamics in which the presence of different articulator combinations may

change the stability of the coupling between gestures with and without target

8 Rounding and the tense/ lax distinction are other characteristics that distinguishes these vowels, but this factor is not explored in the current study.

110

constrictions for a given word position. The exact nature of these changes is harder to

predict, as there is still little known about the physical properties of articulators and their

impact on coupling dynamics (Van Lieshout & Neufeld, 2014).

3. The third objective is to determine to what extent specific differences in articulatory

movement patterns can predict the occurrence of intrusions and reductions. To this end,

the relative difference in position of a particular articulator during the production of

target and non-target constrictions at the onset of a word within a given word pair is

considered as a predictor for the number of intrusions and reductions. It is hypothesized

that a smaller difference indicates a higher position of a non-target articulator. This

factor is deemed relevant given the claim that the narrow lingual constriction of the

vocal tract, i.e. a higher position of the tongue, caused more intrusions and/or reductions

in the /ɪ/ condition (Goldstein et al. 2007).

3.3 Method

The methods of this study are for the most part identical to those described in Slis & Van

Lieshout (2013). The main aspects shared between the two studies are highlighted and we provide more detail on differences in design.

3.3.1 Participants

Data from fourteen monolingual speakers of Canadian English, 7 male and 7 female, ranging between 19 and 45 years of age, were included in the study. Originally, 21 participants participated, but 7 participants were excluded for the following reasons: (1) loss of too many data points, for example, due to a broken coil (n = 1); (2) participants were unable to perform the task (n = 1) or failed to complete both sessions when run on separate days (n = 4); (3) one

111 participant did not meet the inclusion criteria outlined immediately below, which only became apparent after the study was completed.

A speaker was considered monolingual when Canadian English was the only language spoken at home and the main instructional language in school during childhood. To participate in this study, a speaker had to report normal vision (after correction), since part of the test involved reading the stimuli, and no history of speech, hearing or language difficulties. All participants gave written informed consent to participate in the study and received monetary compensation to participate. The study was approved by the Health Science Research Ethics Board at the

University of Toronto.

3.3.2 Stimuli

The stimulus material consisted of word pairs in which the rhymes of the two words were identical but in which the onset consonants alternated, such as in cop top. For the purpose of a different study (Slis & Van Lieshout, 2013), an extra set of word pairs was included in which the onset consonants were identical, such as in top top. These will not be discussed further in this paper. The word pairs differed in their vowel and consonant combinations. The combinations of onset consonants included /k/-/t/, /p/-/t/, and /p/-/k/. The coda consonant consisted of the third remaining consonant, /p/, /k/ or /t/. The vowels were /æ/ as in cat, /ɪ/ as in kit, /u/ as in coot, and /ɑ/ as in cot.

Different vowel and coda#onset consonant combinations form the context. An intrusion by a specific non-target articulator can be measured during the onset of two different coda#onset consonant combinations. For example, in the word pair top#cop, the tongue tip is the non-target articulator during the onset /k/ of the second word cop and is preceded by a coda /p/; in the

112 production of the second word in tock#pock, the tongue tip forms the non-target during the onset

/p/ of the second word, preceded by the coda /k/. In these two examples, the underlined consonant clusters are the coda and onset combinations, /p#k/and/k#p/, when the tongue tip is the non-target articulator in the second word. A reduction involves a target constriction during an onset consonant, for example, the articulation of the tongue dorsum in the onset of the second word in top#cop and pot#cot, produced in the context of two different codas, /p/ and /t/ respectively.

All words appeared in both first and second position of a word pair (e.g., cop top and top cop).

Stress always fell on the first word of the pair. In addition, two different speaking rates, normal and fast, were employed (see 3.3.3 for a detailed description). The total number of stimuli, used in the current study, was 48 (3 onsets * 4 vowels * 2 orders * 2 speaking rates). For the complete stimulus list, see the appendix9.

3.3.3 Procedures

Similar to Goldstein et al. (2007), a repetitive speech task paradigm was used in which participants repeated a word pair as frequently as possible for up to 17 repetitions. Two different speaking rates were employed, normal and fast, determined individually for each participant as follows. The participant was instructed to repeat the word pair mik nik as often as possible on one single breath at a comfortable speaking rate. This served as an estimate of his/her normal rate. Next, the speaker had to perform the same task as fast as possible, until it became

9 The resulting word pairs consisted of a combination of low and high frequency words. Although the results of Goldstein et al.

(2007) do not suggest that this will be an issue (kip has a low frequency in the language compared to tip, but no difference was found in intrusion or reduction rate), an extra analysis will be performed to check for possible frequency effects.

113 impossible to keep repeating the words. Ninety percent of that rate was taken as the high speaking rate condition for the actual study, because the goal of the study was to investigate whether higher speaking rate introduced more intrusions and reductions. These intrusions and reductions cannot be measured when the speaker is not able to perform the task in an adequate way. During the experiment, a combination of a synchronized visual and auditory metronome provided feedback to the participants to help them control speaking rate (e.g., Repp & Penel,

2002). A complete word pair needed to be produced during a single beat of the metronome.

Both the word pair and the visually and auditory metronome signals (1000 hz) were present for the entire trial.

The word pairs, presented at a font size of 100 points (approximately 3.35 cm per letter), appeared in the center of a 19-inch monitor, located approximately one meter in front of the participant. The first word of each pair was shown in capital letters to indicate it had to be produced with primary stress, whereas the second word was presented in lowercase. The visual metronome consisted of a green round dot that appeared above the word. Before the actual trial started, four red dots blinked simultaneously with the auditory metronome signals so the participant had time to prepare.

Articulatory movement data were collected with the AG500 EMA system (Carstens

Medizinelektronik GmbH, Germany). Sensor coils were placed on the mid-sagittal vermilion border of the upper and lower lips, the jaw, the tongue tip (1 cm behind the apex), the tongue body (3 cm behind the tongue tip coil), and the tongue dorsum, as far back on the tongue as possible, based on how well the participant could tolerate a coil at that location (for a detailed description see Slis & Van Lieshout, 2013). For the purpose of this study, only movement data from the tongue tip, tongue dorsum and lower lip were analyzed.

114

Original amplitude signals were sampled at 200 Hz. Next, using complex algorithms provided by the manufacturer of the device, 3D positions over time were calculated for each coil

(Yunusova, Green, & Mefferd, 2009). The positional data were stored in the computer together with a simultaneously recorded acoustic signal, sampled at 16 KHz. At the same time, a Marantz digital recorder (type PMD670) recorded the speech signal together with the metronome signal at 48 KHz for future acoustic analysis.

3.3.4 Analysis

Movement data were pass-band filtered between 0.5 and 6 Hz according to standard procedures in the lab (Van Lieshout, Bose, Square & Steele, 2007). Next, articulatory movements were normalized to eliminate possible inter-speaker variations related to differences in vocal tract size so that data could be compared across trials and participants (Goffman, Gerken, & Lucchesci,

2007; Mooshammer & Geng, 2008; Namasivayam, Van Lieshout, McIlroy, & de Nil, 2009;

Ostry, Cooke, & Munhall, 1987; Smith, Goffman, Zelaznik, Ying, & MacGillem, 1995). To this end, movement data were scaled relative to the values of the highest and lowest movement positions on a trial-by-trial basis, separately for each articulator. This method preserves the relative differences between target and non-target constrictions within a trial.

To select the onsets of individual words forming the word pairs, trial data were segmented. The minima of the movement cycle for the coda consonant formed the boundaries for the word onset. Within these boundaries, maximum and minimum relative movement ranges of the individual articulators, “tongue tip”, “tongue dorsum” and “lower lip”, were located (figure 3.1).

115

Figure 3-1 Waveform and two words of one word pair “pack tack”. The vertical axis displays relative movement range (%) of the individual articulators lower lip (LL, solid line), tongue dorsum (TD, dashed line), and tongue tip (TT, dotted line). The maximum lower lip (indicated with a downwards pointed arrow) and the maximum tongue tip (indicated with a downwards pointed arrow) form the two maximum points for target constrictions. The non-target constrictions are located at the black squares (See text for more details).

The movement ranges of target articulators were calculated based on the travelled distance of an articulator from the minimum position to the maximum position within a segment. The movement range of a non-target articulator was measured at the location where the target articulator was at its maximum position (Slis & Van Lieshout, 2013).

The first and last repetitions of each trial were always disregarded, as these productions may behave somewhat differently compared to the rest of the trial (Slis & Van Lieshout, 2013).

Given a possible maximum of 17 repetitions within a trial, this resulted in a maximum of (17-2

=) 15 target and non-target constriction values for a specific articulator. The median number of repetitions for a given trial (minus first and last repetition) across all participants was 15. In total, 40 trials out of 1344 (2.98 %) were discarded, leaving a total of 1304 trials for analysis.

Discarded trials included trials in which the participant severely stumbled on the words for the entire trial or where the participant did not start speaking at all. Repetitions from trials were not disregarded when the following criteria were met (see also Slis & Van Lieshout, 2013): 1) when a trial was performed fluently and could be segmented into first and second words but audible

116 errors occurred, as in coptop coptop toptop, these errors were included. 2) When an error disturbed the flow of the word pairs, for example coptop coptop top coptop, the extra inserted top was disregarded. 3) When the participant managed to repeat several word pairs correctly but not all, data were still analyzed for the correctly produced part. Based on a total of 1304 trials,

42 trials (3.2%) consisted of 6 to 10 fluent repetitions and 8 trials (0.5 %) consisted of 5 or less fluent repetitions.

3.3.4.1 Outlier criteria for defining intrusions and reductions

Based on the way errors were interpreted in the series of studies by Pouplier and colleagues, the question arises how to define intrusions and reductions. McMillan and Corley (2010) explained non-canonical errors as arising from competing phonological representations activated simultaneously, cascading down to the level at which these representations are translated into articulatory movements. This results in producing speech segments that include properties of more than one phonological representation, which increased variability at the articulatory level.

Rather than defining this increased variability as incorrect, the authors described the resulting variability in terms of articulation fluctuating along a continuum without setting a specific threshold that can be used to define an outlier as an intrusion or reduction. Contrary to

McMillan and Corley (2010), however, our study uses a statistically defined threshold to define intrusions and reductions to reveal asymmetries similar but not identical to the approach used by

Goldstein et al. (2007). The intrusions and reductions are defined in terms of outliers from a movement range distribution of articulators in a specific context on a trial-by-trial basis (see also

Goldrick, Ross Baker, Murphy & Bease-Berk, 2011). Stimuli in a given trial with alternate onsets provide specific target constriction specifications for the relevant articulator in one word and a non-target position for the same articulator in the other word. For example, in a sequence

117 like cop top, the tongue dorsum reaches the target position during the onset of cop versus a non- target position during the onset of top. Over the course of several repetitions within a trial, the produced movements form separate distributions for target and non-target constrictions. As shown in Slis and Van Lieshout (2013), trials with non-alternating onsets (e.g., top top) differed significantly from trials with alternating onsets to such an extent that comparing non-target constrictions for a specific articulator in those trials to non-target constrictions in alternating trials was deemed inappropriate. Using trials that contain both target and non-target constrictions for the same articulator can be assumed to provide a cleaner estimate of true variation in such positions as such trials force the speaker to enhance the underlying linguistic contrast. This also speaks to the fact that to date there is no evidence that intrusions and reductions should be considered categorical events. Instead, at best one can define statistical criteria to identify those events that likely reflect the proverbial “tip of the iceberg”. That is, choosing a robust and even somewhat more conservative statistical technique in identifying these events seems a more prudent approach by treating them as outliers within a continuum of normal variation in constriction degree. Therefore, the current study defines an intrusion or reduction as a statistical outlier in a distribution of normalized movement ranges for target constrictions (i.e. a value below a certain normalized movement range threshold counts as a reduction), and non-target constrictions (i.e. a value above a certain normalized movement range threshold counts as an intrusion). To this end, median estimates of successive target articulator maxima as well as of the simultaneously occurring non-target articulator values within individual trials were calculated for the normalized movement ranges. The median instead of the mean was used, as the former measure has been shown to be less sensitive to outliers in distributions and thus provides a more robust estimate of the true distribution central moment

(Chau, Young, & Redekop, 2005). This way, actual outliers can be detected more clearly as they

118 would be further removed from the median value of such a distribution. Next, the Median

Absolute Deviation (MAD), expressed in percentages, was calculated as follows (Chau et al.,

2005) where Med(X) is the median of the sample:

MAD(X) = med (X –med(X))

Values two MADs above the non-target median were considered outliers that could be labeled intrusions; values below two MADs from the target median were considered outliers that could be labeled reductions (see figure 3.2).

Figure 3-2 15 overlaid repetitions of the word “pat cat”. The vertical axis represents the normalized displacements of the lower lip (LL), tongue dorsum (TD), and tongue tip (TT). The horizontal axis represents time, normalized for presentation purposes only. The squares indicate the location of the target articulator constriction maxima, the circles the location of the non-target articulator. The two arrows indicate two examples of intrusions based on the Median value of distribution plus 2 MADs. For the second word, there were no outliers as defined by this approach.

Since one of the objectives was to replicate the data available in the literature regarding the change in the relative frequency of outliers over the course of a trial (Goldstein et al., 2007), the trials were split up in three separate fragments after removing the first and last repetition. The first fragment consisted of the first 5 repetitions of a trial; the second fragment, the middle of the

119 trial, consisted of the next 5 repetitions; the last fragment, the end of the trial, included the remaining 5 repetitions. As speakers were not always able to finish all trials, the number of repetitions per trial differed. To correct for this difference, the number of outliers in each fragment of the trial was divided by the number of repetitions in that particular fragment, resulting in a ratio of outliers per trial fragment. When there were 10 or fewer repetitions in the whole trial the last fragment was non-existent and this fragment was disregarded in the analysis.

To assess to what extent differences in phonetic context predict the occurrence of intrusions and reductions, a measure was used to estimate the difference in movement range between a target versus non-target constriction for a given articulator within the domain of a word pair. This factor is deemed relevant given the claim that the narrow lingual constriction of the vocal tract caused more outliers in the /ɪ/ condition (Goldstein et al. 2007), as outlined in the introduction.

It was hypothesized that a narrower constriction resulted in a smaller difference. To this end the relative difference, expressed as a percentage, between the median value of a target constriction movement range (e.g., for the tongue dorsum in the onset /k/ in cop in the word pair cop top) and the median value of the same articulator for its movement range in non-target position in the next word of the pair (e.g., the tongue dorsum in the /t/ of top) was calculated. Difference measures have been used previously (see e.g., McMillan & Corley, 2010; Zharkova & Hewlett,

2009). Zharkova and Hewlett (2009) used distance calculations to compare tongue surface outlines of ultrasound data of two identical phonemes in different contexts to measure co- articulatory effects. McMillan and Corley (2010) expressed variability in how speech segments were produced as a measure of deviance. Deviance scores are a measure similar to Euclidean distances between two different means. In their study, the deviance scores between the mean

VOT of onsets in tongue twister stimuli and the mean VOT of onsets in control stimuli were calculated; accordingly, for their ultrasound recordings, the sum of the Euclidean distances was

120 determined between each corresponding pair of frames.

Statistical analyses were performed with the NCSS statistical package (version 8.0.13), the specifics of which are described in the results section. For all tests we used an alpha value of

0.05 as our statistical threshold for significance (which was adjusted by Bonferroni corrections where appropriate).

Before explaining the results in light of articulatory mechanisms in the next section, a possibility exists that frequency effects may have affected the results. Several studies have shown that bigram frequency affected aspects of speech planning and production (Mooshammer, Goldstein,

Nam, McClure, Saltzman, & Tiede, 2012; Bose, Van Lieshout & Square, 2007). This raises the possibility that such frequency effects will play a role in creating some of the patterns hypothesized to arise in our study10. For this reason, a canonical correlation was run between mean bigram frequencies11 of the non-target word (e.g., the mean bigram frequency of the word top when cop was produced) and the intrusion ratios of the non-target articulator (in this example, the intrusions of the tongue tip during cop). This correlation turned out not to be significant (R (22) = 0.08, p = 0.71379, Wilks' Lambda 0.99), indicating no systematic relationship between bigram frequency and outlier ratios.

10 “Bigram frequency” was used instead of “word frequency” because some of the words were nonwords. All the CV and VC combiantions exist in the and are thus more indicative of how skilled speakers are in performing these CVC combinations.

11 Retrieved from the English Lexicon Project Website (http://elexicon.wustl.edu/)

121

3.4 Results

Four separate Repeated Measures Analysis of Variance (RM-ANOVA) were performed for ratios of intrusions and reductions (i.e., statistical outliers based on distribution movement ranges for target and non-target constrictions), using in one analysis “rate”, “word position” and

“part of trial” as within-subject factors, and in the other analyses “coda#onset”, “vowel” and

“type of articulator” as within-subject factors. To reduce the type I error, the alpha level at which a main or interaction effect was considered significant was adjusted using a Bonferroni correction, resulting in an adjusted alpha level of (.05/4 =) 0.0125. Tukey Kramer Post hoc tests were performed for significant main effects at an alpha level of 0.05. All the results are reported in table 3.1.

Table 3-1 Results for all 4 RM-ANOVA’s: Degrees of Freedom, F-values and p-value for reductions (left columns) and intrusions (right column). Asterisks indicate significance at a level of *p < 0.0125, ** p < 0.001, ***p<0.0001.

Reductions Intrusions

df F p F p

Rate 1, 13 6.63 0.02 1.16 0.3

Trial position 2, 26 27.1 < 0.001** 11.91 < 0.001**

Word position 1, 13 0.29 0.6 1 0.33

Rate* trial position 2, 26 5.567 < 0.01* 6.16 < 0.01*

Rate* word position 1, 13 0.63 0.44 0.33 0.58

Trial position * word position 2, 26 1.08 0.35 2.42 0.11

Rate* trial pos.* word pos. 2, 26 4.92 0.015 0.73 0.49

Type of articulator 2, 26 1.18 0.32 18.33 <0.0001***

Vowel 3, 39 0.07 0.97 4.77 < 0.01*

Coda#onset 2, 26 1.62 0.22 3.01 0.07

Type of articulator * vowel 6, 78 0.61 0.72 3.58 <0.01*

Type of articulator * coda#onset 4, 52 0.01 0.99 0.74 0.57

Vowel * coda#onset 6,78 0.82 0.55 1.06 0.39

Type of articulator* vowel* coda#onset 12, 156 1.77 0.06 0.48 0.93

122

3.4.1 Part 1 Effects of rate, part of trial and word position

Part 1 of the analysis addressed the first objective, namely, replicating previous findings on the occurrence of intrusions and reductions related to speaking rate, word position, and the observed change in the relative frequency of outliers over the course of a trial. For these analyses, data were collapsed across “type of articulator”, “vowel” and “type of coda#onset”.

3.4.1.1 Reductions

As shown in table 3.2, the start and middle part of the trials showed fewer reductions than the final part. In addition, "rate" and "part of trial" interacted significantly: at the end of trials, more reductions were produced at a fast rate than at a normal rate. There was no main or interaction effect for word position.

Table 3-2 Mean ratio of reductions (M) and standard deviations (SD) for target articulators during word 1 and word 2 at a fast and normal speaking rate. Separate values are listed for the start, middle and end of a trial. The values are collapsed across words and participants.

Reductions Word 1 Word 2 Part of trial Fast Normal Fast Normal M SD M SD M SD M SD Start 0.09 0.02 0.07 0.03 0.05 0.03 0.09 0.03 Middle 0.09 0.04 0.08 0.02 0.09 0.04 0.07 0.02 End 0.16 0.07 0.13 0.04 0.18 0.05 0.14 0.06 3.4.1.2 Intrusions

For intrusions, the main effect of “part of trial” was significant as well. In general, the start and middle of the trial resulted in fewer intrusions than the end of the trial. The variables "rate" and

"part of trial" interacted significantly. For the fast rate, the start and middle of the trial differed

123 from the end; regarding the normal rate condition, only the middle and end differed from each other. Moreover, at the start of a trial, the fast rate resulted in significantly fewer intrusions than the normal rate. The middle and end of trials did not show a difference between fast and normal rates. There was no main or interaction effect for word position. The results for the intrusions are summarized in table 3.3.

Table 3-3 Mean ratio of intrusions (M) and standard deviations (SD) for non-target articulators during word 1 and word 2 at a fast and normal speaking rate. Separate values are listed for the start, middle and end of a trial. The values are collapsed across words and participants.

Intrusions Word 1 Word 2 Part of trial Fast Normal Fast Normal M SD M SD M SD M SD Start 0.08 0.03 0.12 0.04 0.08 0.03 0.11 0.04 Middle 0.11 0.03 0.10 0.03 0.11 0.03 0.10 0.04 End 0.15 0.04 0.12 0.04 0.15 0.05 0.15 0.04

3.4.2 Part 2 Effects of vowel, coda#onset, and type of articulator

Part two addressed the second objective, which predicted that asymmetries in the ratio of outliers arise as a consequence of phonetic context. Data were collapsed across “rate”, “word position” and “part of trial”.

3.4.2.1 Reductions

There were no main effects or interaction effects for reductions for any of the factors (see table

3.4).

124

Table 3-4 Mean ratio of reductions (M) and standard deviations (SD) for the target lower lip (LL), tongue tip (TT), and tongue dorsum (TD) in the context of the four different vowels and two different coda consonants. The values are collapsed across words and participants, rate, word position and part of the trial.

Reductions LL TT TD Vowel /k/ /t/ /k/ /p/ /p/ /t/ M SD M SD M SD M SD M SD M SD

/æ/ 0.09 0.03 0.09 0.02 0.12 0.05 0.10 0.05 0.12 0.05 0.10 0.06

/ɪ/ 0.11 0.04 0.08 0.04 0.10 0.04 0.11 0.04 0.12 0.07 0.09 0.03

/ɑ/ 0.10 0.05 0.08 0.04 0.10 0.03 0.12 0.05 0.13 0.07 0.11 0.04

/u/ 0.10 0.04 0.09 0.04 0.08 0.04 0.11 0.06 0.10 0.05 0.11 0.06 3.4.2.2 Intrusions

As can be observed from table 3.5, “vowel” affected the ratio of intrusions significantly. The post hoc test showed that the back vowel /u/ resulted in fewer intrusions than /æ/ and /ɑ/. In addition, a main effect for articulator was found. This was caused by the tongue dorsum, which intruded more frequently than the lower lip and the tongue tip. In addition, the variables "vowel" and "type of articulator" interacted significantly. The tongue dorsum in the /æ/ and /ɪ/ contexts showed significantly more intrusions than in the /u/ context and more intrusions than the lower lip and tongue tip in /æ/ and /ɪ/ context. No other main or interaction effects were found.

125

Table 3-5 Mean ratio of reductions (M) and standard deviations (SD) for the non-target lower lip (LL), tongue tip (TT), and tongue dorsum (TD) in the context of the four different vowels and two different coda consonants. The values are collapsed across words and participants, rate, word position and part of the trial.

Intrusions

TD TT LL

Vowel /p/ /t/ /k/ /p/ /k/ /t/

M SD M SD M SD M SD M SD M SD

/æ/ 0.15 0.06 0.14 0.04 0.12 0.03 0.09 0.03 0.11 0.05 0.11 0.04

/ɪ/ 0.13 0.04 0.15 0.05 0.13 0.05 0.09 0.05 0.10 0.05 0.09 0.02

/ɑ/ 0.13 0.04 0.14 0.05 0.13 0.05 0.12 0.05 0.11 0.04 0.11 0.04

/u/ 0.11 0.03 0.09 0.04 0.11 0.05 0.10 0.05 0.10 0.03 0.10 0.04

3.4.3 Part 3 relative difference

The third objective addressed whether the difference in movement range between target and non-target constrictions for a given articulator within a word pair impacted on ratio of reductions and/or intrusions. Two linear regression analyses were performed between the ratio of outliers and the difference values between target and non-target articulators. The regression analyses were performed separately for reductions and intrusions. The alpha level at which a regression was considered significant was set at 0.0125 (0.05/4).

In the first regression analysis, difference values and outlier ratios were calculated for each participant for each articulator separately, resulting in three mean values for each participant – one for tongue tip, one for tongue dorsum and one for lower lip, for a total of 42 mean values for the relative range difference measure and the ratio of reductions and ratio of intrusions (14 participants * 3 articulators). The values were averaged across “rate”, “part of trial”, “word position”, “vowel” and “type of coda#onset”, This way, it was possible to assess whether certain speakers showed larger or smaller difference values in general and if this variable predicted

126 outlier patterns across speakers (i.e., do speakers who show larger difference values show more intrusions or reductions?).

A second regression analysis was performed, this time on a word-by-word basis. This regression analysis is more likely to reveal the general behavior of articulators when producing the target stimuli (for a critical review on the use of averages in correlations see Monin & Oppenheimer,

2005). To this end, regression lines between the mean outlier ratio during the onset of a word and the relative positional difference were calculated. Data were collapsed across participants,

“rate”, “part of trial”, and “word position”, resulting in 24 values for relative difference values and outlier ratios. These 24 values represent the number of different stimuli used in this study (3 articulators * 4 vowels * 2 coda#onset different combinations).

3.4.3.1 Reductions

The difference in movement ranges between target and non-target constrictions for a specific articulator within a word pair did not predict reductions at the participant level (R2 = 0.0699,

F(1, 41) = 3.0081, p = 0.09). Similarly, for results at the word level (see figure 3.3 and 3.4), no relation was found between reductions and the relative range difference (R2 = 0.0006, F(1, 23) =

0.0142, p = 0.91).

127

Figure 3-3 ratio of reductions by difference measures at the individual participant level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the clear triangles the tongue tip. Numbers refer to individual participants.

Figure 3-4 ratio of reductions by difference measures at the individual word level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the triangles the tongue tip. Vowels correspond to the vowels of the individual words (e.g., the tongue dorsum and the vowel /ɪ/ are involved in the word kip or kit).

128

3.4.3.2 Relative difference and intrusions

The relative difference in movement range between target and non-target constrictions significantly predicted 30 % of the variability of the ratio of intrusions at the participant level

(R2 = 0.2970, F(1, 41) = 16.9007, p < 0.001). As can be observed in Figure 3.5, the largest relative difference values and the highest intrusion rates occurred with the tongue dorsum, especially for participants 25 and 9.

Figure 3-5 ratio of intrusions by difference measures at the individual participant level. The black circles represent the tongue dorsum, the grey squares the lower lip, and the triangles the tongue tip. Numbers represent the participants.

129

Figure 3-6 ratio of intrusions by difference measures at the individual word level. The black circles represent the tongue dorsum, the grey squares the lower lip and the triangles the tongue tip. Vowels represent the vowels of the individual words. E.g., the tongue dorsum and the vowel /ɪ/ can be the word tip or pit.

At the word level, the analysis revealed that the relative difference significantly predicted 30 % of the variability in the ratio of intrusions as well (R2 = 0.3018, F(1, 23) = 9.5074, p < 0.01).

Figure 3.6 shows that intrusions of the tongue dorsum were highest for words in vowel context

/ɪ/ and /æ/ (black circles), in which cases the difference values were highest as well. To illustrate the differences between articulators, regression lines are plotted for each articulator separately in

Figure 3.6. This shows that the tongue articulators behaved differently from the lower lip in that a negative correlation was observed for the lower lip, whereas both tongue articulators showed a positive relationship between relative difference and intrusion ratio. A post hoc analysis revealed that the tongue dorsum showed a significant positive relation between the relative difference and intrusion ratios (R2 = 0.7483, F(1, 7) = 17.8374, p < 0.01) and that 75 % of the variability was explained by this relation. The tongue tip showed a positive trend (R2 = 0.5433,

F(1, 7) = 7.1384, p = 0.04). In contrast, the lower lip displayed a negative trend (R2 = 0.5289,

F(1, 7) = 6.7368, p = 0.04).

130

3.5 Summary and discussion

The findings from the current study reveal several important patterns. Before discussing the individual findings and relating them to the theoretical framework, the main points are summarized as follows:

1. As predicted, more intrusions and reductions are found at the end of a trial,

especially at a fast rate. The ratios of intrusions and reductions are higher at the

end than the start and middle of the trial.

2. Speaking rate affects intrusions slightly differently than expected: normal and

fast speaking rate did not differ at first sight. However, an interaction was

revealed between rate and trial part for the intrusions: at the start of a trial, the

fast rate resulted in fewer outliers than the normal rate. At the end of the trial this

difference disappeared.

3. Word order does not affect outlier ratios, confirming the previous findings from

Goldstein et al. (2007). This effect is very clear and does not warrant further

discussion.

4. Different combinations of vowels and consonants influence the occurrence of

intrusions and reductions. Strong evidence is found for a vowel-related bias:

 In front vowel context the non-target tongue dorsum intruded more than

the non-target lower lip and tongue tip.

 Low front and back vowels resulted in more intrusions than the high back

vowel.

131

5. Intrusions were to some extent predicted by the difference between target and

non-target constriction movement ranges, especially for tongue dorsum.

1) The finding that the ratios of intrusions and reductions are higher at the end of the trial suggests that the coupled articulator movements for a given word pair entrain towards a more stable 1:1 mode over the course of a trial. This replicates the findings from the series of studies by Pouplier and colleagues (2007; 2008) and also demonstrates that the approach in defining outliers as employed in our study is effective in revealing similar underlying mechanisms.

2) Speaking rate affected the ratios of reductions showing slightly higher rates at the fast rate than the normal rate, but this did not happen for intrusions. At first sight, these findings contradict results from the Goldstein et al. study (2007) that found that speakers in general made more reductions and intrusions at a fast rate. These results are also not in accordance with results from an earlier speech error study by MacKay (1971), who also found more errors at a fast speaking rate. However, this contradiction only applies partially. The main issue seems to be that for the intrusions, the fast rate actually resulted in fewer outliers than the normal rate at the start of the trial. At the end of the trial this difference disappeared, indicating that the intrusions were building up more strongly during fast rate than during the normal rate, which is in line with the impact of rate on outliers described by the previous studies.

Several factors may have contributed to the divergent results regarding speaking rate. First of all, in contrast to the earlier studies, in which the rate condition was counterbalanced (Goldstein et al., 2007; MacKay, 1971), the current study always offered the normal rate first, followed by the fast rate condition. This may have introduced a practice effect (see e.g., Dell, Burger, &

Svec, 1997; Kelso & Zanone, 2002; Namasivayam & Van Lieshout, 2008), transferring skills

132 acquired in the normal speaking rate condition to the fast rate condition. Secondly, the current study determined the rate of speech for each speaker individually. As such, the task demands were more aligned with each individual's speech coordination skills. However, whereas the fast rate involved significantly fewer intrusions at the start of a trial than trials produced at a normal rate, the intrusions built up relatively faster towards the end of the trial. These findings seem to confirm the claims from a dynamical systems perspective that, especially at fast speaking rates, it is more difficult to maintain a higher frequency ratio (Goldstein et al., 2007; Haken et al.,

1985; Peper & Beek, 1998; Peper et al., 1995a). Thus it still can be concluded that speaking rate influences intrusion and reduction ratios. Taking everything into account, the current study succeeded in replicating the basic findings of the earlier reported studies on intrusions and reductions (Goldstein et al., 2007; Pouplier, 2008).

4) Whereas the type of articulator in coda position did not influence outlier ratios, strong evidence was found for a vowel-related bias. However, this bias was shown only for intrusions, not reductions. In the context of the high back vowel /u/, fewer intrusions were measured than in the context of /æ/ and /ɑ/. In addition, in the vicinity of the two front vowels /ɪ/ and /æ/, the tongue dorsum intruded significantly more than in the vicinity of the vowel /u/. Moreover, this articulator intruded more than the tongue tip and the lower lip in these two front vowel contexts.

The more dominant role for the tongue dorsum in intrusion patterns compared to the tongue tip found by Pouplier (2008) was confirmed for the two front vowels in our study. These findings, however, differ from the vowel effect reported by Goldstein et al. (2007), in which the high front vowel /ɪ/ triggered more outliers. Moreover, whereas in the study by Goldstein et al.

(2007) both the tongue dorsum and tongue tip intruded more often in the context of the high vowel /ɪ/, the current study showed this effect only for the tongue dorsum in the context of the front vowels /ɪ/ and /æ/. In a corpus of naturally occurring errors, Stemberger (1991) observed a

133 bias for /k/ to replace /t/, and in a subsequent study found a bias for the voiceless bilabial /p/ to be replaced by the voiceless velar /k/ more often than vice versa. Later studies have shown that the bias for /k/ to replace /t/ was likely perceptual (Pouplier & Goldstein, 2005) and resulted from the fact that an intruding tongue dorsum affected the acoustic spectrum more than an intruding tongue tip (Marin, Pouplier, & Harrington, 2010). The current study revealed, however, that the bias for the tongue dorsum to intrude more frequently than the tongue tip in certain contexts may have an underlying articulatory basis as well.

5) The larger the relative difference between target and non-target constrictions within a word pair, the more intrusions were found. This relationship was especially strong for the tongue dorsum (R2 = 0.75). These findings have to be interpreted very cautiously, as they are based on very few observations. However, it contradicts the assumption that a high position of the tongue results in more intrusions (Goldstein et al., 2007). For the alternating onset task used in the current study there is a strong tendency for lingual articulations to clearly differentiate between a linguistic constriction-target versus suppressing activation for a given articulator when it is not forming part of a gestural constriction task. This may suggest that, despite the required high absolute position of the tongue dorsum for high vowels (Koenig, 2004), speakers may optimize the remaining constriction space to create a clear functional distinction between target and non- target tongue dorsum constrictions. This could be interpreted as a specific speech strategy tailored towards the dual use of tongue body in both consonant and vowel articulations within a given short time frame. In line with this idea, it is suggested that when such an optimal use of constriction space is available, speakers can allow for more intrusions of the tongue dorsum.

An alternative explanation has to be considered for the relation between larger constriction range differences and the increased ratio of intrusions of the tongue dorsum. It can be argued

134 that speakers compensate for more intrusions of the tongue dorsum in a front vowel context because of the possibility of affecting not only consonant but also vowel productions when the tongue dorsum position is changed. A speaker may try to adjust his/her starting position during non-target constrictions to prevent intrusions from becoming audible and to prevent the quality of the following vowel. The use of such perceptually driven strategies might prevent intrusions from being perceived as errors. As mentioned earlier, tongue dorsum intrusions affect the identification of speech segments to a greater extent than tongue tip intrusions (Pouplier &

Goldstein, 2005). This parallels recent findings showing that tongue dorsum intrusions also affect the acoustic spectrum to a greater extent than tongue tip intrusions (Marin, Pouplier &

Harrington, 2010). Although the suggested explanations regarding the relation between larger range differences between target and non-target constrictions and ratio of intrusions differ, they all point towards a specific speaker strategy in using vocal tract space for serving different linguistic goals. Whether indeed such strategies exist has to be examined more closely in the future.

The findings can largely be explained within the framework of AP. The increasing number of intrusions and reductions has been addressed in previous publications as resulting from the tendency to stabilize gestural coupling, especially at fast rates (Pouplier & Goldstein, 2013;

Goldstein et al., 2007). The finding that in a low vowel context more intrusions were found than in a high vowel context, suggests a possible role for the jaw. The jaw is an articulator which controls vowel height (Keating, Lindblom, Lubker, & Kreiman, 1994). In a low vowel context, the jaw is thus in a lower position. It is possible that when the jaw is entraining, this articulator moves to a higher position especially in a low vowel context. This will affect the tongue tip and tongue dorsum as well as the lower lip articulators. The bias for the tongue dorsum to intrude more than the tongue tip and lower lip can be explained within this framework as well. As the

135 tongue dorsum has a dual role in being the primary articulator for vowel gestures as well as for dorsal consonant gestures (e.g., Recasens & Espinosa, 2009), and as the onset consonant and the vowel are produced in phase with each other in onset position (Goldstein, et al., 2006), it seems likely that intrusions by this articulator are affected the most by surrounding vowels. During a target labial or tongue tip constriction (and consequently during a higher jaw position), the tongue dorsum should not be activated. However, the following vowel requires activation of the tongue dorsum, resulting in contradictory requirements for the tongue dorsum. Studies in limb control have shown that activating muscles of a (motor-driven) limb resulted in more stable movements of this limb, than keeping this limb relaxed (Ridderikhof et al., 2007). Thus, when an articulator is supposed to be activated for a following vowel, this articulator does get an extra impulse to be activated for an intrusion as well, when it is exposed to the coupling forces to stabilize the coordination pattern. When efferent information coincides with afferent information from the activated tongue dorsum12, these two sources match and are not corrected.

Why the tongue dorsum intruded more frequently in front vowel context than the other articulators is harder to explain from an AP perspective. Physical properties of the individual articulators such as the jaw have to be considered in this case. Tongue dorsum intrusions occur in p#t and t#p contexts in words such as cop#top and cot#pot. In general, studies have shown that coronals are characterized by a high jaw position, and labial constrictions are realized with a lower jaw position (see Fletcher & Harrington, 1999; Keating, et al., 1994; Mooshammer,

Hoole, & Geumann, 2007). Jaw height, however, is affected by many factors, including context and supra-segmental influences (Fletcher & Harrington, 1999). Within the cluster t#p, the final

12 Efferent information referes to information sent to the muscles and afferent information is information from the muscles back to the central nervous system.

136

/t/ frequently assimilates to /p/ (Browman & Goldstein, 1991), keeping the jaw in a high position during /p/. Thus both p#t and t#p are most likely realized with a relatively high jaw position. To prevent intrusions, the tongue dorsum has to be able to assume a low position.

Requiring a high jaw for the consonant constrictions /t/ and /p/ limits the space for an advanced tongue dorsum in a front vowel context. In addition, the active constriction of the tongue tip further limits movements of the tongue dorsum. Especially in front vowel context the motion of the tongue dorsum is linited (Iskarous et al., 2010). This tongue dorsum is further actively supporting the tongue tip in forming an alveolar constriction, making it more co-articulatorily resistant and thus more aggressive (Iskarous, et al., 2010). Adding this constraining factor to the positive reinforcement in activating the following vowel may thus result in more intrusions of this articulator in front vowel context.

As indicated in the introduction, other models have little to say about the phenomenon of intrusions or reductions. The only exception (apart from AP) is the cascading model (McMillan

& Corley, 2010; Pouplier & Goldstein, 2010). This model can predict the gradual nature of speech articulations as originating from competing activated phonological representations.

However, the vowel related asymmetries can not be explained within this framework, because it does not predict how context affects these translated phonological representations at the level of articulatory dynamics (see also Pouplier & Goldstein, 2010). Thus, at this point the AP model seems the most fruitful approach in explaining the origin and function of intrusions and reductions in the context of reiterated speech samples.

3.6 Conclusions

The present data not only confirm previous findings from studies on intrusions and reductions but most importantly, reveal that the ratio of intrusions in particular may be influenced by

137 phonetic context. This supports the concept that the coupling strength between effectors is influenced by properties of articulators, as found in studies on limb control and speech (Van

Lieshout & Neufeld, 2014). Speakers may employ strategies to compensate for these constraints by enlarging the contrast between target and non-target constrictions. The lack of context sensitivity for reductions means that target constrictions for stops are less affected by these constraints. A follow-up study needs to determine whether perceptually driven strategies may influence the occurrence of intrusions in particular and such a study is currently underway in our lab (Slis & Van Lieshout, in preparation).

3.7 Acknowledgements

The study was supported by a Social Sciences and Humanities Research Council (SSHRC) grant and partly, by funding from the Canada Research Chairs program both awarded to the second author. The authors would like to thank Mark Noseworthy for the voice recordings and Aravind

Namasivayam for his technical support during the EMA sessions, Radu Craioveanu for his help analyzing the data, and Jeffrey Steele and Keren Rice for their valuable comments on earlier versions.

138

3.8 Appendix

Non-target articulator

Vowel TD LL TT

coP Top poCK Tock toP Cop /ɑ/ coT Pot poT Cot toCK Pock

kiP Tip piCK Tick tiP Kip /ɪ/ kiT Pit piT Kit tiCK Pick

cooP Toop pooK Took tooP Coop /u/ cooT Poot pooT Coot tooK Pook

caP Tap paCK Tack taP Cap /æ/ caT Pat paT Cat taCK Pack

Stimuli used for the intrusions and reduction study. The coda#onset condition is indicated in bold uppercase letters. Each column represents the word pairs in which the non-target articulator can form an intrusion during the second word.

3.9 References

Bose, A., Van Lieshout, P. H. H. M., & Square, P.A. (2007). Word frequency and bigram

frequency effects on linguistic processing and speech motor performance in individuals

with and normal speakers. Journal of Neurolinguistics, 20, 65–88

Boucher, V. J. (1994). Alphabet-related biases in psycholinguistic enquiries: Considerations for

direct theories of speech production and perception. Journal of Phonetics, 22, 1-18.

Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units.

Phonology, 6, 201-251.

Browman, C. P., & Goldstein, L. (1991). Tiers in articulatory phonology, with some

implications for casual speech. In J. Kinston and M. E. E Beckman (Eds.), Papers in

Laboratory Phonology I: Between the and the Physics of Speech (pp. 341-

376). Cambridge, U.K.: Cambridge University Press.

139

Browman, C. P., & Goldstein, L. (1995). Dynamics and Articulatory phonology. In R. F. port &

T. Van Gelder (Eds.), Mind as Motion (pp. 176-193). The MIT Press.

Chau, T., Young, S., & Redekop, S. (2005). Managing variability in the summary and

comparison of gait data. Journal of Neuroengineering and Rehabilitation, 2.

Corley, M., Brocklehurst, P., & Moat, H. (2011). Error biases in inner and overt speech:

Evidence from tongue twisters. Journal of Experimental Psychology: Learning, Memory

and Cognition, 37(1), 162-175.

Cutler, A. (1981). The reliability of speech error data. Linguistics, 19, 561-582.

De Jong, K. J. (1996). Labiovelar compensation in back vowels. Journal of the Acoustical

Society of America, 101(4), 2221-2233

Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production.

Psychological Review, 93(3), 283-321.

Dell, G. S., Burger, L. K., & Svec, W. R. (1997). Language production and serial order: A

functional analysis and a model. Psychological Review, 104(1), 123-147.

Dell, G. S., & Sullivan, J. M. (2004). Speech errors and language production:

neuropsychological and connectionist perspectives. The Psychology of Learning and

Motivation, 44, 63-108.

Fletcher, J., & Harrington, J. (1999). Lip and jaw coarticulation. In W. J. Hardcastle & N.

Hewlett (Eds.), Coarticulation (pp. 164-178). Cambridge: University press.

Fowler, C. A., & Saltzman, E. L. (1993). Coordination and coarticulation in speech production.

140

Language and Speech, 36, 171-195.

Frisch, S. A. (2007). Walking the Tightrope between Cognition and Articulation: The State of

the Art in the Phonetics of Speech Errors. In C. T. Schutze and V. S. Ferreira (Eds), MIT

Working Papers in Linguistics, Vol. 53. The State of the Art in Speech Error Research

(pp.155-171). Cambridge, MA.

Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic

analysis of slips of the tongue. Journal of Phonetics, 30, 139-162.

Fromkin, V. A. (1971). The non-anomalous nature of anomalous utterances. Language: Journal

of the Linguistic Society of America, 47, 27-52.

Goffman, L., Gerken, L., & Lucchesci, J. (2007). Relations between segmental and motor

variability in prosodically complex nonword sequences. Journal of Speech, Language,

and Hearing Research, 50, 444-458.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning to

articulatory processes: Evidence from tongue twisters. Language & Cognitive Processes,

21, 649-683.

Goldrick, M., Ross Baker, H., Murphy, A., & Baese-Berk, M. (2011). Interaction and

representational integration: Evidence from speech errors. Cognition, 121, 58-72

Goldstein, L., Byrd, D., & Saltzman, E. L. (2006). The Role of Vocal tract gestural action units

in understanding the evolution of phonology. In Michael Arbib (Eds.), From Action to

Language: The Mirror Neuron System (pp. 215-249). Cambridge university press.

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E. L., & Byrd, D. (2007). Dynamic action units

141

slip in speech production errors. Cognition, 103, 386-412.

Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical model of phase transitions in

human hand movements. Biological Cybernetics, 51, 347-356.

Hoole, P., & Künhert, B. (1995). Patterns of lingual variability in German vowel production.

Proceedings of the XIIIth International Congress of Phonetic Sciences, Vol. 2 (pp. 442-

445), Stockholm.

Iskarous, K., Fowler, C. A., & Whalen, D. H. (2010). Locus equations are an acoustic

expression of articulator synergy. Journal of the Acoustical Society of America, 128(4),

2021-2032.

Keating, P. A., Lindblom, B., Lubker, J., & Kreiman, J. (1994). Variability in jaw height for

segments in English and Swedish VCVs. Journal of Phonetics, 22, 407-422.

Kelso, J. A. S. (1995). Dynamic Patterns, Cambridge, MA: MIT press

Kelso, J. A. S., & Zanone, P. G. (2002). Coordination dynamics of learning and transfer across

different effector systems. Journal of Experimental Psychology: Human Perception and

Performance, 28(4), 776-797.

Kent, R. D. (1996). Hearing and believing: Some limits to the auditory-perceptual assessment of

speech and voice disorders. American Journal of Speech-Language Pathology, 5, 7-23.

Koenig, L. (2004). Towards a physical definition of the vowel systems of languages. In Victor

H. Yngve & Zdzislaw Wasik (Eds.), Hard-Science Linguistics (pp. 49-66). Continuum.

Laver, J. D. M. (1980). Slips of the tongue as neuromuscular evidence for a model of speech

142

production. In Hans W. Dechert & Manfred Raupach (Eds.), Temporal Variables in

Speech. Studies in Honour of Frieda Goldman-Eisler (pp. 21-26). The Hague: Mouton.

Levelt, W. J. M. (1989). Speaking. Cambridge, MA: MIT Press.

Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech

production. Behavioral and Brain Sciences, 22, 1-75.

MacKay, D. G. (1971). Stress Pre-Entry in Motor Systems. The American Journal for

Psychology 84(1), 35-51.

Marin, S., Pouplier, M., & Harrington, J. (2010). Acoustic consequences of articulatory

variability during productions of /t/ and /k/ and its implications for speech error research.

Journal of the Acoustical Society of America, 127(1), 445-461.

McMillan, C. T., & Corley, M. (2010). Cascading influences on the production of speech:

Evidence from articulation. Cognition, 117, 243-260.

McMillan, C. T., Corley, M., & Lickley, R. J. (2009). Articulatory evidence for feedback and

competition in speech production. Language & Cognitive Processes, 24, 44-66.

Meyer, A. S. (1992). Investigation of phonological encoding through speech error analyses:

Achievements, limitations, and alternatives. Cognition, 42, 181-211.

Monin, B., & Oppenheimer, D. M. (2005). Correlated averages vs. averaged correlations:

demonstrating the warm glo heuristic beyond aggregation. Social Cognition, 23, 257-

278.

Mooshammer, C., & Geng, C. (2008). Acoustic and articulatory manifestations of vowel

143

reduction in German. Journal of the International Phonetic Association, 38(2), 117-136.

Mooshammer, C., Goldstein, L., Nam, H., McClure, S., Saltzman, E. L., & Tiede, M. (2012).

Bridging planning and execution: Temporal planning of syllables. Journal of Phonetics,

40, 374-389.

Mooshammer, C., Hoole, P., & Geumann, A. (2007). Jaw and order. Language and Speech, 50,

145-176.

Mowrey, R. A., & MacKay, I. R. A. (1990). Phonological primitives: Electromyographic speech

error evidence. Journal of the Acoustical Society of America, 88, 1299-1312.

Namasivayam, A. K., & Van Lieshout, P. H. H. M. (2008). Investigating speech motor practice

and learning in people who stutter. Journal of Fluency Disorders, 33, 32-51.

Namasivayam, A. K., Van Lieshout, P. H. H. M., McIlroy, W. E., & de Nil, L. (2009). Sensory

feedback dependence hypothesis in persons who stutter. Human Movement Science, 28,

688-707.

Nooteboom, S., & Quene, H. (2007). Strategies for Editing out Speech Errors in Inner Speech.

Proceedings of the XVI International Conference on Spoken Language Processing (pp.

1945-1948), Saarbrücken, Germany.

Ostry, D. J., Cooke, J. D., & Munhall, K. G. (1987). Velocity curves of human arm and speech

movements. Experimental Brain research, 68, 37-46.

Peper, C. E., & Beek, P. J. (1998). Distinguishing between the effects of frequnecy and amplitue

on interlimb coupling in tapping a 2:3 polyrhythm. Experimental Brain Research, 118,

78-92.

144

Peper, C. E., & Beek, P. J. (1999). Modeling rhythmic interlimb coordination: The roles of

movement amplitude and time delays. Human Movement Science, 18, 263-280.

Peper, C. E., Beek, P. J., & Stegeman, D. F. (1995). Dynamical models of movement

coordination. Human Movement Science, 14, 573-608.

Peper, C. E., Beek, P. J., & van Wieringen, P. C. W. (1995a). Multifrequency coordination in

bimanual tapping: Asymmetrical coupling and signs of supercriticality. Journal of

Experimental Psychology, 21, 1117-1138.

Peper, C. E., Beek, P. J., & van Wieringen, P. C. W. (1995b). Coupling strength in tapping a 2:3

polyrhythm. Human Movement Science, 14, 217-245.

Perkell, J., Matthies, M., Svirskya, M. A., & Jordan, M. I. (1993). Trading relations between

tongue-body raising and lip rounding in production of the vowel/u/: A pilot "motor

equivalence" study. Journal of the Acoustical Society of America, 3(5), 2948-2961.

Pouplier, M. (2008). The role of a coda consonant as error trigger in repetition tasks. Journal of

Phonetics, 36, 114-140.

Pouplier, M., & Goldstein, L. (2005) Asymmetries in the perception of speech production

errors. Journal of Phonetics, 33, 47-75

Pouplier, M., & Goldstein, L. (2010). Intention in articulation: Articulatory timing in alternating

consonant sequences and its implications for models of speech production. Language

and Cognitive Processes, 25(5), 616-649.

Pouplier, M., & Goldstein, L. (2013). The relationship between planning and execution is more

than duration: response to Goldrick & Chu. Language and Cognitive Processes,

145

Advance online publication. DOI:10.1080/01690965.2013.834063

Pouplier, M., & Hardcastle, W. (2005). A re-evaluation of the nature of speech errors in normal

and disordered speakers. Phonetica, 62, 227-243.

Recasens, D. (1999). Lingual coarticulation. In W. Hardcastle & N. Hewlett (Eds.),

Coarticulation (pp. 80-104). Cambridge: University press.

Recasens, D., & Espinosa, A. (2009). An articulatory investigation of lingual coarticulatory

resistance and aggressiveness for consonants and vowels in Catalan. Journal of the

Acoustical Society of America, 125(4), 2288-2298.

Recasens, D., Pallares, M. D., & Solanas, A. (1993). An electropalatographic study of stop

consonants. Speech Communication, 12, 335-355.

Repp, B. H., & Penel, A. (2002). Auditory dominance in temporal processing: New evidence

from synchronization with simultaneous visual and auditory sequences. Journal of

Experimental Psychology: Human Perception and Performance, 28, 1085-1099.

Ridderikhoff, A., Peper, C. E., & Beek, P. J. (2007). Error correction in bimanual coordination

benefits from bilateral muscle activity: evidence from kinesthetic tracking. Experimental

Brain Research, 181, 31-48.

Saltzman, E. L. (1995). Dynamics and coordinate systems in skilled sensorimotor activity. In R.

Port, & T. van Gelder, (Eds.). Mind as motion: Dynamics, behavior, and cognition.

(pp.149-173). Cambridge, MA: MIT Press.

Saltzman, E. L., & Byrd, D. (2000). Task-dynamics of gestural timing: Phase windows and

multifrequency rhythms. Human Movement Science, 19, 499-526.

146

Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to gestural patterning in

speech production. Ecological Psychology, 1(4), 333-382.

Saltzman, E. L., Nam, Goldstein, L., & Byrd, D. (2006). The distinctions between state,

parameter and graph dynamics in sensorimotor control and coordination. In A.

G.Feldman (Ed.), Progress in Motor Control: Motor Control and Learning over the

Lifespan (pp. 63-73). New York: Springer Publisher Springer.

Shattuck-Hufnagel, S. (1979). Speech errors as evidence for a serial-order mechanism in

sentence production. In W. E. Cooper & E. C. T. Walker (Eds), Sentence processing:

Psycholinguistic studies presented to Merrill Garrett (pp. 295-342). Hillsdale, NJ:

Erlbaum.

Slis, A. W., & Van Lieshout, P. H. H. M. (2013). The Effect of phonetic Context on Speech

Movements in repetitive Speech. Journal of the Acoustical Society of America, 134(6),

4496-4507.

Smith, A., Goffman, L., Zelaznik, H. N., Yin, G., & MacGillem, C. (1995). Spatiotemporal

stability and patterning of speech movement sequences. Experimental Brain Research,

104, 493-501.

Stemberger, J. P. (1991). Apparent anti-frequency effects in language production: The addition

bias and phonological underspecification. Journal of Memory and Language, 30, 161-

185.

Tilsen, S., & Goldstein, L. (2012). Articulatory gestures are individually selected in production.

Journal of Phonetics, 40, 764-779.

147

Van Lieshout, P. H. H. M., Hijl, M., & Hulstijn, W. (1999). Flexibility and stability in bilabial

gestures: 2) Evidence from continuous syllable production. In. J. J. Ohala, Y. Hasegawa,

M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings XIVth International

Congress of Phonetic Sciences: Vol. 1 (pp. 45-48). American Institute of Physics.

Van Lieshout, P. H. H. M., Bose, A., Square, P. A., & Steele, C.M. (2007). Speech motor

control in fluent and dysfluent speech production of an individual with apraxia of speech

and Broca’s aphasia. Clinical Linguistics & Phonetics, 21(3), 159-188.

Yunusova, Y., Green, J., & Mefferd, A. (2009). Accuracy Assessment for AG500,

Electromagnetic Articulograph. Journal of Speech, Language, and Hearing Research,

52, 547-555.

Zharkova, N., & Hewlett, N. (2009). Measuring lingual coarticulation from midsagittal tongue

contours: description and example calculations using English /t/ and /a/. Journal of

Phonetics, 37, 248-256.

4 Chapter 4

148

149

The Role of auditory Information in Gestural Intrusions and Reductions

Anneke W. Slis, and Pascal H.H.M. van Lieshout

Department of Speech Language Pathology, Oral Dynamics Lab, 160-500 University Avenue,

University of Toronto, Toronto, Ontario, M5G 1V7, Canada

150

4.1 Abstract

The study investigates the role of auditory information on the stability of articulatory coordination. Gestural intrusions and reductions arise from a general autonomous, self- regulating mechanism of coordination dynamics (e.g., Goldstein, Pouplier, Chen, Saltzman &

Byrd, 2007; Pouplier, 2008; Slis & Van Lieshout, under revision). The question addressed in the current study is whether coordination dynamics in speech production is affected by auditory information in such a way that this auditory information influences the occurrence of intrusion and reduction errors. Fifteen monolingual speakers of Canadian English between 19 and 35 years of age produced CVC-CVC word pairs with alternating onset consonants and identical rhymes in a repetitive speech task. The stimuli consisted of the word pairs cop top, kip tip, pick tick, pock tock, pot cot, and pit kit. Two different speaking rates (normal and fast), and two masking conditions (masked and unmasked) were employed. For the purpose of this study, the maximum vertical displacements of the target movements of the tongue tip, tongue dorsum, and lower lip during the respective onset consonants /t/, /k/ and /p/ in the first and second word were retrieved from data collected with the EMA AG500 system. The position of a non-target articulator, i.e., the intruding articulator, was measured at the time when the target articulator was maximally constricted. Intrusions and reductions were defined as outliers from movement range distributions of individual articulators. The results showed that intrusions and reductions were building up during the course of the trail, that fast speaking rate resulted in more intrusions and that, during masked speech, the speaker made fewer intrusions than in unmasked speech. It is suggested that, when no auditory information was available, speakers paid closer attention to their articulatory movements and proprioceptive information provided a stabilizing role.

151

4.2 Introduction

To understand other people's speech, listeners rely for a substantial part on auditory information.

In addition, speakers depend to a large degree on auditory information from their own speech when (re-)learning to produce new speech sounds (Borden, 1979; Jones & Munhall, 2003; Lane,

Denny, Guenther, Matthies, Menard, & Perkell, et al., 2005). Speakers also use auditory information when maintaining the kinematic aspects of speech (Forrest, Abbas & Zimmermann,

1986), validating whether their speech is produced accurately, and correcting audible errors

(Corley, Brocklehurst & Moat, 2011; Dell, 1980; Postma, 2000; Postma & Kolk, 1992; Postma

& Noordanus, 1996). Despite its apparent importance for monitoring and correcting speech production, little is known about whether and how auditory information influences the occurrence of speech errors. The current study follows on a series of recent kinematic studies that have defined errors as gestural intrusions and reductions that arise from a general autonomous, self-regulating mechanism of coordination dynamics (e.g., Goldstein, Pouplier,

Chen, Saltzman & Byrd, 2007; Pouplier, 2008; Slis & Van Lieshout, under revision). The question addressed in the current study is whether coordination dynamics in speech production is affected by auditory information in such a way that this auditory information influences the occurrence of intrusion and reduction errors. Having a better insight into the role of auditory information on coordination dynamics will inform current models of speech production and perception.

4.2.1 Background

Traditionally, errors have been described at the abstract phonological level as transpositions of discrete units of speech, such as phonemes or features. An example of such an error is “a

Tanadian in....” instead of “a Canadian in Toronto” (Fromkin, 1971). Several of these

152 transcription-based studies have revealed that the number of phonemic speech errors are similar in noise masked and unmasked speech (Postma & Kolk, 1992; Postma, Kolk & Povel, 1991).

Moreover, the type of errors and location within tongue twister sentences are similar for masked and unmasked speech (Dell, 1980; Postma & Noordanus, 1996). This suggests that speakers do not employ auditory information to prevent errors. An interesting finding from a masking study exploring vowel errors by Lackner & Tuller (1979) was that when the second session involved masking speakers made fewer errors than when the masking condition was offered first. In this study, participants were instructed to produce 4 syllables per second for 30 seconds. The syllables consisted of a combination of consonants and vowels (CV), only vowels, or CV combined with an extra vowel. During one session the speakers could hear themselves, during the other session their speech was masked by noise. It was suggested by the authors that, when speakers familiarize themselves with both the material and possible errors, they attend to the finer details of speech when no auditory feedback is available than when auditory information is available, resulting in more accurate speech.

A shortcoming of the error studies cited above is the fact that perceptual transcription was employed as a tool to describe the errors. How auditory information affects the fine-grained details of speech movement coordination which can result in errors is not exactly known (Kent,

1996). Studies that have assessed errors using acoustic and kinematic measures have shown that the actual pattern of errors differs from what is reported in perception studies (Frisch & Wright,

2002; Goldrick & Blumstein, 2006; Goldstein, et al., 2007; McMillan & Corley, 2010; Mowrey

& MacKay, 1990). In these acoustic and kinematic studies, errors were frequently gradual in nature: articulatory movement activations which were not supposed to be activated varied on a scale from 0 (no error) to 100 % (full error) (Goldstein, et al., 2007; Pouplier, 2008). For example, in the word pair cop top, during the onset /k/ of the first word cop, an extra tongue tip

153 activation was frequently measured. This type of error was labeled an "intrusion", as a non- target articulator activation seemed to intrude and occur simultaneously with the correct dorsal target activation. In a similar manner, a reduced target activation was called a reduction. These intrusions and reductions have been explained from a gestural perspective, in which gestures constitute the smallest unit of speech (Goldstein, Byrd & Saltzman, 2006; Goldstein, et al.,

2007; Slis & Van Lieshout, under revision).

4.2.2 Articulatory Phonology (AP) and Task Dynamics

A gesture is a dynamically defined action unit of speech, conceptualized at the cognitive level as an abstract representation of a constriction task, as well as an actual action in the vocal tract at the production level. These action units are characterized by movements of specific sets of articulators controlled in a coordinated way in order to reach a linguistically specified goal

(Fowler & Saltzman, 1993; Goldstein, et al., 2006). Because gestures are specified as a set of articulators that constrict the vocal tract, they can overlap spatially and temporally. Intrusion errors can thus be explained as (partly) overlapping gestures, one intended and one unintended

(for a detailed explanation see Goldstein et al., 2007).

An interesting finding revealed in the kinematic studies (Goldstein, et al., 2007; Pouplier, 2008), and relevant for the current study, involves the way in which intrusions and reductions behave over time. In the studies that have employed repetitive speech, intrusions and reductions built up over the course of a single trial, especially at a higher speaking rate (Goldstein et al., 2007; Slis

& Van Lieshout, under revision). This phenomenon was explained as originating from the tendency of movements to entrain (Goldstein et al., 2007; Slis & Van Lieshout, under revision).

Entrainment is the process by which two independently moving oscillators tend to synchronize over time. Entrainment of movement is found in many oscillatory systems, including

154 movements of the limbs and articulators (Goldstein, et al., 2007; Kelso, 1995; Peper & Beek,

1998; Peper & Beek, 1999; Peper, Beek & Van Wieringen, 1995a and b; Turvey, 1990), and serves a stabilizing function. In these systems, lower modes including 1:1 are more stable than higher frequency modes such as 1:2 and other higher ratios. In the case of the former, lower mode, a stronger coupling exists between the two oscillators (Haken, Kelso, & Bunz, 1985;

Peper & Beek, 1998; Saltzman, 1993; Turvey, 1990). When coupling strength is reduced, by for example higher movement speed, the system is inclined to converge from a higher ratio (in this case, 1:2) to an intrinsically more stable (1:1) mode of coordination (Peper, et al., 1995 a and b).

In the model of Task Dynamics, gestures are seen as limit cycle oscillators and the behavior of these gestures can thus be described with the principles described above (Goldstein, et al.,

2006). In the case of intrusion errors during the onsets of cop top, a 1:2 relation exists between the gestures involved in the onsets /t/ of top and /k/ of cop versus those involved in the coda /p/.

Adding a tongue tip gesture to the tongue dorsum gesture during the onset consonant /k/ of the word cop stabilizes articulation by creating a 1:1 mode (two tongue tip and two lower lip constrictions).

The mechanisms observed in coordination dynamics are autonomous in the sense that they occur irrespective of the properties of the components, such as differences in articulators or muscles and available sensory information. Numerous studies have shown, however, that external factors can affect coupling strength (Beek, Peper & Stegeman, 1995; Fink, Foo, Jirsa &

Kelso, 2000; Van Lieshout, 2004). Potential factors that may interact with coordination dynamics include the presence of visual (Bogaerts, Buekers, Zaal, & Swinnen, 2003; Roerdink,

Peper & Beek, 2005) and auditory information (Lagarde & Kelso, 2006; Namasivayam, Van

Lieshout, McIlroy & de Nil, 2009; Repp, 2005). The studies on feedback have found that both visual and auditory information play a stabilizing role on coordination dynamics.

155

Besides having a stabilizing role in producing speech, some findings from a study by Slis and

Van Lieshout (under revision) point to a possible fine-tuning role for auditory information. This study set out to investigate the role of co-articulatory constraints on the stability of movements in word pairs with alternating onset consonants, such as cop top. Compared to the lower lip and tongue tip, the tongue dorsum articulator intruded more frequently in the context of a front vowel, i.e., during the onset of tip or tap compared to top or toop (Slis & Van Lieshout, under revision). Of specific interest for the current study was the finding that the number of intrusions could be predicted by the difference between the relative movement ranges of the target articulator during an onset consonant of one word and the relative movement ranges of the identical articulator in non-target position during the onset of the other word. For example, the larger the difference between the tongue dorsum movement range during the /k/ of cap and its value during the /t/ of the following word tap, the more tongue dorsum intrusions during the /t/.

This was particularly true for the tongue dorsum in front vowel context. The presence of larger movement range differences suggests that speakers attempt to exaggerate the difference between target and non-target articulations in those contexts/situations where intrusions are more likely to occur, perhaps in order to prevent the intrusions from surfacing as audible errors. In support of this idea, there is some evidence that intrusions affect the acoustic spectrum differently depending on the articulator involved. Pouplier & Goldstein (2005) showed that an intruding tongue dorsum during an intended coronal constriction affects the perceived speech to a greater extent than an intruding tongue tip during an intended dorsal constriction. An alternative explanation that takes the perceptual outcome into account is that speakers might have allowed for more intrusions in these environments based on the possible larger difference between target and non-target activations; it is suggested that this results in a smaller likelihood of these intrusions reaching a threshold at which the error is perceived. Both of these interpretations

156 suggest that speakers actively consider possible auditory consequences when fine-tuning their own speech output. This proposal is in line with studies that assume that auditory information acts as an external source to tune and reset on-going speech and to correct inconsistencies where necessary (Borden, 1979; Lackner & Tuller, 1979; Postma, 2000; Neilson & Neilson, 1987).

No auditory feedback loop has been specified in the model of Task Dynamics and AP that can account for possible effects on speech coordination. In AP, the speaker’s goal is to achieve gestural constrictions within the vocal tract which will structure the acoustic speech signal.

Listeners, in turn, retrieve the gestural information from the acoustic signal (e.g., Goldstein &

Fowler, 2003). Studies have shown that speakers adjust their articulations immediately after an articulator has been disturbed, a mechanism known as motor equivalence (Kelso, Tuller,

Vatikiotis-Bateson & Fowler, 1984; Saltzman, Löfqvist, Kay, Kinsella-Shaw & Rubin, 1998).

These corrections happen more quickly than would be thought possible if they solely relied on auditory information, and suggest a feedback loop from the articulator level to the gestural level

(Saltzman et al., 1998). Accordingly, it seems logical to assume that a feedback mechanism exists between the peripheral, articulatory events and intergestural dynamics (Saltzman et al.,

1998).

4.2.3 Current study

Despite evidence that supports an important role for auditory information as an external source of information in speech production (Lackner & Tuller, 1979; Namasivayam et al., 2009), to date it has not been tested whether speakers use their acoustic output to stabilize coordinative structures in speech to prevent intrusions or reductions. Additionally, research has not explored whether speakers adjust their speech production in the presence of audible intrusions. Thus, the possible interaction between auditory information and coordination dynamics needs to be

157 investigated further to better understand the interplay between these factors and the findings in

Slis & Van Lieshout (under revision).

Two hypotheses are formulated:

1. Based on the findings from studies that show a stabilizing role for visual and

auditory information (Bogaerts, et al., 2003; Lagarde & Kelso, 2006; Namasivayam, et

al., 2009; Repp, 2005), our first hypothesis is that the presence of auditory information

strengthens the coupling in speech movement coordination. Consequently, it is predicted

that the 1:2 mode of coordination is more easily maintained when auditory information

is available than when not, especially at higher speaking rates. In line with this, lack of

auditory information should lead to reduced coupling strength and, consequently, more

intrusions and reductions will arise to stabilize the system by forcing it to switch to a

simpler coupling ratio (1:1).

2. Based on the study by Slis and Van Lieshout (under revision), our second

hypothesis is that auditory information can serve a correcting or fine-tuning function,

causing the speaker to adjust the movements within the vocal tract in such a way that the

perceptual outcome of the intended speech string is not affected to a noticeable degree. It

is predicted that, without auditory information, the observed relation between intrusions

and adjusted vocal tract size, i.e., the difference between target and non-target articulator

positions, disappears.

158

4.3 Methods

4.3.1 Participants

Fifteen monolingual speakers of Canadian English between 19 and 35 years of age participated in the study. A speaker was considered monolingual if no other language than Canadian English was spoken at home during childhood and the main language of schooling was Canadian

English. Two speakers did not meet these strict criteria. One participant had previously lived in the and had a slight British accent; the other had lived in the .

However, both had lived in Toronto for several years and it was decided to include these speakers in the analysis and to monitor the data from these participants carefully for divergent patterns, which were not found. Data from one participant were disregarded because the tongue dorsum coil fell off during the second part of the session. As such, data from fourteen participants were included in the analysis.

To be included in this study, the participant had to have no reported history of speaking or language difficulties, and normal vision (after correction) and hearing. Hearing was assessed by a questionnaire addressing hearing related problems, completed before the study. All speakers gave written consent and were compensated for their participation. The study was approved by the Health Science Research Ethics Board of the University of Toronto.

4.3.2 Stimuli

In order to elicit intrusions and reductions, participants were asked to repeat CVC-CVC word pairs with alternating onset consonants and identical rhymes, such as cop top, 15 times consecutively. Two different vowels – the high front vowel /ɪ/, as in kip, and the low back vowel

159

/ɑ/, as in cop – and 3 different consonants were used. Only two vowels were used to limit the number of trials. The combinations of onset consonants included /k/-/t/, /p/-/t/, and /k/-/p/, resulting in the word pairs cop top, kip tip, pick tick, pock tock, pot cot, and pit kit. As previous studies have shown that word order does not affect intrusion and reduction patterns (Goldstein et al., 2007; Slis & Van Lieshout, under revision), only the above mentioned order of word pairs was included.

4.3.3 Procedure

Two different speaking rates were employed: normal and fast. Both the normal and fast speaking rates were determined individually for each participant. The normal rate was based on the participant’s preferred rate. Speakers were instructed to repeat the word pair mik nik at a comfortable speaking rate (see also Slis & Van Lieshout, 2013). This was their preferred rate.

The fast rate was based on 90% of the highest possible speaking rate the participant was able to produce. In order to have control over rate variation, speaking rate was guided by a visual metronome.

To investigate the role of auditory information on the production of intrusions and reductions during repetitive speech, two conditions were employed: a first condition in which participants could perceive their speech (the unmasked condition) and a second condition in which noise masked the speech to such an extent that the speakers did not perceive their own speech (the masked condition). To this end, depending on the individual’s threshold for speech masking, calibrated white noise with a sound pressure level (SPL) level between 90 and 100 DB was presented to the participants via specially designed acoustic tubing with earplugs

(Namasivayam, et al., 2009).

160

In addition, for a different study a switching paradigm was employed, in which the participants were instructed to switch the order of the two words at a certain point during the trial (top cop

→ cop top) or alternatively, replace the vowel with another vowel (i.e., cop top → kip tip) (see also Namasivayam, et al., 2009). The switching paradigm always resulted in one of the above listed target word pairs and was applied to all six pairs.

The complete session, including the switching trials, consisted of 72 trials (18 trials * 2 auditory feedback conditions * 2 rate conditions). 7 practice trials were presented prior to the normal and fast conditions, adding 14 extra trials, resulting in a total of 86 trials for each participant

(approximately 30 minutes in total). Two lists were constructed consisting of 9 trials each and the trials within each list were randomized. The auditory feedback conditions were blocked and always alternated, so that participants completed a list without noise, followed by a list completed with noise, or the other way around. The masking conditions were counterbalanced.

The normal speaking rate condition was always run first, followed by the fast condition.

The word pairs appeared in the center of a 19-inch monitor, located approximately one-meter in front of the participant. The first word of a pair was always shown in upper-case letters to indicate it had to be produced with primary stress, whereas the second word was shown in lower-case letters; all letters were in 100-point font (approximately 3.35 cm per letter). At the start of a trial, the prerecorded word pair was presented twice over loudspeakers (Slis & Van

Lieshout, 2013). Next, eight visual metronome beats were presented to prepare the participant for the trial. Visual metronome beats appeared as blinking red dots on the monitor just above the word pair. At the same time, numbers counting down from 8 to 1 were presented visually, after which the red dots turned green. At that moment, the actual trial started and the participant began repeating the word pairs aloud so that a single beat of the metronome covered both words

161 in the pair. At the moment the actual trial started, the consonants in the words disappeared from the screen so that only the vowel remained visible. This was done so that the orthography would not reinforce the production of the correct target words.

In the masked condition, noise was added after three repetitions of a word pair. To control for the fact that speakers tend to speak louder when deprived of auditory information (Huber &

Chandrasekaran, 2006), the speakers controlled the loudness of their voice by monitoring a LED display, connected to a SPL meter attached to the top of the monitor. The speakers were instructed to stay within a specific range, indicated by a green colored array of LED’s.

Participants were encouraged to finish the trial on a single breath.

4.3.4 Instrumentation

Kinematic and acoustic data were collected with the EMA AG500 system, which allows for 3D recordings of articulatory movements inside and outside the vocal tract (Goldstein et al., 2007;

Kroos, 2012; Slis & Van Lieshout, 2013; Yunusova, Green & Mefferd, 2009; Zierdt, Hoole &

Tillman, 1999). In order to record articulatory movements, coils were attached on specific articulators using surgical glue (Periacryl Blue; Gluestitch). For the purpose of this study, transducer coils were placed on the mid-sagittal vermilion border of the upper and lower lip, the tongue tip (1 cm from the apex), the tongue body (3 cm from the tongue tip), and the tongue dorsum, varying between 1 and 2 cm posterior to the tongue body coil depending on how well the participant tolerated this. To record the movements of the jaw, a thermo-plastic mold was constructed that was placed over the lower incisor teeth. A coil was glued to this mold to track lower jaw motion. Furthermore, coils were placed on the participant’s head and nose for reference purposes (Van Lieshout, Merrick & Goldstein, 2008; Van Lieshout & Moussa, 2000).

Only data from the tongue tip, tongue dorsum, and lower lip were analyzed for the current

162 study. In order to adjust to the coils, participants read the “The Rainbow Passage”, a short paragraph that contains all the phonemes of American English.

Before the actual session started, positional information was retrieved in order to create a standard reference frame, which was used to remap raw position data of individual articulators be able to compare data across participants (Westbury, 1994). This was done with a 3D bubble level that had to be placed exactly parallel to the horizontal axis of the EMA system. The bubble level was attached to a device, which the participant held in his/her mouth for a couple of seconds. The raw movement signals (sampled at 200 Hz) and 3D positions over time were calculated from the amplitude recordings (Yunusova, et al., 2009). This information was stored in the computer together with the simultaneously recorded acoustic signal sampled at 16 KHz.

Additionally, a Marantz digital recorder (type PMD670) recorded the speech signal at 48 KHz for future acoustic analysis.

4.3.5 Data Processing

The raw movement data were processed according to a standardized protocol (Henriques & Van

Lieshout, 2013; Van Lieshout, et al., 2008; Van Lieshout & Moussa, 2000). Using a 7th-order

Hamming windowed Butterworth filter (0.5 Hz and 6.0 Hz as the low and high cut-off points respectively), individual movement signals of articulators were band pass filtered to remove DC drift and high frequency noise while preserving the relevant frequency components of the movement data (Namasivayam, et al., 2009; Slis & Van Lieshout, 2013). Next, measures of movement range were normalized to remove inter-speaker variations related to differences in vocal tract size. To this end, movement data were scaled relative to the values of the highest and lowest articulator positions on a trial-by-trial basis, separately for each articulator. Movement range values were expressed as percentages of the largest absolute movement range within a

163 trial. This way, data could be compared across trials and participants (Goffman, Gerken &

Lucchesci, 2007; Namasivayam, et al., 2009; Ostry, Cooke & Munhall, 1987; Slis & Van

Lieshout, under revision; Smith, Goffman, Zelaznik, Ying & MacGillem, 1995).

In order to select the onsets of the individual words forming the pairs, similar procedures were applied as described in previous studies (Slis & Van Lieshout, 2013). The boundaries of word onsets were defined by two minima, namely the start and end of the movement cycle for the coda consonant. For example, for the word pair cop top, the second word onset (i.e., /t/ in top) was defined by the two minima of the movement cycle associated with the preceding lower lip constriction for /p/ in cop. The second lower lip constriction minimum also indicated the start of the next segment. Each segment contained trajectory information about the relevant articulators when specific onset and coda consonants were produced. For the purpose of this study, the maxima and minima of the target movements of the tongue tip, tongue dorsum, and lower lip during the respective onset consonants /t/, /k/ and /p/ in the first and second word were retrieved.

Based on the minimum and maximum relative amplitude values of a segment, the relative movement range of a specific articulator was calculated. The value of the non-target articulator maximum was measured at the point in time when the target articulator was maximally constricted; thus, the movement range of the non-target articulator was determined by this value and the minimum of the non-target within a segment.

Given that the first and last repetitions of a trial may behave somewhat differently compared to the rest of the trial, these were always disregarded (Slis & Van Lieshout, 2013). The remaining repetitions of a trial were included in the analysis when they were produced according to criteria, as outlined in Slis & Van Lieshout (2013). If a participant stumbled during most of the trial, this trial was disregarded. However, if the participant managed to repeat several repetitions

164 that could be segmented into two words, the trial was included. When the participant added a word, e.g., cop top cop top top cop top, this extra word was disregarded.

4.3.6 Analysis

Intrusions and reductions were calculated employing the same procedures used in the previous study on intrusions and reductions (Slis & Van Lieshout, under revision). An intrusion was defined as a statistical outlier from a distribution of relative movement ranges for non-target articulator movements. Likewise, a reduction was defined as a statistical outlier from a distribution of relative movement ranges of target articulator movements. In order to determine these outliers, two median values were calculated: one based on the movement ranges of successive target maxima of a specific articulator within a trial, the other based on movement ranges of the non-target movement range values. Values, two or more Median Absolute

Deviations (MADs) below the median target value were considered reductions; values, two or more MADs above the median non-target value were considered intrusions. MAD was calculated as follows (Chau, Young & Redekop, 2005): MAD(X) = med (|X –med(X)|), where

Med(X) is the median of the sample.

To evaluate the prediction that intrusions and reductions build up more quickly in the absence of auditory information, the trials were divided into three parts: start, middle, and end (Slis & Van

Lieshout, under revision). When counting the number of intrusions and reductions, the second and third repetitions of a trial were disregarded because these were always performed without noise (the first repetition was already removed before the medians and MAD’s were calculated).

The start consisted of the 4 repetitions following the disregarded repetitions, the middle part contained the next 4 repetitions and the final part contained the last set of repetitions. Because speakers were not always able to complete the entire trial, the number of intrusions and

165 reductions within a part was divided by the number of repetitions within that part, resulting in a ratio of intrusions or reductions.

In addition to these ratios, a difference measure test was calculated, similar to the one used in

Slis and Van Lieshout (under revision), to assess the prediction that speakers enlarged their vocal tract size to adjust for the number of intrusions. To this end, the difference between the relative target movement range of an articulator and the relative non-target movement range of this articulator in the alternating word was calculated.

The statistical analyses are described in detail in the results section. An alpha level of 0.05 was used.

4.4 Results

4.4.1 Lombard effect

When speakers are deprived of auditory information, they adjust their speech by speaking louder and slower and, consequently, employ larger amplitudes of speech movements (e.g., Dromey &

Ramig, 1998; Huber & Chandrasekaran, 2006; Namasivayam & Van Lieshout, 2011), a mechanism known as the Lombard effect. To check whether, despite the visual metronome, speakers slowed down their speech due to this Lombard effect, an initial repeated measures

ANOVA was performed with average segment duration values collapsed across words as the dependent variable, and rate and masking condition as the independent variables. In addition, an initial ANOVA assessed whether speakers enlarged the difference measure in the masked condition due to the Lombard effect. Here, the dependent variable was the relative difference collapsed across words and the independent variables were rate and masking condition.

166

Masking of speech without inducing a Lombard effect (louder volume and larger movements) was successful. Speech masking did not affect the duration of the words (see Table 4.1): speakers did not slow down when they could not hear themselves (F(1, 13) = 0.06, p = 0.80). As expected, at a normal speaking rate, word segments were realized with longer durations than at the fast rate (F(1, 13) = 105.85, p < 0.0001). No interaction was observed between masking and rate (F(1, 13) = 2.21, p = 0.16).

The analysis of rate and masking condition with the relative movement range difference as dependent variable (see Table 4.1) did not reveal an effect of either masking or rate (F(1, 13) =

0.51, p = 0.49 and F(1, 13) = 1.65, p = 0.22 respectively). No interaction was found (F(1, 13) =

0.00, p = 0.99).

Table 4-1. Mean duration values (ms) and movement range differences (%) for masked and unmasked speech at fast and normal speaking rates. Standard deviations are reported in parentheses.

Masked Unmasked Speaking rate Duration (ms) Difference (%) Duration (ms) Difference (%) Normal 428.63 (56.56) 45.06 (6.25) 424.85 (60.18) 45.11 (6.75) Fast 305.72 (27.67) 43.27 (7.15) 307.69 (33.08) 43.42 (7.39)

4.4.2 Reductions and intrusions

Next, repeated measures ANOVA determined if (lack of) auditory information affected the ratio of intrusions and reductions. The dependent variables were ratio of intrusions and reductions.

The independent variables consisted of “speaking rate” (normal, fast), “part of trial” (start, middle, and end), and “auditory information” (masked or not masked), collapsed across words.

A tukey-Kramer post hoc analysis was performed for main effects and two-way interactions.

The p-value was set at 0.05.The statistical findings for reduction and intrusion ratios are reported in Table 4.2. Only the significant results will be commented on here.

167

As shown in table 4.2, the only variable having a significant effect on reduction ratios was Trial part. Specifically, fewer reductions occurred at the start of the trial than at the end of a trial

(Table 4.3) regardless of speaking rate and the masking condition.

Table 4-2 Repeated measures GLM ANOVA analysis. Values of ratio of reductions and intrusions (dependent variable) for 14 Canadian English-speaking participants. Independent variables were masking, rate, trial position. Main effects, two- and three- way interactions are reported. Data are collapsed across words. The effect is considered significant at p < 0.05 (marked with an asterisk * p < 0.05, ** p < 0.01, *** p < 0.001).

Reductions Intrusions Variable df F p F P Masking 1, 13 0.00 0.97 6.67 0.02* Rate 1, 13 1.03 0.38 34.56 <0.0001*** Part trial 2, 26 7.81 <0.01** 11.05 <0.001*** Masking*rate 1, 13 0.88 0.37 2.73 0.12 Masking*part trial 2,26 1.54 0.23 0.8 0.46 Rate*part trial 2,26 1.0 0.38 1.85 0.18 Rate*part trial*masking 2,26 0.65 0.53 0.99 0.39

Table 4-3 ratio of reduction means (M) and standard deviations (SD) averaged across individual participants (n = 14), collapsed across words. Independent variables are masking (unmasked and masked), rate (normal, fast) and trial part (start, middle and end).

Unmasked Masked Trial Rate M SD M SD part Fast Start 0.1 0.05 0.07 0.05 Middle 0.11 0.05 0.1 0.06 End 0.14 0.09 0.15 0.13 Normal Start 0.09 0.04 0.08 0.05 Middle 0.09 0.05 0.11 0.06 End 0.12 0.05 0.13 0.05

When auditory information was available, more intrusions were found than when speech was masked (see Tables 4.2 and 4.4). In addition, the fast speaking rate resulted in more intrusions

168 than the normal rate. Finally, the start and middle of trials contained fewer intrusions than the final part.

Table 4-4 intrusion means (M) and standard deviations (SD) averaged across individual participants (n = 14), collapsed across words. Independent variables are masking (unmasked and masked), rate (normal, fast) and trial part (start, middle and end).

Unmasked Masked Trial Rate M SD M SD part Fast Start 0.1 0.05 0.09 0.04 Middle 0.15 0.04 0.12 0.04 End 0.2 0.08 0.16 0.08 Normal Start 0.1 0.04 0.08 0.04 Middle 0.09 0.04 0.1 0.04 End 0.14 0.06 0.13 0.07

4.4.3 Regression analysis

Linear regression analyses were performed to explore whether the earlier observed positive relation between intrusions or reductions and vocal tract size (Slis & Van Lieshout, under revision) changed when no auditory information was available. The difference between the median movement range of an articulator in target position and the median non-target movement range of the identical articulator in the alternating word formed the predictor (see Slis

& Van Lieshout, under revision). The ratio of intrusions and reductions, the criterion variable, was calculated over the complete trial, again without the first three repetitions. The analysis examined mean difference values and intrusion or reduction ratios for each articulator separately for each participant, with rate and masking condition assessed separately, for a total of 168 mean values (14 participants * 3 articulators * 2 rate * 2 masking conditions). The alpha level was set at 0.05.

169

As can be observed in Table 4.5, for masked speech produced at the fast speaking rate, a small relationship existed between the ratio of intrusions and movement range difference values. In contrast to what was previously found by Slis & Van Lieshout (under revision), unmasked speech did not show this effect.

Table 4-5 Regression values (R2) and their significance for masked (N+) or unmasked (N-) in normal (N) and fast (F) speech.

Reductions Intrusions Normal Fast Normal Fast df F p R2 F p R2 F p R2 F p R2 N+ 1,41 0.67 0.41 0.02 3.55 0.07 0.08 0.88 0.35 0.02 7.96 <0.01 0.17 N- 1,41 0.07 0.78 0.02 3.12 0.08 0.07 0 0.97 0 4.35 0.04 0.1

4.4.4 Summary

When auditory information was available, more intrusions were found than when speech auditory information was not available. In addition, the fast speaking rate resulted in more intrusions than the normal rate. Compared to the start, more reductions were found at the end of a trial. The end of a trial contained more intrusions than the start and middle. In the fast speaking rate condition with masking, the intrusions showed a small relation with the difference measures.

4.5 Discussion

The primary objective of the experimental study presented here was to explore how auditory information affected articulatory movement coordination. More specifically, we sought to answer the question whether auditory information influenced coordination dynamics in such a way that it affected the relative number of reductions and intrusions found in repetitive speech.

It was hypothesized that the presence of auditory information should strengthen the coupling in speech movement coordination, resulting in fewer intrusions and reductions. This hypothesis was not confirmed. On the contrary, although the ratio of reductions was unaffected by the

170 availability of auditory information, speakers made more intrusions when auditory information was present. The previously found effect of trial position on the ratio of intrusions and reductions (Goldstein et al., 2007; Slis & Van Lieshout, under revision) was again confirmed in the current study, as well as the finding that the fast speaking rate resulted in more intrusions than the normal rate.

A second hypothesis tested was that auditory information served a corrective function. In an earlier study, Slis & Van Lieshout (under revision) suggested that speakers adjusted their movements within the vocal tract when intrusions occurred in such a way that the perceptual outcome of the intended speech string was not affected significantly. This earlier study revealed that the relative difference between the movement ranges of articulators in target position and the movement ranges of the same articulators in a non-target position in the alternating word of a pair predicted the ratio of intrusions: for example, the larger the difference between the tongue dorsum movement range during the /k/ of cap and its value during the /t/ of the following word tap, the more likely the tongue dorsum intrusions during the /t/. The current study predicted that, in the absence of auditory information, the observed relationship between intrusions and adjusted relative vocal tract size would disappear. Contrary to what was predicted, in the absence of auditory information a small relationship existed between intrusions and difference measures at a fast spaking rate which was not present when auditory information was available.

In general, the findings reveal that the presence or lack of auditory information does not stabilize or destabilize speech coordination in the expected way. Articulatory movement coordination seems to be more stable without auditory information in the sense that fewer intrusions occurred in this condition than in the condition where auditory information was present. In addition, the ratio of reductions was not affected by auditory information at all. One

171 possible explanation for this is that speakers monitor their actual speech articulations more closely when they do not hear themselves because they are unable to screen their speech perceptually for irregularities; this would be consistent with studies that have shown a stabilizing effect of attention on movement dynamics (Temprado, Chardenon & Laurent, 2001;

Zanone, Monno, Temprado & Laurent, 2001). This also is in keeping with several speech error studies which have shown that fewer errors occurred when speakers were instructed to attend to possible errors compared to when they were instructed not to attend to the accuracy of their speech (Postma & Kolk, 1990; Postma, Kolk & Povel, 1990). Reductions involve reduced activations of target activations, which receive tactile information about the actual constriction.

The findings support the idea that, when no auditory information is available, speakers can rely more directly on other sensory modalities, such as proprioceptive and tactile information to monitor changes in movement patterns (Alais, Newell & Mamassian, 2010; Lackner & Tuller,

1979; Namasivayam et al., 2009; Williamson, 1998). This may then result in more accurate speech movements, and can explain the somewhat larger differences when speakers make intrusions at a fast speaking rate. The role of proprioceptive and tactile information in monitoring errors is supported by Lackner & Tuller (1979), who found that speakers were less accurate in detecting vowel errors compared to consonantal errors in both masked and unmasked speech. Because of the larger movements of articulators when producing consonants, more proprioceptive information is available This increase in vowel errors in the masked condition was explained by the fact that less proprioceptive information is available for vowels than with consonants due to the large positional changes in the latter (Lackner & Tuller, 1979).

In addition, these authors found fewer errors in the noise masked condition when this condition followed the unmasked condition: when speakers were familiar with speech sequences and possible errors, the greater reliance on movement related feedback, when auditory information is

172 unavailable, such as proprioceptive information, allows for stronger coupling between neural oscillators and the articulators that they control (Namasivayam & Van Lieshout, 2011; Van

Lieshout, 2004). This stronger coupling likely results in more precise articulations. The importance of auditory information in familiarizing speakers with speech sequences has also been observed by Corley, et al. (2011). These researchers found that, although there was no main effect for masking, more phonemic errors were reported during masked speech in subsequent repetitions of the same tongue twister after an error occurred. In the unmasked sessions, the speaker can employ auditory information to notice this error and prevent the error from occurring in subsequent repetitions. Such findings support the notion that auditory information is employed in detecting and correcting audible errors as well as in preventing them in subsequent repetitions. Thus, speakers might rely on the acoustic signal to correct audible intrusions and reductions only to ensure that the acoustic outcome is still acceptable.

No evidence has been found for speakers to produce more intrusions when a larger difference between target and non-target articulations is available or that they correct for such intrusions by exaggerating this difference to prevent the intrusions from becoming audible. The findings of the earlier study by Slis and Van Lieshout (under revision) are thus not confirmed. The earlier study consisted of stimuli containing four different vowels. In this case, speakers may have been careful in preserving the vowel quality when they had to differentiate tongue movements for all four vowel positions. This possibly resulted in larger differences between target and non-target movement ranges in case of intrusions. The current study only employed stimuli with two different vowels, one high front and one low back vowel, which provided a strong natural contrast in vowel identity. The chance that these two vowels overlap kinematically and perceptually is unlikely and their production does not need to be controlled as rigorously. This is consistent with Mooshammer & Geng (2008) who showed that articulatory movements of lax

173 vowels were shifted upwards when reduced. It was suggested that speakers made extra effort to minimize coarticulation between neighboring segments to enhance syntagmatic contrasts to avoid vowel reduction in stressed position (Mooshammer & Geng, 2008).

The findings support the theory that task specific target constrictions in the form of gestures are the immediate goals of speech production (e.g., Fowler, 2007). It is suggested that speakers use auditory information to scan for audible intrusions and reductions in speech production but, for control at the articulatory level, the actual feedback that is relevant resides at the gestural level, in the form of proprioceptive and tactile information (Saltzman et al., 1998). This suggestion is, first of all, based on the fact that reductions were not affected by auditory information: Tactile feedback provided the speaker with the necessary information about whether a consonant constriction, i.e., target articulator, is correct in both masked and unmasked speech. Secondly, intrusions affect the acoustic spectrum to some extent (Marin, Pouplier & Harrington, 2010;

Pouplier & Goldstein, 2005). However, the presence of auditory information resulted in more intrusions than lack of auditory information. One could speculate that, as long as the correct gesture can be retrieved from the acoustic output, the speaker allows for intrusions to occur, without considering the acoustic output (see also Pouplier & Goldstein, 2010). This is in line with Fowler and Saltzman (p. 179, 1993) who suggested that “intergestural interference must either be eliminated or kept within tolerable (articulatory or perceptual) limits”.

Lackner and Tuller (1979) suggested that so-called corollary-discharge information, which contains efferent information about the to-be-produced gesture, is compared with available sensory information such as proprioceptive or auditory information. If these two sources match, the produced gesture is considered correct. The current data show that coupling between the gestures can be strengthened by a coupling between proprioceptive or tactile information and

174 corollary-discharge information. In masked speech, this proprioceptive information thus stabilizes speech movement coordination resulting in fewer intrusions. According to Alais et al.

(2010), different sensory modalities work simultaneously and supplement each other when processing sensory information. In addition, certain sensory modalities are stronger than others, or are given more attention (Alais, et al., 2010). One can argue that auditory information in speech production is important for external monitoring and correcting speech and thus is given priority over other modalities. If many intrusions are not perceived as incorrect speech segments, as has been shown by Pouplier and Goldstein (2005), and tactile information from the target articulations is correct as well, auditory and tactile information are matching, and proprioceptive information might be ignored to a certain extent, resulting in more intrusions. It would be interesting in this regard to examine how much the speaker allows the kinematic signal to be distorted before the intrusions are corrected or inhibited and at what level of intrusions and reductions these kinematic patterns become audible. The current study, however, did not differentiate smaller intrusions from larger ones, so it is not clear whether the larger, and potentially audible, intrusions would be suppressed in this case.

When auditory information is not available, the speaker automatically switches to the different remaining modalities, such as proprioceptive and tactile information. In this case, tactile information confirms that the constriction for a target is correct. Proprioceptive information from the non-target tongue positions, which is now given higher priority because of the lack of auditory information, is entrained with the neural control system (or corollary-discharge information) that dictates that the non-target articulator should not be activated. This information thus suppresses intrusions more effectively and this may have led to the lower rate of intrusions in the masked condition. Thus, the task dynamic model can be improved by including several feedback loops, accounting for the finding that speakers integrate multisensory

175 modalities to stabilize speech coordination, from the interarticulator and acoustic output level back to the gestural level, as has been suggested by Saltzman et al. (1998).

It can be concluded that speakers may not strictly rely on auditory information to stabilize speech coordination but monitor external perceptually salient events including audible intrusions and reductions. However, when this information is not available, speakers may switch to a more gestural-based mode of control by increasing the gain of the feedback related to the actual articulatory movements, thus stabilizing existing modes of coordination and allowing for fewer intrusions. In future work, it will need to be investigated whether auditory information controls the degree of intrusions and reductions. The question is whether the larger, and potentially audible, intrusions would be suppressed when the speaker has auditory information available.

4.6 Acknowledgements

The study was supported by a Social Sciences and Humanities Research Council (SSHRC) grant and by funding from the Canada Research Chairs program both awarded to the second author.

The authors would like to thank Mark Noseworthy for the voice recordings and James Le for his technical support during the EMA sessions, Radu Craioveanu for his help analyzing the data, and Jeffrey Steele and Keren Rice for their valuable comments on earlier versions.

4.7 References

Alais, D., Newell, F. N., & Mamassian, P. (2010). Multisensory processing in review: from

physiology to behaviour. Seeing and perceiving, 23(1), 3-38.

Beek, P. J., Peper, C. E., & Stegeman, D. (1995). Dynamcal models of movement coordination.

Human Movement Science, 14, 573-608.

176

Bogaerts, H., Buekers, M. J., Zaal, F. T., & Swinnen, S. P. (2003). When visuo-motor

incongruence aids motor performance: the effect of perceiving motion structures during

transformed visual feedback on bimanual coordination. Behavioural Brain Research, 138,

45-57.

Borden, G. (1979). An interpretation of research on feedback interruption in speech. Brain and

Language, 7, 307-319.

Chau, T., Young, S., & Redekop, S. (2005). Managing variability in the summary and

comparison of gait data. Journal of Neuroengineering and Rehabilitation, 2(1), 22.

Corley, M., Brocklehurst, P., & Moat, H. (2011). Error biases in inner and overt speech:

Evidence from tongue twisters. Journal of Experimental Psychology: Learning, Memory

and Cognition, 37(1), 162-175.

Dell, G. S. (1980). Phonological and lexical encoding in speech production: an analysis of

naturally occurring and experimentally elicited speech errors (Unpublished doctoral

dissertation), University of Toronto, Toronto.

Dromey, C., & Ramig, L. O. (1998). Intentional changes in sound pressure level and rate: their

impact on measures of respiration, phonation, and articulation. Journal of Speech,

Language, and Hearing Research, 41(5), 1003-1018.

Fink, P., Foo, P., Jirsa, V., & Kelso, J. A. S. (2000). Local and global stabilization of

coordination by sensory information. Experimental Brain Research, 134, 9-20.

Forrest, K., Abbas, P., & Zimmermann, G. (1986). Effects of white noise masking and low pass

filtering on speech kinematics. Journal of Speech and Hearing Research, 29(4), 549-562.

177

Fowler, C. A. (2007). Speech production. In M. G. Gaskell (Eds.), The Oxford Handbook of

Psycholinguistics (pp. 489-501). Oxford University Press.

Fowler, C. A., & Saltzman, E. L. (1993). Coordination and coarticulation in speech production.

Language and Speech, 36(2,3), 171-195.

Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic

analysis of slips of the tongue. Science, 30(2), 139-162.

Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language: Journal of

the Linguistic Society of America, 47(1), 27-52.

Goffman, L., Gerken, L., & Lucchesci, J. (2007). Relations between segmental and motor

variability in prosodically complex nonword sequences. Journal of Speech, Language and

Hearing Research, 50(2), 444-458.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning to

articulatory processes: Evidence from tongue twisters. Language & Cognitive Processes,

21(6), 649-683.

Goldstein, L., Byrd, D., & Saltzman, E. L. (2006). The Role of Vocal tract gestural action units

in understanding the evolution of phonology. In Michael Arbib (Eds.), From Action to

Language: The Mirror Neuron System (pp. 215-249). Cambridge university press.

Goldstein, L., & Fowler, C. A. (2003). Articulatory Phonology: A Phonology for Public

Language Use. In N. Schiller & A. Meyer (Eds.), Phonology and Phonetics, Vol. 6 (pp.

159-207). Berlin, Germany: Mouton de Gruyter.

178

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E. L., & Byrd, D. (2007). Dynamic action units

slip in speech production errors. Cognition, 103(3), 386-412.

Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical model of phase transitions in

human hand movements. Biological Cybernetics, 51, 347-356.

Henriques, R. N., & Van Lieshout, P. H. H. M. (2013). A comparison of methods for decoupling tongue and lower lip from jaw movements in 3D Articulography. Journal of Speech,

Language and Hearing Research, 56(5), 1503-1516.

Huber, J., & Chandrasekaran, B. (2006). Effects of increasing sound pressure level on lip and

jaw movement parameters and consistency in young adults. Journal of Speech, Language

and Hearing Research, 49, 1368-1379.

Jones, J., & Munhall, K. (2003). Learning to produce speech with an altered vocal tract: The

role of auditory feedback. Journal of the Acoustical Society of America, 113(1), 532-543.

Kelso, J.A.S. (1995). Dynamic Patterns. MIT Press.

Kelso, J. A. S., Tuller, B., Vatikiotis-Bateson, E., & Fowler, C. A. (1984). Functionally specific

articulatory cooperation following jaw perturbations during speech: Evidence for

coordinative structures. Journal of Experimental Psychology: Human Perception and

Performance, 10, 812-832.

Kent, R. (1996). Hearing and believing: Some limits to the auditory-perceptual assessment of

speech and voice disorders. American Journal of Speech Language Pathology, 5(3), 7-23.

Kroos, C. (2012). Evaluation of the measurement precision in three-dimensional

Electromagnetic Articulography (Carstens AG500). Journal of Phonetics, 40, 453-465

179

Lackner, J., & Tuller, B. (1979). Role of efference monitorng in the detection of self-produced

speech errors. In M. Cortese & C. Edward (Eds.), Sentence Processing: Psycholinguistic

Studies Presented to Merrill Garrett. (pp. 281-294). Hillsdale, NJ: Lawrence Erlbaum

Associates.

Lagarde, J., & Kelso, J. A. S. (2006). Binding movement, sound and touch: multimodal

coordination dynamics. Experimental Brain Research, 173, 673-688.

Lane, H., Denny, M., Guenther, F., Matthies, M., Menard, L., & Perkell, J., et al. (2005). Effects

of biteblocks and hearing status on vowel production. Journal of the Acoustical Society of

America, 118(3), 1636-1646.

Marin, S., Pouplier, M., & Harrington, J. (2010). Acoustic consequences of articulatory

variability during productions of /t/ and /k/ and its implications for speech error research.

Journal of the Acoustical Society of America, 127(1), 445-461.

McMillan, C. T., & Corley, M. (2010). Cascading influences on the production of speech:

Evidence from articulation. Cognition, 117, 243-260.

Mooshammer, C., & Geng, C. (2008). Acoustic and articulatory manifestations of vowel

reduction in German. Journal of the International Phonetic Association, 38(2), 117-136.

Mowrey, R., & MacKay, I. (1990). Phonological primitives: Electromyographic speech error

evidence. Journal of the Acoustical Society of America, 88(3), 1299-1312.

Namasivayam, A. K., & Van Lieshout, P. (2011). Speech motor skill and stuttering. Journal of

Motor Behavior, 43(6), 477-489

180

Namasivayam, A. K., Van Lieshout, P., McIlroy, W., & de Nil, L. (2009). Sensory feedback

dependence hypothesis in persons who stutter. Human Movement Science, 28, 688-707.

Neilson, M., & Neilson, P. (1987). Speech motor control and stuttering: a computational model

of adaptive sensory-motor processing. Speech Communications, 6, 325-333.

Ostry, D., Cooke, J., & Munhall, K. (1987). Velocity curves of human arm and speech

movements. Experimental Brain research, 68, 37-46.

Peper, C. E., & Beek, P. J. (1998). Distinguishing between the effects of frequnecy and amplitue

on interlimb coupling in tapping a 2:3 polyrhythm. Experimental Brain research, 118, 78-

92.

Peper, C. E., & Beek, P. J. (1999). Modeling rhythmic interlimb coordination: The roles of

movement amplitude and time delays. Human Movement Science, 18, 263-280.

Peper, C. E., Beek, P. J., & Van Wieringen, P. C. W. (1995a). Coupling strength in tapping a 2:3

polyrhythm. Human Movement Science, 14, 217-245.

Peper, C. E., Beek, P. J., & Van Wieringen, P. C. W. (1995b). Multifrequency coordination in

bimanual tapping: Asymmetrical coupling and signs of supercriticality. Journal of

Experimental Psychology, 21(5), 1117-1138.

Postma, A. (2000). Detection of errors during speech production: A review of speech

monitoring models. Cognition, 77, 97-131.

Postma, A., & Kolk, H. (1990). Speech errors, disfluencies, and self-repairs of stutterers in two

accuracy conditions. Journal of Fluency Disorders, 15(5-6), 291-303.

181

Postma, A., & Kolk, H. (1992). The effects of noise masking and required accuracy on speech

errors, disfluencies, and self-repairs. Journal of Speech and Hearing Research, 35(3), 537-

544.

Postma, A., Kolk, H., & Povel, D. J. (1990). On The Relation among Speech Errors,

Disfluencies, and Self-Repairs. Language and Speech, 33(1), 19-29.

Postma, A., Kolk, H., & Povel, D. J. (1991). Disfluencies as Resulting from Covert Self-Repairs

applied to Internal Speech Errors. In H. F. M. Peters, W. Hulstijn & C. W. Starkweather

(Eds.), Speech Motor Control and Stuttering (pp. 141-147). Elsevier Science Publishers

B.V.

Postma, A., & Noordanus, C. (1996). Production and detection of speech errors in silent,

mouthed, noise-masked, and normal auditory feedback speech. Language and Speech, 39,

375-392.

Pouplier, M. (2008). The role of a coda consonant as error trigger in repetition tasks. Journal of

Phonetics, 36, 114-140.

Pouplier, M., & Goldstein, L. (2005). Asymmetries in the perception of speech production

errors. Science, 33(1), 47-75.

Pouplier, M., & Goldstein, L. (2010). Intention in articulation: Articulatory timing in alternating

consonant sequences and its implications for models of speech production. Language and

Cognitive Processes, 25(5), 616-649.

Repp, B. (2005). Sensory synchronization: A review of the tapping literature. Psychonomic

Bulletin and Review, 12(6), 969-992.

182

Roerdink, M., Peper, C. E., & Beek, P. J. (2005). Effects of correct and transformed visual

feedback on rhythmic visuo-motor tracking: Tracking performance and visual search

behavior. Human Movement Science, 24, 379-402.

Saltzman, E. L. (1993). Dynamics and coordinate systems in skilled sensorimotor activity.

Haskins Laboratories Status Report on Speech Research (pp. 1-15).

Saltzman, E. L., Löfqvist, A., Kay, B., Kinsella-Shaw, J., & Rubin, P. (1998). Dynamics of

intergestural timing: A perturbation study of lip-larynx coordination. Experimental Brain

Research, 123(4), 412-424.

Slis, A., & Van Lieshout, P. (under revision). The effect of phonetic context on the dynamics of

intrusions and reductions.

Slis, A., & Van Lieshout, P. (2013). The effect of phonetic context on speech movements in

repetitive speech. Journal of the Acoustical Society of America, 134(6), 4496-4507.

Smith, A., Goffman, L., Zelaznik, H., Ying, G., & MacGillem, C. (1995). Spatiotemporal

stability and patterning of speech movement sequences. Experimental Brain Research,

104, 493-501.

Temprado, J., Chardenon, A., & Laurent, M. (2001). Interplay of biomechanical and

neuromuscular constraints on pattern stability and attentional demands in a bimanual

coordination task in human subjects. Neuroscience Letters, 303, 127-131.

Turvey, M. (1990). Coordination. American Psychology, 45(8), 938-953.

183

Van Lieshout, P. (2004). Dynamical systems theory and its application in speech. In Maassen B,

R. Kent, H. Peters, P. Van Lieshout & W. Hulstijn (Eds.), Speech Motor control in

Normal and disordered speech (pp. 51-82), Oxford University press.

Van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective

on rhotic articulation problems: a descriptive case study. Asia Pacific Journal of Speech,

Language, and Hearing, 11(4), 283-303.

Van Lieshout, P., & Moussa, M. (2000). The assessment of speech motor behavior using

electromagnetic articulography. The Phonetician, 81, 9-22.

Westbury, J. (1994). On coordinate systems and the representation of articulatory movements.

Journal of the Acoustical Society of America, 95, 2271-2273.

Williamson, M. M. (1998). Neural control of rhythmic arm movements. Neural Networks, 11(7-

8), 1379-1394.

Yunusova, Y., Green, J., & Mefferd, A. (2009). Accuracy Assessment for AG500,

Electromagnetic Articulograph. Journal of Speech, Language and Hearing Research,

52(2), 547-555.

Zanone, P., Monno, A., Temprado, J., & Laurent, M. (2001). Shared dynamics of attentional

cost and pattern stability. Human Movement Science, 20(6), 765-789.

Zierdt, A., Hoole, P., & Tillman, H. G. (1999). Development of a system for three-dimensional

fleshpoint measurements of speech movements. In. J. J. Ohala, Y. Hasegawa, M. Ohala,

D. Granville, & A. C. Bailey (eds.), Proceedings of the XIVth International Congress of

the Phonetic Science, Vol. 1 (pp.73-76). Berkeley, University of California.

5 Chapter 5

184

185

5.1 Introduction

The series of studies that form this dissertation built on a new understanding regarding the nature of speech errors resulting from general principles of movement coordination (Goldstein,

Pouplier, Chen, Saltzman & Byrd, 2007; Pouplier, 2003; 2007; 2008). The data collected in the current series of experiments confirm and extend findings from recent studies which reveal that certain speech errors arise because of general constraints that govern movement coordination.

These constraints stem from underlying autonomous mechanisms that stabilize movement coordination (Goldstein et al., 2007; Peper & Beek, 1998; Peper & Beek, 1999; Pouplier, 2003;

Pouplier, 2008). In the present studies, errors took the form of intrusions and reductions of articulatory movements during the repetition of word pairs like cop top. Intrusions involved the activations of articulators expected to be inactivated with onset consonants (i.e., non-target articulators - these form the onset consonant in the alternating word). In contrast, reductions involved the reduced activation of the intended movement of target articulators. This dissertation has investigated how co-articulatory phenomena, introduced by phonetic context and speaking rate, as well as auditory information influence the occurrence of intrusions and reductions.

5.2 Summary of the findings

The three main questions addressed in this dissertation were:

1. How do different types of word pairs (i.e., cop top versus top top) and phonetic

contexts (i.e., different consonant-vowel combinations such as /ɑ-ɑ/ in cop top versus /I-

I/ in kip tip) affect articulatory movements in repetitive speech (Study 1)?

186

2. How do articulatory movement characteristics and phonetic context determine

the occurrence of intrusions and reductions in word pairs with alternating onsets (Study

2)?

3. Does the presence of auditory information influence the occurrence of intrusions

and reductions in word pairs with alternating onsets (Study 3)?

The data revealed that word pairs with identical onset consonants (e.g., cop cop) and alternating onset consonants (e.g., cop top) are characterized by different co-articulatory properties (Study

1). Phonetic context influenced coordination dynamics and the occurrence of intrusions: the tongue dorsum intruded more in a front vowel context (Study 2). Finally, contrary to what was predicted, lack of auditory information resulted in fewer intrusions than presence of auditory information (Study 3). The findings of these three studies will be reviewed in the next paragraphs.

5.2.1 Study 1

The challenge for kinematic studies on intrusions and reductions such as those reported in this dissertation is to determine what part of articulatory variability in word pairs with alternating onset consonants can actually be classified as intrusions and reductions. Compared to word pairs with identical onset consonants and rhyme (e.g., top top), word pairs with alternating onset consonants and identical rhyme (e.g., top cop) are prone to intrusions and reductions due to the more unstable 1:2 frequency ratio (one tongue tip/ tongue dorsum movement to two lower lip movements). However, besides the tendency to stabilize movement, other factors might contribute to articulatory variability resulting in differences between these word pairs. The first study of this dissertation investigated two factors that were hypothesized to affect the variability

187 of the articulatory movements of target and non-target articulators (e.g., tongue dorsum and tip respectively during onset cop). The first factor was “type of word pair”: word pairs were characterized either by identical or alternating onset consonants followed by the identical rhyme. The second factor was “phonetic context”, which involved different vowels (/ɑ/, /æ/, /ɪ/, and /u/) and coda#onset combinations (/p#t/, /t#p/, /k#p/, /p#k/, /k#t/, and /t#k/). Position within a word pair (first, second) and speaking rate (normal, fast) were additional independent variables. Three dependent variables were evaluated:

1. Median movement range (mm) of articulators for all repetitions within a trial;

2. A coefficient of variance (Median Absolute Deviation/ median movement range

of all repetitions within a trial), which expressed movement variability of an

articulator;

3. Mean correlation values between a target articulator and a simultaneous non-

target articulator.

The first two dependent variables are a measure of variability and the third dependent variable was included to investigate whether possible biomechanical constraints related to type of word pair and speaking rate contribute to possible differences in variability in movement patterns.

1. Manipulating the independent variable “Type of word pair” did not result in differences in variability measures of individual articulators, nor did speaking rate. However, compared to movements of non-target articulators in the first word in identical onset word pairs, those during the onset of the second word showed higher variability; the alternating onset word pairs did not show this difference in word position (i.e., first or second word). It was speculated that the interactions between “type of word pair” and “word position” occurred because the position of the articulator in both target and non-target positions required more precise control in word pairs

188 with alternating onsets compared to the identical onset word pairs where such a contrast does not exist. This resulted in similar degrees of variability in the first and second word in alternating onset sequences. In the identical onset sequences, the non-target articulator likely varied as an unintended consequence of the simultaneous movements of the target articulators and only became active when adopting its position for the vowel. These speculations were supported by the finding that word pairs with identical onsets were characterized by higher correlation values between target and non-target articulations, compared to the alternating onset condition. The first word in word pairs with identical onsets was characterized by less variability than the first word in alternating onset word pairs. This finding was proposed to arise from dissimilar degrees of initial strengthening at different prosodic boundaries (see Study 2, discussion).

2. An additional part of variability in speech could be attributed to differences in articulatory movement ranges: the target and non-target median movement ranges of the lingual articulators were larger in alternating onset word pairs than in identical onset word pairs. These findings confirmed what had been observed by Stearns (2006) related to positions of the tongue dorsum. For some participants in her study, the mean tongue dorsum movements during tongue tip constrictions in alternating onset sequences were higher than the mean amplitudes of the non-target tongue dorsum in non-alternating onset sequences.

In our study, only the tongue tip showed an effect of vowel type in the different word pairs: the high vowels /ɪ/ and /u/ differed from each other in the alternating trials but this difference disappeared in the identical onset trials. Furthermore, a coda#onset effect was found only for the tongue tip: coda /k/ showed larger movement ranges than coda /p/ in the context of back vowels.

It was suggested that these results revealed speaker specific strategies, such as hyper-articulation

189 in order to enhance differences between word pairs (De Jong, Beckman & Edwards, 1993;

Lindblom, 1990). However, the larger movement ranges in alternating onset trials may also reflect a tendency to stabilize articulatory movements: by exaggerating articulatory movement patterns movement coordination is stabilized by enabling stronger kinesthetic feedback entrainment with the neural control system (Namasivayam, Van Lieshout, McIlroy & de Nil,

2009; Namasivayam & Van Lieshout, 2011; Ridderikhoff, Peper & Beek, 2007; Van Lieshout,

Hulstijn & Peters, 2004; Williamson, 1998). I will elaborate more on this mechanism in 5.3.2.2.

In contrast, in the first study the lower lip behaved differently with respect to the movement range measures in the two different types of word pairs: no differences were found in these measures between word pairs with alternating and identical onsets. The first study addressed this by suggesting that the lower lip behaved more independently from the jaw than the tongue articulators (Van Lieshout & Neufeld, 2014), and is thus not affected by target overshoot to the same extent. If it is assumed that the enlarged difference measures resulted from stabilizing coordination, one can argue, based on the findings of the first study, that the lower lip behaves more stably than the lingual articulators. Finally, the factor "rate" did not affect movement ranges.

3. The correlation between target articulator and simultaneous non-target articulator, however, was strongly influenced by rate: higher rates resulted in higher correlation values. In addition, word pairs with identical onsets were characterized by higher correlation values between target and non-target articulations, compared to the alternating onset condition.

This first study clearly shows that variability in word pairs with alternating and non-alternating onset consonants differs to such an extent that confounds might be introduced in studies that

190 take non-alternating onset word pairs as a control. The second study took these findings into account when defining intrusions and reductions.

5.2.2 Study 2

The second study focused on the occurrence of intrusions and reductions in word pairs with alternating onset consonants and identical codas. The study tested the following predictions:

1) Based on the results from the studies by Pouplier and colleagues (Goldstein et al.,

2007; Pouplier, 2003; Pouplier, 2008), more intrusions and reductions were expected at

the end of a trial, especially at a fast speaking rate.

2) Different combinations of vowels and consonants (e.g., pack tack or pit kit)

influence patterns of intrusions and reductions.

3) The intrusions and reduction patterns are related to the relative difference in

position of a given articulator in non-target or target position.

The independent variables were “word position” (first, second word), “part of trial” (first 5 repetitions, second 5 repetitions and third 5 repetitions), “speaking rate” (normal, fast), “type of articulator” (tongue tip, tongue dorsum, lower lip), “type of vowel” (/ɑ/, /ɪ/, /æ/, /u/) and “type of coda#onset” (e.g., p#t and t#p in the word pairs cop top and cot pot). The dependent variable was ratio of intrusions or ratio of reductions. These intrusions and reductions were calculated as follows:

• Intrusion: a movement range value of at least 2 median absolute deviations

(MAD) above the median value of a non-target articulator (based on all repetitions

191

within a trial). The ratio of intrusions was calculated by dividing the number of

intrusions during a trial part by the number of repetitions during that particular trial part.

• Reduction: a movement range value of at least 2 MADs below the median value

of a target articulator (based on all repetitions within a trial). The ratio of reductions was

calculated by dividing the number of intrusions during a trial part by the number of

repetitions during that particular trial part.

1) The findings corroborated the results from Pouplier and colleagues (2003; 2007; 2008), which considered intrusions and reductions as resulting from stabilized movement coordination.

Whereas reductions involved slightly higher ratios at the fast compared to the normal speaking rate, the expected positive relation between faster speech and number of intrusions (Goldstein et al., 2007) was not observed. Speaking rate did affect intrusions in the sense that, in the first part of a trial, fewer intrusions were found in the fast rate than in the normal rate. At the end of the trial, this difference disappeared, indicating a steeper increase in number of intrusions. Several explanations were suggested for the divergent results related to speaking rate. First of all, in contrast to earlier studies (Goldstein et al., 2007; MacKay, 1971), the current study always used the normal rate first, followed by the fast rate. This could have introduced a practice effect (see e.g., Dell, Burger & Svec, 1997; Kelso & Zanone, 2002; Namasivayam & Van Lieshout, 2008) transferring skills acquired in the normal speaking rate condition to the fast rate condition. A second factor that may have affected outcomes is that, in the current study, the rate of speech was determined individually for each speaker. As such, the task demands were more aligned with individuals’ speech motor skills. Finally, there existed differences in the analysis techniques used across the studies. Instead of comparing means and standard deviations derived from alternating and non-alternating sequences (Goldstein et al., 2007), the current study took

192 the normalized median and MAD values for each separate trial of alternating onset word pairs to determine intrusion and reduction patterns without referring to a control, such as word pairs with identical onsets. The latter procedure was chosen based on the results of study I. This study revealed that word-pairs with alternating and identical onsets differ with respect to articulatory variability due to underlying constraints.

2) In addition, the second study showed that co-articulatory constraints influence the number of intrusions, confirming our hypothesis: Compared to the tongue tip and lower lip, the non-target tongue dorsum intruded more frequently in a front vowel context. This result paralleled findings in Pouplier (2008), who observed a bias for the tongue dorsum to intrude more frequently than the tongue tip, and in Goldstein et al. (2007), who observed that a following high front vowel resulted in more intrusions of the lingual articulators. However, instead of the hypothesized low-high distinction, the current study revealed a front-back distinction.

3) While difference measures (i.e., the difference between a relative target movement range and a non-target movement range of the same articulator in the alternating word) did not predict the number of reductions, these measures were moderately positively related with the number of intrusions (R2 = 0.3): the larger the relative difference between target and non-target positions within a word pair (e.g., movement range difference of the tongue dorsum in cop versus top), the more intrusions were found. However, observing the values for each individual articulator resulted in high values for the tongue dorsum (R2 = 0.75); the tongue tip showed a positive trend of R2 = 054. Collapsing the articulators thus cancelled out effects which were large when observed independently. I concluded that there is a tendency for the lingual articulators to clearly differentiate between reaching a phonetic constriction goal versus suppressing activation

193 when not forming part of a gestural constriction task. In line with this idea, I suggested that, when such an optimal use of constriction space was available, speakers allowed for more intrusions of the tongue dorsum in those cases where situational demands, such as the need for coordination stability, required adding extra gestures. Another possibility put forward was that speakers compensated for more intrusions of the tongue dorsum in a front vowel context for acoustic reasons: a speaker may have tried to adjust his/her starting position during non-target productions in order to prevent intrusions from becoming audible. The use of such auditory criteria might have prevented intrusions from being perceived as errors. Both these assumptions, however, were invalidated in the third study, which we discuss next. The third study looked at the role of available auditory information on the stability of movement coordination and on maintaining a correct acoustic output.

5.2.3 Study 3

The objectives of the third study were

1. To explore how the presence of auditory information affects the relative number of

intrusions and reductions.

2. To explore whether movement range difference measures predict intrusions and

reductions in a similar way as in the second study, and whether speakers use

auditory information to adjust their movement ranges.

The independent variables “speaking rate”, “part of trial”, and “masking” (masked by noise or not) were evaluated. The dependent variable was the ratio of intrusions or reductions, as defined in the second study. The previously found effect of trial part on the ratio of intrusions and reductions revealed in Goldstein et al. (2007) and the study described in chapter 2, was

194 confirmed in the current study as well as the finding that the fast speaking rate resulted in more intrusions than the normal rate; reductions were not affected by rate.

1. The hypothesis that the presence of auditory information strengthens the coupling in speech movement coordination resulting in fewer intrusions and reductions was not confirmed.

Although the ratio of reductions was not affected by the availability of auditory information, speakers made more intrusions when auditory information was available. To explain this unexpected result, I argued that speakers were able to monitor their speech articulations more closely when no auditory information was available, supporting the notion of a stabilizing role for attention in movement dynamics (Temprado, Chardenon & Laurent, 2001; Treffner & Peter,

2002; Zanone, Monno, Temprado & Laurent, 2001). In addition, the fact that, in the absence of auditory information, fewer intrusions were made also supports the idea that other sensory modalities, such as proprioceptive and tactile, are relevant for stabilizing articulatory coupling

(see also Alais, Newell & Mamassian, 2010; Namasivayam, et al., 2009; Williamson, 1998). I will elaborate on this proposal when discussing the theoretical contributions of my research

(5.3.2).

2. Second, the findings also did not confirm the perceptually-based explanation that speakers enlarged their vocal tract space to “mask” these intrusions acoustically: The analysis in the masking study showed that when no auditory information was available, the relationship between movement range difference measures and intrusions was somewhat positively related at a fast speaking rate. This invalidated the suggestion that compensating for intrusions by enlarging vocal tract sizes has an underlying perceptual explanation or that speakers allow more intrusions because of the larger vocal tract size. I speculate that the difference in results regarding the link between difference values and intrusions between the second and third studies

195 stems from the fact that the second study used stimuli containing four different vowels whereas the stimuli in the third study only involved two different vowels. As the onset of the vowel starts during the production of the onset consonant, intrusions could have distorted the quality of the following vowel. For example, an intrusion of the tongue dorsum results in a higher position of this articulator and a vowel like /æ/ might be produced closer to /ɪ/. Speakers in the second study may have been more careful in preserving the vowel quality when they had to differentiate lingual movements for four vowel positions, hereby exaggerating their articulations and consequently making the vowel space larger. This would result in the larger differences between target and non-target movement ranges reported in the findings for the second study. The stimuli for the third study involved only one high front and one low back vowel, which provided a strong natural contrast in vowel identity. That these two vowels would overlap kinematically and perceptually is indeed very unlikely and, as such, their production does not need to be controlled as rigorously. This is consistent with Mooshammer and Geng (2008) who suggested that speakers make extra effort to minimize coarticulation between neighboring segments in order to enhance syntagmatic contrasts. In their study, articulatory movements of unstressed lax vowels were shifted upwards. It was argued that, compared to articulating unstressed lax vowels, more articulatory effort was used when producing stressed vowels resulting in more contrast and avoiding vowel reduction. The small relation at a fast speaking rate between difference measures and intrusions in the masked condition can be explained as resulting from more accurate speech and the tendency to stabilize movement coordination.

How the three studies contribute to the field of speech production research is discussed next in three separate parts. First empirical contributions are listed. This part is followed by a section on theoretical contributions, in which the results are interpreted in light of the model of articulatory

196 phonology and Task Dynamics. Finally, several methodological contributions will be highlighted.

5.3 Contributions

5.3.1 Empirical contributions

The three studies all contribute to the existing literature on variability in repetitive speech. The first study convincingly showed that word pairs which shared their onsets and word pairs with different onsets differed in how they were produced. This finding indicates one should be careful with interpretations of results obtained from experimental stimuli which are compared with a different set of control stimuli. The second study confirmed already existing observations that intrusions and reductions result from dynamic principles that stabilize speech coordination

(Goldstein et al., 2007; Pouplier, 2007; 2008). The present study added new evidence to the speech error literature, namely that phonetic context affects the number of intrusions. In contrast to the previously observed bias for the tongue articulators to intrude in the context of a high vowel (Goldstein et al., 2007) the second study showed a vowel asymmetry involving a front- back bias as opposed to a high-low bias for the non-target tongue dorsum. This tongue dorsum bias has been observed in an ultrasound study by Pouplier (2008). However, this latter study was not designed to examine the specific effect of articulator and context type on the occurrence of intrusions. Finally, the third study showed that masking speech does not always have a detrimental effect on movement coordination. Rather, auditory information seems to influence movement coordination less clearly than other information, such as proprioceptive or tactile information.

197

As was indicated in the introduction of this dissertation, articulatory movement coordination can affect speech production to such an extent that patterns eventually may surface in the form of audible errors in natural language. The asymmetries related to specific articulators found in the current study resemble similar asymmetries found in traditional error studies. For example,

Stemberger (1991) mentioned a tendency in natural databases for a dorsal consonant to replace an alveolar consonant more frequently than vice versa. The bias for a velar to replace a labial was confirmed in a subsequent experiment (Stemberger, 1991). These two findings resemble the findings in my study. It can thus be inferred that the asymmetries, found in perceptual and acoustic studies (Marin, Pouplier & Harrington, 2010; Pouplier & Goldstein, 2005) have an articulatory basis as well. Marin, Pouplier and Harrington (2010) looked at how gradual and full intrusion of tongue dorsum and tongue tip affected the acoustic output of a target segment. The authors found that, compared to an intruding tongue tip, an intruding tongue dorsum influenced the resulting spectral shape the most. Pouplier and Goldstein (2005) confirmed that an intended

/t/ with an intruding tongue dorsum was identified less consistently than an intended /k/ with an intruding tongue tip.

5.3.2 Theoretical contributions: Articulatory phonology and Task

Dynamics

5.3.2.1 Co-articulatory constraints

Articulatory Phonology (Browman & Goldstein, 1989; Browman & Goldstein, 1990) and Task

Dynamics (Saltzman & Byrd, 2000; Saltzman & Munhall, 1989) provide a framework for interpreting the results from the series of studies reported in this thesis. The increasing numbers of intrusions and reductions at the end of trials, especially at a higher speaking rate, have been

198 explained within this framework as reflecting a tendency to strengthen gestural coupling

(Goldstein et al., 2007; Pouplier & Goldstein, 2013). The hypothesis formulated in this dissertation was that gestural coupling strength was affected by coarticulatory constraints in such way that divergent patterns of intrusions and reductions would result. Gestures differ in their degree of co-articulatory resistance and aggressiveness, affecting entrainment differently.

In this line of reasoning, it was predicted that the number of intrusions of the tongue articulators

(tongue tip as well as tongue dorsum) during a bilabial closure would differ from the number of intrusions of the tongue dorsum during a tongue tip constriction and vice versa. This prediction was based on the fact that tongue gestures are more resistant to coarticulation than bilabial gestures due to (bio-) mechanical constraints. The findings revealed that the tongue dorsum intruded more frequently during a bilabial closure than vice versa. However, the tongue tip did not intrude more frequently during a bilabial closure.

I suggest an alternative explanation for the bias for the tongue dorsum to intrude, which fits well within AP. The tongue dorsum is the main articulator engaged in producing velar constrictions as well as in producing vowels. In addition, onset consonants and following vowels are coupled in phase with each other (Goldstein, Byrd & Saltzman, 2006; Nam, Goldstein & Saltzman,

2009). In the case of a non-target tongue dorsum position during a labial or tongue tip constriction (and consequently a higher jaw position), contradictory requirements are imposed on the tongue dorsum. Whereas this articulator should not be activated for the ongoing alveolar/labial consonant production, it is, however, exposed to coupling forces to stabilize the coordination pattern and at the same time it is required to be activated to produce the following vowel. The latter requirement in particular will make it more likely for the tongue dorsum to be co-activated during consonant production. Studies have shown that activating a limb results in more stable coordination between limb movements (Ridderikhoff, Peper & Beek, 2007). In the

199 context of speech, the tongue dorsum as a vowel articulator will become activated at the same time as the target consonant articulator and this activation is reinforced by the fact that such activation also supports coordination stability, in the form of an intrusion.

The finding that a low vowel context resulted in more intrusions overall can possibly be explained by an entraining jaw: the jaw is the main articulator in controlling the height of a vowel (Keating, Lindblom, Lubker, & Kreiman, 1994). The jaw is thus in a lower position for low vowels. If especially the jaw is entraining resulting in intrusions, the jaw will move to a higher position. Consequently, it is suggested that the tongue dorsum position for a following lower vowel is affected more than a following higher vowel, resulting in more measured intrusions in a low vowel context.

Why the tongue dorsum intrudes more frequently in a front vowel context than the other articulators is harder to explain. Physical properties of the individual articulators have to be considered here. Tongue dorsum intrusions occur in p#t and t#p contexts in words such as cop#top and cot#pot. Both p#t and t#p are likely realized with a relatively high jaw position. In order to prevent intrusions, the tongue dorsum has to be able to assume a low position. During an alveolar or bilabial closure, it is likely easier to accomplish a low position of the tongue dorsum in a back vowel context, because the point closest to the condyle is affected the least by an elevated jaw (Mooshammer, Hoole & Geumann, 2007). A fronted tongue dorsum, however, has more limited space to move and is more constrained biomechanically (Farnetani &

Recasens, 1999) resulting in a less flexible articulatory state to correct for intrusions. In this regard, Iskarous, Fowler, and Whalen (2010) showed that back vowels exhibited more motion than front vowels during velar and alveolar constrictions. In addition, they showed that a constriction of the tongue tip involves a frontward and upward movement of the tongue back to

200 assist the tongue tip in constricting the vocal tract. They suggested that this synergy of the tongue body and tongue tip in forming a tongue tip constriction is making the tongue body more resistant to coarticulatory influences. The more forward and higher the tongue body is, the more resistant and the more aggressive the tongue dorsum is. This can explain the bias for tongue dorsum intrusions in a front vowel context: the articulatory constraint on the tongue body in addition to limited motion of the tongue dorsum in front vowel context, makes it harder to prevent intrusions. Thus coarticulatory resistance and aggressiveness of the tongue dorsum in front vowel context results in more intrusions of this articulator. Adding this constraining factor to the positive reinforcement of efferent information for tongue dorsum activations in general, which are planned for the vowel, results in more intrusions of this articulator in a front vowel context. The results thus show that coarticulatory resistance and aggressiveness, as defined in

Articulatory Phonology, can explain the asymmetries in intrusions regarding the tongue dorsum in front vowel context.

The only model, besides AP and Task Dynamics, which currently can explain the gradual nature of errors is the cascading model, mentioned in the introduction and throughout the following chapters (Goldrick & Blumstein, 2006; McMillan & Corley, 2010; Pouplier & Goldstein, 2010).

This model explains gradual errors as resulting from several partially activated phonological representations that cascade down to the level at which they are executed, leaving so-called phonetic "traces" when a resulting speech segment is produced. However, whereas this model can explain intrusions as a result of competing phonological representations, it fails to account for the finding that speaking rate affected the number of errors and that errors built up over the course of a trial (see for a discussion Goldrick & Chu, 2013; Pouplier & Goldstein, 2013).

Additionally, the vowel related asymmetries, which originate from co-articulatory constraints, can not be explained within this framework. The model does not predict how context affects

201 these translated phonological representations at the level of articulatory dynamics (see also

Pouplier & Goldstein, 2010). Thus, as has been concluded in chapter 3, the AP model seems the most fruitful approach in explaining the origin and function of intrusions and reductions in the context of reiterated speech.

5.3.2.2 Auditory information

The finding from Study 3 that the presence of auditory information alone does not stabilize speech movements is in accordance with the assumption in Articulatory Phonology that task specific target constrictions in the form of gestures are the immediate goals of speech production

(e.g., Fowler, 2007). These gestures directly structure the acoustic speech output. It was already mentioned earlier that intrusions affect the acoustic output to varying degrees depending on the nature of the intruding gesture (Marin, et al., 2010). Marin et al. (2010) found that the intruding tongue dorsum affects the acoustic spectrum in particular. When a tongue dorsum intruded during a /t/, the higher frequency range was lowered, resulting in a flatter spectrum, comparable with a spectrum for /k/ realisations. Pouplier and Goldstein (2005) showed that speakers indeed identified a /t/ with intruding tongue dorsum less consistently than a /k/ with intruding tongue tip. Yet, it is exactly the tongue dorsum in front vowel context that is intruding the most as shown in the findings of the second study. If speakers had used auditory targets, they likely would have controlled these tongue dorsum intrusions more stringently. In addition, speakers would have been able to control articulation more strictly in the presence versus absence of auditory information. However, the masking study showed the opposite: speakers allowed for more intrusions in the unmasked condition. One could speculate that, as long as the correct gesture can be retrieved from the acoustic output, speakers ignore intrusions. It might be more efficient to allow for a certain amount of intrusions. One could argue that it takes more energy to

202 suppress the tendency for articulatory movements to stabilize. In order to evaluate if the produced acoustic output reflects the underlying gestures, however, speakers have to use auditory information to some degree (Fowler, 2007). With simple repetitions of word pairs, evaluating the degree to which underlying gestures are faithfully realized is not a difficult task.

Tactile feedback provides the speaker with the necessary information about whether a planned consonant constriction by a set of target articulators is correct. In masked speech, this information combined with proprioceptive information stabilizes speech movement coordination.

The current data show that coupling between the gestures can be strengthened via a feedback loop fed by proprioceptive or tactile information. Saltzman, Lӧfqvist, Kay, Kinsella-Shaw &

Rubin (1998) showed evidence for the influence of a feedback loop from the inter-articulatory level back to the gestural level. According to Alais et al. (2010), different sensory modalities work simultaneously and supplement each other when processing sensory information. This means that, when sensory information from different sources matches, the neuronal response will be strong. When contradictory information is available, the response will be weaker. In addition, certain sensory modalities are stronger than others, or are given more attention (Alais, et al., 2010). One can argue that auditory information in speech production is important for monitoring and correcting speech and thus is given priority over other modalities. If many intrusions are not perceived as incorrect speech segments, as has been shown by Goldstein and

Pouplier (2005), and tactile information from the target articulations is correct as well, auditory and tactile information are matching and proprioceptive information might be ignored to a certain extent, resulting in more intrusions. When auditory information is not available, speakers automatically switch to monitoring via the remaining modalities, such as proprioceptive and tactile. In such cases, the tactile information confirms that the constriction for a target is correct.

203

Proprioceptive information from the non-target tongue positions, which is now given higher priority because of the lack of auditory information, is entrained with efferent information that dictates that the non-target articulator should not be activated; this articulator is only activated once during the two words, specifically, in the other word. This information thus suppresses intrusions more effectively and this may have led to the lower rate of intrusions in the masked condition. Thus, the task dynamic model can be improved by including several feedback loops, accounting for the finding that speakers integrate multisensory modalities to stabilize speech coordination, from the interarticulator and acoustic output level possibly back to the gestural level.

5.3.3 Methodological contributions

In this thesis, we have provided an alternative approach to the already existing approaches for describing variability of articulatory movements, surfacing as intrusions and reductions. The findings from the first study strengthened the validity of the approach taken in our studies.

Instead of using non-alternating trials as a baseline, the findings of the first study indicated that it is better to use the experimental trials to define intrusions and reductions. These alternating stimuli contain well-defined target and non-target articulations for a given articulator without the biomechanical and co-articulatory confounds found in the production of non-alternating stimuli. The second study confirmed many of the effects previously found by Pouplier and colleagues, indicating that our approach is valid. These findings add an alternative way of analysis to the existing literature (e.g., McMillan & Corley, 2010; Pouplier, 2003; Poupier,

2008), on the definition of intrusions and reductions and the way in which to capture co- articulatory influences. In addition, this alternative way of defining outliers is successful in

204 revealing that speakers switch between different sensory modalities, such as auditory, tactile and proprioceptive, when stabilizing movement coordination.

5.3.4 Future directions and limitations

As with all research studies, this study generated more questions than have been answered. I will suggest some suggestions for future studies that can build on the current findings.

A first factor which could be examined in more detail in future studies is the magnitude of intrusions and reductions. The analyses in the second and third study in this dissertation involved a categorical approach in the sense that we used statistical criteria to distinguish

“outliers” from “normal” variation: neither study however, examined more directly the size of intrusions in the different conditions. In this sense, it would be interesting to examine if speakers control the magnitude of intrusions and reductions to prevent them from becoming large enough to distort the acoustic output of the intended gesture, and if speakers control articulation differently within a trial after an audible intrusion has occurred. This last notion would support the results of Lackner and Tuller (1979), who found fewer vowel errors in a noise masked condition when this condition followed an unmasked condition than when this masked condition was offered first. In addition a question, such as whether intrusions are getting more pronounced towards the end of a trial, possibly suggesting a “failure” of the speaker to control stabilizing forces, could be addressed this way. The current data set contains information about the magnitude of intrusions and reductions, which makes comparison of the individual sizes of intrusions and reductions in different conditions is possible.

The analyses in the present two studies used an approach close to earlier research on intrusions and reductions (see e.g., Goldstein et al., 2007). A statistical criterion was employed to capture

205 outliers from a distribution of movement ranges of articulators. However, studies which examine mechanisms of coordination dynamics typically use other measures such as “relative phase” between articulatory movements and “cross-spectral mean coherence”. The relative phase measures can describe movement coordination continuously over time (Namasivayam &

Van Lieshout, 2008; Van Lieshout et al., 2007). Accordingy, an additional line of research is to examine the intrusions and reductions in a continuous way in order to capture the gradual nature and possible underlying patterns more efficiently. “Cross-spectral coherence” describes the association between pairs of frequency components across articulators or gestures

(Namasivayam & Van Lieshout, 2008; Van Lieshout et al., 2007). This indicates how strong the articulators are entrained.

A third factor which should be addressed in future studies involves horizontal movements of the articulators. The present study determined intrusions and reductions solely on variability distributions of vertical movement ranges. As we revealed, the front vowel context caused a bias in tongue dorsum intrusions. Data on horizontal movements of the tongue articulators could possibly show interesting forces, which might even cause an outlier in the vertical direction, or vice versa.

Finally, a frequently heard argument against the use of repetitive speech in an experimental setting is that it does not resemble natural speech. The question thus arises if intrusions and reductions, typical for repetitive speech, show up in everyday language and speech. Shattuck-

Hufnagel (1992) addressed this issue in a series of traditional speech error studies and found similar error patterns in sentences, phrases and words, revealing that errors elicited in an experimental constrained setting can reflect patterns in natural language. It would be interesting to study how auditory information, for example, affects intrusions and reductions with more

206 complicated stimuli, such as experimental word pairs embedded in sentences. This way, more light can be shed on the notion that auditory information plays a major role in maintaining a correct acoustic output over a longer time period (e.g., Borden, 1979; Lackner & Tuller, 1979;

Neilson & Neilson, 1987; Postma, 2000).

To summarize, the studies in this dissertation revealed that co-articulatory constraints influence intrusions and reductions in a systematic way. In addition, more light has been shed on the role of sensory modalities to control articulatory movement coordination. It can be concluded that the act of speaking is similar to other types of movement control in more than one way: the work in this dissertation has shown that individual characteristics of articulatory movements and availability of sensory modalities can influence the autonomous mechanism of entrainment.

5.4 References

Alais, D., Newell, F. N., & Mamassian, P. (2010). Multisensory processing in review: from

physiology to behaviour. Seeing and Perceiving, 23(1), 3-38.

Borden, G. (1979). An interpretation of research on feedback interruption in speech. Brain and

Language, 7, 307-319.

Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units.

Phonology, 6, 201-251.

Browman, C. P., & Goldstein, L. (1990). Representation and Reality: Physical Systems and

Phonological Structure. Haskins Laboratories Status Report on Speech Research,

105/106, 83-92.

207

De Jong, K., Beckman, M., & Edwards, J. (1993). The interplay between prosodic structure and

coarticulation. Language and Speech, 36(2, 3), 197-212.

Dell, G. S., Burger, L., & Svec, W. (1997). Language production and serial order: A functional

analysis and a model. Psychological Review, 104(1), 123-147.

Fougeron, C., & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains.

Journal of the Acoustical Society of America, 101(6), 3728-3740.

Fowler, C. A. (2007). Speech production. In M. G. Gaskell (Eds.), The Oxford Handbook of

Psycholinguistics (pp. 489-501). Oxford University Press.

Goldrick, M., & Blumstein, S. E. (2006). Cascading activation from phonological planning to

articulatory processes: Evidence from tongue twisters. Language & Cognitive Processes,

21(6), 649-683.

Goldrick, M., & Chu, K. (2013). Gradient co-activation and speech error articulation: comment

on Pouplier and Goldstein (2010). Language and Cognitive Processes, Advance online

publication. doi:10.1080/01690965.2013.807347.

Goldstein, L., Byrd, D., & Saltzman, E. L. (2006). The Role of Vocal tract gestural action units

in understanding the evolution of phonology. In Michael Arbib (Eds.), From Action to

Language: The Mirror Neuron System (pp. 215-249). Cambridge university press.

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E. L., & Byrd, D. (2007). Dynamic action units

slip in speech production errors. Cognition, 103(3), 386-412.

Iskarous, K., Fowler, C. A., & Whalen, D. (2010). Locus equations are an acoustic expression of

articulator synergy. Journal of the Acoustical Society of America. 128(4), 2021-2032.

208

Keating, P. A., Lindblom, B., Lubker, J., & Kreiman, J. (1994). Variability in jaw height for

segments in English and Swedish VCVs. Journal of Phonetics, 22, 407-422.

Kelso, J. A. S., & Zanone, P. G. (2002). Coordination dynamics of learning and transfer across

different effector systems. Journal of Experimental Psychology: Human Perception and

Performance, 28(4), 776-797.

Lackner, J., & Tuller, B. (1979). Role of efference monitorng in the detection of self-produced

speech errors. In M. Cortese & C. Edward (Eds.), Sentence Processing: Psycholinguistic

Studies Presented to Merrill Garrett (pp. 281-294). Hillsdale, NJ: Lawrence Erlbaum

Associates.

Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In W J.

Hardcastle & A. Marchal (Eds.), Speech production and Speech Modelling (pp. 403-439).

The Netherlands: Kluwer Academic Publishers.

MacKay, D. (1971). Stress Pre-Entry in Motor Systems. The American Journal for Psychology,

84(1), 35-51.

Marin, S., Pouplier, M., & Harrington, J. (2010). Acoustic consequences of articulatory

variability during productions of /t/ and /k/ and its implications for speech error research.

Journal of the Acoustical Society of America, 127(1), 445-461.

McMillan, C. T., & Corley, M. (2010). Cascading influences on the production of speech:

Evidence from articulation. Cognition, 117, 243-260.

Mooshammer, C., & Geng, C. (2008). Acoustic and articulatory manifestations of vowel

reduction in German. Journal of the International Phonetic Association, 38(2), 117-136.

209

Mooshammer, C., Hoole, P., & Geumann, A. (2007). Jaw and order. Language and Speech,

50(2), 145-176.

Namasivayam, A. K., & Van Lieshout, P. (2008). Investigating speech motor practice and

learning in people who stutter. Journal of Fluency Disorders, 33, 32-51.

Namasivayam, A. K., & Van Lieshout, P. (2011). Speech motor skill and stuttering. Journal of

Motor Behavior, 43(6), 477-489.

Namasivayam, A. K., Van Lieshout, P., McIlroy, W., & de Nil, L. (2009). Sensory feedback

dependence hypothesis in persons who stutter. Human Movement Science, 28, 688-707.

Neilson, M., & Neilson, P. (1987). Speech motor control and stuttering: a computational model

of adaptive sensory-motor processing. Speech Communications, 6, 325-333.

Parrell, B. (2012). The role of gestural phasing in Western Andalusian Spanish aspiration.

Journal of Phonetics, 40(1), 37-45.

Peper, C. E., & Beek, P. J. (1998). Distinguishing between the effects of frequnecy and amplitue

on interlimb coupling in tapping a 2:3 polyrhythm. Eperimental Brain Research, 118, 78-

92.

Peper, C. E., & Beek, P. J. (1999). Modeling rhythmic interlimb coordination: The roles of

movement amplitude and time delays. Human Movement Science, 18, 263-280.

Postma, A. (2000). Detection of errors during speech production: A review of speech

monitoring models. Cognition, 77, 97-131.

Pouplier, M., & Goldstein, L. (2010). Intention in articulation: Articulatory timing in alternating

210

consonant sequences and its implications for models of speech production. Language

and Cognitive Processes, 25(5), 616-649.

Pouplier, M., & Goldstein, L. (2013). The relationship between planning and execution is more

than duration: response to Goldrick & Chu. Language and Cognitive Processes, Advance

online publication. DOI:10.1080/01690965.2013.834063

Pouplier, M. (2003). Units of Phonological Encoding: Empirical evidence (Unpublished

doctoral dissertation), Yale University, New Haven.

Pouplier, M. (2007). Tongue kinematics during utterances elicited with the SLIP technique.

Language and Speech, 50(3), 311-341.

Pouplier, M. (2008). The role of a coda consonant as error trigger in repetition tasks. Journal of

Phonetics, 36, 114-140.

Pouplier, M., & Goldstein, L. (2005). Asymmetries in the perception of speech production

errors. Journal of Phonetics, 33(1), 47-75.

Ridderikhoff, A., Peper, C. E., & Beek, P. J. (2007). Error correction in bimanual coordination

benefits from bilateral muscle activity: evidence from kinesthetic tracking. Experimental

Brain Research, 181, 31-48.

Saltzman, E. L., & Byrd, D. (2000). Task-dynamics of gestural timing: Phase windows and

multifrequency rhythms. Human Movement Science, 19, 499-526.

Saltzman, E. L., Lӧfqvist, A., Kay, B., Kinsella-Shaw, J., & Rubin, P. (1998). Dynamics of

intergestural timing: A perturbation study of lip-larynx coordination. Experimental Brain

Research, 123(4), 412-424.

211

Saltzman, E. L., & Munhall, K. G. (1989). A Dynamical Approach to Gestural Patterning in

Speech Production. Haskins Laboratories Status Report on Speech Research, 1(4), 333-

382.

Shattuck-Hufnagel, S. (1992). The role of word structure in segmental serial ordering.

Cognition, 42, 213 259.

Stearns, A. (2006). Production and Perception of Place of Articulation Errors (Unpublised

master’s thesis), College of Arts and Sciences, University of South Florida.

Stemberger, J. (1991). Apparent anti-frequency effects in language production: The addition

bias and phonological underspecification. Journal of Memory and Language, 30, 161-185.

Temprado, J., Chardenon, A., & Laurent, M. (2001). Interplay of biomechanical and

neuromuscular constraints on pattern stability and attentional demands in a bimanual

coordination task in human subjects. Neuroscience Letters, 303, 127-131.

Treffner, P., & Peter, M. (2002). Intentional and attentional dynamics of speech-hand

coordination. Human Movement Science, 21, 641-697.

Van Lieshout, P., Hulstijn, W., & Peters, H. (2004). Searching for the weak link in the speech

production chain of people who stutter: A motor skill approach. In B. Maassen, R. Kent,

H. F. M. Peters & W. Hulstijn (Eds.), Speech Motor Control in Normal and Disordered

Speech (pp. 313-356). Oxford: University Press.

Van Lieshout, P., & Neufeld, C. (2014). Coupling dynamics interlip coordination in lower lip

load compensation. Journal of Speech, Language, and Hearing Research, 57, 597-615.

212

Williamson, M. M. (1998). Neural control of rhythmic arm movements. Neural Networks, 11(7-

8), 1379-1394.

Zanone, P., Monno, A., Temprado, J., & Laurent, M. (2001). Shared dynamics of attentional

cost and pattern stability. Human Movement Science, 20(6), 765-789.