<<

Variable Reduction in Mexico City Spanish

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Meghan Frances Dabkowski

Graduate Program in Spanish & Portuguese

The Ohio State University

2018

Dissertation Committee

Dr. Rebeka Campos-Astorkiza, Advisor

Dr. Terrell Morgan

Dr. Fernando Martínez-Gil

Copyrighted by

Meghan Frances Dabkowski

2018

ABSTRACT

This dissertation focuses on variable vowel reduction in Mexico City Spanish, a salient feature of the pronunciation of this dialect in which a word like tomates “tomatoes” may be variably realized as something closer to [to.ma.ts] or [to.ma.te̥ s], with a shortened, voiceless, or weakened final vowel. My research builds on studies of vowel reduction in other languages and varieties, and places within the typology of languages and varieties that variably reduce in this way. My investigation of the phenomenon is the first to examine acoustic data to (i) understand the acoustic properties of these reduced vowels, (ii) describe and categorize them, and (iii) analyze their patterning with regard to linguistic and social factors.

To investigate this issue, I conducted fieldwork onsite in Mexico City in 2015 and

2016, and recorded speech samples with 73 native speakers, women and men from diverse socioeconomic backgrounds, between the ages of 21 and 81. The recordings include a sociolinguistic interview designed to elicit spontaneous informal conversational speech. Approximately 160 vowel tokens were acoustically analyzed for each of 40 of those participants using Praat (Boersma & Weenink 2016). For all vowels not adjacent to another vowel or glide, I measured the duration as well as the duration of full modal voicing within the segment, for a total of 6,504 tokens. Along with the results

ii

from the acoustic analysis, each token was coded for target vowel, surrounding segmental context, , position relative to lexical stress, type, word position, speaker age, gender, and socioeconomic status, in order to execute statistical models that test the relationships between linguistic and social factors and vowel reduction.

My findings from the acoustic analysis indicate that various types of reduction in the articulation of vowels occur, including a range of voice weakening, including devoicing, and weakened/breathy voicing, as well as extreme shortening. The findings from the inferential statistical analyses indicate that stress, position relative to stress, preceding and following contexts, and target vowel all contribute to the likelihood of a vowel’s reduction, while the social factors examined here do not. Voice weakening affects all vowels at relatively similar rates, and is conditioned by preceding voiceless consonants, following voiceless consonants, and following pauses, and is most frequent in post-tonic position. Shortening affects all vowels except /a/ and is conditioned by preceding and following voiceless consonants, and following pauses, and is most frequent in pre-tonic position and unstressed monosyllabic words.

I argue that these results for linguistic factors support an articulatory model of speech in which the relative timing of articulatory gestures results in their overlapping, and voice weakening or extreme shortening are the consequence. The findings regarding social factors suggest that the vowel reduction observed in this variety is a case of stable variation, rather than a change in progress. A major contribution of this research is the understanding of voice weakening and shortening as two complementary strategies that

iii

both contribute to vowel reduction in this variety by targeting different prosodic positions.

iv

DEDICATION

For my mother, who always believed in me.

v

ACKNOWLEDGMENTS

I would like to thank everyone who supported me throughout my graduate studies, and especially through the research and writing of this dissertation. Above all others, I offer my unending gratitude to my advisor, Rebeka Campos-Astorkiza, who consistently goes above and beyond to help students. She is an exemplary scholar, teacher, and mentor, and

I simply cannot imagine having completed this dissertation without her guidance, insightful feedback, and encouragement at every step along the way.

My other committee members have also been influential in my development.

Terrell Morgan’s enthusiasm for dialectal variation is contagious, and he has reminded me to always remain curious. I am grateful to Fernando Martínez-Gil for his helpful insights during the early development of this project, and also for his support throughout the academic job search process. Other faculty members have been influential in my development as well: Sarah Gallo, Mary Beckman, Anna Babel, Glenn Martinez,

Frederick Aldama, Lauren Squires, Scott Schwenter, Scott Kiesling, and Helen Stickney each contributed in no small way to my experience as a graduate student, and to my development as an academic and active member of my various communities.

The data collection would not have been possible without the help of Ariadna

Martínez González and her entire family, as well as Miriham Miranda and family, all of

vi

whom helped to make my time in Mexico not only productive, but immensely enjoyable, and I will always appreciate that. Additionally, everyone who made time to participate in interviews and recordings cannot be thanked enough: they truly made this research possible, and I am extremely grateful to them all.

My colleagues and friends made life as a graduate student downright tolerable at times. I would never have been able to figure out statistics or file my taxes properly without Hannah Washington’s guidance. Christy Garcia’s dissertation methodology was especially helpful, but I appreciate her friendship and encouragement even more. Mary

Beaton selflessly offered motivation, brainstorming and editing support, as well as excellent yoga instruction. Becca Mason’s sense of humor, moral support, and generous dog-sitting offers were always appreciated, but became indispensable during the last big push of finishing this document. Miguel Valerio always let me know when it was time to take a break and go bowling. Many others have supported me in different ways throughout this process as well, especially Elena Costello, Nausica Marcos, Elena Foulis,

Michael Brown, Jenny Barajas, Mary Johnson, and Olivia Cosentino.

My family has always cheered me on, and their support has undoubtedly carried me to otherwise unreachable places. My parents, my brothers Colin and Brendan, my aunts Di and Maela, my uncles, and my in-laws Dave, Lucy, Pete, and Erin have always offered their support, encouragement, and understanding.

However, none of this would have been possible without the unwavering love and support of my partner Robin, who not only tolerated my absence during data collection

vii

trips, but also quite generously made my coffee nearly every day over the last few years.

Finally, I am grateful for the constant companionship of my best friend Loki, who slept in my lap for the nearly all of the work I’ve done in the last 6 years, and who never fails to remind me to take necessary breaks to get up and play. Thank you all.

viii

VITA

2002...... B.A. Spanish and Film Studies, University of Pittsburgh

2012...... M.A. Linguistics, University of Pittsburgh

Publications

Gallo, Sarah, and Meghan Dabkowski. 2018. The permanence of departure: Young Mexican immigrant students’ discursive negotiations of imagined childhoods allá. Linguistics and Education 45. 92-100.

Fields of Study

Major Field: Spanish & Portuguese

ix

TABLE OF CONTENTS

Abstract ...... ii

Dedication ...... v

Acknowledgments...... vi

Vita ...... ix

Table of Contents ...... x

List of Tables ...... xv

List of Figures ...... xviii

Chapter 1: Introduction ...... 1

1.1 Description of the phenomenon ...... 1

1.2 Focus of study and rationale...... 7

1.3 Research questions ...... 10

1.4 Organization of this dissertation ...... 10

Chapter 2: Background Literature...... 12

2.1 Introduction ...... 12

2.2 Vowel devoicing cross-linguistically ...... 15

2.2.1 Modern Greek ...... 17

x

2.2.2 Japanese ...... 21

2.2.3 Korean ...... 23

2.2.4 Montreal French ...... 26

2.2.5 Turkish ...... 29

2.2.6 Portuguese ...... 31

2.2.7 Summary of cross-linguistic findings ...... 35

2.3 Vowel reduction in Spanish ...... 35

2.3.1 Vowel reduction in Andean Spanish ...... 36

2.3.2 Vowel devoicing in Mexican Spanish ...... 51

2.4 Region of study...... 61

Chapter 3: Methodology ...... 63

3.1 Introduction ...... 63

3.2 Participant selection ...... 63

3.3 Tasks and audio recording procedures...... 64

3.3.1 The sociolinguistic interview ...... 64

3.3.2 Reading task ...... 66

3.3.3 Recording procedures and audio processing ...... 67

3.4 Data analysis ...... 67

xi

3.4.1 Envelope of variation ...... 67

3.4.2 Acoustic analysis of the data ...... 68

3.4.3 Defining the dependent variables ...... 75

3.4.3.1 Voice weakening ...... 75

3.4.4 Defining the independent variables ...... 85

3.5 Statistical analysis...... 94

Chapter 4: Results ...... 97

4.1 Introduction ...... 97

4.2 Distribution of data ...... 98

4.2.1 Voice weakening...... 99

4.2.2 Shortening ...... 115

4.3 Inferential statistical models...... 127

4.3.1 Categorical modeling of voice weakening ...... 129

4.3.2 Continuous modeling of voice weakening ...... 137

4.3.3 Categorical modeling of shortening ...... 143

4.3.4 Continuous modeling of vowel duration ...... 149

4.4 Vowel quality ...... 151

4.5 Summary of results ...... 157

xii

Chapter 5: Discussion ...... 160

5.1 Shortening and voice weakening as complementary strategies ...... 160

5.2 Conditioning factors for voice weakening ...... 163

5.2.1 Preceding/following context ...... 163

5.2.2 Stress and position relative to stress ...... 173

5.2.3 Target vowel and word position ...... 176

5.2.4 Social factors ...... 185

5.3 Conditioning factors for shortening ...... 186

5.3.1 Preceding/following context ...... 186

5.3.2 Stress and position relative to stress ...... 192

5.3.3 Target vowel ...... 194

5.3.4 Social factors ...... 195

5.4 Vowel quality ...... 198

5.5 Implications ...... 200

5.5.1 Nature of vowel reduction in MCS ...... 200

5.5.2 Previous formal analyses ...... 201

5.5.3 Proposed analysis ...... 204

Chapter 6: Conclusion...... 214

xiii

6.1 Conclusions ...... 214

6.2 Contributions ...... 215

6.3 Methodological considerations...... 216

6.4 Limitations of this study ...... 218

6.5 Further research ...... 219

References ...... 222

Appendix A. Participant Information ...... 234

Appendix B: Sample Questions for Interview, Spanish/English ...... 237

Appendix C: Sentence Stimuli for Reading Task ...... 238

Appendix D: Average Duration Measurements ...... 239

xiv

LIST OF TABLES

Table 2.1. The Spanish vowel inventory ...... 13

Table 2.2. Duration of each vowel in milliseconds (adapted from Morrison & Escudero

2007, Figure 1) ...... 14

Table 2.3. Delforge’s categories of reduced vowels ...... 42

Table 4.1. Overall rates of voice weakening ...... 100

Table 4.2. Rates of voice weakening by stress ...... 100

Table 4.3. Frequency of occurrence and voice weakening rates of each target vowel ... 107

Table 4.4. Factors and levels that were collapsed ...... 129

Table 4.5. Factors predicting categorical voice weakening ...... 133

Table 4.6. Results of main effects of beta-inflated regression model: percent full voicing

...... 140

Table 4.7. Results for the nu parameter of beta-inflated regression model: percent full voicing ...... 141

Table 4.8. Results for the tau parameter of beta-inflated regression model: percent full voicing ...... 143

Table 4.9. Results from the mixed effects logistic regression model for shortening ...... 146

Table 4.10. Results from the mixed effects linear regression model for shortening ...... 151

xv

Table 4.11. Mean F1 values in Hz (SD in parentheses) for comparisons between weakened and non-weakened, shortened and non-shortened vowels, with significant differences ...... 154

Table 4.12. Mean F2 values in Hz (SD in parentheses) for comparisons between weakened and non-weakened vowels with significant differences...... 154

Table 4.13. Significant predictors of vowel reduction in MCS ...... 159

Table 5.1. Weakening rates for surrounding contexts ...... 164

Table 5.2. Voice weakening rates by preceding consonant ...... 166

Table 5.3. Voice weakening rates by following consonant ...... 169

Table 5.4. Voice weakening by vowel, word-medial position MCS (unstressed vowels only)...... 179

Table 5.5. Voice weakening rates by vowel, word-final position MCS ...... 179

Table 5.6. Voice weakening rates by vowel, word-medial position in Delforge (2009) 179

Table 5.7. Voice weakening by vowel, sandhi (voiceless contexts only) MCS ...... 180

Table 5.8. Voice weakening by vowel, sandhi (voiceless contexts only) in Delforge

(2009) ...... 181

Table 5.9. Word-final vowels closed by /s/, unstressed only MCS ...... 182

Table 5.10. Word-final vowels closed by /s/, Delforge Cuzco ...... 182

Table 5.11. Voice weakening rates by vowel, word-final open pre-pausal position MCS

(unstressed vowels only) ...... 183

xvi

Table 5.12. Voice weakening rates by vowel, word-final open pre-pausal position in

Delforge (2009) ...... 183

Table 5.13. Shortening rates for surrounding contexts ...... 187

Table 5.14. Shortening rates by preceding consonant ...... 190

Table 5.15. Shortening rates by following consonant ...... 192

Table 5.16. Average duration values (in ms) for stressed and unstressed vowels ...... 193

Table 5.17. Percentage of voice weakening and voice weakening rates for /o/ in velar contexts ...... 211

Table 5.18. Percentage of voice weakening and voice weakening rates for /o/ in dental/alveolar contexts ...... 212

Table 5.19. Percentage of voice weakening and voice weakening rates for /e/ in dental/alveolar contexts ...... 212

xvii

LIST OF FIGURES

Figure 1.1. Spectrogram and waveform showing complete devoicing in /a/ of hermosa,

“beautiful” ...... 4

Figure 1.2. Spectrogram and waveform showing weakened voicing in the final /a/ of casa, “house” ...... 5

Figure 3.1. Praat TextGrid example of the measuring and labeling of vowels ...... 72

Figure 3.2. Spectrogram and waveform for a partially devoiced token in the final /a/ of persona, “person” ...... 73

Figure 3.3. Spectrogram and waveform for a partially weakened token in the final /o/ of dos años “two years" ...... 74

Figure 3.4. Full modal voicing in each of three vowels of temblores “earthquakes” ...... 77

Figure 3.5. Partial weakening/breathiness in final /o/ of dos años, “two years” ...... 78

Figure 3.6. Partial devoicing in final /a/ of personas, “people” ...... 79

Figure 3.7. Complete weakening/breathiness in final /a/ of pesaditas, “bothersome” ..... 80

Figure 3.8. Complete devoicing in final /o/ of actos, “acts” ...... 81

Figure 3.9. Apparent deletion in initial /o/ of ochenta, “eighty” ...... 82

Figure 3.10. A shortened vowel token, in ese tipo de, “this type of”...... 84

Figure 4.1. Individual rates of voice weakening ...... 102

xviii

Figure 4.2. Individual rates of voice weakening for unstressed vowels only ...... 103

Figure 4.3. Voice weakening type by stress ...... 105

Figure 4.4. Voice weakening by position relative to primary word stress ...... 106

Figure 4.5. Voice weakening rates by target vowel ...... 108

Figure 4.6. Voice weakening rates by following (left panel) and preceding (right panel) context ...... 110

Figure 4.7. Voice weakening rates by syllable type ...... 111

Figure 4.8. Voice weakening rates by word position ...... 112

Figure 4.9. Voice weakening rates by age group of the speaker ...... 113

Figure 4.10. Voice weakening rates by speaker’s education level ...... 114

Figure 4.11. Voice weakening rates by gender of speaker ...... 115

Figure 4.12. Individual rates of shortening overall ...... 116

Figure 4.13. Individual rates of shortening for unstressed vowels only ...... 117

Figure 4.14. Vowel duration in ms, shown for each target vowel, separated by stress .. 119

Figure 4.15. Vowel shortening rates by target vowel ...... 120

Figure 4.16. Vowel shortening rates by position relative to lexical stress ...... 121

Figure 4.17. Vowel shortening rates by following (left panel) and preceding (right panel) context ...... 122

Figure 4.18. Vowel shortening rates by syllable type ...... 123

Figure 4.19. Vowel shortening rates by word position ...... 124

Figure 4.20. Vowel shortening rates by age group of speaker ...... 125

xix

Figure 4.21. Vowel shortening rates by speaker’s highest level of formal educational attainment ...... 126

Figure 4.22. Vowel shortening rates by gender of speaker ...... 127

Figure 4.23. Random forest showing variable importance of predictor variables for voice weakening ...... 131

Figure 4.24. Conditional inference tree showing interactions between factors that predict voice weakening ...... 136

Figure 4.25. Histogram of percent full voicing...... 139

Figure 4.26. Random forest showing variable importance of predictor variables for shortening ...... 144

Figure 4.27. Conditional inference tree showing interactions between factors that predict shortening ...... 148

Figure 4.28. Female and male F1 and F2 charts for unstressed /a/ based on voice weakening ...... 155

Figure 4.29. Female and male F1 and F2 charts for unstressed /o/ based on voice weakening ...... 155

Figure 4.30. Female F1 and F2 chart for /e/ based on shortening...... 156

Figure 4.31. Female F1 and F2 chart for /o/ based on shortening ...... 157

Figure 5.1. Vowel duration by following context and target vowel ...... 189

Figure 5.2. Vowel duration by education level and stress ...... 196

Figure 5.3. Vowel duration by education level and target vowel ...... 197

xx

Figure 5.4. Illustration of CV COORDA and VC COORDA, from Delforge (2008b: 150)

...... 207

xxi

CHAPTER 1: INTRODUCTION

1.1 Description of the phenomenon

This dissertation considers the variable production of vowels in the Spanish spoken in

Mexico City. Spanish vowels have been described in the literature as showing remarkable stability, and when compared to consonants, less subject to changes in their quality, duration, or voicing based on regional or social variation (Hualde 2005, Quilis 1981,

Quilis & Esgueva 1983). However, there is a growing body of evidence showing that they do present more dialectal differences than initially assumed, varying in quality

(Barajas 2014, Barnes 2013, Boyd-Bowman 1960, Cárdenas 1967, Flórez 1951, Garrido

2007, Hernández 2009, Holmquist 1985, 1998, 2005, Hualde 1989, Hualde and Sanders

1995, Lope Blanch 1979, Luria 1930, Navarro Tomás 1948, Oliver Rajan 2008, Sanders

1998), duration (Morrison and Escudero 2007, García 2016) and voicing (Boyd-Bowman

1952, Canellada de Vicente & Zamora Vicente 1960, Delforge 2008a, 2008b, 2009,

2012, Lipski 1990, Lope Blanch 1963, Serrano 2006). Indeed, a notable exception to

Spanish vowel stability is the evidence of shortened, deleted, or devoiced vowels, a phenomenon that occurs in other languages, and which has been documented in Mexico

City Spanish (henceforth MCS) (Boyd-Bowman 1952, Canellada de Vicente & Zamora

1

Vicente 1960, Lope Blanch 1963, Serrano 2006), as well as in several varieties of

Andean Spanish (Delforge 2008a, 2008b, 2009, 2012, Lipski 1990, Sessarego 2012a,

2012b.

Until now, the nature of vowel reduction in Mexican Spanish has not been well understood; that is, impressionistic descriptions from previous research characterize the phenomenon as primarily involving devoicing or deletion, but also mention possible shortening and centralization, or quality changes in the formant frequencies of F1 and F2 toward the center of the vowel space. Throughout this dissertation, I use the broad term vowel reduction to refer to the processes under investigation, namely voice weakening, shortening, and changes in vowel quality. Voice weakening subsumes weakened voicing, devoicing, and apparent deletion, while shortening refers to duration. Reduction may refer to both voice weakening and shortening, in addition to changes in vowel quality, including raising and centralization.

Thus far, vowel reduction in MCS has been described as a highly variable process that primarily affects unstressed vowels. Some examples taken from early research on this variety (Canellada de Zamora & Zamora Vicente 1960, Matluck 1952) are provided below in (1). In these orthographic representations, a superscript vowel represents a devoiced variant, and an apostrophe represents apparent deletion of the target vowel.

2

(1) a. dientes ‘teeth’ dientes, dient’s

b. manos ‘hands’ manos

c. casa ‘house’ casa

d. amistad ‘friendship’ amistad

All previous accounts of weakening in MCS mention that it is most frequent when the target vowel is in contact with a voiceless consonant, especially /s/, as shown above in (1a-d).

As mentioned earlier, vowel weakening has also been documented in Andean

Spanish. In that variety, vowel weakening is primarily constituted by devoicing and apparent deletion of unstressed vowels (Delforge, 2008a, 2008b, 2009, 2012) rather than centralization, which leads Delforge to refer to it as unstressed vowel devoicing (UVD) rather than reduction (UVR). Generally, across languages with vowel devoicing, a devoiced vowel is produced with minimal or no vibration of the vocal folds, that is, a glottal abduction gesture predominates throughout the duration of the vowel and air passing through the open glottis is not modified by vocal fold vibration. This is similar to the laryngeal articulation of a voiceless consonant, except that there is no obstruction of airflow in the oral cavity. Acoustically, devoiced vowels are characterized by an absence of energy at F0, or the fundamental frequency – the frequency at which the vocal folds vibrate – also called the voice bar. A partially devoiced vowel shows the articulatory

3

characteristics described above for only a portion of the vowel. In the acoustic signal, there may be evidence of one or more glottal pulses which represent the vibration of the vocal folds during a portion of the vowel, typically the portion closest to either the beginning or the end of the vowel, and thus adjacent to another segment. In weakly voiced vowels, there is a faint voice bar and a lack of clear formant structure. The images in Figures 1.1 and 1.2, taken from the MCS data analyzed here, illustrate the waveforms and spectrograms associated with complete devoicing of /a/ in hermosa, “beautiful”, and weakened voicing of /a/ in casa “house”.

Figure 1.1. Spectrogram and waveform showing complete devoicing in /a/ of hermosa,

“beautiful”

4

Figure 1.2. Spectrogram and waveform showing weakened voicing in the final /a/ of casa, “house”

Cross-linguistically, unstressed vowel devoicing is described as a variable and gradient process that primarily affects high vowels (Gordon 1998) adjacent to voiceless consonants. The process is thought to affect mainly high vowels for two reasons: their short duration, and the high tongue position involved in their production. The inherently shorter duration of high vowels as compared to mid or low vowels (Lehiste 1970) makes them more prone to overlap with glottal abduction associated with adjacent voiceless consonants. The high tongue position needed to produce a high vowel results in a close oral constriction, which increases air pressure in the oral cavity and inhibits transglottal airflow, which is needed to maintain voicing (Jaeger 1978). While high vowels are the

5

most common targets for devoicing, there are also exceptions to this pattern, in particular for Spanish, as will be shown below.

This topic is important because understanding the details about vowel reduction in

Mexico City Spanish can help to refine our knowledge and theoretical models of weakening phenomena cross-linguistically, as well as provide insight into the diversity of

Spanish varieties. Vowel reduction in this variety is particularly worthy of additional research because it has been understudied: the few previous studies that describe vowel reduction in this region of Mexico rely on researchers’ auditory impressions, and there has not yet been a systematic, instrumental acoustic analysis of the phenomenon.

Moreover, only two of the previous studies (Lope Blanch, 1963; Serrano, 2006) attempt to understand social correlates of Mexico City Spanish vowel weakening, but both fall short. Neither study provides a clear definition of nor a systematic investigation of socioeconomic status, and Serrano (2006) only examines data from a small number of speakers (twelve) and thus characterizes his results as preliminary.

My exploration of Mexico City vowel reduction is based on a systematic description of the phenomenon based on instrumental acoustic analysis. I gathered speech data by conducting interviews and a reading task with speakers that vary in age, sex, and socioeconomic status, which allows me to understand not only the linguistic correlates of vowel weakening, but also possible social correlates.

6

1.2 Focus of study and rationale

Because there has been no systematic acoustic analysis of vowel reduction in Mexico

City Spanish, this dissertation fills a gap in the descriptive literature of Spanish dialectology, and thus adds to our understanding of variation in Spanish. Additionally, because the literature on variation in Spanish pronunciation has shown such a strong tendency in the language to lenite consonants and preserve vowels, weakening processes affecting vowels are particularly interesting and deserving of in depth study. The data collected for this investigation help to inform our knowledge about Spanish and , and about the phonetics-phonology interface more generally. Furthermore, while previous studies of this process in Mexican Spanish have reported no strong correlations with sex, age, or social class, they were mostly based on subjective impressions rather than systematic and rigorous statistical analysis of the type carried out here. The results of this study shed light on the sociolinguistic stability of the process, and confirm previous suggestions that linguistic factors are more important in conditioning the voice weakening and shortening than social factors.

To obtain the speech samples needed for analysis, I recruited participants using the “friend-of-a-friend” method, which has several advantages, such as allowing the researcher to easily obtain a participant sample that varies according to certain characteristics (e.g. age, sex), and providing an a priori connection between the researcher and the participant, thereby lessening the researcher’s outsider status (Barajas

2014, Milroy & Gordon 2003). I administered and digitally recorded a reading task and

7

interview with each participant, designed to elicit speech styles that vary in terms of formality, i.e. attention paid to speech. I recorded a total of 73 speakers, in order to ensure coverage across levels for sex, age, and socioeconomic status, but data from only

40 of those speakers is included here. After completing the interviews, I used speech analysis software to segment the vowels and measure the duration of each, as well as the duration of full voicing, weak voicing, or devoicing, for each stressed and unstressed vowel token. I also measured formant values for a subset of the tokens in order to analyze possible changes in vowel quality such as centralization or raising.

Each token was coded for linguistic and social factors in order to perform the statistical analysis. Linguistic factors include target vowel, preceding and following contexts, stress and position relative to word stress, syllable type, and word position.

Social factors include: speaker gender, age, and socioeconomic status (determined by using the speaker’s level of formal educational attainment as a proxy). These factors have been determined based on the literature, in particular Delforge (2008a, and 2012) and

Lope Blanch (1963). For the statistical analysis, several inferential statistical tests and regression models were employed. For voice weakening, the factors mentioned above were considered as independent variables, with amount of full modal voicing of the vowel as the dependent variable. In one mixed effects regression model, percent full modal voicing was considered as a continuous dependent variable, and in another, I coded different categories of voicing (full voicing, weak voicing, partial voicing, complete devoicing, deletion) and consider voicing as a categorical variable. The

8

statistical analysis shows how these external factors account for the voicing of vowel segments. For shortening, two mixed effects regression models were also employed, one using vowel duration measured in milliseconds as a continuous dependent variable, and the other with a binomial categorical measure of “shortened” or “not shortened”.

I build on insights from previous impressionistic studies and observations of the phenomenon in Central Mexico (Boyd-Bowman 1952, Canellada & Vicente Zamora

1960, Lope Blanch 1963, Matluck 1952, Serrano 2006) by:

1) conducting a detailed acoustic analysis of the full range of vowel reduction,

obtaining fine phonetic details about the process(es), information which is not

only valuable in itself, but may also be used to inform phonological and

sociolinguistic theory;

2) carrying out a systematic analysis of both linguistic and social patterning, in

order to offer a more nuanced view of the factors that contribute to this variation.

In order to accomplish the above, I follow Delforge’s methodology, discussed in

Chapter 2, in a number of ways (especially in the factors investigated), and also build on it by:

1) using more sophisticated statistical methods, including models that can handle

continuous and multivalent categorical variables, non-parametric tests, and those

that take into account both fixed and random effects;

2) investigating weakening from both a categorical and continuous perspective,

since the data show Mexico City vowel reduction as a gradient process.

9

This project has several important outcomes. First, the results provide a better understanding of how linguistic factors interplay with social factors of a variable phenomenon that seems to carry neither stigma nor prestige. Second, this research places

Mexican Spanish within the typology of languages that undergo this type of vowel reduction and improves theories and models of reduction based on instrumental data. A third outcome is the development of a better understanding of the social factors relevant in Mexico City and their possible relation with the linguistic performance of speakers.

1.3 Research questions

This dissertation addresses the following research questions:

1) How is vowel reduction in MCS characterized acoustically? What are the

acoustic correlates of different types of weakening? How do these compare to

Andean Spanish and to other languages that present vowel weakening?

2) What linguistic factors condition this phenomenon in MCS?

3) What social characteristics of speakers (e.g. age, sex, socioeconomic status)

condition this phenomenon in MCS?

4) What model is most appropriate to account for this phenomenon?

1.4 Organization of this dissertation

Chapter 2 provides an overview of previous literature documenting the general stability of vowels in Spanish, as well as exceptions to that stability, i.e. vowel weakening

10

processes documented in Spanish varieties. Section 2.2 reviews the literature on vowel devoicing phenomena cross-linguistically, while 2.3 pays special attention to the critical review of the existing work on this phenomenon in Mexican and Andean Spanish. 2.4 provides background on the region of study. Chapter 3 focuses on the methodology chosen for this study, including the selection of participants in section 3.2, and the tasks and audio recording procedures used in 3.3. Section 3.4 describes the data analysis, including the acoustic analysis, as well as definitions for and discussion of the linguistic and social variables under consideration, and the statistical procedures, tests, and models employed in order to answer the research questions. Chapter 4 provides detailed exploration of the distribution of data across the various linguistic and social independent variables in 4.2, along with statistical tests modeling the contributions of predictor variables to voice weakening in 4.3, and shortening, in 4.4. Vowel quality is explored in

4.5, and a summary concludes the chapter in 4.6. A comparison of my results to previous research and discussion of the implications of these results is the focus of Chapter 5, along with a phonological accounting for the data. I review overall patterns in the data in

5.1, and discuss the factors found to influence voice weakening in 5.2, and those found to influence shortening in 5.3. I discuss the results for vowel quality in 5.4, and in section

5.5, I present the implications of this data and analysis. Chapter 6 concludes and considers areas for further research on this topic.

11

CHAPTER 2: BACKGROUND LITERATURE

2.1 Introduction

This chapter provides a survey of the literature on vowel reduction cross-linguistically, and a critical review of studies addressing vowel reduction in Andean and Mexican

Spanish varieties. In this section, I describe characteristics of vowels, and make reference to some of the literature that documents the stability of Spanish vowels, as well as some of the literature that documents exceptions to this stability. In

2.2, I review cross-linguistic research on vowel devoicing, including research on Modern

Greek, Japanese, Korean, Montreal French, Turkish, Brazilian and .

Section 2.3 focuses on previous research documenting and accounting for vowel devoicing/reduction in Spanish, for several Andean varieties, as well as Central Mexican

Spanish. In 2.4, I briefly describe the region of study, its historical importance, and demographics.

Modal vowels are characterized by a lack of oral constriction and regular vocal fold vibration. The Spanish vowel system consists of five modal vowel : /i/, /e/,

/a/, /o/, /u/; which are differentiated based on the tongue’s position in the oral cavity.

There is a three-way distinction for both the tongue’s height (high, mid, and low) and its

12

horizontal position (front, central, back), resulting in a triangle shaped vowel space, as shown in Table 2.1. These articulatory dimensions of height and horizontal position correspond to acoustic measures of F1 and F2, the first two formants. Formants are the concentrations of acoustic energy at particular frequency ranges, and can be identified in a spectrogram by the appearance of dark black bands in those regions.

Table 2.1. The Spanish vowel inventory

Front Central Back High i u Mid e o Low a

Compared to consonants, the Spanish vowel system has been described as remarkably stable (Hualde 2005, Navarro Tomás 1977, Quilis 1981, among others), and

Spanish vowels as less subject to changes in their quality based on regional and social variation. The general tendency in Spanish phonetic and phonological variation is that consonants are weakened while vowels are preserved.

Durational differences, however, have been documented, based on region

(Morrison & Escudero 2007) and whether the vowel appears in a stressed or unstressed syllable (Quilis & Esgueva 1983). Morrison and Escudero (2007) compare vowel productions by speakers to those of speakers, including an equal number of males and females for each region. The authors find significant durational differences across the two dialects, with Peninsular vowels being

13

significantly shorter. Table 2.2 shows the approximate mean duration for each vowel.

Table 2.2. Duration of each vowel in milliseconds (adapted from Morrison & Escudero

2007, Figure 1)

i e a o u Peruvian 107 95 95 92 98 Peninsular 70 69 73 69 73

Morrison and Escudero also find that Peninsular speakers’ vowels were produced with significantly lower fundamental frequencies, but no significant differences are found in the higher formants. The crucial conclusion from this study is that there are indeed dialectal differences among vowel durations, suggesting that in fact more differences might be present than traditionally assumed. Similarly, García (2016) finds significant differences between Amazonian Spanish vowels and Limeño Spanish, in that Amazonian vowels are significantly longer.

Even while affirming the generally stable nature of Spanish vowels, Navarro

Tomás (1977) admits that there may be slight differences in stressed versus unstressed vowels, but claims that any differences that may exist are likely below the perceptual level, and therefore different from, for example, the vowel reduction that takes place in stress-timed languages like English, where unstressed vowels are frequently reduced in duration and centralized to .

14

However, there is a growing body of research that describes and accounts for dialectal variation affecting the Spanish vowel system. In fact, stressed and unstressed vowels do vary across Spanish varieties in terms of quality (i.e. formant values), with unstressed vowel raising, or a lower F1 value, being documented for a number of varieties of Spanish, including Judeo-Spanish (Luria 1930), northwestern Spain

(Holmquist 1985, Barnes 2013), (Flórez 1951), Puerto Rico (Navarro Tomás

1948, Holmquist 1998, 2005, Oliver Rajan 2007, 2008) and Mexico (Barajas 2014,

Boyd-Bowman 1960, Cárdenas 1967, Lope Blanch 1979). Vowel raising or centralization due to vowel harmony is also documented in northern Spain (Hualde 1989) and raising or lowering is documented in eastern Andalusia (Hualde & Sanders 1995, Sanders 1998).

Differences in strategies for hiatus resolution in are described by

Garrido (2007), and by Hernández (2009) for Mexican Spanish. The above studies, along with the work on Andean and Mexican Spanish weakening and devoicing, show that there are indeed exceptions to the general tendency of Spanish to preserve vowels.

2.2 Vowel devoicing cross-linguistically

This section presents a summary of the most influential research on vowel devoicing across Modern Greek, Japanese, Korean, Turkish, Montreal French, and Brazilian

Portuguese, outlining the various empirical approaches and methodologies employed, providing major findings, and including phonetic and/or phonological explanations, according to the focus of each study’s author(s). Several of these studies couch their

15

findings within the Articulatory Phonology (henceforth AP) (Browman & Goldstein,

1989) framework, so a brief summary of the main theoretical components of AP is included here.

In the AP framework, the articulatory gesture, rather than the segment, or phonological feature, is the most basic unit of speech, and the coordination of gestures across space and time make up “constellations” that correspond to phonological structures. A gesture is defined as “an abstract characterisation of coordinated task- directed movements of articulators within the vocal tract” (p. 206), and the coordination of gestures is referred to as a gestural score. In this theory, vocal tract variables that are associated with one or more articulators and that correspond to independent task spatiotemporal dimensions make up each gesture. A language or variety may employ the same gestures, but gestures may vary in their magnitude and relative timing, giving rise to different contrasts or phonological patterns.

The gestural overlap account, which is used to explain vowel devoicing in many of the studies referenced here, is a precursor to the fully elaborated Articulatory

Phonology model described above (Browman & Goldstein 1989, 1992). The main idea is that articulatory gestures for adjacent segments may overlap or blend together, especially in fast speech. Blending occurs when gestures specified for the same articulator overlap, whereas hiding takes place when gestures specified for different articulators overlap. In the case of devoicing/voice weakening, there is blending of the glottal gestures associated with adjacent consonants and vowels. For example, a vowel between two voiceless

16

obstruents such as the final syllable in the word actos may devoice due to overlap of the glottal gestures associated with either consonant. For modal voicing in vowel production, a glottal abduction gesture with a constriction degree of “critical” is proposed, that is, a degree of constriction critical to achieve voicing. For a voiceless segment, however, a glottal adduction gesture with a wide degree of constriction would be necessary.

Variations in the timing and magnitude of the gestures associated with constriction degrees could result in a fully voiced vowel, a devoiced vowel, or a variant that lies somewhere in between. This theory and how it relates to the patterns found here for the

MCS data will be discussed further in chapter 5.

2.2.1 Modern Greek

In standard Modern Greek, high vowels /i/ and /u/ in unstressed are often severely reduced in duration, devoiced, or elided, especially when in contact with voiceless consonants and immediately following a stressed syllable. The variation in realizations of these vowels has been described phonologically as the result of optional fast speech rules (Theophanopoulou-Kontou 1973) and phonetically as a series of related stages in the same phonetic process (Dauer 1980), influenced by context.

Dauer’s focus is to show that high vowel reduction in standard Modern Greek is not purely optional, but instead that it depends on a number of linguistic factors, most notably the phonetic environment of the target vowel and its position relative to the stressed syllable. The author identifies five related stages in the process of unstressed

17

vowel reduction and describes them acoustically: 1) a full voiced vowel, with formant structure similar to a stressed vowel; 2) a short vowel (under 30 ms) with complete formant structure appearing after a liquid or a nasal and often having the auditory effect of a syllabic nasal or lateral; 3) a vowel with the presence of brief (duration unspecified) voicing but no other formant structure; 4) no voicing but energy in the F2 and F3 areas making identification of the vowel as /i/ or /u/ possible; 5) preceding consonant is released directly into following consonant, no acoustic evidence of the vowel. An example of each is provided below in (2), where the target vowel is indicated in italics in the phonological transcription as well as in the , if still present.

(2) 1. /'jinume/ ‘we become’ ['jinume]

2. /afto'kinito/ ‘automobile’ [aftɔ'ciɲ̩ tɔ] or [afto'ciɲ:tɔ]

3. /'akusa/ ‘I heard’ ['akusa]

4. /panepi'stimio/ ‘university’ [panepi̥ 'stimio]

5. / metisfo'nes/ ‘with shouts’ [mɛtsfɔ'nes]

Having identified the five stages, Dauer calculates “reduction scores” for all unstressed /i/ (but not /u/ because of low frequency of occurrence), with lower scores corresponding to the most extreme reduction. Stages 2 and 3, which represent short but fully voiced vowels, and vowels with brief voicing but no formant structure, are

18

collapsed into one category for this purpose, and Dauer is left with 4 possible scores, from 0 to 3, with 3 representing a full vowel and 0 representing complete .

She finds that an unstressed vowel is most likely to be reduced when both preceded and followed by a voiceless consonant, and exhibit even lower scores when one of the flanking consonants was part of a cluster with another voiceless consonant. As for position with regard to the stressed syllable, an unstressed /i/ shows the lowest scores in post-tonic position and the highest scores in pre-tonic position. She does not find an effect for speech rate overall, although when she compares intra-speaker rate variation she finds that when a speaker is speaking at a faster rate than normal (subjective, and relative to that speaker’s normal speaking rate), they show more reduction. Dauer’s results suggest a possible role for frequency: she mentions that reduction occurs most frequently in common verb endings and clitics, as well as in common, everyday words, but this is not further evaluated. When it comes to phrase-final position, the author reports that reduction appears to be blocked by phrase-final lengthening, although she does note that at the end of a sentence with a falling pattern, devoicing is frequent.

Dauer motivates Greek unstressed vowel reduction with a phonetic account, noting that unstressed vowels in Greek are on average two-thirds shorter than stressed vowels, and vowel duration is also affected by phonetic context, with vowels shortening somewhat when flanked by voiceless consonants. The combination of these factors with the inherent shorter duration of high vowels (Lehiste 1970) leave an extremely short (30

19

ms or less) period of time in which to begin the glottal adduction gesture necessary for voicing. Dauer invokes Lindblom’s statement that “the talker does not adjust the control of his vocal tract as [sic] fast rates to compensate for its response delay” (1963: 1775), and modifies that author's undershoot hypothesis to account for Greek devoicing by hypothesizing that “as the vowel becomes shorter, there is less and less time for the vocal folds to complete their ‘on’ and ‘off’ gestures between successive syllables. If the action of the vocal folds lags behind that of the articulators and at the same time there is an attempt to anticipate the following sound, with a very short interval in between, such as for /i/ or /u/, the voicing ‘lag’ and voicing ‘lead’ overlap” (p. 27).

Dauer’s account constitutes an important contribution to the literature on vowel weakening processes, as one of the early gestural accounts of the process, and the first consideration of factors contributing to the “optionality” of the variants for Greek.

Although her account precedes the full elaboration of the Articulatory Phonology (AP) theory (Browman & Goldstein 1989, 1992) by several years, her modification of

Lindblom’s undershoot hypothesis (1963) accords well with the predictions that the theory makes about gestural overlap. There are some differences between the facts she describes for Greek and observations in the literature about Spanish vowel weakening, but several aspects of her research are important for the current study. First, her division of the process into five related stages with distinct phonetic correlates, and her organization of these based on degree of reduction has informed my own methodology in making the divisions for my own categorical analysis of voice weakening.

20

2.2.2 Japanese

Vowel devoicing is perhaps most well-studied for Japanese. As in Greek, Japanese devoicing targets high vowels /i/ and /u/ when preceded and followed by voiceless consonants. It exhibits regional variation, being frequent in dialects of eastern Japan, and infrequent in western dialects. It was traditionally described as a phonological rule that either categorically deletes the unstressed target vowel or changes the vowel’s phonological feature specification from [+voice] to [-voice]. For example, McCawley

(1968: 127) provides the following SPE-type rule in (3) to account for the devoicing of high vowels /i/ and /u/ between voiceless consonants or word finally after voiceless consonants.

(3) [-cons +voc +dif] à [-voice] in environment [-voice]__{[-voice] #}

Another categorical rule-based account was offered by Ohso (1973) which posits that the lack of acoustic evidence for the devoiced vowel is an indication that the process is more accurately described as deletion rather than devoicing. Ohso’s proposed rule is shown in (4).

(4) V [+ high]à ∅ / [-voice] __ {[-voice] #}

21

However, the categorical rule-based account in (4) does not explain the finding in

Beckman and Shoji (1984) that the minimal pair /sikaN/ and /sukaN/ is not completely neutralized, and that the devoiced vowel is recoverable by listeners and neither (3) nor (4) account for the gradience and variability of the process.

Beckman and Shoji (1984) examine perceptual and acoustic characteristics of syllables with high vowel devoicing (or apparent deletion, as they find no acoustic trace of voicing and no durational timeslot for the target vowel) in Japanese, specifically /si/ and /su/. They find that the apparently deleted vowel left traces in the spectrum of the fricative, and thus designed two perception experiments to determine the recoverability of the apparently deleted vowel. The results of their perception tests show that speakers could use the spectral coloring in the fricative as a cue to identification of the vowel as /i/ or /u/ at better than chance rates. These findings are used to support a model of speech production in which phonological units are inherently dynamic, stating that the recoverability of the vowel “suggest[s] that a supposedly lower-level coarticulation between the fricative and the vowel can occur before a higher-level process deletes the vowel, contradicting the order implied by traditional accounts of speech as a translation of discrete phonological units” (61). This important study was foundational in advancing the view that allophonic variation may simply be a more consistent form of temporal overlapping in gestures known as coarticulation.

Fujimoto, Murano, Niimi, and Kiritani (2002) investigate dialectal variation in glottal opening patterns for voiceless consonants in order to better understand the patterns

22

for Tokyo Japanese, which exhibits high rates of devoicing, and Osaka Japanese, with low rates of devoicing. They use a photoelectric glottograph to measure glottal opening and closing gestures in the pronunciation of eight nonce words that varied the target vowel and voicing of surrounding consonants. Their participants are 3 adult male Tokyo speakers and 2 adult male Osaka speakers. They find that Tokyo speakers exhibited a significantly longer duration of glottal opening in /k/ when followed by voiced vowels in

CV syllables than Osaka speakers, indicating dialectal differences in glottal opening durations for the preceding consonant when the vowel is voiced. They also report both a longer duration and larger glottal opening for the /k/ in devoiced /kite/ versus that of the voiced variant, /kide/. They relate these articulatory features to dialect differences in devoicing.

These two studies provide important empirical evidence of coarticulation in the acoustics, perception (Beckman & Shoji 1984) and articulation (Fujimori et al. 2002) of devoiced vowels in Japanese.

2.2.3 Korean

Data from Korean has been particularly influential in advancing the gestural overlap account of vowel devoicing, because Korean contrasts three categories of voiceless consonants produced with very different glottal opening gestures hypothesized to affect the amount of coarticulation with surrounding vocalic gestures (Jun & Beckman, 1993,

1994). There is an aspirated stop, with a large glottal opening and peak opening that

23

occurs timed around the oral release; an unaspirated or lenis stop, with a medium glottal opening and peak opening timed to end shortly after the oral release; and a stiffened or fortis stop, with a small glottal opening which ends in a tight glottal seal, or stiff adduction, right before the oral release. There is also a contrast between a fortis fricative

(produced with a stiff glottis) and a plain fricative, which has a wide glottal opening whose peak is maintained longer than in the other sounds. Jun and Beckman (1993) report on the results of an acoustic experiment testing the effects of preceding stop type

(aspirated, fortis, lenis) and accentual phrase position (initial or medial) on unstressed high vowel devoicing. The authors found that a preceding aspirated stop resulted in the most vowel devoicing on the following vowel, and fortis stops resulted in the least. The lenis stop was examined in both phrase positions and found to trigger devoicing in the following vowel when occurring in accentual phrase initial position (i.e. when the word was in focus). They take this line of research further in the following study (1994), where they investigate effects of the following consonant, in addition to the preceding one.

Jun and Beckman (1994) carry out an experiment that uses 125 short constructed dialogues containing 25 target words balanced for preceding and following consonant of the target high vowel, either /i/, /u/ or, /ɨ/. They find that, as hypothesized, the most devoicing occurred following the non-fortis fricative and the aspirated stop, which are the consonants with the largest glottal openings. The authors also found that devoicing is conditioned not only by the degree of glottal opening of the preceding consonant, but also by the gesture accompanying the following consonant. They had hypothesized that a

24

larger glottal opening in either or both should result in devoicing the intervening vowel, but the results indicated that a following fortis consonant was consistently associated with more devoicing than the corresponding lenis stop or plain fricative. This finding suggests that there must be another explanation for the devoicing, given that glottal aperture alone cannot explain the patterns found.

In order to confirm and further probe the acoustic results of Jun and Beckman

(1994) with articulatory data, Jun, Beckman, Niimi, and Tiede (1998) examine the phenomenon using electromyographic activity level. Electrodes placed on muscles responsible for abduction and adduction of the vocal cords (the posterior cricoarytenoid, or the abductor muscle, and the thyroarytenoid, one of the muscles involved in adduction) measured the activity level of these muscles, as the speaker produced the same constructed dialogues referenced above for Jun and Beckman (1994). They found a similar distribution of devoiced vowels across consonantal contexts and prosodic positions as Jun and Beckman. They found that the degree of vowel devoicing is correlated with the degree of glottal opening of the adjacent consonant, that is vowels are more likely to be devoiced when the preceding consonant has a large glottal opening – though the prosodic position is also important. When the preceding consonant type is held constant, vowels are more likely to devoice when they appear in accentual phrase initial position as compared to phrase medial position. There were differences found however, based on whether the vowel was followed by a fricative or a stop, which the authors attribute to faster articulatory movements into stops. They conclude that it is not

25

just timing, but also magnitude of glottal opening gestures, and their interaction with concurrent gestures associated with oral stricture and tonal pattern. Crucially, this study, along with the previous work of Beckman and Jun (1993, 1994) provide evidence advancing the gestural overlap account and that most importantly, cannot be explained using an account that understands devoicing as a categorical phonological rule.

2.2.4 Montreal French

Cedergren (1985, 1986) investigates unstressed vowel reduction or what she calls

“perceived high vowel deletion” in Montreal French, which actually corresponds to four distinct but related acoustic phenomena: 1) vowel reduction, where vowel duration was under 30 milliseconds; 2) vowel devoicing, where the vowel segment reveals no periodic voicing activity; 3) vowel assimilation, where formant-like energy appeared within the spectral envelope of the consonant therefore indicating probable cases of vowel- consonant temporal coarticulation; and 4) vowel , where there were no acoustic traces that could be interpreted as a manifestation of an underlying vowel. The author investigates only the cases of vowel syncope, examining effects of various social and linguistic factors. Her data come from interviews with 60 speakers, ages 15 to 85, from different “levels of insertion in the linguistic marketplace”, transcribed 15 minutes and impressionistically coded vowels as “syncopated” or “present”, then “calibrated by instrumental means” (1986: 294), though Cedergren does not elaborate on how calibration was done.

26

Among social factors, she does not find any effects for speaker sex or social level, but does find that younger speakers favor syncope significantly more than older groups, while there may be a weak effect for level of education, although the effect is not linear, indicating the highest probability coefficient for speakers with only a primary education

(0.56), followed by those with a university education (0.48), followed by those with secondary education (0.43). As for linguistic factors, Cedergren divides constraining factors into three levels:

1) categorical preconditions determined by syllabic structure: The internal structure of both onset and rime limit syncope, i.e. neither branching onsets nor branching rimes allow syncope, so the only possible syllable type for syncope is a CV syllable.

2) variable output prosodic constraints on syllabic structure: Syncope is disfavored if the onset of the following syllable is complex. The author notes that although she does not thoroughly investigate the implications of this finding, it may reflect the operation of conditions on the resyllabification of stranded onsets following vowel syncope. She notes that this result, along with the constraint described in 1) related to complex onsets, are similar to the conditions previously described for schwa deletion in this and other varieties of French, which she interprets as evidence that high vowel syncope and schwa deletion may be manifestations of the same process, though she does not investigate that further here.

3) variable segmental constraints: Finally, Cedergren notes the contribution made by the voicing value of segmental context: preceding fricatives, non-coronal stops, and /l/

27

favor syncope, while following voiceless stops, nasals, voiced fricatives, /s/, and /l/ favor syncope.

The above results clarify the role played by segmental and prosodic structure in Montreal

French high vowel syncope. Cedergren uses these findings to motivate her metrical grid account of the phenomenon. She proposes that first, a universal Demibeat Alignment rule applies, assigning a beat to each syllable, then a Basic Beat Rule of Montreal French, which assigns a basic beat to each complex rime projection whose nucleus composition corresponds to a vowel higher in sonority than /ɪ, y, u, ə/, then a Second Basic Beat Rule which assigns a beat to the right-most syllable. Finally, a Main Word Stress rule assigns a beat to the right end of the lexical domain. The application of the rules at each level of the metrical grid results in a general rule that stipulates the context for deletion as any unstressed syllable, that is any syllable unaligned to a basic beat.

Cedergren’s metrical grid account seems to work relatively well for her data, although it accounts better for the prosodic constraints than the segmental constraints she finds. Additionally, it does not necessarily account for the variable nature of the process, that is, once a context for deletion has been stipulated, i.e. an unstressed CV syllable in a position other than final, there is no further explanation for why a vowel may or may not be deleted. However, the methodology for the study is sound, and the author’s division of reduction into four distinct but related acoustic phenomena informs the categorical division that I ultimately choose for my analysis.

28

2.2.5 Turkish

Jannedy (1995) investigates high vowel devoicing in Turkish, which affects both stressed and unstressed /i, y, ɨ, u/. The segmental and prosodic factors analyzed were speech rate, stress, preceding environment, following environment, vowel type, and syllable type.

Because Jannedy’s research follows from the findings for Korean (Jun & Beckman

1994), which suggest a strong role for the glottal gesture of the preceding consonant, she needed to establish patterns for the duration of aspiration following the release of voiceless stops. To that end, she also carries out two duration experiments: one for Voice

Onset Time (VOT) duration in which she measures VOT for /p t k/ before non-high vowels, and one for vowel duration in order to establish the basic durational facts for

Turkish. The VOT duration experiment consists of measurements of word-initial /p, t, k/ and word-medial /t, k/ from a speech sample consisting of 28 minimal pairs or near minimal pairs read in citation form and in carrier phrases by five educated male speakers of Istanbul Turkish. The results, compared with cross-linguistic VOT trends, show that

Turkish VOT is considerably longer on average than VOTs reported for both Korean and

Japanese stops, which could explain the slightly different devoicing pattern Jannedy finds for Turkish stops versus fricatives. The vowel duration experiment indicates that Turkish vowels are significantly longer in closed versus open syllables.

Having established the basic facts about Turkish VOT and vowel duration,

Jannedy is able to more accurately interpret the results for devoicing. She uses recordings made of 9 speakers reading 135 words positioned utterance-initially in carrier phrases,

29

repeated three times at speaker-determined slow, medium and fast speech rates. She classifies each token according to one of three categories based on her analysis of the acoustic information in the spectrograms, similarly to the way Jun and Beckman (1994) did for Korean. These are: “voiced (clear voicebar with several glottal pulses), partially voiced (one or two faint glottal pulses) or completely devoiced vowel (no glottal pulses visible on spectrogram)” (63). In her Varbrul analysis, however, the partially voiced tokens are grouped with the voiced tokens because of the necessity of comparing a binary opposition for the logistic regression. Jannedy finds that faster speech rates and voiceless preceding environments favor devoicing the most, but also that stress, syllable type, and following environment were significant predictors of devoicing. She discusses each of these in turn and provides illustrations of hypothetical changes in gestural phasing, showing different degrees of overlap for each. As expected, faster speech rates correspond to more devoicing, as do vowels in unstressed syllables, being shorter than stressed vowels. In contrast to findings for other languages, she finds more devoicing in open syllables than closed ones, which she attributes to the differences in duration:

Turkish open syllables tend to be longer than closed ones as well. Also departing from cross-linguistic trends is her finding that a preceding stop is more likely to condition devoicing than a preceding fricative in Turkish, which is true when the consonant follows the target vowel as well. Jannedy concludes that her findings accord well with a gestural overlap account, in which laryngeal gestures pertaining to consonants “overlap or blend with the glottal adduction gesture of the preceding or following vowel” (80). She also

30

suggests that some of her findings point to language-specific timing relations, which is a point that Delforge addresses (2008b, 2009) and is discussed below for Andean Spanish.

Jannedy’s study has been influential in the cross-linguistic devoicing literature, providing important evidence for the gestural overlap account. Although her spectrographic analysis indicates that the process is gradient, i.e. she finds partial devoicing, her statistical methods could only account for a category with two levels, so partially devoiced tokens had to be considered as voiced, a decision which risks potentially understating or overlooking possible effects. The present study avoids this by including a continuous analysis of voicing in addition to a categorical analysis. Her method of gathering and measuring speech rate, accounting for intraspeaker variation in rates rather than using an outside measure, like Dauer (1980), is adopted in Dabkowski (2017) where

I investigate the effects of speech rate on vowel reduction in a reading task, discussed further in 4.2.

2.2.6 Portuguese

Vowel devoicing and deletion is found to varying degrees in European and Brazilian varieties of Portuguese, in particular in word-final post-tonic position, and often co- occurring with vowel raising.

2.2.6.1

31

Lipski (1990: 15) suggests that vowel reduction in Brazilian Portuguese (henceforth BP) acts similarly to (discussed below). While he makes mention of

Major’s (1985) demonstration that BP post-tonic vowels are weaker than pre-tonic vowels, it is unclear exactly what other evidence this observation is based on, though he may be referring to the observation made by Nobre and Ingemann (1987) in their investigation of vowel reduction in BP. They measured duration and formant values for stressed, pre-tonic, and unstressed final vowels, for four speakers total of three regional varieties of BP: Rio de Janeiro, Rio Grande do Sul, and Minas Gerais, but did not report on any regional differences, likely due to the small size of the sample. The stimuli measured were all words where the vowel of interest would occur in /s_k/ or /ʃ_k/ contexts, but after experiencing difficulty in measuring formant values for the unstressed final vowels, since so many were devoiced, they added another set of stimuli where the unstressed final vowel was between two voiced consonants in order to compare formant frequency results to previous findings. As for duration, the authors report that the average duration of post-tonic (final) vowels in two-syllable words is 38% of the average stressed vowel duration, while duration of pre-tonic vowels in three-syllable words is 46% of stressed vowels.

Meneses and Albano (2015) have recently reported new results from a series of experiments investigating the acoustics, aerodynamics and perception of final post-tonic devoicing following /s/ in the variety of Brazilian Portuguese spoken in Bahia. Their most noteworthy finding is that none of the results support previous claims of vowel

32

deletion in this position. They find that these final vowels are most often partially devoiced, showing high degrees of centralization, which is inversely proportional to duration: more centralized vowels are shorter than less centralized ones, indicating that temporal reduction and weakening are related. For completely devoiced vowels, they find evidence of the vowel quality in the fricative: the /s/ segment is lengthened, and has a final vowel-like portion which is too short to be voiced, and the /s/ centroid is lowered.

This /s/ lengthening also favors recovery of the vowel in the perception experiment.

Meneses and Albano very briefly attempt to interpret their findings in line with the predictions of Articulatory Phonology, noting that their results highlight the importance of the temporal dimension of articulation.

Meneses and Albano (2015) provide crucial evidence that much of what has been traditionally described as post-tonic vowel deletion, is in fact, a gradient process ranging from reduction involving centralization and partial devoicing to total devoicing.

However, their ability to make the conclusion that deletion does not occur is limited to one phonetic context: open syllables following /s/. My own impression is that the context that results in the most vowel weakening in BP, be it raising, devoicing or deletion, is that in which the post-tonic vowel follows an affricate or a stop, as in the examples given by

Mendes and Walker (2012): junto “together”, and bastante “a lot”. Further research on

BP should address phonetic context, especially since Mendes and Walker claim that more extreme devoicing and deletion is on the rise in São Paulo Portuguese.

33

2.2.6.2 European Portuguese

Silva investigated vowel devoicing and deletion in the varieties of European Portuguese spoken in the Azores, on the islands of Faial (1997) and São Miguel (1998). For

Faialense, he recorded one speaker and focused on explaining the phonological contexts for deletion, whereas for São Miguel, he was additionally interested in the social patterning of devoicing and deletion. I focus here on his findings for the linguistic patterning of the phenomenon. He finds that unstressed [u] and [ə], the neutralized unstressed realizations of /o, ɔ, u/, and /e, ɛ/, respectively, were the most likely vowels to be deleted, highlighting the importance of word-final position at the end of an utterance as the prosodic context most favoring deletion, and word-initial position as the context most discouraging reduction. He characterizes deletion as a process controlled by both prosodic and feature-based effects, while calling devoicing a “‘domain-edge’ effect”

(1998: 172) that is more widespread, irrespective of segmental context. In the 1997 paper,

Silva only refers to deletion (though mentions one “odd case” of devoicing), and in the

1998 paper makes a distinction between devoicing and deletion. However, since Silva’s analyses were based on impressionistic auditory categorizations, that distinction may not be meaningful. In the absence of articulatory data, or in this case even acoustic data, it is not possible to definitively distinguish the two. That limitation notwithstanding, the author’s findings, especially those related to prosodic contexts for reduction, are important for considering similar processes in Spanish.

34

2.2.7 Summary of cross-linguistic findings

The studies reviewed above represent a sample of the most foundational research on vowel reduction that involves devoicing, voice weakening, and shortening across languages. In all of the languages reviewed above, the main targets of voice weakening are high vowels in unstressed positions. As we have seen, segmental context plays an important role in this process: voiceless flanking consonants (both preceding and following) result in more voice weakening for vowels across languages. Prosodic context also has an important role to play: several of the above studies also find greater rates of voice weakening in post-tonic, word-final, and phrase-final positions. With these findings in mind, we now turn to how vowel reduction manifests in Spanish.

2.3 Vowel reduction in Spanish

For Spanish, unstressed vowel reduction and devoicing has been documented and described most extensively for Andean Spanish (Delforge 2008a, 2008b, 2009, 2012,

Gordon 1980, Hundley 1983, Lipski 1990, Sessarego 2012a, 2012b), and Central

Mexican Spanish (Boyd-Bowman 1952, Canellada & Zamora Vicente 1960, Lope Blanch

1963, Matluck 1952, Serrano 2006). In this section, I review the findings and offer critical summaries of those studies.

While this study may uncover differences in the behavior of vowel weakening in

Mexico and in the , Delforge (2008a, 2009) notes that an examination of the research that has been done to date shows that Spanish unstressed vowel devoicing

35

appears to exhibit the same basic characteristics in both regions: a high degree of inter- and intra-speaker variability, mid-vowel targets /e/ and /o/ (as opposed to /i, u, a/) in similar consonantal and prosodic contexts, and prevalence in rapid speech (although

Delforge’s results indicate that the phenomenon is not limited to fast speech).

2.3.1 Vowel reduction in Andean Spanish

Prior to Delforge’s in depth analysis, there had only been three previous studies dealing with Andean Spanish vowel reduction. Gordon (1980) is the first study to mention the phenomenon, however, because his goal was a general description of pronunciation, his discussion of vowel reduction is very brief. In 1968, the author tape recorded interviews with 118 participants from all parts of the country and representing all levels of education and social classes. He observes that stressed vowels are quite stable, but that unstressed vowels are weakened or disappear completely in the speech of more than half of his participants from the highlands. He describes it as a variable process, observing that it does not uniformly appear in the speech of these individuals, and that while it tends to be used in more informal speech, it was not absent from formal speech, but he does not provide a statistical analysis. He reports several frequencies for segmental contexts of occurrence, following Lope Blanch’s investigation for Mexico:

80.5% of weakened or lost vowels appear between a consonant and /s/, and 71% between a voiceless consonant and /s/, nearly equal to Lope Blanch’s findings (83%, and 71%, respectively). Unlike Lope Blanch, Gordon finds that the consonantal context most

36

favoring weakening was tVs, accounting for 28.3% of the weakened tokens. Lope Blanch found that sVs accounted for 22.9% of weakened tokens, whereas for Gordon’s data, this context appeared in only 13.2% of weakened tokens. He attributes this to Lope Blanch’s inclusion of the filler word entonces “then”, which he notes is not as common in Bolivian speech as in Mexican speech. Interestingly, he also reports a few tokens of weakening in the speech of participants from the lowlands, but those occur exclusively after the affricate /t∫/. Finally, Gordon reports that /e/ is the most frequently weakened vowel, followed by /o/, then /a/ and /i/. He leaves out /u/, implying that he doesn’t find any weakened tokens of it.

Gordon’s data provides a starting point for a serious investigation of the process but is limited by a lack of inferential statistical analysis, and the lack of investigation of any linguistic factors that may influence the process, other than segmental context.

Additionally, he does not attempt to characterize the process itself, referring to it simply as “debilitamiento o hasta la desaparición total” ‘weakening or even complete disappearance”. He does refer to devoicing only when referencing the few tokens following /t∫/ in the speech of lowland participants, but it is unclear whether weakened vowels are primarily devoiced or weakened in other ways.

Hundley (1983) performs a variationist analysis investigating the behavior of unstressed vowels adjacent to /s/, by tape recording interviews with nine male residents of Cuzco between the ages of 50 and 65, three each from lower, middle, and upper- middle socioeconomic groups. The author groups vowel tokens into three categories: full,

37

weak, and absent. His weak category includes vowels that are short, voiceless, or reduced, or some combination of the three. In his data, he finds that 37% of mid vowels are weakened as compared to 27% of high vowels, and 20% of low /a/. Because Hundley only investigates unstressed vowels in contact with /s/, the only contexts he reports are

Vs and sV, finding no significant difference in weakening based on the position of the vowel relative to the /s/, nor any significant difference between closed and open syllables.

He finds that a weak or absent token is much more likely when the other consonant is voiceless, even more so when that voiceless consonant is coronal. He also mentions that his informants were more likely to produce a weakened or deleted variant after an affricate. Word-final position as well as utterance-final position are found to significantly predict weakening, though not deletion. For a subset of data, Hundley analyzes speech rate quantitatively, defining fast and slow speech as that which contains 6.0 to 6.4 syllables per second and 4.0 to 4.4 syllables per second, respectively. He finds that devoicing is more frequent in fast speech. The only speaker characteristic Hundley examines is social class, which he operationalizes by using the speaker’s occupation. He finds that weakening and devoicing are more frequent among lower and middle-class speakers, while only lower-class speakers show evidence of deletion.

Hundley’s study makes several important contributions: it is the first variationist analysis of the phenomenon in Spanish, and the first to provide a systematic analysis of speech rate. The major limitation of the study is the sample itself: the small number of speakers interviewed, and their being limited to older males means that he cannot analyze

38

the roles of speaker sex or age, so it is unknown whether the process represents a change in progress or stable variation. Also, as Delforge (2009) notes, the fact that six of the nine speakers were either from rural areas outside Cuzco or from Lima is concerning. Their varied provenances may limit the extent to which their speech is representative of the

Cuzco urban area. Finally, Hundley’s decision to treat weakening separately from deletion is questionable, because without performing acoustic or articulatory analysis, he cannot provide evidence that these realizations are in actuality separate phenomena.

Lipski (1990) investigates Ecuadorian Spanish unstressed vowel reduction

(UVR), which he defines as a series of related intermediate stages on the way to deletion, including devoicing, shortening, and centralization. His data are from recorded interviews

(presumably designed to elicit casual speech, but he does not specify) with speakers from several cities in (Ibarra, Quito, Riobamba, Cotopaxi, Cuenca). His sample includes twenty male and female middle-class speakers ranging in age from 18 to 80. His spectrographic analysis leads him to interpret devoicing as an initial stage of UVR, representing the mildest form of reduction, followed by shortening and centralization, and finally deletion. He finds that UVR is “usual only in contact with /s/” (1990:3), and that the unstressed vowel’s location in the word is a factor contributing to the patterns.

With regard to the target vowels for reduction, Lipski observes that vowels targeted by

UVR differ according to word position: word-finally /e/ was most affected, followed by

/o/ and /a/, but word-medially, only /e/ and the high vowels /i/ and /u/ were affected. As for the role of /s/, Lipski finds that in word initial and word medial syllables, UVR occurs

39

at approximately the same rates either preceding or following /s/, provided that /s/ is syllable-initial, which appears to indicate that higher rates of UVR occurred in open syllables. Lipski also notes that morphemes ending in /s/, such as the plural morpheme – es and the first person plural verb ending –mos, are frequently predictable from sentence context, and therefore can be more readily reduced.

Lipski analyzes the above results using an articulator-based model of feature geometry, which organizes features in a hierarchy according to articulator nodes. Under the traditional feature geometry model, /i/ and /e/ engage the dorsal node, while /s/ engages the coronal node, but there have been proposals such as Clements (1976) that /i/ and /e/ are actually [+coronal], while others (Keating 1988: 276) analyze /s/ as [+high].

Lipski takes this as evidence of the similarity of the segments, and proposes that UVR can be modeled by assuming that the target vowel loses its [-consonant] specification and then, because of the features shared between the vowel and consonant, [+coronal] and

[+continuant], the vowel’s place node delinks from its root (which is the node containing the laryngeal features) and reattaches to the root node of the consonant.

Lipski’s study is an important contribution to the understanding of the patterning of UVR in Ecuador, as well as to the phonology of variable vowel reduction in Spanish.

His is the first study to report on the difference in vowels targeted based on word position. His phonological analysis accounts well for cases of UVR involving /e/ or /i/ in contact with /s/, but it does not address other target vowels, or consonantal contexts not involving /s/.

40

The most comprehensive sociolinguistic analysis of the phenomenon of vowel devoicing in Andean Spanish is Delforge (2009, 2012). In addition to performing a detailed acoustic analysis of unstressed vowel devoicing in Cuzco, Delforge investigates the sociolinguistic patterning of unstressed vowel devoicing in Cuzco Spanish, focusing on its changing rates of occurrence over time, and its social evaluation. To analyze the production of vowels in this variety, Delforge uses 2-minute segments of recorded spontaneous speech data from sociolinguistic interviews carried out with 150 residents of

Cuzco, including 75 men and 75 women evenly divided into four age groups (those born before 1950, which is relevant because of the timing in that year of the earthquake that brought many social changes to the area; those born between 1950 and 1964, who grew up during the time that the most social change to the area took place; those born between

1965 and 1979, and those born between 1980 and 1987), and nearly evenly divided into four socioeconomic groups (lower, middle, upper middle and upper class), which yielded nearly 50,000 tokens, 9.5% of which were classified as reduced.

Observing the range of phonetic variation in the reduction of unstressed vowels,

Delforge divides reduced vowels into four categories, contrasting these with a completely voiced vowel, noting that the categories are not mutually exclusive, but divided as such to serve an analytic purpose. Each category and its acoustic characteristics are shown below in Table 2.3. Of the reduced tokens, 70% were completely devoiced, 16% were weakly voiced, 5% were partially devoiced/shortened, and 9% were apparently elided. It is also important to note that Delforge compared F1 and F2 values in 1600 stressed and 1600

41

unstressed fully voiced tokens, as well as 361 unstressed reduced/devoiced tokens

(because these were the only reduced tokens with sufficient formant structure to allow for measurement) and did not find significant differences. This comparison allowed her to provisionally conclude that Andean unstressed vowel reduction is primarily devoicing and does not entail any changes in vowel quality.

Table 2.3. Delforge’s categories of reduced vowels

Category of reduction Acoustic characteristics of vowel Partially devoiced / shortened Voice bar of 30 ms or less Weakly voiced Slightly longer but faint voice bar and lacking clear formant structure Completely devoiced No glottal present but some energy observed in first and second formants, syllable not temporally reduced Apparently elided Absence of formants and voice bar, syllable does appear to be shortened.

In her analysis of the distribution of devoicing according to linguistic factors,

Delforge determined that the best approach is to evaluate only tokens from speakers she identifies as frequent devoicers, as opposed to non-devoicers, since overall reduction rates were so low. To identify vocalic targets for reduction, Delforge takes frequency rates of occurrence for Spanish vowels into account, and finds that /e/ and the high vowels are the most frequent targets, though targets vary according to word position.

Word-medial high vowels and /e/ are affected at much higher rates than /o/ and /a/, but in word-final position, /e/ is not significantly more likely to be affected than other vowels.

42

As for the surrounding segmental environment, Delforge finds that 70% of her devoiced tokens were adjacent to /s/, and the remainder occurred in contact with other voiceless consonants: stops /p t k/, affricates /tʃ/ and /tř̥ /, other fricatives /x/ and /f/ and voiceless assibilated /ř̥ /. Delforge omitted those tokens occurring in between voiced consonants from her analysis on the basis of lack of any devoiced tokens in that context. She concludes that devoicing cannot be entirely attributed to /s/, but that voiceless consonants more generally contribute to the process.

As Delforge (2009) mentions, there are multiple factors that may contribute to this finding that /s/ accounts for so much of the devoicing in her corpus: primarily that

63% of Spanish words are paroxytones (Nuñez Cedeño and Morales Front 1999), and the fact that /s/ often functions as a plural marker, which indicates a possible role for type frequency. Lipski (2009) also mentions a possible role for information status, highlighting the predictability of the plural morpheme –es and the first person plural verb ending –mos based on sentence context. He hypothesizes that this predictability facilitates reduction in these contexts.

Delforge finds that 60% of the devoicing in her sample occurs word-finally, and devoicing in this word position is often more severe, falling into either the “completely devoiced” or “apparently elided” categories, which represent the most extreme cases of reduction. Delforge further examines word-final position, observing different rates for target vowels in three different devoicing patterns. Sandhi devoicing, in which the consonant following the devoiced vowel forms the onset of the following word, as in (5a)

43

below, shows devoicing rates similar to word-medial position, with /e/ and the high vowels showing the highest rates. However, devoicing in syllables closed by /s/ (5b) and in open pre-pausal syllables (5c) exhibit high rates for all vowels, except for /u/ in word- final pre-pausal syllables, of which there were only 7 tokens. An example of each type, adapted from Delforge (2008a) is shown below in (5).

(5) a. Sandhi: [tɾaxe̥ típiko] traje típico “typical costume”

b. Closed by /s/: [ésto̥ s] estos “those”

c. Open pre-pausal: [kusko̥ ]# Cuzco “Cuzco”

With regard to phrase position, Delforge distinguishes between intonational phrase final and utterance final, and reports different behavior based on interaction with the three word-final patterns mentioned above in (5). She finds that devoicing in open, pre-pausal, word-final syllables increased in utterance-final versus intonational phrase- final (21% vs. 13%), and devoicing rates in word-final syllables closed by /s/ were 22% phrase-medially, 17% in intonational phrase-final position, and 15% in utterance final position. Thus, her results indicate that prosodic boundaries may interact differently with open versus closed syllables, with a word-final vowel next to a higher prosodic boundary more likely to devoice if the syllable is open, and less likely to do so if the syllable is closed.

Delforge is cautious about making strong claims regarding the effects of speech rate, because of the lack of consensus regarding conversational speech rates (Laver

44

1993), but she does find at least weak support for the lack of dependence on speech rate for Cuzco Spanish unstressed vowel devoicing (UVD), noting that it was “commonly observed in very slow speech during these interviews, even at rates between 2 and 3 syllables per second” (Delforge 2009). Because of this finding, it is important that future research investigate speech rate as a possible factor in UVD in a systematic way.

She identifies the 1950 earthquake in Cuzco as a landmark event in the changing demographics of the city, and hence, a catalyst for shifting linguistic attitudes. With the rebuilding of the city, Cusqueñans were suddenly confronted with two major groups with whom they had not had extensive contact: rural migrant workers (who tended to be of indigenous ancestry and often spoke Quechua) and tourists, many of whom spoke dialects other than Andean Spanish. Whereas devoicing was frequent and widespread among Cusqueñans prior to the earthquake, it began a steady decline in use from that point forward. To understand that decline, Delforge makes use of the construct of apparent time by surveying speakers from four different age groups, divided as they were to reflect the differential access they would have had to non-native Cusqueñan speech.

Her conception of social class, like Hundley’s, is based on occupation, following previous sociological research indicating that except for the elite, who are relatively easily identified, the best indicator of prestige in Cuzco is occupation (Van den Berghe and Primov 1977). By combining that research with Larson and Bergman’s (1969) description of job categories and social stratification in , Delforge was able to create correlations between occupations and social class such that “occupations such as street

45

vendor, cook, waiter, and construction workers were assigned to the lower-class category... teachers, secretaries, tour guides, small business owners or in similar occupations were identified as middle class” (321), while doctors, dentists and lawyers were determined to be members of the upper-middle-class.

All speakers, both frequent and non-devoicers were included in the statistical analysis of speaker characteristics. Results indicated age, class, and gender as significant individual predictors of devoicing, as well as the interaction between class and age, and the interaction between all three factors: gender, class and age. The highest devoicing percentage (16.67%) is found among the oldest age group, and it declines steadily as age decreases, to a minimal 5.74% in the youngest group of speakers. This difference represents the fact that devoicing is receding from Cuzco speech. Social class was significant as well, with upper class speakers devoicing at very low rates, below 4%, for all except the oldest group of speakers. The interaction between class and age reflects the spread of the decline over time: the oldest speakers devoice at similar rates across class, while a very large difference is seen in the upper class group starting with the second oldest group of speakers (those born between 1950 and 1964, the fifteen years following the earthquake), and spreading to upper middle and middle class speakers in the next youngest group, born between 1965 and 1979, and finally to the lower class speakers in the youngest age group, who still devoice at a rate more than double that of the upper class speakers.

46

Gender was independently significant, with women devoicing at lower rates than men, but it also presented a significant three-way interaction between age, class, and gender, which Delforge explains “...captures the fact that upper-middle-class and middle- class women born between 1965 and 1979 adopt the rejection of devoicing initiated by both male and female upper-class speakers born from 1950 to 1964 to a much greater extent than do upper-middle and middle-class men in the same age group” (325), indicating that women more rapidly eliminate devoicing from their speech, thus moving toward the prestige norm, i.e. fully voiced vowels.

In order to understand the language attitudes and ideologies present in Cuzco that may account for the recession of vowel devoicing, Delforge also carries out a perception experiment using speech samples from the production data she had collected in which participants listened to a speaker who devoiced many vowel segments, and were asked open-ended questions to discover: “what they thought of this way of speaking, if they associated this pronunciation with any particular type of person and, if so, what kind of occupation, level of education, social class and other characteristics would best describe them.” (2012: 327). Findings from this part of the study indicate that the age groups differ in their awareness of devoicing, and their impressions of its connection to migrant speech: the oldest speakers were unaware of the feature and thus had no associations with it, while 60% the middle age group was aware of the feature, but only half of those associated it with provincianos, “Quechua-accented rural migrants” (311), a stigmatized social type in Cuzco.However, 80% of the youngest age group, was aware of the

47

feature, and all but one of them associated it with provincianos rather than with their regional variety.

As Delforge (2008a, 2008b, 2009, 2012) and others have shown, variable unstressed vowel devoicing is an interesting phenomenon not only for sociolinguistic study, but also for phonetic and phonological theory. In Cuzco, rates of devoicing are receding, and its patterning is characteristic of a change in progress, specifically a change from above. Indeed, Delforge’s analysis of attitudes toward devoicing indicate that, for

Cusqueñans, there is a certain amount of social stigma associated with it.

Focusing on its phonological implications, Delforge’s study (2008a, 2008b, 2009) gives the most thoroughly elaborated model of unstressed vowel devoicing by using

Gafos’ (2002) gestural alignment schema, which integrates the Articulatory Phonology

(Browman & Goldstein 1989) and the Optimality Theory (Prince & Smolensky

1993/2004) frameworks. This schema proposes that each gesture contains a set of dynamic reference points, or landmarks, which are: Onset, Target, Center, Release, and

Offset. These landmarks are then aligned in speech along the temporal dimension via a series of gestural alignment constraints that specify how landmarks of one gesture line up with those of a preceding or following gesture. The relevant constraints for vowel devoicing are CV COORD and VC COORD, which specify the alignment of gestures corresponding to each segment in a CV sequence, and those corresponding to a VC sequence. In this view, devoicing arises from differences in gestural alignment.

48

In order to account for not only the characteristics of Andean Spanish UVD that accord with cross-linguistic trends, but also those that differ from cross-linguistic trends, i.e. its occurrence independent of speech rate, and the fact that it occurs in non-high vowels, Delforge proposes that CV COORD and VC COORD show cross-linguistic variation in that Andean Spanish allows more overlap between gestures pertaining to adjacent vowels and consonants than other languages, which explains why devoicing in this variety is not dependent on speech rate. In order to account for the /e/ being affected more so than /o/, Delforge appeals to the homorganicity of adjacent segments, proposing a constraint called “*OVERLAP V//CHET” which states that “The plateau of a consonant may not overlap the plateau of an adjacent heterorganic vowel”

(153). Using these types of constraints, Delforge is able to account for patterns not explained by the interaction of decreases in the temporal distance between gestures based on speech rate. Finally, high devoicing rates for unstressed vowels preceding coda /s/ are accounted for in this model by constraints governing each consonant’s intrasegmental organization.

Delforge’s study is extremely comprehensive and addresses both linguistic and social factors in a detailed and intentional way. Her methodology and findings serve as motivation for the present study, which seeks to replicate certain aspects of her investigation while improving upon others.

Finally, Sessarego (2012a, 2012b) also explores Andean Spanish vowel devoicing, carrying out two studies in , one among Afro-Bolivian communities in

49

the Yungas region of the department of La Paz (2012a), and the other in Cochabamba

(2012b). For each study, Sessarego interviewed twelve speakers, selected on the basis of age and gender, with social class being held constant. The speech recorded and analyzed for the Afro-Yungueño (AY) study (2012a) was informal interview speech, and in

Cochabamba, a reading task was administered and recorded. The classification of weakening was the same for both studies: a vowel is considered “completely devoiced”1 when there was no visible voice bar throughout its total duration, “partially devoiced” when the duration of its voice bar is not longer than half of the total duration of the vowel, “completely voiced” when the of its voice bar was more than half of their total duration, and apparently elided when there is no temporal evidence for a vowel in the spectrogram.

Sessarego’s results indicate very low rates of devoicing in both communities:

3.2% of unstressed vowels were reduced for AY, and 5% for Cochabambino. In AY, mid vowels /e/ and /o/ are targeted most frequently, and more so in word-final syllables, while

/a/ is nearly never reduced. In Cochabambino, the high vowels /i/ and /u/ are the most frequent targets, though they never appear word-finally in this data set, so that in word- final syllables, /e/ and /o/ are the most frequent targets. The majority of devoicing takes

1 This characterization must be an error on Sessarego’s part, although it is repeated in both studies (2012a, b). I believe the description he gives “vowels were classified as ‘completely devoiced’ when the length of their voice bar was more than half of their total duration” (2012a: 283, 2012b:217, emphasis mine) is in fact meant to characterize the criteria used in classifying a vowel as “completely voiced”. Because of this error, there is no description given of what he considered to be a completely devoiced vowel, although the figures provided to illustrate this category (2012a: 284, 2012b: 216) show spectrograms with no discernable voice bar at any point throughout the vowel’s duration. Based on the figures and his descriptions of other categories, I surmise that he intended the description to be “vowels were classified as ‘completely devoiced’ when there was no visible voice bar throughout its total duration”.

50

place in vowels adjacent to /s/, followed by other voiceless consonants, for both varieties.

With regard to social factors, in AY Sessarego finds that the oldest and youngest speakers have the lowest rates of devoicing, and that, while gender on its own is not significant, there is a significant interaction between age and gender, whereby women from the older and middle age groups exhibit significantly more reduction than men. For

Cochabamba Spanish, Sessarego found no effect for gender, but an effect for age, in that older speakers reduced at rates more than double that of younger speakers (7% vs. 3%).

Sessarego’s studies provide some evidence that devoicing is less common in these

Bolivian communities than other Andean communities studied. However, his findings are based on only 66 weakened tokens for Afro-Yungueño and 147 for Cochabambino.

Because of the extremely low rates of weakening found, the sample size is not adequate to conduct a correlational analysis of linguistic patterning, because some categories only contain 1 or 2 tokens. Because of this, the results of these studies should be taken as preliminary evidence, to be corroborated with further investigation.

2.3.2 Vowel devoicing in Mexican Spanish

The investigation of variable unstressed vowel reduction in Mexico encompasses only a handful of studies: four early studies all completed prior to the advent of variationist methods, and one small-scale study from 2006. It is surprising that Central Mexican UVR has not received a more thorough and systematic treatment, especially because, as

51

Perissinotto notes (1975:26), the weakening or loss of unstressed vowels is one of the most salient characteristics of pronunciation in Mexico City.

The earliest descriptions of vowel devoicing in Central Mexico (Boyd-Bowman

1952, Matluck 1952) were done without the benefit of recording devices, and therefore based on auditory impressions made in the moment. Matluck used the cuestionario lingüístico hispanoamericano, “Latin American linguistic questionnaire” developed by

Navarro Tomás to collect speech data for 51 speakers from the areas immediately surrounding the Distrito Federal, and while he was interested in other aspects of pronunciation, I only discuss his description of unstressed vowel pronunciations here.

Matluck observes that initial unstressed vowels are reduced and “obscured”2 resulting in such pronunciations as oficio, “office” italiano, “Italian”, amigo “friend”; or they are elided, lengthening the following consonant in the process. He characterizes the word- internal unstressed vowel as “reducida y relajada” (“reduced and relaxed”), giving policia, “police”, viejecito “little old man”, as examples, but states that very rarely does elision occur in this context. He describes the greatest degree of weakening in the post- tonic vowel, characterizing it as “sumamente relajada, y muchas veces llega a perderse...Tras consonante sorda, la vocal final absoluta es siempre relajada y más o menos ensordecida.” (“extremely relaxed, and many times even lost... Following a voiceless consonant, the vowel in absolute final position is always relaxed and to some degree devoiced”) (113). Matluck also describes vowel raising in both word-internal and

2 “Entre personas semicultas, y aún más en el habla popular, la vocal inicial se reduce y oscurece...” (1952:112).

52

word-final positions, and fronting in word-internal position. He notes that a stressed diphthong can result in either the raising or fronting of a pretonic /e/ to [i], such as the pronunciation of pescuezo “neck” as [piskweso], or [pe̘ skweso]. In word-final position, he mentions that /e/ after a palatal may raise and devoice simultaneously, resulting in pronunciations of noche “night” and calle “street” as [nochi] and [caji]. Matluck does not attempt a systematic investigation or explanation of these patterns, so the value of this study is limited, with its primary contribution being the examples provided.

Boyd-Bowman’s (1952) data comes from speakers from Mexico City as well as the nearby states of Hidalgo and Guanajuato. While it is not clear how he collected his data, he seems to have done so through casual conversations and observations of public speech. Boyd-Bowman observes that vowels in word-final position are the most frequent targets of what he calls the “abreviación o pérdida completa”, “abbreviation or complete loss”, of unstressed vowels. He claims that /e i o u/ may all be subject to devoicing or elision, but that /a/ resists this process. He also states that weakening and elision occurs almost exclusively when the vowel is in contact with /s/, especially when the target vowel is between /s/ and another voiceless consonant. Notably, he also attributes the reduction observed to rapid speech, reporting that complete loss of unstressed vowels is a characteristic of rapid speech rather than a general process, and that when speech is slowed down, “lost” vowels are recovered. With regard to possible social patterning of vowel weakening, Boyd-Bowman mentions that it frequently occurs in the speech of

53

“indios, campesinos and mineros” (“Indians, peasants, and miners”)3, all categories associated with lower socioeconomic status. Like Matluck (1952), Boyd-Bowman reports observations but does not systematically investigate them. However, his observations provide a useful point of departure for studying the phenomenon of vowel weakening in

Central Mexican Spanish.

Canellada de Zamora and Zamora Vicente (1960) provide the first instrumental analysis as part of their larger study of vowel weakening in and around Mexico City.

Their data are from spontaneous conversational speech observed in public places around the city, and kymographic representations of eight male and two female speakers from different ages and social classes. They claim that the phenomenon operates independent of social class, with unstressed vowels being devoiced in the speech of all social classes, and in particular note that there does not seem to be any stigma associated with these pronunciations. The authors claim that the most frequent position for unstressed vowel weakening or loss is absolute syllable initial position, i.e. onsetless syllables, followed by an /s/, such as [sp'erate] for /es'perate/ ‘wait’ (226), but they also note that this type of reduction takes place in all positions, word-internal and word-final, pre- and post-tonic.

Like Boyd-Bowman, they consider /s/ as playing a major role in conditioning weakening and devoicing, and they also mention that stressed vowels can be devoiced or deleted, such as in “mamas'ta” for “mamacita”, ‘attractive female’. Also, like Boyd-Bowman, they attribute the phenomenon to rapid speech (although they do not measure speech

3 The author does not specify which participants are from Mexico City and which are from nearby more rural areas of Hidalgo and Guanajuato.

54

rate), remarking that slow and careful pronunciation restores these vowels (233). This investigation is valuable for the copious examples provided, and the fact that it provides the very first instrumental evidence for vowel devoicing and deletion in Mexican

Spanish. However, as noted below, some of the examples provided by the authors constitute rare and extreme productions presented alongside frequent contexts, thereby not accurately reflecting the reality of the phenomenon.

Motivated by what he refers to as “la relativa insuficiencia” (“the relative insufficiency”) of Boyd-Bowman’s (1952) study, and “la deformación apreciable” (“the striking distortion”) of Canellada & Zamora Vicente’s (1960) study, Lope Blanch (1963) returns to the topic with the goal of providing data that is sufficiently broad while representing the true scope of the phenomenon. His major criticism of Canellada and

Zamora Vicente is that many of the examples they provide are cases that he characterizes as extreme, occurring only sporadically, and therefore not representative of the linguistic reality. As for Boyd-Bowman’s study, Lope Blanch acknowledges the contribution and notes that while many of his conclusions are likely correct, they must be considered provisional and impressionistic, as the limited data do not allow for strong conclusions.

He does not address Matluck’s observations.

Lope Blanch was the first to conduct a systematic investigation of the phenomenon, interviewing 100 speakers and obtaining recordings of 52 of them. He was also the first to provide quantitative analysis of his results. Although he did not have the benefit of spectrographic evidence, his recordings permitted careful listening, and he

55

notes that he often listened to ambiguous cases up to 20 times in order to arrive at a decision about their nature. He identifies four degrees of weakening and divides them into discrete categories: 1) intense relaxation of the vowel4, 2) an intermediate stage of reduction in which a weakened vowel is perceptible, 3) an intermediate stage in which a

“light vocalic element” is difficult (but possible) to perceive, and 4) apparent elision.

Devoicing is possible in 1, 2, and 3.

Lope Blanch does not specify the overall rate of weakening in his sample, but states that there were 2,284 vowels that were either weakened or lost. Of those, 17% were apparently elided, 38% were maximally reduced and devoiced, but not completely lost, and the remaining 45% were weakened, but still perceptible, which represents a collapsing of his categories 1 and 2. Fewer than 2% of the reduced tokens were tonic vowels. The primary target vowels were the mid vowels /e/ and /o/, accounting for 42% and 24% of the reduced vowels in the sample. Nearly 90% were atonic vowels adjacent to /s/. Of these, he identifies the ‘Vs’ context, in which /s/ follows the target vowel, as the context producing the most extreme weakening and elision, as well as the context responsible for the most /s/-adjacent weakened tokens, 67%. The ‘sV’ context, where the target vowel is preceded by /s/ was only responsible for 7% of the /s/-adjacent weakened tokens, while the remaining 26% occurred in an ‘sVs’ context, where the weakened vowel both preceded by and followed by /s/. Although Lope Blanch claims that for ‘Vs’

4 This description as well as the notation used to represent vowels in this category, e.g. [ɐ] and [ə], leads me to believe that Lope Blanch believed that he heard evidence of centralization in his auditory analysis. Serrano (2006) also mentions that he believes he hears centralization in his data.

56

reduction, the preceding consonant is irrelevant, he finds 71% of ‘Vs’ reductions are preceded by a voiceless consonant. However, as Delforge points out, Lope Blanch likely greatly overestimates the role of /s/ in conditioning devoicing, due to his inclusion of the very high frequency fillers pues (“well”) and entonces (“so”), which typically reduce to

[ps] and [tons(:)], respectively.

As for social patterning, Lope Blanch finds none, confirming Canellada and

Zamora Vicente’s (1960) finding that the phenomenon is not a characteristic of rustic or lower class speech. Although he was careful to collect data from male and female participants ranging in age from 15 to 60, and from all social classes (established using occupation and education), he doesn’t provide the distribution across social categories.

He does provide percentages for individual speakers’ overall devoicing behavior, observing that 42% of his participants showed “relatively pronounced” behavior, while

57.8% only devoice occasionally or rarely. Because he did not conduct any inferential statistical tests to compare the amount of weakening among populations, we must rely on his impressions and interpretations of the distributional data only. He states that if he were forced to give his impression about which speakers devoice at the highest rates and with the most intensity5, his data vaguely point to young speakers of middle or superior culture (4)6.

5 The mention of intensity as well as rate here points to the possibility that Lope Blanch believed that the phenomenon could potentially be socially stratified based on the degree of reduction, e.g. male and female speakers reduce at similar rates overall, but where female speakers tend toward milder reduction, male speakers favor complete elision. Unfortunately, this type of analysis was not undertaken. 6 The terms “nivel sociocultural” and “cultura” are often used in the Mexican Spanish literature to refer to level of education (Barajas 2014), although when unspecified, they may refer to some combination of social class and education.

57

More recently, Serrano (2006) carried out a small-scale study in order to assess the continued validity of Lope Blanch’s results. He interviewed 12 speakers from Mexico

City, including 6 women and 6 men, distributed equally across two age groups and three socioeconomic groups, which he operationalizes by using level of education. He made recordings of the interviews, but his results are based on auditory analysis, using the same categories identified by Lope Blanch. Overall rates of reduction are not provided, but

Serrano does mention that the participant with the most reduction in his speech has a reduction rate of 13%, followed by another two speakers with 10%, and another with 9% so it can be assumed that the overall rate is a bit lower than 9%. His overall elision rates are similar to those found by Lope Blanch: 22% apparent elision, but reduction rates are different: 68% devoiced, and 10% shortened and weakened but voiced. He does not offer an interpretation for this, but is careful to caution that any conclusions must be taken as preliminary because of the small sample size.

Linguistic factors that Serrano finds to predict devoicing are word frequency, position of the target vowel relative to the stressed vowel, and consonantal context. He finds that the context most favoring devoicing and deletion is that in which the target vowel appears in the post tonic syllable, preceded by a voiceless consonant onset and followed by a voiceless consonant coda (2006: 21)7 such as antes (“before”), trescientos

7 Serrano states that the context most favoring devoicing and deletion is consonante sorda en sílaba acentuada + vocal + consonante sorda en sílaba inacentuada, that in which the target vowel is preceded by a voiceless consonant in the coda of the stressed syllable and followed by a voiceless consonant in the post tonic syllable (2006: 21). However, this is unlikely because voiceless consonants other than /s/ in syllable codas are quite rare in Spanish (Hualde 2005:76). I have altered his description of the finding based on the three examples he gives, all of which contain a tautosyllabic voiceless consonant preceding the unstressed target vowel forming the onset of the post-tonic syllable.

58

(‘three hundred’), and pesos (‘pesos’). Serrano also finds a possible role for utterance position in predicting reduction, as 37% of his reduced tokens occurred in pre-pausal position. However, as Delforge (2009: 25) notes, it is unclear whether he means to indicate the position that precedes a pause ending an intonation phrase or a phonological utterance.

Serrano reports that frequent words, especially fillers like entonces “then”, este

“um”, and pues “well” undergo more extreme weakening, accounting for many cases of apparent elision. He also acknowledges that complete elision cannot be determined conclusively without instrumental articulatory analysis, but reports his impression that a completely elided vowel leaves behind its time slot, which is filled by a syllabic /s/.

Serrano classified a number of fillers and words as highly frequent, and found that these words accounted for 27% of all reduction found in his data. However, this was not done in a systematic way: it is unclear from his account, but it appears that Serrano simply used his own native speaker intuition to determine frequent words. While I do not necessarily dispute that the words he chose are in fact frequent words, an objective measure of frequency would be preferable.

Like Lope Blanch (1963), Serrano reports that overall, reduction does not seem to be conditioned by age or social class, although he does report a possible role for gender, stating that male speakers provide the most extreme cases of reduction and elision in casual speech and limit reduction in careful speech, which he did not find to be the case for women. He also notes a possible interaction between gender, social class, and

59

frequency in the degree of reduction, observing that lower class men are more likely to completely delete unstressed vowels in high frequency lexical items. This patterning leads Serrano to suggest that Mexican unstressed vowel reduction is a case of stable variation rather than a change in progress, an assessment that I am inclined to agree with based on my analysis. In his sample, men devoice at a higher frequency and to a more extreme degree, although they avoid doing so in reading tasks, which Serrano interprets as an indication that there may in fact be some stigma associated with the feature.

However, speech rate was not considered, and because it would be expected that participants would speak more slowly in a reading task than in casual speech style typically associated with the interview, this finding may be a reflection of the effect of speech rate rather than an indication of social stigma.

Serrano cautions that his study is very preliminary as there is not enough data to draw strong conclusions, which is true. Additionally, the lack of systematicity in determining the grouping of speakers, as well as in determining frequent words is concerning, along with the aforementioned confounding of speech rate with speech style.

The studies discussed here lay the groundwork for my investigation. In this dissertation,

I build on the previous research by offering a comprehensive acoustic characterization of

Mexico City Spanish vowel reduction, along with a rigorous analysis of the linguistic and social patterning of the phenomenon.

60

2.4 Region of study

I conducted my research in Mexico City, one of the largest urban metropolises in the world, with a population greater than 20 million in the Distrito Federal (‘Federal

District’) and surrounding areas. It is located in the central valley of Mexico, in an area that was once the Aztec capital known as Tenochtitlán, built among a large system of lakes. Most of the lakes have dried up, and the city today is built over the soft land of the dried-up lakebeds. The Spanish began to occupy the area with Cortez’s arrival in 1521, and the began to spread throughout the region and over time, began to overtake and in many cases replace the indigenous languages spoken there, although several are still spoken today, most notably Nahuatl. The city’s population grew steadily up through the end of the 19th century, and in the 20th century, exploded, growing from

200,000 to many millions. This growth is largely attributed to in-migration from rural areas of Mexico, but global migration is also a factor. Today, Mexico City is the center of political, economic, and cultural life in Mexico, and is a particularly interesting venue for sociolinguistic research, in part due to the many diversifying and homogenizing forces at work (Terborg and Velazquez, 2018).

Mexico City as a whole is socially and economically diverse, but distribution of wealth is uneven, so there are neighborhoods within the city that are extremely well-off, and others where residents lack basic resources. Most neighborhoods, however, are a mix of wealthy and poor households (Martín Butragueño 2009). Nutini (2000) distinguishes five social classes in Central Mexico: upper-middle (2%), solid-middle (15%), lower-

61

middle (20-25%), working (30-35%), and marginal (25-30%), though he notes that they are often difficult to distinguish, and can be considered to be nominal, rather than real, classes. Nutini highlights the “improvement efforts” of the solid-middle class, relating them to “precisely the kind of insecurity expressed through emulative consumerism and other attempts at status enhancement” (p. 232), which is reminiscent of Labov’s assessment of linguistic insecurity as a major factor influencing his findings for the social stratification of (r) in New York City department stores (1966).

Nutini (2000) briefly discusses the relationship between education and social class, noting that occupational prospects for university-educated Mexicans are much expanded when compared to those who did not finish high school. In many sociolinguistic studies of Mexico, level of education is used as a proxy for socioeconomic status. Among other studies not discussed here, it was used as such in Lope Blanch

(1963), Serrano (2006), and seems to be a reliable heuristic in many analyses, including research conducted using the Corpus sociolingüístico de la ciudad de México (Lastra &

Martín Butragueño 2009).

Having explored previous research on vowel reduction in Spanish and across languages, as well as the context of Mexico City, I now turn to the methodology for my own research on vowel reduction in Mexico City Spanish, in Chapter 3.

62

CHAPTER 3: METHODOLOGY

3.1 Introduction

In this chapter, I describe the methodology used in selecting the region of study and participants in section 3.2, and the ways in which I obtained and digitally recorded spoken data in section 3.3. Section 3.4 details the processes involved in carrying out the acoustic analysis of the data, as well as the methods used to determine what constitutes a token for analysis, and how I define and measure the dependent and independent variables used in the statistical analysis. In section 3.5, I briefly explain the statistical models used to understand the patterning of this variation.

3.2 Participant selection

Participants were recruited using the “snowball”, or “friend of a friend” technique

(Milroy & Gordon 2003), which has several advantages, such as allowing the researcher to easily obtain a participant sample that varies according to certain characteristics (e.g. age, sex, etc.), and providing an a priori connection between the researcher and the participant, thereby lessening the researcher’s outsider status. This method was used successfully by Delforge (2009) in Cuzco in combination with a large-scale survey-type

63

study. Following this recruiting technique, I gathered speech samples from 73 participants, with a relatively even distribution across genders, age (from 21 to 81 years), and levels of education. 40 of these speakers were included in the final analysis presented here, chosen to ensure a balanced distribution across social categories. More details about the speakers can be found in Appendix A, including their level of formal education, occupation, and neighborhood of residence. Participants were included in the analysis on the basis of having grown up in Mexico City, either within the limits of the Distrito

Federal or the municipalities belonging to the State of Mexico that directly border it. In order to minimize any effects of other languages, only participants who were raised in a household where only Spanish was spoken were included, although many who learned an additional language later in life were included. Those who had extensive experience abroad or who had lived for a period longer than one year outside of Mexico City before the age of 18 were not included.

3.3 Tasks and audio recording procedures

3.3.1 The sociolinguistic interview

The data for this study were primarily elicited via a sociolinguistic interview, originally developed by Labov for his Lower East Side study (1966), some version of which is the primary data collection technique for many studies of sociolinguistic phonological variation. Labov (1984) conceives of the interview as a means of capturing the

64

differences in formal, careful, and casual styles, with the ultimate goal of capturing the

“vernacular”, or a speaker’s most natural way of speaking. This design is based on the notion that one’s speech style varies depending on the amount of attention being paid to one's speech. Thus, Labov’s sociolinguistic interview encompasses not only a casual conversation with the interviewer (and/or any others present), but also includes a number of formal elicitation tasks, the specifics of which are dependent on the researcher’s interest. He also notes that the interview is an opportunity to obtain information about subjective reactions and perceptions of linguistic forms.

Labov describes a number of goals of the sociolinguistic interview, three of which have been reproduced here:

“(1) to elicit narratives of personal experience, where community norms and

styles of personal interaction are most plainly revealed, and where style is

regularly shifted towards the vernacular.

(2) to stimulate group interaction among the people present, and so record

conversation not addressed to the interviewer8.

(3) to isolate from a range of topics those of greatest interest to the speaker and

allow him or her to lead in defining the topic of conversation” (1984: 32).

8 Although my participants sometimes directed speech to others during the interview, for consistency across all participants, I only analyzed speech directed to me as the interviewer.

65

The above stated goals are those that pertain to the elicitation of natural speech, and are the ones I used in the construction of my interview questions, which range from questions about the speaker’s daily routines, their family and friends, and their life growing up, to more general questions about local customs, such as typical Mexican food, holiday traditions, sports, and opinion questions, such as those relating to local, national, and international politics. See Appendix B for a list of sample questions. Interviews averaged approximately 30 minutes, with the shortest being 15 minutes and the longest being approximately 120 minutes.

3.3.2 Reading task

The only formal elicitation task conducted as part of this study was a reading task that typically took about 10 minutes to complete. Participants read a series of 11 sentences, presented on a laptop screen using a slideshow. Criteria used to construct sentences were that each vowel type (/a, e, i, o, u/) was represented in each of pre-tonic, post-tonic and stressed positions, and in both open and closed syllables across the corpus. Words with different stress patterns (proparoxytone, paroxytone, and oxytone) were also included.

Participants were instructed to read each sentence three times, at different speech rates.

The first time the participant read the sentence at what they perceived to be a normal speech rate, the second time a fast rate, and the third time a slow rate. See Appendix C for a list of sentences used in this task.

66

3.3.3 Recording procedures and audio processing

Interviews and reading tasks were digitally recorded either directly into the Audacity® program (Audacity Team, 2014) on a MacBook laptop or into a TASCAM® DR-100 hand-held digital recorder, at a sampling rate of 44,100 Hz and 32-bit format, using a

Plantronics® head-mounted USB microphone. Interviews took place in participants’ homes, or in quiet cafes or parks close to their homes. Each interview session included the researcher and the participant, although occasionally others were present in the room that were not a part of the conversation.

3.4 Data analysis

This section describes the methods used in carrying out the acoustic analysis of the phonetic data, as well as definitions and descriptions for the linguistic and social factors under investigation.

3.4.1 Envelope of variation

Although the primary target of investigation in previous studies has been reduction or devoicing in vowels that are both post-tonic and word-final, there is some evidence in the literature of pre-tonic, word-initial and word-medial reduction as well. Additionally, since some earlier studies on MCS (Canellada de Zamora & Zamora Vicente 1960, Lope

Blanch 1963) have found reduction even in stressed vowels, albeit at very low rates, I

67

included in my analysis vowels in all prosodic positions, including those that carry primary word stress, so all vowels in each word were included, except those that fit the criteria for exclusion. The only exclusions based on segmental context were diphthongs and monophthongs immediately preceded or followed by another vowel, because of the difficulties in segmentation. Hence, tokens like the adjacent /u/ and /i/ in su hija, (“his/her daughter”), were excluded, as were vowel sequences in hiatus, like the adjacent /e/ and

/a/ in te ayudo, (“I help you”). Very frequent filler words like pues “well” and este “um” were excluded from the analysis so that results were not skewed. Pues, in addition to containing a diphthong, is often severely reduced to [ps]. Este, on the other hand, is often lengthened, filling in empty space while the speaker is considering what to say next. It should be noted that uses of este as a demonstrative determiner “this” were not excluded.

3.4.2 Acoustic analysis of the data

Approximately two minutes of each speaker’s interview was transcribed orthographically, beginning 10 minutes into the interview after the participants had become comfortable with the interview setting and head-mounted microphone.

Subsequently, each vowel was segmented and analyzed spectrographically for duration, voicing and formant values, as explained below, and annotated using TextGrids in Praat

(Boersma & Weenink 2016), for an average of 163 tokens per speaker, resulting in a total of 6,504 tokens.

68

For the acoustic analysis of tokens, I took measurements of vowel duration and duration of full modal voicing throughout the vowel, for each vowel produced by all 40 speakers, in order to calculate the percent full voicing. Apparently deleted vowels were marked as well, with vowel duration and voicing measurements of zero (0 ms). Other types of voicing were found, as well as devoicing: weak/breathy voicing and creaky voicing. The 222 creaky voiced tokens found were removed from further analysis, as they did not seem to pattern with other types of voice weakening. While I do not further address creaky voice here, the issue deserves further investigation, as it seems to be representative of a different type of weakening process. After removing the creaky voiced tokens, 6,282 tokens remained for analysis. Standard criteria (outlined below) were used to determine landmarks associated with the beginning and end of each vowel in order to measure vowel duration.

1. The beginning of a vowel preceded by a pause was marked at the point in the

waveform at which periodicity begins. The end of a vowel followed by a pause

was marked at the point at which the waveform intensity approaches zero, and is

no longer regular.

2. The beginning of a vowel preceded by a stop was marked at the cycle in the

waveform where periodicity begins after the stop release burst. The end of a

vowel followed by a stop was marked at the point at which the stop closure

begins, evidenced by a lack of energy in both the waveform and the spectrogram.

69

3. The beginning of a vowel preceded by a fricative was marked at the point of

offset of turbulent noise, represented by an irregular, aperiodic waveform and the

presence of energy in the high frequencies of the spectrogram. The end of a vowel

followed by a fricative was likewise marked at the point of onset of turbulent

noise.

4. The beginning of a vowel preceded by a nasal or lateral was marked at the point

corresponding to the change in intensity and shape of the waveform, and the

change in intensity of formant frequencies in the spectrographic pattern. The end

of a vowel followed by a nasal or lateral was marked at the point at which

waveform intensity decreases along with the intensity of formant frequencies in

the spectrogram.

5. The beginning of a vowel preceded by a rhotic was marked differently depending

on the realization of the consonants. The typical realization of /ɾ/ is as a voiced

alveolar tap, and a vowel preceded by or followed by this sound was segmented

using the criteria mentioned above for stops. The /r/, however, shows quite a bit

of variation in highland Mexican Spanish. Many realizations match the standard

voiced alveolar trill, or a series of fast taps, and the segmentation of vowels

adjacent to these realizations will be treated similarly to /ɾ/. However, Rissel

(1989) identified several other variants, including a voiced alveolar flap, a voiced

alveolar fricative, a voiced unreleased alveolar flap, a voiced assibilated trill, a

voiced assibilated fricative, and a voiceless assibilated fricative. I have also

70

identified some of these variants in my data, and thus, each phonological /r/ will

be treated according to its phonetic realization, with adjacent vowels being

segmented according to criteria mentioned above for either stops or fricatives.

6. The beginning of a vowel preceded by an approximant was marked at the point

corresponding to an increase in intensity in the waveform and the spectrogram.

The end of a vowel followed by an approximant was marked at the point

corresponding to a decrease in intensity in the waveform and spectrogram.

When analyzing devoiced and deleted vowel tokens, special attention was paid to landmarks associated with flanking consonants, because standard criteria for identifying vowels (periodicity in the waveform and presence of a voice bar and/or formant structure) were not reliable criteria. Figure 3.1 shows an example of how vowels were measured and labeled. The first vowel /a/ has full modal voicing and the second vowel /o/ has no voicing (hence the “0” in the label).

71

Figure 3.1. Praat TextGrid example of the measuring and labeling of vowels

Modal voicing duration was determined by the presence in the spectrogram of a voice bar, which is a dark band of energy representing the fundamental frequency (F0), or the frequency at which the vocal folds vibrate, and the presence of periodicity in the waveform. Preliminary acoustic analysis showed that, instead of simple presence or absence of voicing, many tokens show weak or breathy voicing that can be partial in some cases. Weak or breathy voicing is characterized by a lower intensity in the waveform, a lighter voicing bar, and lighter bands indicating formant structure the spectrogram. Additionally, the presence of frication, or energy in the higher frequencies, has emerged as a way to distinguish devoiced segments from weakly voiced segments when other aspects of the acoustic signal are not clear indicators of the presence or absence of voicing. In the majority of tokens analyzed, full modal voicing is found from the beginning of the vowel to somewhere between the midpoint and the end. The offset of

72

voicing is characterized acoustically by aperiodicity in the waveform, and a sharp decrease in intensity.

Figure 3.2 shows an example of a partially devoiced /a/ ending the word persona

“person”. The dark black lines extending beyond the spectrogram indicate the devoiced portion of the vowel. The reader will note the lack of voice bar, the aperiodic wave, and some very light energy around the second formant. In Figure 3.3, on the other hand, we can see an example of weak voicing in the final /o/. In the waveform, intensity has decreased substantially, but there is still periodicity, while in the spectrogram, formant structure is difficult to perceive, but there is a light voice bar.

Figure 3.2. Spectrogram and waveform for a partially devoiced token in the final /a/ of persona, “person”

p e ɾ s o n a

73

Figure 3.3. Spectrogram and waveform for a partially weakened token in the final /o/ of dos años “two years"

d o s a ɲ o s

As a complement to the major analyses of voice weakening and shortening, I also analyzed vowel quality. Barajas (2014) found raising of /e/ and /o/ in another variety of

Mexican Spanish, that spoken in Colongo, Michoacán, and Lope Blanch (1963) mentions possible centralization of vowel targets as well. My own auditory impressions led me to believe changes in vowel quality are not a major contributor to vowel reduction in MCS, but acoustic data was needed in order to confirm or reject this hypothesis. To that end, I took F1 and F2 measurements from a subset of the data that included the 16 speakers from my sample that exhibited rates of voice weakening higher than 5%. F1 and F2 measurements were taken for all vowels produced by those speakers, except those tokens that were coded as “apparently deleted”. I used a modified Praat script to extract F1 and

74

F2 values from the midpoint of each vowel, in order to minimize any effects of flanking consonants. These values were then submitted to several statistical comparisons, as described in section 4.5.

3.4.3 Defining the dependent variables

Two dependent variables, voice weakening and shortening, were derived from the acoustic measurements in order to quantify their contributions to vowel reduction and understand their linguistic and social patterning.

3.4.3.1 Voice weakening

The dependent variables that represent voice weakening in this study are both continuous and categorical. Vowel devoicing in Andean Spanish and in other languages has been shown to be a gradient process, and results from a preliminary analysis indicate that the same is true of the Mexican data. In order to capture the possible contributions of phonetic gradience to variation, I used the acoustic measurements of vowel duration and voicing duration to derive the percentage of full modal voicing, which is used as a continuous dependent variable. In addition, in order to be able to compare my results with previous vowel reduction research, as well as to understand possible relevance of distinct weakening categories, I created a categorical variable based on the degree of voice weakening. This allowed me to capture the full range of possible manifestations of

75

weakened and full vowels, using a combination of voicing and duration to create categories. Preliminary analysis of voicing led to the proposal of the following categories, each of which is explained in greater detail and exemplified below: fully voiced, weakly voiced, partially weakly voiced, partially devoiced, completely devoiced, and apparently elided. By analyzing voicing as both a categorical and a continuous variable, I ensure that

I do not overlook any important details about the process.

While Delforge and other researchers (Cedergren 1986, Dauer 1980, Jun et al

1997) used an arbitrary cutoff of 30 ms to categorize a vowel as “partially voiced”, I instead calculate the percentage of full voicing in order to be able to analyze voicing as a continuous variable. Then, for the categorical analysis, I constructed categories based on the distribution of percent full voicing I find in my data. The data analyzed fall into one of several voicing categories:

1) fully voiced: these tokens were characterized by a strong, dark voice bar

throughout the duration of the vowel, and a periodic waveform, as shown in all

three vowels in Figure 3.4.

76

Figure 3.4. Full modal voicing in each of three vowels of temblores “earthquakes”

2) partially weakened/breathy: these tokens were characterized by a faint voice bar

and lower intensity in waveform for some portion of vowel duration, as shown below

in the final /o/ in dos años “two years”.

77

Figure 3.5. Partial weakening/breathiness in final /o/ of dos años, “two years”

3) partially devoiced: these tokens showed no voice bar and an aperiodic waveform

for a portion of vowel duration. Energy around F2/F3 was sometimes visible in the

spectrogram, as shown in figure 3.6.

78

Figure 3.6. Partial devoicing in final /a/ of personas, “people”

4) completely weakened/breathy: these tokens were characterized by a faint voice bar

and lower intensity in waveform throughout vowel duration, as shown in the final

vowel in figure 3.7.

79

Figure 3.7. Complete weakening/breathiness in final /a/ of pesaditas, “bothersome”

5) completely devoiced: No voice bar, aperiodic waveform throughout vowel

duration. Energy around F2/F3 sometimes visible in spectrogram, as shown below in

figure 3.8.

80

Figure 3.8. Complete devoicing in final /o/ of actos, “acts”

7) apparently deleted: these were tokens that showed no acoustic evidence of the vowel.

Figure 3.9 below shows no temporal dimension for the apparently deleted vowel between the voiceless consonants that flank the word-initial /o/ in ochenta of novecientos ochenta

“nineteen eighty”.

81

Figure 3.9. Apparent deletion in initial /o/ of ochenta, “eighty”

I have described above the several different types of voicing found in the data: full modal voicing, weak/breathy voicing, partial weak/breathy voicing, partial devoicing, complete devoicing, and apparent elision. However, in the inferential statistical models, the following categories were collapsed into one, called “weakened” that represented all types of weakened voicing: weak/breathy voicing, partial weak voicing, partial devoicing, complete devoicing, and apparent elision. Then, tokens with weakened voicing were compared to those with full modal voicing. Although it would have been ideal to be able to analyze the different levels as such, the extremely low rates of voice weakening in some contexts made that impossible, so a binary opposition was used instead.

82

3.4.3.2 Shortening

Shortening as a dependent variable is also represented both as a continuous variable, and a categorical variable. For the continuous analysis, the vowel duration in milliseconds is used as the dependent variable. For the categorical analysis, a vowel was determined to be “shortened” if it was 50% shorter than the average duration for that target vowel, tonicity, and speaker. In order to quantify shortening as a categorical variable, i.e. to establish whether a token had been shortened or not, I first removed 260 tokens with extremely long duration measurements (≥200 ms) from the dataset, resulting in 6,022 total tokens remaining for analysis. I then calculated average vowel duration values in milliseconds for each speaker, by target vowel and stress9, which can be seen in

Appendix D. Each token was then compared to the corresponding average, that is each unstressed /a/ was compared to the average value for unstressed /a/ for that particular speaker, and tokens with a duration 50% or more shorter than the average were coded as

“shortened”. These averages were computed and used as the metric for comparison as a way of normalizing the duration data, rather than setting an arbitrary cutoff for a shortened vowel. For example, if we imagine an arbitrary cutoff of 30 milliseconds for a shortened vowel, there is a risk of overestimating, as well as underestimating the number of shortened tokens. For a speaker with an average value of 53 ms for unstressed /a/, 30 ms represents a cutoff value that is only 43% shorter than the average. By the same token,

9 For example, speaker AS had 10 averages computed, one for each of the 5 vowels (/a/, /e/, /i/, /o/, /u/), stressed and unstressed.

83

for a speaker with an average value of 98 ms for unstressed /a/, setting an arbitrary cutoff of 30 ms would not classify an observation of unstressed /a/ with a duration of 40 ms as shortened, even though it is 59% shorter than the average. An example of a shortened token is shown below in figure 3.10, and the reader may notice that only a few wave cycles are visible in the figure. Thus, the categorical variable shortening had two levels:

“shortened” and “not shortened”.

Figure 3.10. A shortened vowel token, in ese tipo de, “this type of”

e s e t i p o ð e

84

3.4.4 Defining the independent variables

The independent variables under consideration for this study pertain to both the speaker and the linguistic context of the token. Variables pertaining to the linguistic context are discussed in 3.4.4.1, and those pertaining to the speaker are discussed in 3.4.4.2.

3.4.4.1 Linguistic variables

I investigate the contribution of each of the below listed linguistic variables to the variable weakening process in MCS. These variables have been found to influence vowel reduction in the other studies mentioned in Chapter 2. Variants of each are included.10

1) Target vowel: Each token was coded according to what the target vowel was, from a set of 5 monophthongs, /i e a o u/. In the phrase los mexicanos, “the Mexicans”, the vowels would be coded, in order, as /o/, /e/, /i/, /a/, /o/.

2) Preceding context: Each token was coded for preceding consonant , including voicing, manner, and place of articulation11. Coding the preceding consonant for the vowel token in the first word, los, from the example phrase, los mexicanos, would give a preceding vowel /l/, voicing = “voiced”, manner of articulation = “lateral approximant”, place of articulation = “alveolar”. Besides being preceded by a consonant, a vowel token could also be preceded by a pause, as in the utterance: Entre los mexicanos, nos

10 Speech rate was also investigated as a factor in the reading task. The measure was subjective in that participants determined a speed that for them was fast, medium, or slow. Results from this reading task are reported in Dabkowski (2017). 11 Manner and place of articulation were explored in the distributional analysis of the patterning of voice weakening and shortening, but only voicing was included in the inferential statistical analyses.

85

entendemos, “Among Mexicans, we understand each other.” The preceding context for the initial /e/ in entre in this utterance would be coded as a pause.

3) Following context: Each token was coded for following consonant phoneme, including voicing, manner, and place of articulation. To use the same example as above, the following consonant for the vowel in los would be coded as /s/, voicing = “voiceless”, manner of articulation = “fricative”, place of articulation = “alveolar”. Besides being followed by a consonant, a vowel token could also be followed by a pause. The following context for the final vowel /o/ in the utterance Me gusta mucho, “I like (it) a lot”, would be coded as a pause.

4) Relationship to primary word stress: each token was coded according to the position of the syllable containing it with regard to primary stress. Each token was coded as stressed or unstressed, and all unstressed vowels were coded as pre-tonic, post-tonic, or unstressed monosyllabic word. For example, in the phrase los mexicanos, “the Mexicans”, the /o/ in los would be coded as occurring in an unstressed monosyllabic word; the /e/ and /i/ of mexicanos would be coded as occurring in pre-tonic syllables, the /a/ of mexicanos would be coded as occurring in the stressed syllable, and the final /o/ would be coded as occurring in a post-tonic syllable.

5) Syllable type: each token was coded as occurring in an open or closed syllable, based on whether the syllable contained a coda. The /a/ in the final syllable in habla “he/she speaks” would be coded as occurring in an open syllable, while the final syllable of hablan “they speak”, would be coded as occurring in a closed syllable.

86

6) Word position: each token was coded according to whether it occurred word-finally or not. No distinction was made between word-initial and word-medial positions; both were coded as word-internal. The /o/ in the initial syllable of obispo, “bishop”, would be coded as word-internal, while the /o/ in the final syllable would be coded as word-final.

3.4.4.2 Social Variables

The external independent variables under investigation are age, gender, and socioeconomic status (operationalized by using the speaker’s highest level of formal educational attainment).

3.4.4.2.1 Gender

Labov (1990) identifies some consistent patterns with regard to gender and linguistic variation and change that are present across a wide range of languages and regional settings, including Spanish varieties in Latin America (Cedergren 1973, Fontanella de

Weinberg 1974, Alba 1990) and Spain (Holmquist 1985). He formulates these as the following principles of linguistic change (1990: 210, 213, 215):

Principle I. For stable sociolinguistic stratification, men use a

higher frequency of nonstandard forms than women.

87

Principle la. In change from above12, women favor the incoming

prestige form more than men.

Principle II. In change from below13, women are most often the

innovators.

However, there are exceptions to these principles, typically related to women’s roles in the particular society under study. Eckert (1989), and later, Eckert and

McConnell-Ginet (1992) advance an explanation for linguistic differences between men and women based on the idea of the linguistic marketplace (Sankoff & Laberge 1978): women’s greater need for symbolic capital, and language as a “symbolic resource”

(Eckert and McConnell-Ginet 1992: 483).

In Mexico, gender has been found to influence linguistic variation. Rissel (1989), in her investigation of assibilation of /r/ in young speakers from San Luis Potosí, Mexico, finds that women exhibit higher rates of assibilation, in particular women who hold traditional attitudes about gender roles, while men holding traditional views assibilate the least. She finds interactions with class as well, and her findings lead her to propose that when "a variable that carries local prestige is introduced by women of the middle and upper classes, it will spread to women of the next lowest sociocultural group, who, being

12,13 Change from “above” or “below” refers to the level of consciousness or social awareness. In stable variation or change from above, linguistic variants may be prestige forms or stigmatized forms, and in change from below, according to Labov (1966), there is no important distinction with regard to prestige or stigmatized forms.

88

particularly sensitive to the prestige pattern (Labov, 1972:243), will adopt that trait to such an extent that younger female members of that group will surpass the originators of the change” (282).

Aaron (2004) explores the gendered use of a morphosyntactic variable across time in Mexico, the pronominal form of salir(se) “to leave”, to understand the ways in which it reflects and is constitutive of societal gender norms. She not only finds that the form is used more by women than by men, but also that diachronically the form has tended to be applied to women’s behavior, that is, used with female referents. She finds the differences in men’s and women’s usage of the form to be partially influenced by their choice of topic and semantic content, in line with broader theories of language and gender, and interprets her results as an indication of “both the relative expressive freedom of women’s speech and the socially constrained nature of expectations for female behavior in colonial and contemporary Mexican society” (585). Looking at gender variation in the present study will add to our understanding of women's role in the linguistic marketplace in Mexico.

3.4.4.2.2 Age

Eckert 1997 provides a perspective of age and aging in sociolinguistics as reflective of one’s linguistic life course. In doing so, she also offers a concise summary of age as a sociolinguistic construct, and the implications of age stratification of linguistic variables.

She defines age as “a person's place at a given time in relation to the social order: a stage,

89

a condition, a place in history” (1997: 151). Historical , or the change over time of the speech of a community is understood by viewing age as a “place in history” for a particular speaker. Age grading, or change in an individual’s speech over time is based on the view of age as a stage or condition. Trend studies can show change in progress, while panel studies show change in the individual lifetime.

As Eckert notes, age can be “incorporated into social structure and invested with value in culturally specific ways” (155), so age cohorts for community studies of linguistic variation may be defined etically or emically. Etic definitions of age are determined arbitrarily, and based on equal age spans such as decades. Emic definitions are based on some shared experience of time, which might be based on life stage, or the experience of shared external events, as we have seen above for the Cuzco earthquake and resulting increased access to other Spanish varieties (Delforge 2009, 2012).

Possibilities for emic definitions of age for Mexico City might take into account the shared experience of such events such as the 1968 Tlatelolco student massacre and the

Olympic Games that followed, and the 1985 earthquake. Although an argument could be made for an emic treatment of age in Mexico City, for simplicity’s sake, I used three groups divided based on my sample. The younger group included participants between the ages of 21 to 33, the middle group 34 to 50, and the older group 51 to 81.

90

3.4.2.2.3 Socioeconomic status

“The linguistic data will help illuminate the structure of our society and

identify social divisions and points of conflict and convergence. They will

illustrate the class-based nature of standard varieties of language and the

subjective nature of linguistic prejudice. And they will help reveal the

sources of social innovation and the motivations of the innovators” (Guy,

1988: 37).

The above quote indicates the importance of social class in linguistic variation. But however central the theoretical construct of social class is to the sociolinguistic enterprise, it is not without its problems. While the examination of variation according to social class has yielded robust results, it is complex and not well-defined. Ash (2002) notes that while social class has been one of the most productive dimensions in sociolinguistic research, she characterizes its use by many researchers as mechanical and naïve, and laments the fact that although the field of sociology has made great strides in understanding social class and its centrality to social structure, sociolinguists do not often interact with the sociological research, and moreover, within the field of sociolinguistics, class has not been systematically studied. Chambers also observes the definitional problems with class, calling it a “fuzzy categor[y]... better defined for prototypical members than for peripheral ones" (2003: 43). This observation captures the recognition that class is not merely based on objective, economic measures such as income and property ownership, but also takes into account subjective measures of power, prestige,

91

control, reputation, and status (Ash 2002). Nonetheless, it is worth continuing to explore social class as a variable because it clearly represents one of the most important ways in which we understand and categorize our social worlds, and because of this, it is likely to affect linguistic variation.

Ash (2002) reports that many sociolinguistic studies, in order to operationalize class in a way that take into account the multiple markers of class, create composite scores which typically include some combination of income, education, and occupation, assigned numerical scores and ranked. Local measures of prestige may also be included, such as Cedergren’s (1973) neighborhood measure of percentage of homes with refrigerators, or Labov’s (2001) measure of house upkeep. However, Chambers signals a move away from composite scores toward the use of occupation alone to determine social status, calling the composite scores “much more elaborate than is necessary in sociolinguistics” (2003: 52). While Ash criticizes this move as overly simplistic, she also notes that occupation typically makes the greatest contribution to composite indices, and

Delforge (2009) observes that this may be due to its strong correlation with other measures like education and income.

Delforge (2009) also recognizes that applying a concept of social class conceived with western industrialized societies in mind to third world societies may be problematic.

However, she reviews several studies that made use of class divisions based on either education or occupation or a combination of the two, and finds class to be a significant predictor of variation. The studies she reviews were from across Latin America: Ecuador

92

(Gomez 2003), Puerto Rico (Lopez Morales 1983, Prosper Sanchez 1995), Argentina

(Fontanella de Weinberg 1972), (Silva Corvalan 1979), the Yucatán (Michnowicz

2006, Solomon 1999), Costa Rica (Adams 2002). Other studies like Cedergren (1973) for

Panama, Berk Seligman and Seligman (1978) for Costa Rica, and De los Heros (1997) for Cuzco, found relevant distinctions in variation by using composite scores that include locally relevant criteria to approximate social class. Delforge concludes that despite the controversy, conceptions of class in Latin America are similar enough to allow the same factors used in the U.S. and Europe studies; that is, either education or occupation, or a combination of the two.

Based on the above discussion, socioeconomic status (or social class) for this study will be based on the speaker’s level of education. The scale for education is based on the main schooling divisions in Mexico: primaria, secundaria, preparatoria, licenciatura, especialización, maestría, doctorado. Children attend primaria from ages 6 to 12, secundaria from 12-16, and preparatoria or bachillerato from 16-18. Primaria is compulsory, as is secundaria, unless the student chooses some form of vocational training instead. Preparatoria is designed to prepare students for university study, and is not compulsory. There are vocational alternatives at this level as well, usually referred to as carrera técnica. Licenciatura refers to a 4-year university degree, and especialización is a one-year post-licenciatura program, with a special focus in one area. Maestría is typically a two-year Master's degree program, and doctorado is the equivalent of doctorate, which could last many years, depending on the field. For each participant, I

93

coded the highest level of formal education they had attained, but because of the way data were distributed across the different variables, these were collapsed into three main categories: “finished college”, “prep or tech”, “primary/secondary”. The “finished college” level includes any participants who had completed at least a licenciatura, but also several who had completed especialización or maestría after their licenciatura. The

“prep or tech” level includes those participants who had attended and completed preparatoria or vocational school and some of whom may have attended some college, but they did not complete licenciatura. Finally, the “primary/secondary” level includes those participants who had completed only primaria as well as those who had completed secundaria.

3.5 Statistical analysis

Descriptive statistics help researchers understand the general patterns of variation in their dataset, i.e. how many or what percentage of tokens of a particular variant were uttered by a particular type of speaker, or in a particular context, however these results cannot be generalized to a wider population. Inferential statistics are needed in order to draw conclusions from the data about a wider population.

Because many factors influence sociolinguistic variation, multivariate analyses are needed in order to understand the contributions of each independent social and linguistic variable on a particular dependent variable. The sociolinguistic standard has been the use of the statistical analysis software GoldVarb (Sankoff, et al. 2005), which

94

can only handle fixed effects regression models. However, for the Mexican Spanish vowel reduction data, these proved to be inadequate. Independence is one of the assumptions of regression analysis, and because my data contain multiple tokens from each speaker, each data point cannot be said to be independent. Because of this, a mixed effects model including speaker as a random variable was used. In order to analyze continuous variables, such as percent voicing, a linear regression model can be run, however that model assumes normally distributed data. Since the distribution of my data is bimodal, with peaks at 0% and 100%, I can use regression with an inflated beta distribution, since it is designed to work with percentages. For the categorical analysis, I used ordinal logistic regression, which allows for more than two categories.

In addition to the regression models, two non-parametric tests were also used to assess the data: random forests and conditional inference trees. According to Tagliamonte and

Baayen “[r]andom forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate” (2012:

135). Because my data are not distributed evenly and do contain variables that are collinear, random forests are useful for my analysis. Conditional inference trees provide information about interactions between independent variables, and “straightforwardly visualize how multiple predictors operate in tandem” (Tagliamonte and Baayen

2012:135). According to Tagliamonte and Baayen, conditional inference trees and

95

regression analyses provide slightly different information, so they are particularly informative when used together.

96

CHAPTER 4: RESULTS

4.1 Introduction

This chapter presents an exploration of the distribution of the data, as well as the results of the various descriptive and inferential statistical tests performed on this data. The statistical analyses described in this section include several different dependent and independent variables. The acoustic analysis revealed both shortening and voice weakening as contributors to reduction in this variety, and subsequent examination of patterning revealed that the two strategies seem to be independent of each other; that is, many tokens exhibited either voice weakening or shortening, but typically not both.

Because of this, it was necessary to perform mixed effects linear and logistic regression analyses of each dependent variable separately, in order to understand which of the linguistic and social factors contribute most to the variation in duration and in voice weakening. Both voice weakening and shortening were analyzed as both categorical and continuous variables. As discussed in Chapter 3, the independent variables considered were the target vowel; preceding and following contexts, including pauses, and voicing of adjacent consonants14; position relative to primary word stress; syllable type; word

14 Manner and place of articulation were also explored, but due to the low rates of both voice weakening and shortening, could not be included in any inferential statistical models. 97

position; gender; age; and speaker’s level of education (as a proxy for socioeconomic status). I also explored the average F1 and F2 values in order to assess changes in vowel quality as a factor contributing to vowel weakening in this variety.

4.2 Distribution of data

This section explores the patterning of both voice weakening and shortening overall and as they each relate to the independent variables explored, which are: target vowel, stress, position relative to stress, preceding and following contexts, syllable type, word position, speaker gender, speaker age and speaker level of education.

While speech rate was not considered in this analysis, it was investigated in

Dabkowski (2017), where I found that voice weakening is not dependent on speech rate.

A reading task was administered together with the sociolinguistic interviews from which the data reported on in this dissertation were extracted, and while there was not very much shortening found, presumably because participants were consciously trying to control their tempo, so the results only speak to voice weakening, and not shortening. A flaw in the design of the task administration complicates the results, however there was at least one important conclusion that should be highlighted before continuing the present discussion: vowel reduction in MCS occurs independent of speech rate. The task was administered in such a way that participants were asked to read sentences at self- determined normal, fast, and slow speech rates. The flaw was that they read each sentence three times in succession, always in the same order: normal, fast, slow. Counter to expectations, the most voice weakening was found in the slow condition, which I

98

interpret as an effect of the order of task presentation: since participants had already read each sentence twice before reading it slowly, they were aware that no new information was being communicated and therefore relaxed their articulation accordingly. Therefore, we can preliminarily conclude that MCS voice weakening cannot be entirely accounted for by speed-induced overlap, and thus we look to the other factors explored here.

4.2.1 Voice weakening

Voice weakening was explored both as a categorical variable and as a continuous variable. Categories of voice weakening were determined and assigned to each token based on the acoustic analysis, as described in section 3.4.3.1. These include: fully voiced, partially weakened, completely weakened, partially devoiced, completely devoiced, and apparently deleted. For the continuous analysis, a measure of percent full voicing was used, also described in section 3.4.3.1. In this section, I focus on the distribution of data as it pertains to the categorical dependent variable of voice weakening, and do not report further on tendencies for percent of full modal voicing.

Overall rates of voice weakening in the data explored here are quite low. Table

4.1 shows the rates of the different categories of voicing found in the data. As the table shows, only 6.6% of vowels measured were found to have weakened voicing of any kind, including all of the categories that are not fully voiced: partially weakened, completely weakened, partially devoiced, completely devoiced, and apparently deleted. Although the overall rates are quite low, there is quite a bit of individual variation among speakers as well. Figure 4.1 shows that the speakers analyzed here have a wide variety of individual

99

voice weakening rates, between 1.74% and 15.68%.

Table 4.1. Overall rates of voice weakening

Voicing Counts Percent Fully voiced 5624 93.39% Weakened voicing 398 6.61% Partially weakened 139 2.31% Completely devoiced 93 1.54% Completely weakened 81 1.35% Partially devoiced 43 0.73% Apparently deleted 42 0.70% Total 6022 100%

Previous studies that looked at vowel weakening in Spanish only considered unstressed vowels and for this reason, I examine voice weakening according to stress.

When we consider the unstressed vowel tokens separately from stressed vowels, in Table

4.2, we can see that although still relatively low, the rate of voice weakening increases to

8.36%. The rate for stressed vowels is quite a bit lower, at only 3.42%. When we examine the rates for individual speakers based on stress, we see even greater individual rates for voice weakening in unstressed positions, up to 22.12%, as shown in figure 4.2.

Table 4.2. Rates of voice weakening by stress

Stressed Unstressed Voicing Frequency Percent Frequency Percent Fully voiced 2060 96.56 % 3564 91.64 % Weakened voicing 73 3.42 % 325 8.36% Partially weakened 31 1.45 % 108 2.78 %

100

Completely devoiced 13 0.61 % 80 2.06 % Completely weakened 15 0.70 % 66 1.70 % Partially devoiced 8 0.38 % 35 0.90 % Apparently deleted 6 0.28 % 36 0.93 % Total 2133 100 % 3889 100 %

101

Figure 4.1. Individual rates of voice weakening

Voice weakening rates by Speaker 0.15 0.10 Percentage of voice weakening of voice Percentage 0.05 0.00

HOM SMF MVN MMF SVF TTM AMF PMC MGF KOM MDK FBM ASC JGA MCF SBF PVM LGM URK RPF MRG LAL ESF ASF LMM RTG EVG ASM TBF PDF HMM HAF GTF VWM ABF JVL FDM MJM JUN TZM Speaker

102

Figure 4.2. Individual rates of voice weakening for unstressed vowels only

Voice weakening rates by speaker in unstressed vowels 0.20 0.15 0.10 Percentage of voice weakening of voice Percentage 0.05 0.00

HOM TTM SVF SMF MVN MMF PMC SBF AMF ASC URK MGF RPF MCF KOM JGA ESF LGM MDK FBM PVM MRG EVG ASF LAL LMM RTG ASM TBF HMM PDF GTF ABF HAF JUN JVL VWM MJM FDM TZM Speaker

103

Because of the low incidence overall of tokens with voice weakening, and the even lower incidence of voice weakening among stressed vowels, it was necessary to collapse several of the categories together. The categories “completely devoiced” and

“apparently deleted” were collapsed into the category “completely devoiced/deleted”.

The categories “completely weakened”, “partially weakened/breathy”, “partially devoiced” were collapsed into the category “intermediate”. Figure 4.3 shows another view of the information contained in Table 4.2, with the aforementioned collapsing of categories present. The figure shows raw counts of stressed and unstressed tokens, and it is easy to see the lower incidence of any type of weakened voicing in stressed vowels, as compared to unstressed vowels.

104

Figure 4.3. Voice weakening type by stress

4000

3000

Voice weakening fully voiced 2000 intermediate count completely devoiced/deleted

1000

0

stressed unstressed Stress

Ultimately, because of the extremely low rates of any individual category of weakening, the intermediate weakening category needed to be further collapsed together with the completely devoiced and deleted category to create one category containing all types of weakening. All further data visualization charts compare only two categories: weakened vowels and fully voiced vowels. To further explore the effect of lexical stress,

I considered the position of each vowel relative to main lexical stress. Each token was coded for this based on occurrence of the vowel in a stressed syllable, a post-tonic syllable, a pre-tonic syllable, or a monosyllabic unstressed word. Figure 4.4 shows the

105

rates of weakening for each of these categories. As the chart makes clear, the rate of voice weakening for vowels in post-tonic syllables is much greater than for any of the other three positions. Vowels in pre-tonic and stressed syllables weaken at similar rates:

3.25%, and 3.42%, respectively. Unstressed monosyllabic words have a slightly higher rate of voice weakening at 4.63%, but none of these compare to the rate of voice weakening for vowels in post-tonic position, at 15.72%.

Figure 4.4. Voice weakening by position relative to primary word stress

Voice Weakening by Position Relative to Lexical Stress 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

15.72%

4.63% 3.25% 3.42% 0.0 unstressed monosyllabic word pretonic post−tonic stressed Position relative to stressed syllable

106

Voice weakening rates were also explored across each target vowel, in order to understand whether any vowels were more likely to weaken when compared to others.

First, it is important to consider the relative frequency of each target vowel in the corpus.

Table 4.3 shows the frequency of occurrence of each vowel, as well as the number of fully voiced and weakened occurrences. The high vowels /i, u/ have much lower rates of occurrence which correspond to their overall lower frequency in Spanish. Figure 4.5 shows the rates of voice weakening within each vowel type: /a, e, i, o, u/. For /a/, /e/, and

/i/, the rate of voice weakening hovers around 6.00%. For /o/, it is quite a bit higher at

9.00%, and for /u/, it is much lower at only 4.32%. Because of the much lower rates of occurrence for the high vowels, and especially due to their uneven distribution across the other linguistic and social factors, they were collapsed together into one “high vowel” category for the inferential statistical analyses.

Table 4.3. Frequency of occurrence and voice weakening rates of each target vowel

Target vowel /a/ /e/ /i/ /o/ /u/ Fully voiced 1585 1609 670 1406 354 Weakened 103 100 40 139 16 Totals 1688 1709 710 1545 370 Weakening rates 6.1% 5.85% 5.63% 9.00% 4.32%

107

Figure 4.5. Voice weakening rates by target vowel

Voice Weakening by Target Vowel 1.0 Fully Voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

9.00% 6.10% 5.85% 5.63% 4.32% 0.0 a e i o u Target Vowel

The preceding and following contexts of the vowel of interest were also considered as a potential factor influencing voice weakening. The left panel of figure 4.6 shows rates for voice weakening across three following contexts: voiceless consonant, voiced consonant, and pause. As the graph clearly shows, vowels followed by a pause were much more likely to be weakened than those followed by a consonant: 30.03% of these vowels are weakened. For those vowels that are followed by a consonant, only

1.39% are weakened when that consonant is voiced, compared to 9.75% when the following consonant is voiceless. However, due to the overall low rates of voice

108

weakening, assessing these three levels separately was not possible, because of empty cells in the cross-tabulation matrix when additional factors were added. Because of this, the following voiceless consonant and pause contexts were collapsed into one category that included both contexts for the purposes of conducting further statistical analyses of voice weakening. For example, the final /o/ vowel in actos, “acts”, would be considered in the final statistical analysis as part of the same following context as the final vowel in an open pre-pausal syllable in acto, “act”. The category containing both of those contexts is compared to the category containing tokens with a following voiced consonantal context, for example the initial /a/ vowel in mamá, “mom”.

While we see well-defined differences between weakening rates according to the following context, the distinction between weakening rates according to preceding contexts is less obvious. The right panel in Figure 4.6 shows that the weakening rates across preceding contexts are all fairly similar. Preceding voiceless contexts do result in slightly higher rates of weakening than preceding pauses, at 7.15% and 5.19%, respectively, with preceding voiced consonants falling in the middle. However, due to empty cells in the cross-tabulation matrix, the preceding voiced consonant and pause contexts were collapsed into one category that included both contexts for the purposes of conducting further statistical analyses of voice weakening. Since the voice weakening rates differed by less than 1%, their patterning was similar enough to justify collapsing categories in this way.

109

Figure 4.6. Voice weakening rates by following (left panel) and preceding (right panel) context

Voice Weakening by Following Context Voice Weakening by Preceding Context 1.0 1.0 Fully voiced Fully voiced Weakened Weakened 0.8 0.8 0.6 0.6 0.4 0.4 Percentage of voice weakening of voice Percentage weakening of voice Percentage

30.03% 0.2 0.2

9.75% 7.36% 6.09% 5.19%

1.39% 0.0 0.0 voiceless voiced pause voiceless voiced pause

Another factor explored as a possible predictor was syllable type. Open syllables were compared to closed syllables and found to weaken at similar rates. Open syllables weakened at a slightly lower rate, 0.89% lower than the 7.25% weakening found for closed syllables. Figure 4.7 shows the comparison.

110

Figure 4.7. Voice weakening rates by syllable type

Voice Weakening by Syllable Type 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

6.36% 7.25% 0.0 open closed Syllable type

The position of the vowel within the word was also considered, and a sizeable difference was found between word-final vowels and non-final, or word-internal, vowels.

Figure 4.8 shows that 9.77% of word-final vowels were weakened, whereas only 3.56% of word-internal vowels were weakened. Word-final position also corresponds in many

(but certainly not all) cases with post-tonic position, so this pattern is not surprising.

111

Figure 4.8. Voice weakening rates by word position

Voice Weakening by Word Position 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

9.77%

3.56% 0.0 word final word internal Word position

The age of the speaker was also considered, and the younger age group, that is, speakers between the ages of 21 and 33, showed lower rates of voice weakening than either the middle or older groups. The difference between middle and older speakers was negligible (7.13% and 6.82%, respectively), as can be seen in figure 4.9.

112

Figure 4.9. Voice weakening rates by age group of the speaker

Voice Weakening by Age of Speaker 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

7.13% 5.99% 6.82% 0.0 21−33 34−50 51−81 Age of speaker

Voice weakening was also analyzed with regard to the speakers’ highest level of educational attainment. As figure 4.10 shows, while the group of speakers who had graduated from a university has a lower voice weakening rate (4.93%) than the other groups, the other two groups, those who completed preparatory school or a technical vocational school (8.00%) and those who did not advance beyond secondary school

(7.94%), show nearly the same rates of voice weakening.

113

Figure 4.10. Voice weakening rates by speaker’s education level

Voice Weakening by Formal Educational Attainment of Speaker 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

8.00% 7.94% 4.93% 0.0 finished college prep or tech primary secondary Highest level of formal education of speaker

Figure 4.11 shows the distribution of weakening by gender of the speaker. As the figure shows, there was only a negligible difference in these rates: on average, the speech of males contained slightly more voice weakening than that of females, but the difference was slight, only 0.53%.

114

Figure 4.11. Voice weakening rates by gender of speaker

Voice Weakening by Speaker Gender 1.0 Fully voiced Weakened 0.8 0.6 0.4 Percentage of voice weakening of voice Percentage 0.2

6.36% 6.89% 0.0 female male Speaker gender

4.2.2 Shortening

Overall, I found that 7.41% of vowels in the corpus were shortened: 7.66% of unstressed vowels, and 6.94% of stressed vowels. Like voice weakening, shortening also showed a high degree of individual variation. Individual speakers varied in their rates of shortening from 1.91% to 16.90% overall, as shown in figure 4.12 and from 1.94% to 22.12% for unstressed vowels, as shown in figure 4.13.

115

Figure 4.12. Individual rates of shortening overall

Shortening rates by speaker 0.15 0.10 Percentage of shortening Percentage 0.05 0.00

HOM MVN MDK TTM HMM SMF MRG LAL SVF RPF JGA AMF ASF GTF MJM JVL URK RTG LMM EVG JUN LGM ASC ABF MMF PDF MGF VWM ESF SBF ASM MCF FBM FDM TZM PVM KOM TBF PMC HAF Speaker

116

Figure 4.13. Individual rates of shortening for unstressed vowels only

Shortening rates by speaker in unstressed vowels 0.20 0.15 0.10 Percentage of shortening Percentage 0.05 0.00

HOM TTM SVF SMF MVN MMF PMC SBF AMF ASC URK MGF RPF MCF KOM JGA ESF LGM MDK FBM PVM MRG EVG ASF LAL LMM RTG ASM TBF HMM PDF GTF ABF HAF JUN JVL VWM MJM FDM TZM

Speaker

117

Vowels differ in their inherent duration based on quality: high vowels tend to be shorter, while low vowels tend be longer (Lehiste, 1970). In Spanish, duration is also a phonetic cue of lexical stress (Ortega-Llebaria, 2006), so in this study, stressed and unstressed vowels were considered separately for each vowel (/a, e, i, o, u/). Figure 4.14 demonstrates the average vowel duration measurements, in milliseconds (ms), across the entire set of interview data, separated out by target vowel (/a, e, i, o, u/) and by stress.

The pane on the left shows unstressed vowels, and as expected, these are shorter on average than their stressed counterparts, and also less variable (hence, the smaller boxes, representing less distance between the first and third quartiles). As expected, /a/ is the longest and high vowels /i/ and /u/ are the shortest.

118

Figure 4.14. Vowel duration in ms, shown for each target vowel, separated by stress

Unstressed Stressed

200

150

100 Vowel duration (ms) duration Vowel

50

0

a e i o u a e i o u Target vowel

Vowel shortening was examined from two different perspectives: as a categorical variable and as a continuous variable. In the categorical analysis, vowels were classified as either “shortened” or “not shortened” using the criteria described in Chapter 3, and that classification was used as the dependent variable. In the continuous analysis, a measurement of each vowel’s duration was taken, in milliseconds, and that measure was used as the dependent variable. As above for voice weakening, I focus on the distribution of data as it pertains to the categorical dependent variable of shortening, and do not report further on tendencies for duration.

119

Figure 4.15 shows the rates of shortening for the categorical analysis based on the target vowel. As the figure shows, the highest rate of shortening occurred for /i/, at

10.14%, followed closely by /o/ at 9.39%, /e/ at 8.72%, and /u/ at 7.03%. Productions of

/a/ were infrequently shortened: only 3.20% of occurrences were shortened.

Figure 4.15. Vowel shortening rates by target vowel

Shortening by Target Vowel 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

10.14% 8.72% 9.39% 7.03%

3.20% 0.0 a e i o u Target Vowel

Shortening rates for different positions relative to lexical stress are shown in figure 4.16. Vowels in pre-tonic and stressed syllables shorten at similar rates, neither

120

very much more than the rate of post-tonic shortening, at 6.94%. Vowels in monosyllabic unstressed words are shortened at 9.63%, a good deal higher than the other positions.

Figure 4.16. Vowel shortening rates by position relative to lexical stress

Shortening by Position Relative to Lexical Stress 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

9.63 7.53% 6.37% 6.94% 0.0 unstressed monosyllabic word pretonic post−tonic stressed Position relative to stressed syllable

Figure 4.17 shows the shortening rates by following context on the left, and by preceding context on the right. Both following and preceding contexts show the highest rates of shortening in the voiceless condition, at 10.09%, and 11.66%, respectively. For following context, the next highest rate of shortening is for voiced consonants, at 6.04%,

121

and only 2.51% for vowels followed by pauses. Preceding pauses, on the other hand, result in a higher rate of vowel shortening, 9.91%, while preceding voiced consonants result in a much lower rate, 3.74%.

Figure 4.17. Vowel shortening rates by following (left panel) and preceding (right panel) context

Shortening by Following Context Shortening by Preceding Context 1.0 1.0

Not shortened Not shortened Shortened Shortened 0.8 0.8 0.6 0.6 0.4 0.4 Percentage of shortened vowels Percentage of shortened vowels Percentage 0.2 0.2

11.66% 10.09% 9.91%

6.04% 3.74% 2.51% 0.0 0.0 voiceless voiced pause voiceless voiced pause

122

The shortening rates for open and closed syllables are represented in figure 4.18, but as the figure shows, there is only a negligible difference, with vowels in open syllables being produced with 0.40% higher rate of shortening.

Figure 4.18. Vowel shortening rates by syllable type

Shortening by Syllable Type 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

7.49% 7.19% 0.0 open closed Syllable type

Word position shows an even smaller difference in shortening rates. Word- internal vowels shorten at only a very slightly higher rate, 7.63%, than word-final vowels at 7.06%, with a difference of only 0.45%, as shown in figure 4.19. It is useful to think

123

about word position in terms of position relative to stress as well. Often (but not always), word-final position represents post-tonic position, and word-internal position includes pre-tonic and stressed positions.

Figure 4.19. Vowel shortening rates by word position

Shortening by Word Position 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

7.18% 7.63% 0.0 word final word internal Word position

The social variables show less variation for shortening than they do for voice weakening. Unlike the rates of voice weakening across different age groups, younger speakers do not exhibit lower rates of voice weakening. Figure 4.20 below shows the

124

vowel shortening rates by the age group of the speaker, and it is clear that there are only very small differences between the groups, less than 1.00% between the highest and lowest rates.

Figure 4.20. Vowel shortening rates by age group of speaker

Shortening by Age of Speaker 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

7.75% 7.55% 6.95% 0.0 21−33 34−50 51−81 Age of speaker

The highest level of formal educational attainment of the speaker also did not vary greatly. The lowest rate of shortening, 6.51%, was found for those who had finished

125

primary or secondary school, and only differed from the highest rate, 8.45%, by 1.94%, as shown in figure 4.21.

Figure 4.21. Vowel shortening rates by speaker’s highest level of formal educational attainment

Shortening by Formal Educational Attainment of Speaker 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

7.57% 8.45% 6.51% 0.0 finished college prep or tech primary secondary Highest level of formal education of speaker

Finally, the difference in shortening rates between male and female speakers, while greater than the small difference found for voice weakening, was still fairly small,

126

with females shortening vowels just 1.25% more often than males. This difference is shown in figure 4.22.

Figure 4.22. Vowel shortening rates by gender of speaker

Shortening by Speaker Gender 1.0 Not shortened Shortened 0.8 0.6 0.4 Percentage of shortened vowels Percentage 0.2

8.00% 6.75% 0.0 female male Speaker gender

4.3 Inferential statistical models

After exploring the distribution of both voice weakening and shortening, a number of inferential statistical models were created, in order to better understand the contributions of each variable to the patterns observed above in section 4.2. This section details the

127

statistical tests performed to evaluate the contribution of the independent variables to the likelihood of voice weakening and shortening, as well as to understand the ways in which independent variables interact with each other. Statistical tests and models reported here include parametric and non-parametric regression models, including mixed effects logistic regression, mixed effects linear regression, univariate distributional regression, random forests, conditional inference trees, and Wilcoxon signed rank tests.

Due to the low overall rates of voice weakening and shortening, several of the levels for each independent factor needed to be collapsed together in order to avoid empty cells in the cross-tabulation of results. Table 4.4 shows each factor with its original levels and collapsed levels. Speaker gender, syllable type, and word position do not appear in the table because each of these variables originally contained only two levels.

128

Table 4.4. Factors and levels that were collapsed

Factor Original levels Collapsed levels Target vowel 5 (a, e, i, o, u) 4 (a, e, o, i/u) Following context 3 (pause, voiced consonant, For voice weakening analysis: voiceless consonant) 2 (pause/voiceless consonant, voiced consonant) For shortening analysis:

2 (pause/voiced consonant, voiceless consonant) Preceding context 3 (pause, voiced consonant, 2 (pause/voiced consonant, voiceless consonant) voiceless consonant) Position relative to 4 (unstressed monosyllabic For voice weakening analysis: stress word, pre-tonic, post-tonic, stressed) 3 (unstressed monosyllabic word/pre-tonic, post-tonic, stressed) For shortening analysis:

2 (unstressed monosyllabic word/pre-tonic, post- tonic/stressed) Age group Originally numeric (21-81) 3 (younger (21-32), middle (33-50), older (51-81))

Education 6 (primary, secondary, 3 (primary/secondary, preparatory, vocational school, preparatory/vocational, college, post-graduate) college/post-graduate)

4.3.1 Categorical modeling of voice weakening

Now that the distribution of voice weakening in terms of the aggregate of the various categories included in Table 4.1 has been explored vis-à-vis the various independent factors hypothesized to contribute to its occurrence, a random forest was carried out in

129

order to assess the relative importance of the predictor variables. These include the target vowel, preceding and following contexts, position relative to stress, syllable type, word position, age, gender, and education. A random forest is the cumulative result of many iterations of conditional inference trees run on the data set to assess whether each factor is a useful predictor of the variant choice for the dependent variable. Both random forests and conditional inferences trees are functions contained in the party package in R

(Hothorn, et al. 2006). Plotting the results of the random forest in a dot chart sorted by variable importance shows which factors are most important relative to others, although it does not indicate what the best model is, or give any indication of statistical significance.

The dependent variable here is voice weakening with two levels: weakened and fully voiced. The weakened category includes all the different types of weakening, described in section 3.4.3.1. In Figure 4.23, we can see that for predicting voice weakening, the speaker’s level of formal educational attainment is by far the most important factor, followed by the speaker’s age group, the position of the syllable relative to lexical stress, the preceding context, the following context, the speaker’s gender, syllable type, stress, the target vowel and finally word position. Understanding the variable importance of factors is helpful in determining the order in which to add variables when building the logistic regression models. One thing random forests are helpful for is deciding which of two collinear variables to include in a regression model. Among my variables, stress and position relative to stress are collinear, as are position relative to stress and word position.

Because both stress and word position appear below position relative to stress in relative importance, I can omit these from the model. It is also important to note that random

130

effects, such as that of speaker cannot be included in random forests, so it is likely that the placement of education and age group is truly a speaker effect. It may also be an indication of the interactions present for these variables.

Figure 4.23. Random forest showing variable importance of predictor variables for voice weakening

Variable Importance of Predictor Variables on Voice Weakening

Followingvoicing.of.following context

EducationeducationNEW

Ageagegroup

Targettarget.vowelNEW vowel

Position relative to PPTNEW stress

syllableopenSyllable type

voiceprecedingNEWPreceding context

stressStress

speaker.genderGender

Wordword.final position

0.000 0.001 0.002 0.003

131

Models were built using a combination of the results of the random forest and an iterative step process comparing AIC (Akaike Information Criterion) scores, which test goodness of fit, to decide the order in which to add variables to the model. Then nested models were built and an ANOVA was performed in order to determine the most parsimonious model with the most explanatory power. It is important to note that in model selection, not all combinations of factors could be considered, due to the distribution of data across variables. All combinations that did not include empty cells in the cross tabulation were considered however, and the best model was chosen using the method described above.

The model selected included speaker as a random variable, and only linguistic factors as fixed variables: following context, preceding context, and position relative to stress. Neither education nor age group were selected as part of the best fit model, which might seem surprising given the random forest results, but their position in the random forest is likely an indicator that these factors are collinear with the random variable of speaker.

132

Table 4.5. Factors predicting categorical voice weakening

Estimate SE z-value p-value (Intercept) -1.66 0.14 -12.14 <0.001 Following context (reference level is voiceless consonant or pause) Voiced consonant -2.12 0.16 -12.91 <0.001 Preceding context (reference level is voiced consonant or pause) Voiceless consonant 0.48 0.11 4.33 <0.001 Position relative to stress (reference level is post-tonic syllable) Monosyllabic unstressed word and pre-tonic syllable -1.15 0.13 -8.63 <0.001 Stressed syllable -1.22 0.15 -8.43 <0.001

Table 4.5 shows the results of the best fit model, which selected the following context as the most important predictor. A following voiced consonant is 8.33 times15 less likely to result in voice weakening than a following voiceless consonant or pause, or put another way, a following voiceless consonant or pause is significantly more likely to result in a weakened observation. The next most important predictor of voice weakening is the preceding context. Observations with preceding voiceless consonants are significantly more likely than those with preceding pauses or voiced consonants to result in weakened voicing, by a factor of 1.61. Finally, the position of the vowel observation relative to lexical stress is also a significant contributor to the model. Observations occurring in post-tonic syllables are 3.16 times more likely than those occurring in monosyllabic unstressed words or pre-tonic syllables to result in weakened voicing, and

15 To obtain this number, I calculated the exponent of the absolute value of the coefficient given in the estimate, 2.11. 133

3.39 times more likely than observations occurring in a stressed syllable to result in weakened voicing. Effects for both, as shown in table 4.5, are significant. Releveling factors reveals there is no significant difference between observations occurring in stressed syllables and those occurring in the category containing monosyllabic unstressed syllable and pre-tonic syllables. Neither the target vowel, syllable type, nor any of the social factors (age, gender, level of education) were selected in the best fit model, and stress and word position were left out due to collinearity with position relative to stress.

Another powerful statistical tool that demonstrates interactions between predictor variables is the conditional inference tree, a non-parametric tool that estimates the likelihood of the dependent variable’s value “based on a series of binary questions about the values of predictor variables” (Tagliamonte & Baayen, 2012). Figure 4.24 shows interactions between factors that predict voice weakening. The most important predictor of voice weakening is the following context, for which the levels were not collapsed as they were for the regression models, so a following pause and a following voiceless consonant are separate. When a token is followed by a pause, there is an interaction with the position relative to stress. A vowel in a post-tonic syllable in pre-pausal position

(node 4) is significantly more likely to be weakened than a pre-pausal vowel in a stressed syllable or in a monosyllabic unstressed word (node 3). When a vowel is followed by voiced consonant (node 5), it is less likely to be weakened than when followed by a voiceless consonant. However, when a token is followed by a voiceless consonant, there are a several additional interactions. The first is with syllable type: for vowels in closed syllables, there is an additional interaction with position relative to stress: a vowel token

134

in a post-tonic syllable is more likely to be weakened (node 10), than a token in a monosyllabic unstressed word, pre-tonic syllable, or stressed syllable (node 9). On the other hand, a vowel in an open syllable is affected by the preceding context: a preceding voiceless consonant (node 12) conditions more weakening than a preceding voiced consonant or pause (node 13).

135

Figure 4.24. Conditional inference tree showing interactions between factors that predict voice weakening

Interactions Between Predictor Variables for Voice Weakening

1 voicing.of.followingFollowing context pp << 0.001

nadapause {voiceless{0, 1} C, voiced C} 2 5 PositionPPTNEW relative to stress voicing.of.followingFollowing context pp < < 0.0010.001 pp << 0.001

voiced1 C voiceless0 C 7 syllableopenSyllable type pp << 0.001

{monosyllabic unstressed word/pre-tonic{monopre, syllable, stressed} postpost-tonic syllable closedn openy stressed syllable} 8 1111 PositionPPTNEW relative to stress voiceprecedingNEWPreceding context pp =< 0.001 pp << 0.001

{monosyllabic unstressed word/pre- post-tonic syllable pause or voiced C tonic syllable,{monopre, stressed syllable} stressed} post voicelessvoiceless C pause or voiced

Node 3 (n = 91) Node 4 (n = 308) Node 6 (n = 3244) Node 9 (n = 390) Node 10 (n = 376) Node 12 (n = 611) Node 13 (n = 1002) 1 1 1 1 1 1 1 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6

fully voiced 0.4 fully voiced 0.4 fully voiced 0.4 fully voiced 0.4 fully voiced 0.4 fully voiced 0.4 fully voiced 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 weakened weakened weakened weakened weakened weakened weakened

weakened | fully voiced

weakened | fully voiced weakened | fully voiced weakened | fully voiced 0 0 0 0 weakened | fully voiced 0 weakened | fully voiced 0 weakened | fully voiced 0

136

4.3.2 Continuous modeling of voice weakening

Because I had hypothesized that voice weakening is a gradient phenomenon, I also ran analyses of the data in which voice weakening was measured as a continuous variable, that is, the duration of full voicing throughout the vowel divided by the total duration of the vowel, resulting in a percentage. Because of the way the data is distributed, shown in figure 4.25, with the majority of tokens at 100%, a small number of tokens at 0%, and an even smaller number of tokens falling in between 0% and 100%, I could not use a basic linear regression model. Instead, following García (2015), I used regression with inflated beta distribution, which is specifically designed to work with percentages. To run these models, I used the gamlss (Generalized Additive Models for Location Scale and Shape) package in R (Stasinopoulos 2017). These models use a distributional regression approach to model all the parameters of the conditional distribution of the dependent variable using the predictor variables. The parameters modeled are mu, sigma, nu, and tau. Mu shows the main effects of the model. Sigma is based on the standard deviation, and thus assesses which of the independent variables demonstrate the most variation. Nu reveals the conditions under which an observation’s percent voicing is more likely to equal 0% than to fall somewhere between 0 and 100, and tau, likewise reveals the conditions under which an observation’s percent voicing is more likely to equal 100% than to fall somewhere between 0 and 100. For my data, the number of observations below 100% fully voiced (398 of 6022 tokens) was so low that a model could not be fit to the entire data set. Instead, only those speakers whose speech samples contained 5% or greater of voice weakened observations, as determined by the percent full voicing and

137

voice weakening categories, were included, resulting in a subset of 2,425 tokens from 16 speakers of the original 40. Delforge (2008a, 2008b, 2009, 2012) also took this approach to understanding the linguistic predictors of devoicing in her data, because of similarly low rates of reduction in her larger sample. The best fit model was chosen using an automated iterative step process, which tests all combinations of predictor variables in each parameter, including speaker as a random effect, and selects the model with the lowest AIC score. In this case, the model chosen excludes the sigma parameter, because that parameter could not be fit to the data due to instabilities in the covariance matrix.

138

Figure 4.25. Histogram of percent full voicing16

Distribution of Percent Full Voicing 350 300 250 200 Counts 150 100 50 0

0.0 0.2 0.4 0.6 0.8 1.0 Percent Full Voicing

The only factors selected to be included in in the mu parameter, representing the main effects, were the position relative to stress and the following context. Only the position relative to stress, however, was determined to be a significant predictor of percent full voicing, as represented in Table 4.6. As the table shows, percent voicing is significantly higher when the vowel observation is located in a stressed syllable, or an

16 Note that y-axis is limited to 350 for ease of viewing only. In reality, more than 5000 tokens were included in the last bin representing 90-100% full voicing. 139

unstressed monosyllabic word/pre-tonic syllable, when compared to post-tonic position.

Releveling showed that there was no significant difference between the percent full voicing of tokens in stressed syllables and that of tokens in unstressed monosyllabic words/pre-tonic syllables.

Table 4.6. Results of main effects of beta-inflated regression model: percent full voicing

Estimate SE t value p-value (Intercept) 2.98 0.06 54.93 < 0.001 Position relative to stress (reference level is post-tonic syllable) Stressed syllable 0.16 0.07 2.18 <0.05 Unstressed monosyllabic word 0.19 0.08 2.51 <0.05 /pre-tonic syllable Following context (reference level is voiceless consonant or pause) Voiced consonant 0.12 0.07 1.87 0.06

For nu, the parameter that gives information about conditions under which an observation’s percent voicing is more likely to equal 0, the results are shown in Table 4.7.

We can see that the position of the syllable relative to lexical stress is the most important predictor of 0% full voicing. A vowel token in a stressed syllable is significantly less likely than one in a post-tonic syllable to result in 0% full voicing. Likewise, a vowel token in a monosyllabic unstressed word/pre-tonic syllable is significantly less likely than one in a post-tonic syllable to result in 0% full voicing. No significant difference was found between when releveling to compare monosyllabic unstressed words/pre-tonic

140

syllables to stressed syllables. In addition to the position relative to stress, the preceding context also contributes to the model, with a preceding voiceless consonant being significantly more likely to result in 0% full voicing than in a percentage of full voicing between 0% and 100%. Finally, the target vowel was found to be a significant predictor of 0% full voicing, in that /e/, /o/, and /i, u/ were also significantly more likely to result in

0% full voicing than /a/. Releveling showed that there were no significant differences between /e/ and /o/, /e/ and /i, u/, or /o/ and /i, u/.

Table 4.7. Results for the nu parameter of beta-inflated regression model: percent full voicing

Estimate SE t value p-value (Intercept) -2.20 0.26 -8.40 < 0.001 Position relative to stress (reference level is post-tonic syllable) Monosyllabic unstressed word/pre-tonic syllable -0.56 0.24 -2.38 <0.05 Stressed syllable -1.26 0.29 -4.39 <0.001 Preceding context (reference level is voiced consonant or pause) Voiceless consonant 0.93 0.18 5.27 <0.001 Target vowel (reference level is /a/) /e/ 1.05 0.30 3.52 <0.001 /i, u/ 1.55 0.34 4.56 <0.001 /o/ 1.32 0.28 4.73 <0.001

The results of the model for the tau parameter are shown in Table 4.8. These are the conditions under which an observation’s voicing is more likely to equal 100% full

141

voicing than to fall somewhere between 0% and 100%. The table shows that the following context is the most important predictor of 100% full voicing. A following voiced consonant is significantly more likely than a following voiceless consonant or pause to result in 100% full voicing. The next most important predictor is the vowel’s position relative to lexical stress. A vowel in a monosyllabic unstressed word or a pre- tonic syllable is significantly more likely to be 100% fully voiced than a vowel in a post- tonic syllable. Likewise, a vowel in a stressed syllable is significantly more likely to be

100% fully voiced than a vowel in a post-tonic syllable. Releveling showed that there are no significant differences between tokens in a monosyllabic unstressed word or a pre- tonic syllable and those in stressed syllables. Finally, the results for target vowel show that /e/ and /o/ are significantly more likely than /a/ to be 100% fully voiced. No significant differences were found when releveling to compare /e/ and /i, u/, /e/ and /o/, or

/i, u/ and /o/.

142

Table 4.8. Results for the tau parameter of beta-inflated regression model: percent full voicing

Estimate SE t value p-value (Intercept) -0.90 0.16 -5.48 < 0.001 Following context (reference level is voiceless consonant or pause) Voiced consonant 2.00 0.20 10.08 <0.001 Position relative to stress (reference level is post-tonic syllable) Monosyllabic unstressed word/pre-tonic syllable 1.13 0.20 5.61 <0.001 Stressed syllable 0.85 0.22 3.86 <0.001 Target vowel (reference level is /a/) /e/ 0.44 0.22 2.01 <0.05 /i, u/ 0.47 0.26 1.78 0.08 /o/ 0.69 0.21 3.29 <0.001

4.3.3 Categorical modeling of shortening

As described in section 3.4.3.2, vowel duration was quantified, and each token was subsequently categorized as “shortened”, or “not shortened”, based on its duration relative to the average for that target vowel, tonicity, and speaker. As with the categorical analysis of voice weakening, I inspected random forests run with shortening as the dependent variable. In Figure 4.26, we can see that for predicting shortening, the preceding context is by far the most important factor, followed closely by the following context, then by syllable type, target vowel, the speaker’s age group, word position, stress, the position of the syllable relative to lexical stress, the speaker’s gender, and finally, the speaker’s level of educational attainment. As mentioned in 4.3.1, the random forest ranking of variable importance is helpful for deciding which of two collinear

143

variables to include in a regression model, and among my variables, stress and word position are both collinear with position relative to stress, and both were ranked above position relative to stress. However, I also compared AIC scores for models that included each of these variables, and found that the best fit model selected position relative to stress over either of the other factors, and position relative to stress is more informative than either of the other factors, I chose to proceed in that way.

Figure 4.26. Random forest showing variable importance of predictor variables for shortening

Variable Importance of Predictor Variables on Shortening

voiceprecedingNEWPreceding context

followsFollowing context

syllableopenSyllable type

target.vowelNEWTarget vowel

agegroupAge

word.finalWord position

stressStress

Position relative to PPT2 stress

speaker.genderGender

educationNEWEducation

0.0000 0.0005 0.0010 0.0015

144

Models for vowel shortening were built using a combination of information from the random forest and an iterative step process comparing AIC (Akaike Information

Criterion) scores, which test goodness of fit, to decide the order in which to add variables to the model. Then nested models were built and an ANOVA was performed in order to determine the best model. Speaker was included as a random effect in all models compared. The model selected included only linguistic factors: preceding context, target vowel, following context, and position relative to stress.

As Table 4.9 shows, the most important predictor of shortening is the preceding context. An observation with a preceding voiceless consonant is 3.53 times more likely to be shortened than an observation preceded by a pause or voiced consonant. The next most important predictor is the target vowel. Targets /e/, /i, u/, and /o/ are all significantly more likely than /a/ to result in a shortened realization: /e/ is 2.56 times more likely, /i,u/ are 2.77 times more likely, and /o/ is 3.13 times more likely to be shortened than is /a/.

Releveling this factor shows that neither /i, u/ nor /o/ are significantly different from /e/, nor are they significantly different from each other. Next, an observation with a following voiceless consonant is 2.27 times more likely to result in a shortened realization. Finally, the analysis shows that the position relative to lexical stress also significantly predicts shortening. A vowel in a stressed or post-tonic syllable is 1.34 times less likely to be shortened than one in a monosyllabic unstressed word or pre-tonic syllable.

145

Table 4.9. Results from the mixed effects logistic regression model for shortening

Estimate SE z-value p-value (Intercept) -4.29 0.18 -23.45 <0.001 Preceding context (reference level is voiced consonant or pause) Voiceless consonant 1.26 0.11 11.58 <0.001 Target vowel /e/ 0.94 0.17 5.68 <0.001 /i,u/ 1.02 0.18 5.77 <0.001 /o/ 1.14 0.17 6.86 <0.001 Following context (reference level is voiced consonant or pause) Voiceless consonant 0.82 0.10 7.94 <0.001 Position relative to stress (reference level is monosyllabic unstressed word and pre-tonic) Post-tonic and stressed -0.29 0.10 -2.81 <0.05

As with voice weakening, a conditional inference tree was also run for shortening, in order to understand how the individual factors under consideration interact with each other. The tree is shown in Figure 4.27, and shows that, as other models predict, the preceding context is the most important factor, but that it interacts with other factors.

Following the first split in the tree, on the left, when a token is preceded by a pause or voiced consonant, there is an interaction with the following context. A following voiced consonant or pause interacts with age group such that those in the middle age group (33-

50) produce significantly more shortened vowels than speakers in the older or younger groups (see nodes 4 and 5). When the preceding context is a pause or voiced consonant and the following context is voiceless, there are no additional interactions (node 6). On the right side of the first split, when the preceding context is voiceless, there is a significant interaction with the following context. A following voiceless consonant

146

interacts with the target vowel, which further interacts with the speaker gender for all vowels except /a/ (node 9). Mid and high vowels in this context are shortened at different rates by male and female speakers, with female speakers producing significantly more shortened vowels (node 11) than males (node 12). When the following context is a voiced consonant or pause, there is a further split by target vowel: /o/ is more likely to be shortened than /a, e/ and high vowels (node 14), but when the vowel is any other than /o/, there is an additional interaction with age group, such that younger speakers shorten those vowels at significantly higher rates than speakers in the middle or older groups (nodes 13 and 14).

147

Figure 4.27. Conditional inference tree showing interactions between factors that predict shortening

Interactions Between Predictor Variables for Shortening

1 11 voiceprecedingNEWPrecedingPreceding contextcontext pp < < 0.0010.0010.001

pausepause or orvoiced voiced C voicelessvoiceless C

22 7 Followingfollows context Followingfollows context pp << 0.0010.001 p < 0.001

pausepause oror voiced C voicelessvoiceless C voicelessvoiceless C pausepause or orvoiced voiced C

3 8 13 agegroupAge group target.vowelNEWTarget vowel target.vowelNEWTarget vowel pp =< 0.002 pp < 0.001 pp < 0.001

a {e, high, o} o {a, e, high}

10 15 speaker.gender agegroupAge group {older, younger} middle Gender pp =< 0.049 pp =< 0.009

femaleF maleM {middle, older} younger

Node 4 (n = 1330) Node 5 (n = 516) Node 6 (n = 1552) Node 9 (n = 192) Node 11 (n = 338) Node 12 (n = 297) Node 14 (n = 507) Node 16 (n = 853) Node 17 (n = 437) 1 1 1 1 1 1 1 1 1 no no no no no no no no no

0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

shortened | not shortened

shortened | not shortened

shortened | not shortened

shortened | not shortened shortened | not shortened

shortened | not shortened shortened | not shortened shortened | not shortened

shortened | not shortened yes 0 yes 0 yes 0 yes 0 yes 0 yes 0 yes 0 yes 0 yes 0

148

4.3.4 Continuous modeling of vowel duration

As mentioned above, in addition to exploring vowel shortening as a categorical variable,

I also explored vowel duration as a continuous variable, measured in milliseconds, in a mixed effects linear regression model, using the lmer function in R, contained in the lme4 package (Bates et al., 2015). The same dependent variables were explored as those in the other models explained above: following context, preceding context, stress position relative to stress, target vowel, syllable type, word position, speaker’s level of formal educational attainment, speaker’s age, and speaker’s gender. Also, as with previous models, speaker was included as a random effect. To determine the best fit model, an automated step process was used to determined which variables should be added to the model and in what order. The automated process used restricted maximum likelihood

(REML) estimations to compare models, and the best model included all of the linguistic factors: following context, preceding context, position relative to lexical stress, and target vowel; as well as one social factor: speakers’ highest level of formal education. Table

4.10 below shows the results of the linear mixed effects model. The following context was the most important predictor of shortening, with vowels followed by voiceless consonants being, on average, 10.56 ms17 shorter than those followed by voiced consonants or pauses. The preceding context was selected as the second most important predictor of shortening, and as the table shows, a vowel token preceded by a voiceless consonant is 12.23 ms shorter than one preceded by a voiced consonant or a pause. The

17 In the linear model represented in Table 4.9, the estimates given for each factor represent milliseconds. 149

next factor in the model is position relative to stress. Compared to vowels in stressed syllable and post-tonic syllables, vowels in the category containing monosyllabic unstressed words and pre-tonic syllables were 15.02 ms shorter on average. Education was also a significant predictor of vowel duration, with the category of speakers containing those with the lowest level of formal education (who had completed either primary or secondary school) producing vowels that were on average 10.97 ms longer than those with the highest level of formal education (who had finished college).

Releveling variables also revealed a significant difference between the lowest level and the middle level (those who had attended either a college preparatory school or vocational training): vowels produced by speakers with the lowest level of formal education were 6.87 ms longer on average than those in the middle group. No significant differences were found between those who had finished college and those who had attended either a college preparatory school or vocational training after secondary school.

Finally, the target vowel was also a significant predictor of duration. All vowels were significantly shorter than /a/, as shown in table 4.10. Releveling showed no significant differences among /e/ and /o/ nor /e/ and high vowels, nor /o/ and high vowels.

150

Table 4.10. Results from the mixed effects linear regression model for shortening

Estimate SE df t value p-value (Intercept) 72.91 1.93 62.00 37.81 < 0.001 Following context (reference level is voiced consonant or pause) Voiceless consonant -10.56 0.80 5988 -13.20 <0.001 Preceding context (reference level is voiced consonant or pause) Voiceless consonant -12.23 0.79 5991 -15.51 <0.001 Position relative to stress (reference level is post-tonic syllable/stressed syllable) Monosyllabic unstressed word/pre-tonic syllable -15.02 0.80 5988 -18.75 <0.001 Education (reference level is finished college) Preparatory school or vocational training 4.10 2.94 37 1.40 0.17 Primary or secondary 10.97 2.61 37 4.20 <0.001 Target vowel (reference level is /a/) /e/ -9.04 1.04 5994 -8.69 <0.001 /i, u/ -11.22 1.17 5989 -9.57 <0.001 /o/ -9.24 1.06 5992 -8.73 <0.001

4.4 Vowel quality

Vowel reduction in other languages as well as in other varieties of Spanish has been shown to be related to differences in vowel quality, primarily having to do with the position of the tongue in the articulation of the vowel, which is reflected in F1 and F2 measurements of the resonant frequencies. F1 is inversely related to vowel height, so the higher the F1 measurement, the lower the tongue position used to produce the vowel. F2 is related to the degree of frontness, so a higher F2 measurement is the result of a tongue position further to the front of the oral cavity. My own auditory impressions have led me

151

to believe that weakened and shortened vowels in MCS do not differ in quality from non- weakened and non-shortened vowels, however, empirical tests are necessary in order to confirm or discount quality changes as a possible contributor to vowel reduction in this variety. To that end, I extracted F1 and F2 measurements for comparison, as described in section 3.4.2. Then I made a comparison between the F1 and F2 values for vowels with voice weakening and those that were fully voiced, and a separate comparison for F1 and

F2 values for shortened versus not shortened vowels. There was some overlap between these groups, as some shortened vowels were also weakened, and some fully voiced vowels were also shortened. Just a few tokens (n=19) were both weakened and shortened.

Because the F1 and F2 values were not normally distributed, I used a Wilcoxon signed rank test, which is a non-parametric test to compare samples and assess whether their means differ significantly. Several iterations of comparisons were made between weakened and non-weakened tokens for each vowel type (/a, e, i, o, u/), gender (M/F), and stress (stressed/unstressed), and the same was done for shortened and non-shortened tokens. I then applied a Holm – Bonferroni correction to the p-values in order to mitigate

Type I errors due to multiple comparisons.

Mean F1 values and standard deviations only for relevant comparisons with significant differences are given in table 4.11, while table 4.12 shows mean F2 values and standard deviations for comparisons with significant differences. I found significantly higher mean F1 values in voice weakened tokens of unstressed /a/ and /o/ when compared to unweakened tokens, for both male and female speakers. For male speakers only, I also found significantly higher F2 values in weakened voice tokens of unstressed /a/ when

152

compared to fully voiced tokens, and for female speakers only, I found significantly higher F2 values in weakened voice tokens of unstressed /o/ when compared to fully voiced tokens. Vowel plots for /a/ and /o/ are shown below in figures 4.28 and 4.29. In addition to the differences in means, figure 4.28 also shows a higher amount of variability in female participants’ productions of unstressed /a/ than in male participants’ productions. For unstressed /a/, the differences in F1 means are fairly small, under 100

Hz for each comparison, as shown in table 4.11. The difference for F2 means was only significant for males, with a difference of about 105 Hz between weakened tokens and non-weakened tokens. For /o/, the difference in F1 means is small for males at just 75 Hz, but fairly large for females, at 266 Hz. However, it should also be noted that the standard deviation associated with the female speakers’ F1 values is quite high, indicating a high degree of variability in the F1 values. This difference in variability can also be seen for

/o/ in figure 4.29.

In terms of articulation, these results indicate that both male and female speakers produce both weakened unstressed /a/ and /o/ with a lower tongue position than their fully modally voiced counterparts, and males also produce the weakened tokens for both of these vowels with a tongue position that is further front than their fully modally voiced counterparts.

153

Table 4.11. Mean F1 values in Hz (SD in parentheses) for comparisons between weakened and non-weakened, shortened and non-shortened vowels, with significant differences

Mean F1 Mean F1 Mean F1 Mean F1 female male female male /a/ weakened 814.26 733.85 /a/ non-weakened 724.15 629.95 (199.89) (124.94) (121.18) (73.76) /o/ weakened 847.09 609.31 /o/ non-weakened 581.06 534.22 (349.04) (152.78) (102.06) (86.94) /e/ shortened 472.86 n.s. /e/ non-shortened 542.10 n.s. (134.09) (100.22) /o/ shortened 529.03 n.s. /o/ non-shortened 628.17 n.s. (131.13) (185.09)

Table 4.12. Mean F2 values in Hz (SD in parentheses) for comparisons between weakened and non-weakened vowels with significant differences

Mean F2 Mean F2 Mean F2 Mean F2 female male female male /a/ weakened n.s. 1620.92 /a/ non-weakened n.s. 1515.02 (99.23) (131.61) /o/ weakened 1640.12 n.s. /o/ non-weakened 1319.55 n.s. (515.00) (310.19)

154

Figure 4.28. Female and male F1 and F2 charts for unstressed /a/ based on voice weakening

weakening.NEW Female speakers' formant values for /a/ based on voice weakening weakening.NEW Male speakers' formant values for /a/ based on voice weakening no yes no yes 0 0 500 500 F1 F1 1000 1000 1500 1500

3000 2500 2000 1500 1000 500 0 3000 2500 2000 1500 1000 500 0 F2 F2

Figure 4.29. Female and male F1 and F2 charts for unstressed /o/ based on voice weakening

weakening.NEW Female speakers' formant values for /o/ based on voice weakening weakening.NEW Male speakers' formant values for /o/ based on voice weakening no yes no yes 0 0 500 500 F1 F1 1000 1000 1500 1500

3000 2500 2000 1500 1000 500 0 3000 2500 2000 1500 1000 500 0 F2 F2

155

In shortened vowels, the F1 for unstressed /e/ and /o/ for female speakers had significantly lower mean values when compared to non-shortened tokens: shortened /e/ had an average F1 about 70 Hz lower than non-shortened /e/, while the F1 for shortened

/o/ was about 100 Hz lower. Figure 4.30 shows the chart for unstressed /e/ and figure

4.31 the chart for unstressed /o/. Shortened vowels showed no significant differences in

F2 from non-shortened vowels. These results indicate that female speakers produce shortened mid vowels with a higher tongue position than their non-shortened counterparts.

Figure 4.30. Female F1 and F2 chart for /e/ based on shortening

shortening Female speakers' formant values for /e/ based on shortening no yes 0 500 F1 1000 1500

3000 2500 2000 1500 1000 500 0 F2

156

Figure 4.31. Female F1 and F2 chart for /o/ based on shortening

shortening Female speakers' formant values for /o/ based on shortening no yes 0 500 F1 1000 1500

3000 2500 2000 1500 1000 500 0 F2

4.5 Summary of results

Overall, the statistical tests performed on the data indicate that both voice weakening and shortening are primarily conditioned by the linguistic factors explored here, especially the position relative to lexical stress, and preceding and following contexts. Position relative to stress was selected as a significant predictor of voice weakening and shortening in all models. Vowels in post-tonic position are significantly more likely to be produced with voice weakening, or less than 100% full voicing throughout the vowel’s duration, while pre-tonic position and unstressed monosyllabic words significantly predict shortening.

157

Stressed vowels appear to resist both voice weakening and shortening. Following voiceless consonants and pauses significantly predict more voice weakening, while following voiced consonants and pauses disfavor shortening. The voicing of the preceding context also was shown to significantly contribute to both voice weakening and shortening, with voiceless contexts favoring short and weak vowels. The target vowel played a role in main effects for both shortening models, and was included in the nu and tau parameters for the continuous voice weakening mode. Overall, it appears that /a/ is resistant to shortening, and somewhat more resistant to weakening, while the other vowels show differences in duration but not shortening or voice weakening patterns. A summary of significant predictors of voice weakening and shortening is included below in Table 4.13.

The results explored for vowel quality indicate significantly higher F1 values for

/a/ and /o/ weakened vowels when comparing them to fully voiced vowels, and significantly higher F2 for weakened /a/ in female speakers only, and for /o/ in male speakers only. For female speakers, significantly lower F1 values were found for shortened /e/ and /o/ when compared to their non-shortened counterparts. Overall, the quality changes found are not systematic.

158

Table 4.13. Significant predictors of vowel reduction in MCS

Voice weakening Shortening Categorical Continuous Categorical Continuous Following context ✓ ✓18 ✓ ✓ Preceding context ✓ ✓19 ✓ ✓ Position relative to stress ✓ ✓ ✓ ✓ Target vowel ✓20 ✓ ✓ Syllable type Word position Age Gender Education ✓

18 Only selected as a significant contributor to the tau parameter. 19 Only selected as a significant contributor to the nu parameter. 20 Only selected as a significant contributor to the nu and tau parameters. 159

CHAPTER 5: DISCUSSION

In this chapter, I provide an interpretation and discussion of the results presented in chapter 4, addressing each independent variable individually. In section 5.1, I discuss overall patterns in the data, focusing on the two dependent variables that I looked at, shortening and voice weakening. In section 5.2, I discuss the factors found to influence voice weakening, and in section 5.3, those found to influence shortening. The discussion in 5.4 focuses on the findings related to vowel quality, and in section 5.5, I present the implications of this data and analysis.

5.1 Shortening and voice weakening as complementary strategies

Overall, vowel reduction in Mexico City Spanish can manifest acoustically as either voice weakening or shortening. Voice weakening subsumes devoicing and weakened or breathy voicing, both of which can be partial or complete, affecting only a portion of the vowel, or affecting the entire duration of the vowel. Shortening, on the other hand, involves a reduction in the temporal dimension of the vowel. These processes are related but I argue here that they are in fact two different manifestations of the same phenomenon of vowel reduction.

160

Most previous work on vowel reduction or devoicing subsumes shortening as a phase in reduction, and thus does not investigate it as a distinct variable. Lipski (1990), for example, claims that devoicing is the initial stage of vowel reduction, while shortening and centralization are intermediate phases, and deletion is the ultimate stage.

Serrano (2006), on the other hand, classifies a shortened but fully voiced vowel as the first stage of weakening, followed by a devoiced vowel, then a vowel that is both devoiced and shortened, and finally deletion. For each of these analyses, it is impossible to know what constitutes a shortened vowel, because although Lipski did carry out a spectrographic analysis, he does not report those measurements, and Serrano’s classification, like that of Lope Blanch, was conducted using auditory impressions rather than acoustic measurements. Delforge (2009) follows the same criteria employed in previous research on vowel devoicing in other languages (Cedergren 1986 for Montreal

French, Dauer 1980 for Greek, Jun et al. 1997 for Korean). These authors either included a category containing shortened but fully modally voiced vowels that were under 30 milliseconds in duration, or included those observations in the same category as partially devoiced vowels. In light of the different ways in which shortening has been treated in the literature on vowel reduction and devoicing, either as a stage in a gradient process or combined with another type of reduction (devoicing in the cases mentioned above), I opted to investigate it as a variable in and of itself.

The argument that voice weakening and shortening are in fact distinct processes is evidenced by the findings outlined here regarding the contexts in which they occur most frequently. A notable difference in their patterning is the effect of position with regard to

161

lexical stress on each process: shortening occurs most often in monosyllabic unstressed words and pre-tonic position, whereas voice weakening is favored by post-tonic position.

Another important difference has to do with phrase position. Phrase-final position is the least common locus for shortening, and the most common for voice weakening; in fact, more than 30% of the voice weakening in this corpus takes place in pre-pausal contexts.

Furthermore, the two strategies seem to operate mostly independently of each other: in the corpus of natural speech studied here, of the 766 tokens that were either weakened or shortened, only 78 tokens are both weakened and shortened. These represent only 1.30% of all vowels in the corpus, and only 10.18% of the reduced vowels.

In terms of rates of occurrence for voice weakening and shortening, I find an overall rate of 6.61% for voice weakening in the entire corpus that includes stressed and unstressed vowels, and a rate of 8.35% in the subset of unstressed vowels, and 3.42% in stressed vowels. For shortening, I find 7.41% of all tokens to be shortened, 6.94% of stressed vowels and 7.66% of unstressed vowels. When considering the two processes together, voice weakening and shortening, they account for 12.72% of all vowels, 9.42% of stressed vowels and 14.37% of unstressed vowels. Delforge reports an overall reduction rate of 9.50%, although the comparison to the MCS data is not a direct one, because Delforge only considers unstressed vowels in her analysis, and because of the difference in the way shortening was analyzed. Overall, these results suggest that MCS presents more reduction than Andean Spanish.

162

5.2 Conditioning factors for voice weakening

As reported in chapter 4, the conditioning factors found for voice weakening were the preceding and following contexts, and the position relative to lexical stress. Target vowel emerged as a factor conditioning only the nu and tau parameters in the inflated beta model that was used to model voice weakening as a continuous variable. The effect of preceding and following contexts is discussed in 5.2.1, position relative to stress is discussed in 5.2.2, and target vowel in 5.2.3. I comment on social factors in 5.2.4.

5.2.1 Preceding/following context

The effect of the preceding and following voiceless segments on voice weakening is evident from the results of the analysis presented in chapter 4. To add more detail, I include in table 5.1 a breakdown of the rates of weakening in each combination of preceding and following contexts, minus the pause_pause context, since there were no tokens fitting that criteria. The table shows that the highest voice weakening rates occur before pauses, then between voiceless consonants, with the lowest rate between voiced consonants. It can be inferred from the table that the following context matters more than the preceding context, as there is a clear decline in weakening rates as it progresses from following pauses to following voiceless consonants to following voiced consonants, but we do not observe this for preceding context.

163

Table 5.1. Weakening rates for surrounding contexts

Preceding context _ following context Weakening rate voiceless_pause 32.00% voiced_pause 29.02% voiceless_voiceless 13.18% pause_voiceless 9.30% voiced_voiceless 7.84% pause_voiced 2.40% voiceless_voiced 1.73% voiced_voiced 0.94%

5.2.1.1 Preceding consonants

Although the results of the various statistical models seem to indicate that the following context for a vowel matters more in determining voice weakening than the preceding context, it is also clear from those results that the preceding consonant plays an important role as well. To reiterate the findings presented in chapter 4, preceding context was selected as being a significant contributor in the main effects of the categorical model for voice weakening, in that preceding voiceless consonants favored voice weakening when compared to preceding voiced consonants and pauses. It also contributed to a significant interaction with the syllable type and following context, such that a vowel followed by a voiceless consonant in an open syllable is more likely to be weakened when the preceding context is voiceless, as shown in the conditional inference tree in figure 4.24.

The preceding context was not included in the main effects of the continuous model, although it was a significant contributor to the nu parameter, which expresses the likelihood of 0% voicing in a vowel as opposed to a percentage of voicing greater than

164

0%.

To understand effects beyond voicing, in particular the role of contact with /s/, the specific consonants themselves, along with consonantal features of manner and place of articulation, are explored here. Table 5.2 shows the preceding consonantal phonemic context for all vowels in the corpus not preceded by a pause. The fourth column,

“Percentage of Voice Weakened Tokens” is the amount of all voice weakened tokens in the corpus that a particular context accounts for, that is, the number of weakened tokens for that context divided by all weakened tokens in all contexts. The inclusion of these percentages allows comparison to patterns reported by Lope Blanch (1963), but as the author acknowledged and as Delforge (2009) further highlights, Spanish phonemes, consonants and vowels, have very different frequencies of occurrence (Quilis & Esgueva,

1980), and therefore a more informative number is that presented in the fifth column,

“Voice Weakening Rate”, which shows the rate of voice weakening for that context alone, that is, the number of weakened tokens for that context divided by the total number of tokens occurring in that context.

Table 5.2 shows that the individual preceding consonants that condition the most voice weakening are palatals: /tʃ/ and /ɲ/. Preceding voiceless stops /t, k/ also have comparatively high weakening rates, but the rate for /p/ is quite low. The rate of voice weakening for contexts following /s/ is only 6.68%, although these contexts account for

12.31% of all weakened tokens. In addition to /ɲ/, vowels following other nasal consonants /m/ and /n/ also show higher rates of voice weakening. The main thing to keep in mind about this patterning is that, while the preceding consonant is important, the

165

following consonant plays a larger role, so that when voiced segments result in higher rates of weakening, the vast majority of those observations are followed by a voiceless consonant or pause. In fact, only 7.45% of weakened tokens preceded by a voiced consonant were also followed by a voiced consonant.

Table 5.2. Voice weakening rates by preceding consonant

Preceding Number of Number Percentage of Voice Weakening consonant Voice Weakened of Voice Weakened Rate Tokens tokens Tokens voiceless stops p 8 382 2.01% 2.09% t 59 623 14.82% 9.47% k 50 588 12.56% 8.50% fricatives f 2 62 0.50% 3.23% s 49 733 12.31% 6.68% x 11 148 2.76% 7.43% affricates tʃ 14 85 3.52% 16.47% voiced stops b 5 285 1.26% 1.75% d 30 460 7.54% 6.52% g 5 111 1.26% 4.50% ɟ 6 119 1.51% 5.04% rhotics ɾ 26 538 6.53% 4.83% r 1 38 0.25% 2.63% lateral l 29 493 7.29% 5.88% nasals m 41 591 10.30% 6.94% n 46 511 11.56% 9.00% ɲ 5 39 1.26% 12.82%

166

5.2.1.2 Following consonants

The factor that was found to contribute the most to both the categorical and continuous analyses of voice weakening was the following context. Vowels followed by a voiced consonant are the least likely to be produced with voice weakening, while those followed by either a voiceless consonant or a pause are most likely to be produced with voice weakening.

While a following pause is the context that triggers the most voice weakening, at

30.03%, a following voiceless consonant is next, with 9.75% of all vowels in this context being weakened, and these tokens accounting for 58.29% of all weakened tokens in the corpus. In order to understand the effects of the syllabic affiliation of a following consonant, I also explore the difference in weakening rates between tautosyllabic following consonants from those occurring in the onset of the following syllable. Vowels in syllables that are closed with a voiceless consonant weaken at a rate of 16.19%, while vowels in open syllables followed by voiceless consonants in onset of the following syllable only weaken at 6.70%. This suggests that the syllabic affiliation of a following voiceless consonant influences the degree of voice weakening, explaining the interaction found between following context and syllable type in the conditional inference tree.

The high rates of voice weakening in the context of following voiceless consonants found for MCS accords with the literature on Spanish vowel weakening as well as the cross-linguistic literature: voiceless contexts are found to favor vowel devoicing in Greek (Dauer 1980), Turkish (Jannedy 1995), Korean (Jun & Beckman

1993, 1994), Japanese (Jun & Beckman 1993, Jun et al. 1997), Brazilian Portuguese

167

(Meneses & Albano 2015), as well as several other languages (cf. Gordon 1998). All of the previous research on Spanish reports voiceless contexts as conditioning the most weakening as well. Lope Blanch (1963) reported finding that 83.7% of the weakening in his sample occurred before /s/.21 Delforge points out, however, that Lope Blanch’s results may overstate the importance of /s/ due to the inclusion of high-frequency filler words like pues and entonces. In his replication and expansion of Lope Blanch’s study, Serrano

(2006) reports that 56.43% of all weakening in his sample occurred before a voiceless consonant, which in the majority of cases was /s/. This is quite close to the 58.29% found in my sample.

In order to better understand the contribution of particular segments beyond their voicing specification, I provide a breakdown of weakening rates by following consonantal phoneme below in table 5.3. As the table shows, 41.71% of voice weakening takes place preceding /s/, and of all the tokens appearing in this context in the corpus,

14.02% of them are weakened. Overall, 49.50% of voice weakening in my data takes place in contact with /s/, either in vowels preceded by /s/ (n=49), followed by /s/ (n=166), or both preceded and followed by /s/ (n=18). 16.67% of all tokens both preceded by and followed by /s/ are produced with voice weakening. This leads to the conclusion that /s/ is the consonant that motivates the most voice weakening, which is in line with previous descriptions of both Andean and Mexican varieties of Spanish. Vowels preceding other voiceless fricatives /f, x/ also weaken at higher rates than those following other segments,

21 This number is extrapolated from Lope Blanch’s statement that 90% of the weakening was adjacent to /s/, and 67% of that was in the context Vs, while 26% was sVs. (67% + 26%=93%; 93% of 90 is 83.7%). 168

even voiceless stops.

Table 5.3. Voice weakening rates by following consonant

Following Number of Number Percentage of Voice weakening consonant Voice Weakened of Voice Weakened Rate Tokens tokens Tokens voiceless stops p 9 220 2.26% 4.09% t 11 303 2.76% 3.63% k 25 395 6.28% 6.33% fricatives f 6 75 1.51% 8.00% s 166 1182 41.71% 14.02% x 13 127 3.27% 10.24% affricates tʃ 4 94 1.01% 4.26% voiced stops b 2 271 0.50% 0.74% d 6 357 1.51% 1.68% g 0 111 0.00% 0.00% ɟ 1 73 0.25% 1.37% rhotics ɾ 5 575 1.26% 0.87% r 0 36 0.00% 0.00% lateral l 7 412 1.76% 1.70% nasals m 12 485 3.02% 2.47% n 10 870 2.51% 1.15% ɲ 0 37 0.00% 0.00%

Following voiced consonants account for a much lower amount of the voice weakening found. Delforge actually found no instances of devoicing between two voiced consonants, so she excluded this context from her analysis. I do find several instances of

169

voice weakening between voiced consonants, but they account for a very small amount of the voice weakening, only 5.53%. Of all the voiced-voiced contexts in the MCS data,

98.45% are fully voiced.

5.2.1.3 Comparing the effects of preceding and following consonants

In examining the patterns of voice weakening with regard to preceding and following contexts, one observation emerging from the data is that the following consonant, and in particular a following coda consonant, seems to exert more influence over a vowel than does a preceding consonant. It is worth exploring why we may expect this asymmetrical effect. There are many assimilation processes in Spanish that tend to be regressive or anticipatory, such as those involving assimilation to place of articulation, but the question remains as to why we should necessarily expect greater overlap of voicing gestures, and thus more vowel devoicing and voice weakening, in VC sequences as opposed to CV sequences. The patterning of voice weakening by following context and preceding context can be explained in part by research on the articulatory gestures involved in CV and VC segments. Beckman et al. (1993) show differences in the duration of opening and closing gestures for the consonants in [pap] sequences, and additionally find longer gestures in nuclear accented syllables than non-accented ones. Overall, closing gestures show shorter durations and higher peak velocities of jaw displacement, so the movements for the coda [p] in [pap] are shorter and faster than those for the [p] in onset. I understand those findings to indicate that a consonant following a vowel, being shorter and faster, overlaps more with the preceding vowel. Byrd (1995) also explores the timing of

170

articulatory gestures involved in VC segments with regard to the syllabic affiliation of the following consonant. She finds that VC# segments, in which the following C is tautosyllabic, forming the coda of the syllable in which the V is located, are timed such that the V anchor aligns with the C Center rather than C Onset, whereas in a V#C segment, in which the following consonant forms the onset of the next syllable, alignment is more typically to the left edge of the following C. The result of these different alignment patterns reported by Byrd is that there is more VC overlap when the following C is tautosyllabic with the preceding vowel. This result offers evidence that explains the patterning of voice weakening in word-final position by syllable type, i.e. the differences in weakening rates between word-final syllables closed by /s/, and word-final syllables followed by a voiceless consonant in the onset of the following word, shown in section 5.2.3. Together, the articulatory evidence offered by Beckman et al. (1993) and

Byrd (1995) can help to explain the greater effect of a following consonant within the same syllable on the MCS data.

5.2.1.4 Following pauses

Although following voiceless consonants could not be separated from following pauses for the regression analyses, it is important to consider them separately as well, because of the way they pattern with regard to voice weakening. In section 4.2, figure 4.6 shows that

30.03% of the vowels followed by pauses in this corpus are weakened. In order to understand this result, it must be noted that a following pause does not simply signal a particular segmental context (or lack thereof), but also indicates a particular prosodic

171

position. A vowel followed by a pause necessarily meets two other conditions that are crucial to understanding the patterning found, namely the much higher percentage of voice weakening: 1) it is word-final, and 2) it is at a minimum, intonational phrase-final but most often utterance final. In addition to those conditions, these vowels are also often post-tonic as well, since paroxytones are the most frequently occurring word type in

Spanish. In fact, 78.54% of the vowels followed by a pause in this corpus are post-tonic.

This combination of contexts makes for a particularly weak position phonologically

(Nespor & Vogel 1986, Selkirk 1984), so the much higher voice weakening rates in this position are not surprising. Aerodynamic constraints are also important to consider: as subglottal pressure drops as a function of time throughout the utterance, it may eventually become equal with supraglottal pressure, at which point modal voicing cannot be physically sustained (Jaeger 1978).

This finding regarding following pauses is in line with previous research on vowel weakening in both Mexican and Andean Spanish as well as in other languages. Lope

Blanch (1963) also reports a higher frequency of weakening before a pause, and Serrano

(2006) also confirms the importance of this context, noting that 37% of the weakened vowels in his corpus occur before a pause. My findings are similar, in that 27.64% of weakened vowels occur before a pause. The difference in rates is likely due to the fact that I did not code separately for phrase position, so the tokens in my data that are coded as occurring before a pause include only open syllables, while Serrano included vowels in closed syllables before pauses as well. For example, in an utterance like Son mis hermanos (“they are my siblings”), Serrano would have coded the /o/ in the third syllable

172

in hermanos as phrase-final, whereas my simplified coding schema only identifies phrase-finality via following pauses, so I would have coded that /o/ as being followed by a voiceless consonant, /s/. For Andean Spanish, Hundley (1983) also mentioned utterance-final position as a conditioning factor for weakening, and Delforge (2009) conducted a much more detailed analysis of prosodic units, coding for position in the: phonological phrase, intonation phrase, and phonological utterance. For vowels in open pre-pausal syllables (which correspond to all the vowels in my sample that are followed by a pause), she also notes that these are necessarily either intonational phrase-final or utterance-final, and reports a much higher percentage of devoicing (40%) in utterance- final position when compared to intonation-phrase-final position (16%) (2009: 204), which could be due, at least in part, to any rising tones present at the end of intonational phrases, or simply to the effect of the presence of a following pause in utterance-final position. Gordon (1998: 97) reports that final position is the most common environment for devoicing cross-linguistically, and further points out the following implicational hierarchy: the presence of devoicing in final position of a smaller prosodic domain like a word nearly always implies its presence in larger prosodic domains, like phrases and utterances.

5.2.2 Stress and position relative to stress

As mentioned in chapter 4, seems to play a major role in voice weakening. The primary targets for voice weakening in my data were unstressed, post-tonic vowels. Many previous studies only investigated voice weakening in unstressed vowels, but I included

173

vowels in stressed syllables because, although they tended to be voice weakened at much lower rates than vowels in unstressed syllables (3.42% versus 8.37%), the rates for shortening were fairly similar: 7.66% in unstressed position, and 6.94% in stressed position. When I further divide unstressed vowels into pre- and post-tonic, we can then see that vowels in post-tonic position are the most frequent targets of voice weakening.

The hierarchy of stress and voice weakening, from least to most weakening is as follows: pre-tonic syllables (3.25%), stressed syllables (3.42%), unstressed monosyllabic words

(4.63), and finally, post-tonic syllables (15.72%).

The “unstressed monosyllabic word” category captures mostly function words that precede nouns, such as articles (el/la/los/las, “the”) and possessive determiners

(mi/tu/su/sus, “my/your/his/her/their”), or that precede verbs, like object pronouns

(me/te/se/lo/los/las/le/les). This stress position category was collapsed together with pre- tonic syllables due to the distribution of weakening across stress categories, as well as their similar patterning. In addition, an argument could be made that this position is in fact pre-tonic if we were to consider the phonological word as the basic stress-bearing unit rather than the lexical word. Vowels in the collapsed category containing unstressed monosyllabic words and pre-tonic syllables were weakened at a very low rate, only

3.85%, compared to vowels in post-tonic position, which were weakened at 15.72%.

Post-tonic position is also correlated with phrase-final position; that is, a post-tonic syllable would be the most frequent type to appear in phrase-final position, while by definition, it would be impossible for a pre-tonic syllable to appear in that position.

Because the most frequent stress pattern for Spanish words is paroxytonic, that is, having

174

stress on the penultimate syllable, it stands to reason that many of the post-tonic syllables are also phrase-final. In fact, 21.00% of all post-tonic vowels in the corpus occur before a pause, or phrase-finally.

The finding that rates of voice weakening were significantly higher in post-tonic syllables coincides with previous cross-linguistic research on vowel weakening and also consonant lenition. Unstressed positions in general tend to favor weakening and reduction of various types, affecting both consonant and vowel segments. In their study of intervocalic consonant lenition, Hualde, Simonet and Nadeu (2011) report finding a significantly higher rate of voicing in unstressed syllables, and a particularly high rate in the last syllable of proparoxytones, suggesting that this position may be even weaker than an immediate post-tonic syllable, or than a pre-tonic syllable. Arvaniti (1994) and Dauer

(1980) both find that vowel reduction in Greek occurs most often in post-tonic position as opposed to pre-tonic position. Major (1985) reports that for Brazilian Portuguese, post- tonic position is the main locus for several shortening or reduction processes, including raising, diphthongization, and syllabicity shifts, noting an implicational hierarchy: any reductive processes that occur in pre-tonic position will necessarily occur in post-tonic position, but not vice versa. He concludes that this is evidence for the comparative weakness of post-tonic syllables. For Andean Spanish, Delforge (2008a, 2009) finds the most vowel devoicing in word-final position, which, due to the preponderance of paroxytonic words in Spanish, most often corresponds to post-tonic position.

Accordingly, Serrano (2006) reports for Mexican Spanish that the post-tonic syllable is the main locus for vowel reduction. Moreover, unstressed positions in general, and post-

175

tonic positions in particular have been reported to be the weakest from an articulatory point of view, showing gestural magnitudes that can be reduced in either the temporal dimension, the spatial dimension, or both (de Jong 1995, 1998). This reduction in gestural magnitude often leads to greater overlap from adjacent gestures, which results in more coarticulatory effects (Cole et al. 1999).

5.2.3 Target vowel and word position

Interestingly, the type of target vowel itself, that is, which vowel of /a, e, i, o, u/, did not emerge as a major factor influencing the distribution of voice weakening. The results for the influence of the target vowel on voice weakening only partially accord with previous research on vowel devoicing or voice weakening, suggesting that MCS may act differently from other languages and from Andean Spanish. The literature suggests that most languages target high vowels for voice weakening reduction, which has been attributed to their shorter duration relative to mid and low vowels (Lehiste 1970), which implies a glottal gesture with a shorter temporal dimension as well.

Interestingly, rather than high vowels, Andean Spanish seems to target the mid vowels /e, o/ (Delforge 2008a, 2008b, 2009, Gordon 1980, Hundley 1983, Lipski 1990) with some differences based on word position. Lipski was the first to point out differences in the devoicing rates for target vowels based on word position, observing that word-medially, /e/ and high vowels are most affected, while word-finally /e/ is most frequently affected, followed by /o/ and then /a/, however as Delforge (2009) also points out, /i/ and /u/ rarely occur in this context. Delforge finds that word-initial and word-

176

medial /e/ and high vowels devoice at much higher rates than /a/ and /o/, while word- finally, there are differences in devoicing rates based on the segmental context. More precisely, in sandhi contexts, that is, those in which a voiceless consonant following the devoiced vowel belongs to the onset of the following word, word-final vowels show weakening rates similar to those found for word-medial devoicing. But word-final vowels in syllables closed by /s/ and in open pre-pausal syllables weaken at fairly high rates irrespective of target vowel (with the exception of /u/ in open pre-pausal position, which does not present any weakened tokens). I find the same for MCS (though with an additional exception of /i/ in open pre-pausal position, for the same reason): although the rates for sandhi voice weakening are a bit higher than those for word-medial position, the highest rates are found in word-final syllables closed by /s/ and open pre-pausal syllables.

Lope Blanch (1963) described Mexico City Spanish as primarily targeting the mid vowels as well, but as Delforge (2009) points out, he does not account for the relative frequency of each vowel in Spanish. As Delforge highlights, by way of Quilis and

Esgueva (1980), /e/ is the most frequently used vowel in Spanish, and /o/ is next, while the high vowels are relatively infrequent. So by calculating only the number of weakened tokens in his corpus that were /e/ or /o/, and not dividing the number of weakened /e/ tokens by the number of total /e/ tokens, Lope Blanch has overstated the contribution of mid vowel targets to voice weakening in MCS. Serrano (2006) does not consider target vowel in his more recent update of Lope Blanch’s study, thus no comparisons can be made between his findings and my own.

However, in the data analyzed here, no vowel emerges as more likely than the

177

others to favor voice weakening in the main effects for either the categorical or continuous analyses. This finding points toward the likelihood that in MCS, voice weakening reduction is more generalized than in Andean Spanish, so that it is not dependent on the vowel type. However, when the target vowel is explored in relation to word position, some differences in voice weakening depending on the target vowel can be observed. In the remainder of this section, I compare word-medial and word-final voice weakening rates in the MCS data to each other, and also to the rates reported by

Delforge for Cuzco Spanish (2009) As a reminder, Delforge includes shortening and devoicing together, but the number of tokens that are shortened and not devoiced in her data represents less than 5% of all weakening, so the comparisons here are for the most part analogous.

As tables 5.4 and 5.5 show, voice weakening rates for all vowels except /i/ are considerably lower in word-medial position than in word-final position. The highest rate of weakening in word-medial position is for /i/, at 7.29%. Word-finally, /o/ weakens the most, followed by /u/ (but note the low frequency of occurrence for /u/). When comparing rates of weakening in the MCS data under investigation here, in table 5.4 to

Delforge’s rates for word-medial position in Cuzco Spanish in table 5.6, we see considerable differences. For all vowels, the voice weakening rates are much lower for

MCS, but most notable are the difference in the voice weakening rates for /e/ (3.50% in

MCS vs. 18.51% in Cuzco Spanish) and high vowels (7.29% in MCS vs. 26.85% in

Cuzco Spanish for /i/, and 5.00% in MCS vs. 35.19% in Cuzco Spanish for /u/).

178

Table 5.4. Voice weakening by vowel, word-medial position MCS (unstressed vowels only)

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 7 415 13.73% 1.69% /e/ 15 429 29.41% 3.50% /i/ 14 192 27.45% 7.29% /o/ 10 256 19.61% 3.91% /u/ 5 100 9.80% 5.00%

Table 5.5. Voice weakening rates by vowel, word-final position MCS

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 72 664 26.28% 10.84% /e/ 76 814 27.74% 9.34% /i/ 10 159 3.65% 6.29% /o/ 113 837 41.24% 13.50% /u/ 3 24 1.09% 12.50%

Table 5.6. Voice weakening rates by vowel, word-medial position in Delforge (2009)22

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 79 3187 4.34% 2.48% /e/ 951 5138 52.22% 18.51% /i/ 447 1665 24.55% 26.85% /o/ 59 2838 3.24% 2.08% /u/ 285 810 15.65% 35.19%

22 Adapted from tables 6.17 and 6.18 in Delforge (2009: 192). Delforge had separate categories for word- initial, word-medial, and word-final contexts, and my coding schema only recognized the difference between word-final, and non-final (referred to as word-medial here). 179

Comparing the word-final context is a bit more complex. Delforge separated three distinct categories of word-final devoicing: sandhi devoicing, syllables closed by /s/, and open pre-pausal syllables. Below in tables 5.8, 5.10, and 5.12, I have included the rates that Delforge found for Cuzco Spanish, and the corresponding rates for MCS are given in

5.7, 5.9, and 5.11, respectively. In MCS, like in Cuzco Spanish, much more voice weakening is found in word-final syllables closed by /s/ than for the sandhi context. In sandhi contexts, /e/ and /o/ show the most voice weakening, whereas in syllables closed by /s/, all vowels weaken at fairly similar rates, from 20–25%, with /u/ weakening at the highest rate, 37.50%. For the sandhi context, voice weakening rates are lower in MCS than Cuzco, except for /o/, which weakens at a rate of 9.72%, 0.34% higher than the rate for Cuzco, and higher than rates for all other vowels in MCS. Although the rates are lower, the percentage of weakening accounted for by each vowel is fairly similar to the percentages reported for Cuzco.

Table 5.7. Voice weakening by vowel, sandhi (voiceless contexts only) MCS

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 10 200 16.67% 5.00% /e/ 21 255 35.00% 8.24% /i/ 5 66 8.33% 7.58% /o/ 24 247 40.00% 9.72% /u/ 0 10 0.00% 0.00%

180

Table 5.8. Voice weakening by vowel, sandhi (voiceless contexts only) in Delforge

(2009)23

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 50 929 13.85% 5.38% /e/ 183 912 50.69% 20.07% /i/ 8 49 2.22% 16.32% /o/ 117 1247 32.41% 9.38% /u/ 3 6 0.83% 50.00%

The only voiceless consonant phoneme that is frequent in coda position by

Spanish phonotactics is /s/. Word-final vowels in syllables closed by /s/ weaken at considerably higher rates for both MCS and Cuzco Spanish than when the following voiceless consonant is not tautosyllabic, as can be seen by comparing the rightmost columns in tables 5.7 and 5.9 for MCS, and in tables 5.8 and 5.10 for Cuzco Spanish. For both varieties, /u/ shows the highest rate of voice weakening, followed by /e/, then /a/, then /o/, and finally /i/. Although overall rates are lower for MCS, they follow the same general pattern as Delforge’s data, shown in table 5.10: all vowels weaken at relatively high rates in word-final syllables closed by /s/, and the progression from most to least weakening from u > e > a > o > i.

23 Adapted from table 6.19, Delforge (2009: 193). 181

Table 5.9. Word-final vowels closed by /s/, unstressed only MCS

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 26 115 30.23% 22.61% /e/ 24 99 27.91% 24.24% /i/ 2 10 2.33% 20.00% /o/ 31 147 36.05% 21.09% /u/ 3 8 3.49% 37.50%

Table 5.10. Word-final vowels closed by /s/, Delforge Cuzco24

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 298 926 24.09% 32.18% /e/ 379 955 30.63% 39.68% /i/ 2 11 0.16% 18.18% /o/ 553 1857 44.70% 29.78% /u/ 5 12 0.40% 41.67%

Table 5.11 shows the weakening rates for vowels in word-final open syllables before a pause, and compare to the rates that Delforge found in table 5.12. It should be noted that in this context, all non-high vowels show high rates of weakening, when compared to other positions, and when compared to Delforge’s results for this context in Cuzco Spanish. My data contained no observations of /u/ in this position, and

Delforge’s data contained only 7 observations, none of which were weakened. Also, the frequency of occurrence for /i/ in this context is quite low for both dialects, but with a total of only 5 tokens in MCS, none were weakened. The much higher rates for

24 Adapted from table 6.20, Delforge (2009: 194). 182

weakening in the non-high vowels in MCS may shed light on a dialectal difference in contextual conditioning of voice weakening. It appears that a following pause, and by implication, phrase-final position, is a more important predictor of voice weakening in

MCS than it is in Cuzco Spanish.

Table 5.11. Voice weakening rates by vowel, word-final open pre-pausal position MCS

(unstressed vowels only)

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 36 106 29.20% 31.13% /e/ 25 53 22.12% 47.17% /i/ 0 5 0.00% 0.00% /o/ 55 165 48.67% 33.33% /u/ 0 0 0.00% 0.00%

Table 5.12. Voice weakening rates by vowel, word-final open pre-pausal position in

Delforge (2009)25

Vowel Number of Voice Number of Percentage of Voice Voice Weakened Tokens tokens Weakened Tokens weakening rate /a/ 178 954 41.49% 18.66% /e/ 79 504 18.41% 15.67% /i/ 4 16 0.93% 25.00% /o/ 168 1276 39.16% 13.16% /u/ 0 7 0.00% 0.00%

25 Adapted from table 6.21 in Delforge (2009: 194). 183

It should be noted however, that although target vowel did not emerge as a factor significantly contributing to the main effects of either the categorical (logistic regression) or continuous (gamlss) models, it does emerge as a contributor to the parameters in the gamlss model that control for distribution at 0% and 100%, nu and tau, respectively. As discussed in section 4.3.2, target vowel was found to be a significant predictor of 0% full voicing, in that when compared to /a/, all other vowels were significantly more likely to result in 0% full voicing than a rate between 0% and 100%. No significant differences were found between the other vowels. Additionally, the results for the tau parameter show that /o/ was found to be significantly more likely than /a/ to result in 100% full voicing than a rate between 0% and 100%. While on its face, this result seems to contradict the results of the nu parameter, the two effects in fact are not exact opposites.

We may interpret this finding in terms of categoricity: /a/ and /o/ are the vowels least likely to show gradience in their realization of voicing. What these results tell us is that while full modal voicing for /e/, and /i, u/ may be between 0% and 100%, /a/ is not likely to be produced with 0% full modal voicing, and /o/ is likely to be either fully voiced

(100%), or completely weakened/devoiced (0%), when compared to /a/. This could be due in part to the frequency of occurrence of each vowel in a particular context. While overall, /o/ is not more frequent in the corpus than either /a/ or /e/, it is much more frequent than other vowels preceding a pause, accounting for 48.12% of all pre-pausal tokens, and accordingly, 47.93% of weakened pre-pausal tokens. Weakening in pre- pausal position is also more likely be more extreme, i.e. we see more vowels produced with 0% modal voicing in this position. In short, /a/ and /o/ differ from each other in that

184

/o/ is less stable than /a/, and appears more often in positions that condition more extreme voice weakening.

5.2.4 Social factors

It is interesting that the social factors investigated here do not seem to play a role in voice weakening in MCS. The lack of contribution of social factors to the voice weakening process is a likely indication that this phonetic variation does not represent a change in progress, but rather it is a case of stable variation that seems to exist below the level of consciousness for most speakers (Labov 1966). Interestingly, this feature of MCS is not one that is typically commented on or recognized as a characteristic of local speech by

Mexico City dwellers. In fact, even when I asked my participants explicitly about this feature, by and large, it was unfamiliar to them. Most participants that could identify something different about capitalino speech commented primarily on the local use of certain lexical items. Many also mentioned speaking cantadito (“sing-songy”), which generally refers to a circumflex intonation pattern typical of capitalinos, which seems to be associated with the working class (Martin Butragueño 2011), as well as a particular manner of speaking associated with those who aspire to be upper class, known as fresas.

In fact, no participant mentioned shortened or voice weakened vowels as a feature of local speech, and only a few recognized it when given an example. The apparent stability of this process can be considered in light of the observation that “linguistic features behave differently depending on how people perceive them...those that are lower on the scales of awareness may escape notice until some circumstance calls attention to them.”

185

(Babel 2016: xix). Since vowel reduction is not salient to most native speakers of Mexico

City Spanish, it is not a feature that is available for identity work, and thus it remains unassociated with social categories.

5.3 Conditioning factors for shortening

As discussed in 3.4.3.2, it must be emphasized that two distinct analyses of vowel duration were undertaken here: one which investigates vowel duration per se, and another which investigates vowel shortening, that is the duration of a vowel observation relative to the average duration value computed for that vowel, stress condition, and speaker.

These averages were computed and used as the metric for comparison as a way of normalizing the duration data, rather than setting an arbitrary cutoff for a shortened vowel. Comparisons to previous research of my findings for shortening are difficult, because most previous work on vowel reduction or devoicing subsumes shortening as a phase in reduction, and thus does not investigate it as a distinct variable. Due to this methodological difference, many of the comparisons that I make in this section are to research on vowel duration, and not necessarily shortening. I discuss the effect of preceding and following contexts is in 5.3.1, position relative to stress in 5.3.2, and target vowel in 5.3.3. I comment on social factors influencing duration in 5.3.4.

5.3.1 Preceding/following context

My results indicate main effects for both preceding and following contexts, in both the

186

linear regression model that investigates duration, and the logistic regression model that investigates shortening. As a reminder, the preceding context factor collapses preceding pauses and voiced consonants together and compares them with voiceless consonants, and the same is done for following context. My findings indicate that both preceding and following voiceless consonants favor shorter vowels. The combination of preceding and following contexts along with their respective shortening rates is shown in table 5.13.

Table 5.13. Shortening rates for surrounding contexts

Preceding context _ following context Shortening rate voiceless_voiceless 17.65% pause_voiceless 12.79% voiceless_voiced 9.43% pause_voiced 7.94% voiced_voiceless 5.66% voiceless_pause 4.00% voiced_voiced 2.20% voiced_pause 1.35%

Previous investigations of vowel duration in Spanish investigate effects of the context following a vowel on its duration, but no studies that I am aware of include an analysis of the preceding context as a variable, although some authors (Borzoni de

Manrique & Signorini 1983, Chládková et al. 2011, Quilis & Esgueva 1983) do investigate voiced__voiced contexts and voiceless__voiceless contexts and find shorter vowels in voiceless contexts, but this result does not confirm or deny any effect of a preceding voiceless consonant. The present study’s finding that a preceding voiceless

187

consonant, independent of following context, favors both shorter vowel durations and more vowel shortening appears to be novel. However, since my data were from natural speech and thus not controlled in such a way as to isolate preceding context from other possible contributors to duration and shortening such as syllable type, position relative to stress, word position, following context, phrase position, among others, the finding is somewhat preliminary.

Several studies (Borzone de Manrique & Signorini 1983, Marín Gálvez 1994-

1995) affirm the importance of both stress and phrase accent on vowel duration, reporting that vowels in pre-pausal position tend to be significantly longer than non-phrase final vowels. In my data, a following pause was indicative of phrase-final position, and thus, the longer duration measurements for all target vowels in this context26, shown in figure

5.1, are not surprising.

26 The lack of appearance of /u/ in the figure is due to the absence of any tokens of /u/ in pre-pausal position in the corpus studied here, and also serves to explain the need to collapse the high vowels together for the regression analyses. 188

Figure 5.1. Vowel duration by following context and target vowel

a e i o u

200

150

100 Vowel duration (ms) duration Vowel

50

0

Voiceless C Voiced C Pause Voiceless C Voiced C Pause Voiceless C Voiced C Pause Voiceless C Voiced C Pause Voiceless C Voiced C Pause Following context

5.3.1.1 Preceding consonants

Shortening rates by preceding consonant are provided in table 5.14. The highest rate of shortening, 16.84%, occurs in contexts where /k/ precedes the vowel, followed by contexts preceded by /tʃ/, with a rate of 14.12%. Rates for the other voiceless stops /p, t/ and /s/ are also high, and rates associated with all other consonants are comparatively low. This is in agreement with the observation that preceding voiceless consonants lead to the highest amount of shortening.

189

Table 5.14. Shortening rates by preceding consonant

Following Number of Number Percentage of Shortening Rate consonant Shortened of Shortened Tokens tokens Tokens voiceless stops p 42 382 9.42% 10.99% t 66 623 14.80% 10.59% k 99 588 22.20% 16.84% fricatives f 5 62 1.12% 8.06% s 69 733 15.47% 9.41% x 12 148 2.69% 8.11% affricates tʃ 12 85 2.69% 14.12% voiced stops b 7 285 1.57% 2.46% d 5 460 1.12% 1.09% g 2 111 0.45% 1.80% ɟ 1 119 0.22% 0.84% rhotics ɾ 27 538 4.84% 5.83% r 3 38 0.67% 7.89% lateral l 24 493 5.38% 4.87% nasals m 34 591 7.62% 5.75% n 17 511 3.81% 3.33% ɲ 0 39 0.00% 0.00%

5.3.1.2 Following consonants

Almeida (1986) also reports differences in vowel duration based on the following consonant, partially confirming an observation made by Navarro Tomás (1916) that vowels followed by rhotics tend to be longer than those followed by other consonants.

Almeida reports finding the longest durations for vowels in unstressed syllables among

190

those followed by voiced fricatives27 and rhotics and for vowels in stressed syllables, voiced fricatives, rhotics, and laterals. In general, he finds that vowels occurring before voiceless consonants are shorter, with a couple of exceptions for stops: vowels in stressed syllables followed by voiced consonants have the same average duration as those followed by voiceless consonants, for careful speech style, and are only slightly longer in this context for informal style. For vowels in unstressed syllables, those spoken in the careful style followed by voiced stops are longer than those followed by voiceless stops, whereas in the informal style, the opposite is true. However, these differences are small

(66 ms vs. 71 ms, and 58 ms vs. 52 ms, respectively). My findings are in partial agreement with those of Almeida (1986), in that I found shorter duration values in vowels followed by a voiceless consonant, although my data only come from one speech style, which would be comparable to the informal style that the author investigated.

Table 5.15 shows the shortening rates and percentage of shortened tokens associated with following consonants. Similarly to voice weakening, the consonant that accounts for the most shortening in the entire data set is /s/, but its shortening rate is comparatively low at 7.77%, when considering other voiceless consonants, especially voiceless stops /p, t, k/ and affricate /tʃ/, which all shorten at rates higher than 10%.

Vowels followed by /m/ also shorten at a surprisingly high rate, 12.58%. This is likely an effect of the inclusion of the word como (“like”) in the corpus; nearly 1/3 (n=19) of the shortened tokens followed by /m/ were accounted for by como.

27 It should be noted that both Navarro Tomás and Almeida conceive of /β, ð, ɣ/ as voiced fricatives rather than as approximants. 191

Table 5.15. Shortening rates by following consonant

Following Number of Number Percentage of Shortening Rate consonant Shortened of Shortened Tokens tokens Tokens voiceless stops p 43 220 9.64% 19.55% t 31 303 6.95% 10.23% k 54 395 12.11% 13.67% fricatives f 2 75 0.45% 2.67% s 92 1184 20.63% 7.77% x 9 127 2.02% 7.09% affricates tʃ 10 94 2.24% 10.64% voiced stops b 10 271 2.24% 3.69% d 5 357 1.12% 1.40% g 0 111 0.00% 0.00% ɟ 0 73 0.00% 0.00% rhotics ɾ 35 575 7.85% 6.09% r 0 36 0.00% 0.00% lateral l 23 412 5.16% 5.58% nasals m 61 485 13.68% 12.58% n 60 869 13.45% 6.90% ɲ 1 37 0.22% 2.70%

5.3.2 Stress and position relative to stress

Duration has long been recognized as a phonetic cue to lexical (Monroy

Casas 1980, Navarro Tomás 1916, 1917, 1964, Quilis & Esgueva 1983, among others).

192

Although the differences in duration between stressed and unstressed Spanish vowels are not as pronounced as they are for a stress-timed language like English, nearly all studies focusing on duration affirm that stressed vowels are significantly longer on average than unstressed vowels. Therefore, the results presented for the continuous analysis, that is, with duration measured in milliseconds as the dependent variable, are not surprising: for all target vowels, stressed tokens are longer than unstressed tokens, as demonstrated in figure 4.14 and reiterated here in table 5.16.

Table 5.16. Average duration values (in ms) for stressed and unstressed vowels

Vowel Unstressed Stressed /a/ 73.36 86.81 /e/ 61.98 75.29 /i/ 59.92 74.50 /o/ 66.58 75.22 /u/ 49.26 68.01

Among unstressed vowels, some previous research has found an additional distinction between the duration of pre-tonic and post-tonic vowels. Almeida (1986) reports that the closer a pre-tonic vowel is to the stressed vowel, the longer it is, or to state the same in terms of shortening: a pre-tonic vowel is shorter the farther it is from the stressed vowel. He does not report any similar distinction for post-tonic vowels. García

(2016) reports similar duration measurements for pre-tonic and post-tonic vowels for

Lima Spanish, but much shorter post-tonic vowels and longer pre-tonic vowels in

Amazonian Spanish.

193

Results for the categorical analysis of shortening indicated that vowels in monosyllabic unstressed word or pre-tonic syllables are significantly more likely to be shortened than those in stressed or post-tonic syllables. If we consider stress in terms of relative prominence, it stands to reason that a stressed syllable would be unlikely to be shortened, since the ratio of stressed vowel duration to unstressed vowel duration has been identified as one of the main phonetic cues to stress (Ortega-Llebaria, 2006).

Furthermore, unstressed positions tend to show reduced gestural magnitude and/or reduced temporal duration of gestures. In the data observed here, it seems that gestural magnitude is more likely to be reduced in post-tonic syllables, while temporal duration is reduced in pre-tonic syllables. Furthermore, the difference in rates of shortening between vowels in pre-tonic and post-tonic unstressed syllables can be accounted for by an effect of compensatory shortening (Fowler 1981), in which the pre-tonic vowel is shortened to highlight the upcoming stressed vowel.

5.3.3 Target vowel

Even the earliest studies on vowel duration in Spanish (Navarro Tomás 1916, 1917) report an articulatory dimension to vowel duration, finding that high vowels /i, u/ tend to be significantly shorter than mid vowels, and low vowels tend to be the longest. Only

Monroy Casas (1980) seems to refute this, reporting the shortest duration measurements for /o/ and /u/, and therefore claiming that it is back vowels rather than high vowels that are intrinsically shorter. As García (2016) points out, this difference in result may be due to the experiments used (isolated words in the case of Navarro Tomás and sentences in

194

the case of Monroy Casas). My duration measurements for target vowel by stress (shown in figure 4.12) corroborate Navarro Tomás’s (1916, 1917) results: high vowels are the shortest, followed by mid vowels, and /a/ is the longest. Target vowel is also selected as a significant predictor of duration when combined with other predictor variables in the linear regression analysis. All vowels were significantly shorter than /a/, as expected based on previous research. Mid vowels /e/ and /o/ were not significantly different from each other, but they were significantly different from the high vowels /i, u/. My results for the effect of target vowel on shortening in the logistic regression model are similar: all vowels are significantly more likely than /a/ to result in a shortened realization, but there are no significant differences between the other vowels. This result seems to point to the stability of /a/: not only is it consistently longer than other vowels, it is less subject to shortening, which could be related to the greater magnitude of the articulatory gesture associated with the aperture of the oral cavity in the production of /a/ as compared to other vowels.

5.3.4 Social factors

Although some previous research reports effects for gender on vowel duration

(Chládková et al. 2011), none were found here, nor did gender emerge as a contributing factor to the analysis of shortening. There was, however, an effect of education on vowel duration, such that the longest vowels were produced by the group with the lowest level of education, and the duration measurements for this group differed significantly from the group with the highest level of education, as well as from the middle group. No

195

significant differences were found between the high and middle groups, but a pattern is easily discerned from figure 5.2, showing average duration measurements for each education group, for stressed and unstressed vowels. This pattern holds across all target vowels, as figure 5.3 shows. Again, I must reiterate that this result does not point to a tendency of speakers with higher education to produce vowels that are shortened (per the criteria defined in section 3.4.3.2), just that their vowels were shorter overall. This difference is an important illustration of why normalization is important; choosing an arbitrary duration measurement for classifying a vowel as shortened could have made it appear as if those with higher levels of formal education shortened vowels more frequently when this is not the case.

Figure 5.2. Vowel duration by education level and stress

Unstressed Stressed

200

150

100 Vowel duration (ms) duration Vowel

50

0

high middle low high middle low Level of formal educational attainment 196

Figure 5.3. Vowel duration by education level and target vowel

a e i o u

200

150

100 Vowel duration (ms) duration Vowel

50

0

high middle low high middle low high middle low high middle low high middle low Level of formal educational attainment

A possible explanation for this correlation between education and vowel duration has to do with local speech rate and may be related to the interview context itself. It may be the case that those with higher levels of education are linguistically more confident and this confidence led them to speak faster, or it could be that they were more confident speaking to a non-native of the local dialect (and a non-native speaker of Spanish).

Participants with lower levels of education might have felt more intimidated by the interview and thus spoke more slowly than they might have in other contexts. An additional consideration is that as interviewer and interlocutor, my own education level could have come into play, since the participants understood as part of the consent

197

process that they were participating in research for a doctoral dissertation. Future research might address this question by varying the contexts for eliciting natural speech, including recording local participants speaking to each other rather than to an interviewer.

No social factors were selected as main effects predicting shortening in the categorical logistic regression analysis. Given the results for voice weakening, this is in line with expectations. However, the speaker’s age group was found to interact with preceding context, following context, and target vowel. Speakers in the middle age group shorten vowels at higher rates than older and younger speakers in contexts where both the preceding context and the following context contained pauses or voiced consonants.

Younger speakers shorten all vowels except /o/ at higher rates in contexts where the preceding consonant is voiceless, and the following context is a voiced consonant or a pause. Taken together with the results from the regression model, these findings with regard to the interaction of age with other effects do not point to any systematic effect of age on vowel shortening. However, because random variables cannot be included in conditional inference trees, I do not discount that these interactions could be related to a possible effect of speaker. This seems plausible especially given the high degree of inter- speaker variability.

5.4 Vowel quality

The results presented for possible changes in vowel quality due to weakening in section

4.4 show some variation in formant values when comparing voice weakened vowels to

198

those that are fully modally voiced, and when comparing shortened vowels to those that are not shortened. Significantly higher F1 values were found for weakened /a/ and /o/ when comparing them to fully modally voiced vowels. Significantly higher F2 values were found for weakened /a/ in female speakers only, and for weakened /o/ in male speakers only. Additionally, significantly lower F1 values were found for shortened /e/ and /o/ when compared to their non-shortened counterparts, but only for female speakers.

These results should be further explored in future research, but at this point they do not point to any systematic difference in the first two formants for voice weakened or shortened vowels. To emphasize this point, we may consider Barajas’ (2014) research on vowel reduction in Colongo, Michoacán. The author finds robust changes in vowel quality for mid-vowels with significantly lower F1 values for raised /e/, as well as significantly lower F1 and higher F2 in /o/ in certain contexts. This contrasts with the lack of systematic effect found here.

Furthermore, these results contradict previous impressionistic accounts that implied centralization was an aspect of vowel reduction in MCS (Lope Blanch 1963,

Serrano 2006). Although minor differences are found, they are not in the direction of the center of the vowel space. This is similar to what Delforge found for Cuzco Spanish based on her analysis of F1-F2 distance, and thus concluded that vowel reduction in

Cuzco Spanish is primarily devoicing. Similarly, I conclude that vowel reduction in MCS is primarily voice weakening and shortening.

199

5.5 Implications

This section will summarize the nature of vowel reduction in MCS, review previous formal analyses and their accounting for language and variety-specific patterns, and finally, propose an analysis that accounts for the patterns found in MCS.

5.5.1 Nature of vowel reduction in MCS

Based on the evidence presented thus far, we can conclude that vowel reduction in MCS primarily consists of voice weakening, which includes devoicing, weak/breathy voicing, and apparent deletion, as well as shortening, or reduced relative duration. The analysis of the linguistic and social patterning of each signals these as variable, but relatively stable phonetic processes, interacting with the segmental and prosodic structure of the language, but not much with the sociodemographic characteristics of speakers. The two processes act as complements to each other, as evidenced in part by the discrepancies between prosodic contexts that condition each. Pre-pausal position, that is, intonational and utterance final position, favors voice weakening but disfavors shortening, and similarly, post-tonic positions favor voice weakening while disfavoring shortening. Where the two processes coincide, however, is that they are both favored by voiceless preceding and following segmental contexts. Of the 78 tokens that are both shortened and weakened, 47 tokens, representing 60.26%, occur between two voiceless consonants, and only 6.41%

(n=5) neither follow nor precede a voiceless consonant – but 3 of those are followed by pauses. As discussed in 4.2, the results of the reading task from Dabkowski (2017)

200

indicate that reduction is not dependent on speech rate.

5.5.2 Previous formal analyses

This section outlines some previous formal analyses of phonetic vowel reduction cross- linguistically, and for Andean Spanish. These include explanations in terms of speech- rate induced gestural overlap (Dauer 1980, Jannedy 1995, Jun & Beckman 1993, 1994), feature geometry-based delinking and relinking of nodes (Lipski 1990), and a gestural constraint-based approach marrying the AP approach with the constraint ranking of

Optimality Theory (Delforge 2008b, 2009).

Dauer explains vowel reduction in Greek in terms of “physiological restrictions preventing the complete realization of vowels of very short duration” (1980: 27) at speech rates that are subjectively faster. The author invokes Lindblom’s undershoot hypothesis, an important aspect of his analysis of vowel reduction as evidence of a

“simple dynamic model of vowel articulation that contains two components: a source that emits signals that are isomorphic with linguistic categories and a set of responding structures” (1963: 1779). The idea is that “targets” for a vowel’s formant frequency or duration are aimed for and often hit, but may be undershot when there are too many articulatory movements taking place in close temporal succession. Likewise, Jun and

Beckman (1993, 1994) analyze similar processes of high vowel devoicing in Japanese and Korean as gestural overlap or blending, in which the glottal adduction gesture associated with the vowel is hidden by the laryngeal gestures associated with adjacent

201

consonants. Jannedy (1995) suggests the same for Turkish and proposes language- specific timing relations between glottal gestures.

As described in section 2.3.1, Lipski (1990) explains the higher degree of weakening for /e/ and /i/ in his Ecuadorian data using an articulator-based feature geometry model that organizes features in a hierarchy according to articulator nodes. In light of proposals modifying the traditional feature geometry model to address the similarity of the vocalic and consonantal segments, he proposes a model of unstressed vowel reduction in which the target vowel loses its [-consonant] specification and then, the vowel’s place node delinks from its root (which is the node containing the laryngeal features) and reattaches to the root node of the consonant (see section 2.3.1 for more details).

As mentioned in chapter 2, Delforge’s research on Cuzco Spanish (2008a, 2008b,

2009) provides a model of unstressed vowel devoicing that uses Gafos’ (2002) gestural alignment schema, which integrates the frameworks of Articulatory Phonology

(Browman & Goldstein 1989) and Optimality Theory (Prince & Smolensky 1993/2004).

Gafos’ schema proposes that each gesture, in addition to being specified for a constriction location and degree, includes a set of dynamic reference points, or landmarks, which are:

Onset, Target, Center, Release, and Offset. He posits that a series of gestural alignment constraints specify how landmarks are aligned in speech along the temporal dimension, and how landmarks for adjacent gestures line up. The constraints proposed to account for vowel devoicing are CV COORD, which specifies the alignment of gestures corresponding to each segment in a CV sequence, and VC COORD, which does the same for a VC

202

sequence. In this view, devoicing arises from differences in gestural alignment.

Delforge’s proposal is that CV COORD and VC COORD show cross-linguistic variation in the amount of gestural overlap between vowels and adjacent consonants, which can help to explain why devoicing in Andean Spanish is not dependent on speech rate, whereas for other languages, it seems to be more prevalent in rapid speech. In order to account for the difference in rates, i.e. mid vowel /e/ being affected at higher rates than /o/, Delforge, like

Lipski (1990) appeals to the homorganicity of adjacent segments, proposing a constraint called “*OVERLAP V//CHET” which stipulates that plateaus for adjacent heterorganic vowels and consonants may not overlap (2008b: 151). By positing constraints that either allow for or restrict gestural overlap, Delforge is able to account for patterns not explained by the interaction of decreases in the temporal distance between gestures based on speech rate. The high devoicing rates for unstressed vowels preceding coda /s/ are accounted for in this model by constraints governing each consonant’s intrasegmental organization of oral and glottal gestures.

The above analyses successfully account for the language-specific and variety- specific data that they were developed for. However, since vowel reduction in MCS, like in Andean Spanish, seems to operate independent of speech rate, occurring even in slow, careful speech elicited in a reading task (Dabkowski 2017), an analysis conceiving of gestural overlap due to the compression of gestures in a shorter temporal dimension does not accurately account for overlap taking place in slower speech. Similarly, Lipski’s analysis is inadequate when applied to the MCS data because it only accounts for reduction of /i/ and /e/ adjacent to /s/. Likewise, Delforge’s analysis successfully accounts

203

for the patterns found in the Cuzco data, but over specifies constraints that are not needed to account for the observed patterning in MCS, in particular the much higher rates of voice weakening for /o/ in MCS versus Cuzco Spanish. Another major difference between patterns is the fairly low vowel devoicing rates for /o/ reported by Delforge for word-initial and word-internal positions, as compared to the higher rates for MCS. The other major difference is that since Delforge’s acoustic analysis only revealed 5% of the reduction in her sample to be shortened but fully voiced, her analysis naturally focuses on constraints that affect voicing rather than any that might be proposed to account for durational differences. This is another reason why another reason why Delforge’s analysis would not be able to account for the MCS findings.

5.5.3 Proposed analysis

Given the above findings, the question that remains is how to explain the variable and gradient behavior of vowel reduction in Mexico City Spanish. In this section, I bring together the possible explanations for the patterning found that were explored in 5.1 –

5.4. Any model proposed to account for this behavior must consider the linguistic patterning observed for both voice weakening and shortening of vowels in this variety, that is, the resistance of /a/ to either process, the preference of both processes for voiceless contexts, in particular voiceless following consonants, the differential preference for pre-tonic and post-tonic syllables, and the differential preference for phrase-final and phrase-internal positions.

An appealing explanation for both the higher rates of voice weakening in post-

204

tonic syllables, as well as the higher rates of shortening in pre-tonic syllables, is reduced gestural magnitude and reduced temporal duration of gestures in unstressed positions.

Under this view, the post-tonic syllable is weakest from an articulatory perspective due to the fact that lexical stress has already been realized and articulators can then “relax”, before they prepare for the next hyperarticulated stressed vowel (de Jong 1995).The difference in the preference of post-tonic position for voice weakening and pre-tonic position for shortening, may be explained by an effect of compensatory shortening

(Fowler 1981, among others), a process through which an unstressed syllable may be shortened, in order to highlight a proximal stressed syllable, that is, to create the appearance of a longer, more prominent stressed vowel. Spanish is said to be a syllable- timed language, in which all syllables have relatively equal length, as opposed to stress- timed languages like English, in which the length between stressed syllables is equal

(Pike 1945). Even though it has been shown that stressed vowels are slightly longer than unstressed vowels in Spanish, the difference is not as great as that found between stressed and unstressed English vowels. I hypothesize that MCS pre-tonic vowels are shortened at higher rates to compensate for the similar durations between stressed and post-tonic vowels, although this could only be the case for the non-high vowels, as they all have very similar average duration measurements in stressed and post-tonic positions, but the stressed high vowels are considerably longer than their post-tonic counterparts.

The preference of both voice weakening and shortening for voiceless preceding and following contexts is compatible with an analysis involving the overlapping of gestures related to the onset and offset of voicing in CV and VC sequences. The variable

205

alignment and timing of these gestures can result in voice weakening or shortening, or less frequently, both. The gestural alignment constraints proposed by Delforge (2008b,

2009), conceived of in terms of phase windows (Byrd, 1996) rather than static points to account for the variability and gradience of unstressed vowel devoicing in Cuzco

Spanish, can also be applied here. The constraints that Delforge posited are CV

COORDA, which calls for the alignment of any point ranging from C Onset to C Center with V Onset, and VC COORDA, specifying the alignment of any point ranging from V

Target to V Release with C Target. Figure 5.4 shows example of how these constraints work, in cases that result in the gestural overlap that would cause devoicing, as well as in cases with full modal voicing. In (9a) and (9b), we can see how CV COORDA operates to align the V Onset within the phase window between C Onset and C Center. V Onset may align with a point toward the middle of the window and result in sufficient overlap for devoicing in (9a), or it may align at a point toward the end of the window at C Center, which does not result in sufficient overlap to cause devoicing (9b). (9c) and (9d) show how VC COORDA operates to align the C Target within the window of the V Target and

V Release. In (9c), the C Center aligns at a point toward the middle of the phase window, which results in sufficient overlap to cause devoicing, and in (9d), the C Center aligns at a point toward the end of the phase window, closer to the V Release, which results in insufficient overlap to cause devoicing. By specifying the alignment of glottal gestures related to voicing in this way, these constraints can successfully account for both the voice weakening found in many vowels adjacent to voiceless consonants, but also for the majority of vowels produced with full modal voicing. Another set of constraints with

206

stricter coordination requirements is also proposed, and in other dialects of Spanish without vowel reduction, those stricter constraints are active, while they are low-ranked in MCS.

Figure 5.4. Illustration of CV COORDA and VC COORDA, from Delforge (2008b: 150)

Whereas both voice weakening and shortening prefer voiceless contexts, their differential application when the vowel is followed by a pause can be explained by a combination of the drop in sub-glottal pressure throughout the time course of an utterance

(Jaeger 1978), and pre-boundary lengthening, specifically phrase-final lengthening in this case. Pre-boundary lengthening is a phenomenon thought to be universal (cf. Fletcher

2010 for a list of accounts of pre-boundary lengthening in various languages) that affects the duration of syllables in intonational phrase-final and utterance-final positions, such that they are consistently longer than when they occur in non-final positions. Edwards et al. (1990) describe two different strategies used by speakers to control phrase-final

207

lengthening: decreasing intragestural stiffness, which was used to slow down the tempo, and modifying intergestural phasing to decrease overlap, which was used to increase the duration of accented syllables compared to unaccented syllables. The behavior of MCS voice weakening and shortening can be explained by the decrease of intragestural stiffness that occurs as a result of final lengthening. This decrease in stiffness, or gestural relaxing, results in longer vowel durations, but increased voice weakening as a result of the loss of stiffness in the movements of the cricothyroid muscles responsible for adducting the vocal folds. This laxing combined with the decrease in sub-glottal pressure as the end of an utterance approaches accounts for the increased rates of voice weakening in pre-pausal vowels.

There was no significant effect of target vowel on voice weakening in the main effects of either model, but the additional parameters used in the beta-inflated model did show some significant differences, as discussed in 4.3.2. Overall, all vowels are more likely than /a/ to contain 0% full modal voicing. Both /e/ and /o/ are more likely than /a/ to contain 100% full modal voicing. Another way of stating the above is that /a/ is unlikely to contain 0% full modal voicing, and less likely than mid vowels to contain

100% full modal voicing. This result suggests that when /a/ does weaken, it may be more subject to partial weakening than other vowels. An examination of the patterning of /a/ with regard to the different categories of voice weakening (see section 3.4.2) confirms this: 65.05% of all weakened tokens of /a/ were either partially weakened or partially devoiced, whereas for /e/, only 40.00% of the weakening was partial, and for /o/, 40.29%.

Additionally, /e/ and /o/ both show substantially higher rates of “apparently deleted”

208

tokens than /a/. Only 1.94% (n=2) of weakened /a/ tokens appeared to be deleted, as opposed to 20.00% of /e/ (n=20), and 8.63% (n=12) for /o/.

In addition to the above results for the continuous model of voice weakening, /a/ is also the vowel least likely to be shortened. It shows the longest average duration, for both stressed and unstressed observations. The apparent stability of /a/ can be related to its articulation. Articulatorily, /a/ is produced with the largest jaw opening and the lowest tongue position, in the center of the oral cavity in terms of backness. Longer comparative duration and a larger magnitude of jaw movement imply that the articulators have plenty of time to adjust from producing the previous consonant to producing /a/: if not already adducted, the vocal folds have enough time to come together to produce modal voicing, and the tongue has plenty of time to move into a low central position and thus produce a fully modally voiced, and non-shortened vowel.

This stability of /a/ contrasts with the less stable behavior of non-low vowels, in particular mid vowels /e/ and /o/, which both shorten and weaken at higher rates. It is therefore necessary to understand how /e/ and /o/ differ in terms of their production and patterning with regard to adjacent consonants in order to understand why they should be less stable than /a/. Both mid vowels are produced with a fairly similar tongue height, but differ in their horizontal position from the front of the oral cavity (/e/) to the back (/o/).

For /e/, the tongue body is produced closer to the front of the mouth, nearby to the place of articulation of consonants like /s/, which led Lipski (1990) and Delforge (2008a,

2008b, 2009) to propose an effect of homorganicity accounting for the high rates of front vowel weakening in Andean Spanish. However, /o/ is produced with the tongue body

209

further back in the oral cavity, so, if there were an effect of homorganicity, we might expect that /o/ would weaken more in contexts with back consonants. This is not borne out by the MCS data, however. Of all weakened vowels in velar contexts that is, either preceded by or followed by a velar consonant, (or both), 33.33% are /o/. Of all weakened tokens of /o/, 23.74% are in velar contexts. Overall, /o/ weakens at a rate of 7.62% in this context, which is only slightly more than /e/ in velar contexts, at 6.28%. Even more important to consider is the rate at which /o/ weakens in non-velar contexts. Table 5.17 shows the percentage of weakening of /o/ accounted for by velar contexts, percentage of weakening in velar contexts accounted for by /o/, and the weakening rates for /o/ in velar contexts. Table 5.18 shows the percentage of voice weakening of /o/ accounted for by dental/alveolar contexts, percentage of weakening in dental/alveolar contexts accounted for by /o/, and the weakening rates for /o/ in dental/alveolar contexts. When we examine the rates for /o/ weakening in dental/alveolar contexts as compared to velar contexts, the most important finding is that /o/ weakens at lower rates in velar contexts, which signals that the homorganicity of CV and VC segments does not appear to influence voice weakening in MCS as it is proposed to do in Andean Spanish. To further establish this point, we can examine the rates of /e/ in dental/alvolear contexts, shown in table 5.19.

The percentage of /e/ weakening accounted for by either a preceding or following dental/alveolar context is high 91.00% due to the preponderance of these contexts, but the percentage of weakening in dental/alveolar contexts that /e/ is responsible for is only

27.49%. The overall weakening rate of /e/ in dental/alveolar contexts is only 5.94%.

210

Table 5.17. Percentage of voice weakening and voice weakening rates for /o/ in velar contexts

Number of Voice Number of Percentage of Percentage Voice Weakened tokens /o/ of velar weakening Tokens weakening weakening rate Following velar 28 297 20.14% 28.28% 9.43% Preceding velar 8 149 5.76% 8.08% 5.37% Either 33 433 23.74% 33.33% 7.62%

211

Table 5.18. Percentage of voice weakening and voice weakening rates for /o/ in dental/alveolar contexts

Number of Voice Number Percentage of Percentage Voice Weakened of tokens /o/ weakening of dental weakening Tokens /alveolar rate weakening Following dental /alveolar 86 826 61.87% 41.75% 10.41% Preceding dental /alveolar 58 929 41.73% 24.17% 6.24% Either 107 1277 76.98% 32.33% 8.38%

Table 5.19. Percentage of voice weakening and voice weakening rates for /e/ in dental/alveolar contexts

Number of Voice Number Percentage of Percentage Voice Weakened of tokens /e/ weakening of dental weakening Tokens /alveolar rate weakening Following dental /alveolar 64 1188 64.00% 31.07% 5.39% Preceding dental /alveolar 64 1013 64.00% 26.67% 6.32% Either 91 1533 91.00% 27.49% 5.94%

Given the above discussion, it seems that homorganicity of C and V segments does not induce more overlap, and hence, we can conclude that the constraint

*OVERLAP V//CHET does not apply to the MCS patterns. While it may be able to

212

account for weakening of front vowels that takes place in the context of a “front” consonant, it fails to account for front vowels weakening in other segmental contexts and fails to account for the majority of /o/ weakening. We can thus discard this constraint as irrelevant for MCS.

However, in undertaking the above analysis of possible influence of place of articulation on gestural overlap, the sheer difference in frequency of occurrence between the dental/alveolar and velar contexts could not be ignored: there are nearly three times more dental/alveolar contexts than velar contexts. In this corpus, /s/ is the most frequently occurring following context, and vowels followed by /s/ weaken at 14.02%, the highest rate of any following context except a pause. It should also be noted that /s/ appears word-finally in a number of highly frequent inflectional verbal and nominal morphemes, including second-person present singular verbal suffixes, first-person plural verbal suffixes, and plural noun suffixes. The patterning for voice weakening and shortening of vowels in MCS is compatible with a gestural overlap account, in which the amount of overlap is determined by a combination of segmental voicing, prosodic position, and phonemic and lexical frequency.

213

CHAPTER 6: CONCLUSION

6.1 Conclusions

Based on the evidence and discussion presented in chapters 4 and 5 of this dissertation, vowel reduction in Mexico City Spanish can be explained as a highly variable, gradient phenomenon comprised of two reductive processes: 1) voice weakening, which can be partial or complete, and includes devoicing, weak or breathy voicing, and voicing produced with low amplitude; and 2) shortening, which involves a reduction in the duration of the vowel. The processes correlate with several linguistic variables, those being stress, position relative to stress, preceding and following contexts, and target vowel; but they do not correlate with any of the social variables considered, which are age, gender, and socioeconomic status. In addition to not being associated with any broad sociodemographic categories, there also appears to be very little awareness of the feature among local speakers, and thus, no social evaluation associated with it.

I have shown here that the patterns found for each of these processes are compatible with a gestural overlap account, in which the amount of overlap is determined not by speech rate, but by a combination of the voicing specification and syllabic affiliation of adjacent segments, and the position within various levels of the prosodic hierarchy.

214

The remainder of this chapter includes a summary of the major contributions of this research in 6.2, a reflection on the methodological decisions involved in 6.3, limitations of the study in 6.4, and several areas for further exploration in 6.5.

6.2 Contributions

This dissertation comprises the first investigation to consider acoustic data in the description and analysis of vowel reduction in Mexico City Spanish (MCS). It is also the first study that systematically investigates the effects of both linguistic and social variables on voice weakening and shortening using mixed effects regression analyses.

Furthermore, it is the first study to approach these phenomena as both continuous and categorical variables and analyze them as such, thus offering a more nuanced examination of this phenomenon than previous studies of MCS vowel reduction. The acoustic analysis led to a major contribution of this study: the distinction between voice weakening and shortening and their identification as two complementary processes that target different prosodic positions to contribute to vowel reduction in this variety.

My analysis builds on the foundational impressionistic investigation of MCS vowel reduction undertaken by Lope Blanch (1963), and updated by Serrano (2006). I contribute acoustic evidence to the validity of the categories of reduction that Lope

Blanch and Serrano had proposed by using auditory impressions, further confirming the gradience and variability of reduction in this variety. Additionally, I extend Delforge’s comprehensive analysis of Cuzco Spanish beyond that regional variety, and thus offer a more complete picture of vowel reduction in Spanish, supported by acoustic evidence.

215

In addition to offering a more complete understanding of the nature of this phonetic process, this study contributes to the depth of detail included in the dialectological literature on Spanish, bringing further attention to variation in vowel production, and additional evidence that while vowels do not seem to exhibit quite the same degree of variation associated with Spanish consonants, they are not as stable as often described. By shedding light in particular on voice weakening in vowels, it places

Mexico City Spanish within the typology of languages and varieties that exhibit voice weakening, including devoicing in vowels, thus bringing new data to the study of the phenomenon. The ongoing documentation and analysis of vowel phenomena in MCS may prompt new discussions and exploration of voicing variation in vowels of other

Spanish varieties.

6.3 Methodological considerations

As mentioned above, this is the first analysis of MCS vowel reduction to use acoustic analysis of natural speech data, which contributes greatly to our understanding of the gradience involved in the processes, especially in that voicing may be weak or partial.

Additionally, the use of parametric and non-parametric statistical models to complement each other substantially benefitted the analyses by providing information that otherwise could have been overlooked. The parametric logistic regression models provided information about which individual independent variables best predict voice weakening and shortening, while also allowing random effects, like that of speaker, which mitigates the violation of the assumption of independence of each observation. The conditional

216

inference trees are non-parametric models that show how independent variables interact to predict the dependent variables. These were important tools to use because, given the low rates of voice weakening and shortening found, it would not have been possible to submit interactions to the logistic regression modelling due to the resultant empty cells, so the conditional inference trees provide crucial information not obtained elsewhere.

Because random variables cannot be added to conditional inference trees, they are best used to supplement models that can especially for data with a large amount of inter- participant variation, like the current study. In addition to logistic regression and conditional inference trees for the analysis of the categorical variables, linear regression was used for the continuous measure of vowel duration, and inflated beta regression, an innovative type of regression designed to work with percentages, for the continuous measure of voice weakening.

One aspect of the statistical analysis that bears further consideration is the decision to investigate shortening separately from voice weakening, especially because it differs from previous analyses of vowel reduction that either ignored shortening or treated it as a stage in a gradient process. While it would also be interesting to carry out an ordinal regression analysis with shortening as the initial stage of vowel reduction, I believe that it was necessary to investigate it as a variable in and of itself, especially because it affects different prosodic positions than voice weakening does. By acting as a complement to voice weakening, shortening extends the contextual limits of vowel reduction.

217

A final methodological consideration was the process by which a vowel was determined to be shortened in this analysis, by comparing its duration to the average duration based on vowel type, stress, and speaker. This was one way of capturing durational variations related to speech rate (since all data for each speaker occurred within about 2 minutes of the interview, it was assumed that speech rate would be relatively similar throughout). While forcing a categorical distinction in this way between shortened and non-shortened vowels is not without problems, it has proved to be a better option than setting an arbitrary duration measurement as a cutoff for classifying a vowel as shortened. Future research should consider normalizing duration values by means of the method used by Estebas-Vilaplana (2010) and García (2016), i.e. dividing the segment duration by the sentence or utterance duration.

6.4 Limitations of this study

Having carried out the data collection and analysis for this study, there are several ways in which the analysis could be improved. The biggest limitation involved the extremely low rates of voice weakening and shortening that were found, and the number of variables considered in the statistical analyses. Because of these low rates of occurrence of both weakening and shortening, cross-tabulating frequencies for the dependent variables of voice weakening and shortening across categories of predictor variables resulted in several empty cells, so ultimately two measures needed to be taken: 1) the levels of some predictor variables were collapsed together, and 2) not all factors could be submitted to the same regression analyses. For the former, the levels were collapsed in a

218

way that, as much as possible, a coherent argument could be made that they indeed are phonologically similar (e.g., unstressed monosyllabic words and pre-tonic syllables), in addition to ensuring that the levels being collapsed patterned similarly with regard to the dependent variable. The latter was mitigated by comparing model fit across all possible combinations of predictor variables that did not result in empty cells, and I am confident that this method accurately selected the models that best account for the variation in each process. That being said, in light of the low rates of both voice weakening and shortening, expanding the data sample to avoid such collapsing should be a goal for future research involving natural speech data.

6.5 Further research

This dissertation is a first step in gaining a better understanding of the gradience and variation in the voicing and duration of Mexico City Spanish vowels. Additional data from a larger and more varied sample of speakers would benefit the analysis in at least two ways: 1) it would allow for additional observations to fill any empty cells in the cross-tabulation matrix for multiple variables, allowing more variables to be assessed by the same model; and 2) by filling empty cells, it would allow for a gradient analysis using ordinal regression, instead of the binary distinction made here. This topic is also ripe for additional research on topics such as phonetic perception, sociolinguistic perception and attitudes, exploration of social or interactional meaning in the absence of any correlation with broad social categories, as well as many others. Additionally, the effect of speech rate should be examined systematically in light of the results reported in Dabkowski

219

(2017) and summarized above in 5.5.1. Objective and subjective measures of speech rate should each be employed in order to understand any potential relationship between speech rate and vowel reduction in MCS.

Future studies of voice weakening will benefit from a more fine-grained phonetic analysis of what I called “weak/breathy voicing” here. It would be ideal to analyze the data in light of acoustic correlates of breathy voice in order to distinguish between vowels that were simply produced with lower amplitude relative to surrounding sounds and truly breathy voiced vowels. An analysis including the first harmonic amplitude, additive noise, and spectral tilt (Hillenbrand et al. 1994) of MCS vowels could help shed light on this “intermediate” voicing category and provide further evidence advancing the argument that this is a gradient process.

It would also be ideal to investigate the amount of shortening as a continuous variable. Here, I only analyzed the duration measurement itself as a continuous variable, but a measure such as “percent shorter”, i.e. the percent difference of a vowel observation’s duration from its average would be a more informative way of understanding the degree of shortening from a gradient perspective.

Finally, a question that has not been addressed here and will be left for future research is the source of vowel reduction in Mexico, and its apparent lack of social evaluation. Delforge (2009) has related the vowel reduction in Andean Spanish to prolonged contact with Quechua, which also reduces unstressed vowels in similar ways.

Nahuatl is the indigenous language most widely spoken in Central Mexico, but I am not aware of research describing vowel devoicing in Nahuatl vowels. There does, however,

220

seem to be a phonological length distinction in Nahuatl vowels (SIL Mexico, 2018), the effects of which on MCS shortening should be investigated further.

The combination of the low overall rates found for vowel reduction and the observation that it seems to be a very salient feature of the dialect, at least to non-native speakers, leads to the question of why something that occurs so infrequently should be so salient. One of the possible reasons for this perception could be that voice weakening occurs in prosodic positions that are perceptually salient. Although unstressed syllables are not considered to be perceptually salient, it may be that unstressed syllables marking boundaries in fact do stand out more to the listener. The primary loci for voice weakening are post-tonic, word-final syllables closed by /s/, and those occurring before a pause, both positions that mark a boundary. Additionally, the comparatively high rates of reduction of non-high vowels may add to the perceptual salience of the feature. Further complicating this question is the observation that neither voice weakening nor shortening appear to be at all salient among capitalinos, and it would be interesting to probe this lack of noticing and/or awareness in future perception studies.

221

REFERENCES

Aaron, Jessi Elana. 2004. The gendered use of salirse in Mexican Spanish: Si me salía yo con las amigas, se enojaba. Language in Society 33, 585–607.

Adams, Catalina. 2002. Strong assibilation and prestige: A sociolinguistic study in the Central Valley of Costa Rica. Unpublished Ph.D. dissertation, University of California, Davis.

Almeida, Manuel. 1986. La cantidad vocálica en el español de Canarias. Estudio acústico. Revista de Filología de la Universidad de La Laguna, 5, 73-82.

Arvaniti, Amalia. 1994. Acoustic features of Greek rhythmic structure. Journal of phonetics 22, 239-263.

Ash, Sharon. 2002. Social class. The handbook of language variation and change, ed. by J. K. Chambers, Peter Trudgill, and Natalie Schilling-Estes, 402-22. Oxford: Blackwell.

Audacity Team. 2014. Audacity®: Free Audio Editor and Recorder [Computer program]. Version 2.0.0 retrieved April 20th 2014 from http://audacity.sourceforge.net/.

Babel, Anna. 2016. Awareness and control in sociolinguistic research, ed. by Anna Babel. Cambridge: Cambridge University Press.

Barajas, Jennifer. 2014. A Sociophonetic Investigation of Unstressed Vowel Raising in the Spanish of a Rural Mexican Community. (Doctoral dissertation, The Ohio State University).

Barnes, Sonia. 2013. Morphophonological variation in urban asturian spanish: language contact and regional identity. (Doctoral dissertation, The Ohio State University).

Bates, Douglas, Martin Maechler, and Ben Bolker. 2011. lme4 package for R. Madison: University of Wisconsin-Madison, R Foundation for Statistical Computing and McMaster University.

Beckman, Mary E., Jan Edwards, and Janet Fletcher. 1993. Prosodic structure and tempo in a sonority model of articulatory dynamics. Papers in Laboratory Phonology II: Segment, Gesture, and Prosody: 68-86.

Beckman, Mary, and Atsuko Shoji. 1984. Spectral and perceptual evidence for CV coarticulation in devoiced /si/ and /syu/ in Japanese. Phonetica 41.2. 61-71.

222

Berk-Seligman, Susan and Mitchell A. Seligman. 1978. The phonological correlates of social stratification in the Spanish of Costa Rica. Lingua 46.1 -28.

Boersma, Paul and Weenink, David 2016. Praat: doing phonetics by computer [Computer program]. Version 6.0.14, retrieved 11 February 2016 from http://www.praat.org/

Borzone de Manrique, Ana María and Angela Signorini. 1983. Segmental duration and rhythm in Spanish. Journal of Phonetics, 11, 117-128.

Boyd-Bowman, Peter. 1952. La pérdida de vocales átonas en la altiplanicie mexicana. Nueva Revista de Filología Hispánica, 6, No. 2, 138-140.

Boyd-Bowman, Peter. 1960. El habla de Guanajuato. Santiago de Compostela: Imprenta universitaria.

Browman, Catherine and Louis Goldstein. 1989. Articulatory gestures as phonological units. Phonology, 6(2), 201-251.

Browman, Catherine and Louis Goldstein. 1992. Articulatory phonology: an overview. Phonetica 49. 155-180.

Byrd, Dani. 1995. C-centers revisited. Phonetica 52: 285-306.

Byrd, Dani. 1996. A phase window framework for articulatory timing. Phonology 13. 139-169.

Canellada de Zamora, María Josefa and Alonso Zamora Vicente. 1960. Vocales caducas en el español mexicano. Nueva Revista de Filología Hispánica 14, 222-241.

Cárdenas, Negrete Daniel. 1967. El español de Jalisco: Contribución a la geografía lingüística hispanoamericana. Revista de filología española, Anejo 85.

Cedergren, Henrietta. 1973. The interplay of social and linguistic factors in Panama. Unpublished doctoral dissertation: Cornell University.

Cedergren, Henrietta. 1986. Metrical structure and vowel deletion in Montreal French. Sankoff, David (ed.). Diversity and Diachrony, 293-300.

Cedergren, Henrietta and Louise Simoneau. 1985. La chute des voyelles hautes en francais de Montreal: ‘As-tu entendu la belle syncope?' Les tendentes dynamiques du francais parle a Montreal, ed. by Monique Lemieux and Henrietta Cedergren, 57-144. Quebec: Bibliotheque nacionale du Quebec.

Chambers, Jack K. 2003. Sociolinguistic Theory (2nd ed.). Oxford, UK: Blackwell. 223

Chládková, Kateřina, Paola Escudero, and Paul Boersma. 2011. Context-specific acoustic differences between Peruvian and Iberian Spanish vowels. The Journal of the Acoustical Society of America, 130(1), 416-428.

Clements, George N. 1976. Palatalization: Linking or assimilation? Chicago Linguistic Society 12. 96-109.

Clements, George N. and Elisabeth Hume. 1995. The internal organization of speech sounds. In J. Goldsmith (ed.), The Handbook of Phonological Theory, 245-306.

Cole, Jennifer, José Ignacio Hualde, and Khalil Iskarous. 1999. Effects of prosodic and segmental context on/g/-lenition in Spanish. In Proceedings of the fourth international linguistics and phonetics conference vol. 2, 575-589.

Dabkowski, Meghan F. 2017. Speech rate effects on Mexico City Spanish vowel weakening. Poster presented at Phonetics and Phonology in Europe. Cologne, Germany, June 13, 2017.

Dauer, Rebecca. 1980. The reduction of unstressed high vowels in Modern Greek. Journal of the International Phonetic Association 10, 17-27.

De Jong, Kenneth J. 1995. The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. The Journal of the Acoustical Society of America, 97(1), 491-504.

De Jong, Kenneth. 1998. Stress-related variation in the articulation of coda alveolar stops: Flapping revisited. Journal of Phonetics, 26(3), 283-310.

De Los Heros, Susana. 1997. Language variation: The influence of speakers' attitudes and gender on sociolinguistic variables in the Spanish of Cuzco, Peru. Unpublished Ph.D. dissertation, University of Pittsburgh.

Delattre, Pierre. 1966. A comparison of syllable length conditioning among languages. IRAL-International Review of Applied Linguistics in Language Teaching 4.1- 4:183-198.

Delforge, Ann Marie. 2008a. Unstressed vowel reduction in Andean Spanish. Selected proceedings of the 3rd Conference on Laboratory Approaches to , ed. by Laura Colantoni and Jeffrey Steele, 107-124. Somerville, MA: Cascadilla Proceedings Project.

Delforge, Ann Marie. 2008b. Gestural Alignment Constraints and Unstressed Vowel Devoicing in Andean Spanish. Proceedings of the 26th West Coast Conference on

224

Formal Linguistics, ed. by Charles B. Chang and Hannah J. Haynie, 147-155. Somerville, MA: Cascadilla Proceedings Project.

Delforge, Ann Marie. 2009. The rise and fall of unstressed vowel reduction in the Spanish of Cusco, Peru: A sociophonetic study. (Doctoral dissertation, University of California Davis)

Delforge, Ann Marie. 2012. ‘Nobody wants to sound like a provinciano’: The recession of unstressed vowel devoicing in the Spanish of Cusco, Perú. Journal of Sociolinguistics, 16(3), 311-335.

Eckert, Penelope. 1997. Age as a sociolinguistic variable. In F. Coulmas (ed.), The Handbook of Sociolinguistics. Oxford: Blackwell. 151-167.

Eckert, Penelope. 2004. Variation and sense of place. In Carmen Fought (ed.) Sociolinguistic Variation: Critical Reflections. Oxford: Oxford University Press.107–118.

Eckert, Penelope. 2008. Variation and the indexical field. Journal of Sociolinguistics 12(4). 453–476.

Eckert, Penelope. 2012. Three Waves of Variation Study: The Emergence of Meaning in the Study of Sociolinguistic Variation. Annual Review of Anthropology 41, 87- 100.

Eckert, Penelope and Sally McConnell-Ginet. 1992. Think Practically and Look Locally: Language and Gender as Community-Based Practice. Annual Review of Anthropology. 21: 461-490.

Edwards, Jan, Mary E. Beckman, and Janet Fletcher. 1991. The articulatory kinematics of final lengthening. The Journal of the Acoustical Society of America 89.1, 369- 382.

Estebas Vilaplana, E. 2010. The role of duration in intonational modelling. A comparative study of Peninsular and Argentinean Spanish. Revista Española de Lingüística Aplicada, 23, 153-173.

Fletcher, Janet. 2010. The prosody of speech: Timing and rhythm. The Handbook of Phonetic Sciences, Second Edition (2010): 521-602.

Flórez, Luis. 1951. La pronunciación del español en Bogotá. Bogotá: Publicaciones del Instituto Caro y Cuervo, VIII.

225

Fontanella de Weinberg, Maria. 1972. Un aspecto sociolinguistico del espanol bonarense: la - s en Bahia Blanca. Bahia Blanca: Cuademos de Linguistica

Fowler, Carol. A. 1981. Production and perception of coarticulation among stressed and unstressed vowels. Journal of Speech, Language, and Hearing Research, 24(1), 127-139.

Fujimoto, Masako, Emi Murano, Seiji Niimi, and Shigeru Kiritani. 2002. Differences in glottal opening pattern between Tokyo and Osaka dialect speakers: factors contributing to vowel devoicing. Folia Phoniatrica et Logopaedica 54, 3: 133- 143.

Gafos, Adamantios. 2002. A grammar of gestural coordination. Natural Language and Linguistic Theory 20:269-33.

Garcia, Christina Marie. 2015. Gradience and Variability of Intervocalic /s/ Voicing in Highland Ecuadorian Spanish. (Doctoral dissertation, The Ohio State University).

García, Miguel. 2016. Sobre la duración vocálica y la entonación en el español amazónico peruano. Lengua y Sociedad, 14(2), 5-29.

Garrido, Marisol. 2007. Diphthongization of Mid/Low Vowel Sequences in Colombian Spanish. Selected Proceedings of the Third Workshop on Spanish Sociolinguistics, ed. by Jonathan Holmquist, Augusto Lorenzino, and Lotfi Sayahi, 30-37. Somerville, MA: Cascadilla Proceedings Project.

Gomez, Rosario. 2003. Sociolinguistic correlations in the Spanish spoken in the Andean region of Ecuador in the speech of the younger generation. Unpublished Ph.D. dissertation. University of Toronto.

Gordon, Alan. 1980. Notas sobre la fonética del castellano en Bolivia. Actas del sexto congreso internacional e hispanistas, ed. by Alan Gordon & Evelyn Rugg, 349- 352. Toronto: Department of Spanish and Portuguese, University of Toronto.

Gordon, Matthew. 1998. The phonetics and phonology of non-modal vowels: A cross- linguistic perspective. Proceedings of the Berkeley Linguistics Society 24, 93- 105.

Guion, Susan G. 2003. The vowel systems of Quichua-Spanish bilinguals. Phonetica, 60, 98-128.

Guy, Gregory. 1988. Language and social class. In Frederick J. Newmeyer (ed.), Language: the socio-cultural context, 37–63. New York: Cambridge University Press. 226

Hay, Jennifer and Katie Drager. 2007. Sociophonetics. Annual Review of Anthropology 36, 89-103.

Hernández, Edith. 2009. Resolución de hiatos en verbos-ear: un estudio sociofonético en una ciudad mexicana. Columbus, OH: The Ohio State University, doctoral dissertation.

Hillenbrand, James, Ronald A. Cleveland, and Robert L. Erickson. 1994. Acoustic correlates of breathy vocal quality. Journal of Speech, Language, and Hearing Research 37.4, 769-778.

Holmquist, Jonathan C. 1985. Social correlates of a linguistic variable: A study in a Spanish village. Language in Society, 14, 192-203.

Holmquist, Jonathan C. 1998. High-lands high vowels: A sample of men’s speech in rural Puerto Rico. Papers in Sociolinguistics: NWAVE-26 a l’Université Laval, ed. by Claude Paradis, Diane Vincent, Denise Deshaies & Marty LaForest, 73-79. Quebec: Université Laval.

Holmquist, Jonathan C. 2005. Social stratification in women’s speech in rural Puerto Rico: A study of five phonological features. Selected Proceedings of the Second Workshop on Spanish Linguistics, ed. by Lotfi Sayahi and Maurice Westmoreland, 109-119. Somerville, MA: Cascadilla Proceedings Project.

Hothorn, Torsten, Kurt Hornik and Achim Zeileis. 2006. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651-674.

Hualde, José Ignacio. 1989. Autosegmental and metrical spreading in the vowel-harmony systems of Northwestern Spain. Linguistics, 27, 773-805.

Hualde, José Ignacio, and Benjamin Sanders. 1995. A new hypothesis on the origin of the Eastern Andalusian Vowel System. Proceedings of the Berkeley Linguistics Society, 426-437. Berkley, CA: Berkley Linguistics Society.

Hualde, José Ignacio. 2005. The Sounds of Spanish. Cambridge University Press.

Hualde, José Ignacio, Miquel Simonet, and Marianna Nadeu. 2011. Consonant lenition and phonological recategorization. Laboratory Phonology, 2(2), 301-329.

Hundley, James. 1983. Linguistic variation in Peruvian Spanish: Unstressed vowels and /s/.

227

Jaeger, Jeri J. 1978. Speech aerodynamics and phonological universals. Annual Meeting of the Berkeley Linguistics Society. Vol. 4.

Jannedy, Stefanie. 1995. “Gestural phasing as an explanation for vowel devoicing in Turkish”. OSU Working Papers in Linguistics 45:56-84.

Jun, Sun-Ah and Mary E. Beckman. 1993. A gestural overlap analysis of vowel devoicing in Japanese and Korean. Paper presented at the 1993 Annual Meeting of the Linguistic Society of America, 7-10 January 1993, Los Angeles, CA. USA.

Jun, Sun-Ah and Mary E. Beckman. 1994. Distribution of devoiced high vowels in Korean. Proceedings of the International Conference on Processing. Yokohama, Japan, Vol. 2, 479-482.

Jun, Sun-Ah, Mary E. Beckman, Seiji Niimi and Mark Tiede. 1997. Electromyographic evidence for a gestural-overlap analysis of vowel devoicing in Korean." Journal of Speech Sciences 1: 153-200.

Keating, Patricia. 1988. Underspecification in phonetics. Phonology 5. 275-292.

Labov, William. 1966. The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics.

Labov, William 1969. “Contraction, deletion and the inherent variability of the English copula”, Language 45, 715-762.

Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.

Labov, William. 1984. Field methods of the project on linguistic change and variation. In J. Baugh, and J. Sherzer (eds.), Language in Use: Readings in Sociolinguistics. Englewood Cliffs, NJ: Prentice Hall. 28-66.

Labov, William. 1990. The intersection of sex and social class in the course of linguistic change. Language Variation and Change 2: 205-254.

Larson, Magali Sarfatti and Arlene Eisen Bergman. 1969. Social Stratification in Peru. Berkeley, CA: Institute of International Studies, University of California.

Lastra, Yolanda, and Pedro Martín Butragueño. 2009. Corpus sociolingüístico de la ciudad de México. Materiales de PRESEEA–MÉXICO. México: El Colegio de México.

Laver, John. 1993. Principles of phonetics. Cambridge: Cambridge University Press.

228

Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, MA: MIT Press.

Lindblom, Björn. 1963. Spectrographic study of vowel reduction. The Journal of the Acoustical Society of America 35.11. 1773-1781.

Lipski, John. 1990. Aspects of Ecuadorian Vowel Reduction. Hispanic Linguistics 4(1), 1-19.

Lope Blanch, Juan M. 1963. En torno a las vocales caedizas del español mexicano. Nueva Revista de Filología Hispánica 17, 1-20.

Lope Blanch, Juan M. 1979. Sobre el tratamiento de –e, -o finales en el español de México. Investigaciones sobre dialectología mexicana, ed. by Juan M. Lope Blanch, 35-40. México, D.F.: Universidad Nacional Autónoma de México.

Lopez Morales, Humberto. 1986. Velarization of /n/ in . Diversity and diachrony, ed. by David Sankoff, 105-113. Philadelphia: John Benjamins.

Luria, Max A. 1930. A study of the Monastir dialect of Judeo-Spanish based on oral material collected in Monastir, Yugo-Slavia. New York: Instituto de Las Españas en Los Estados Unidos.

Major, Roy. 1985. Stress and rhythm in Brazilian Portuguese. Language 61. 259-282.

Marín Gálvez, Rafael. 1994-1995. La duración vocálica del español. Estudios de Lingüística de la Universidad de Alicante, 10, 213-226.

Martín Butragueño, Pedro. 2009. Inmigración lingüística en la ciudad de México. Lengua y migración 1(1). 9-37.

Martín Butragueño, Pedro. 2011. Estratificación sociolingüística de la entonación circunfleja mexicana. Realismo en el análisis de corpus orales. Primer coloquio de cambio y variación lingüística. 93-121.

Matluck, Joseph. 1952. La pronunciación del español en el valle de México. Nueva Revista de Filología Hispánica 6, No.2, 109-120.

McCawley, James D. 1968. The phonological component of a grammar of Japanese. No. 2. Mouton.

Meneses, Francisco, and Eleonora Albano. 2015. From reduction to apocope: final poststressed vowel devoicing in Brazilian Portuguese. Phonetica, 72(2-3), 121- 137.

229

Mendes, Ronald Beline and James A. Walker. 2012. Going, going, gone? Devoicing of unstressed final vowels in Sao Paulo Portuguese. Paper presented at LSRL 43, CUNY .

Meyerhoff, Miriam. 2002. Communities of Practice. The handbook of language variation and change, ed. by J. K. Chamber, Peter Trudgill, and Natalie Schilling-Estes, Oxford: Blackwell. 526-48. Michnowicz, James Casimir. 2006. Linguistic and social variables in Yucatan Spanish. Unpublished Ph.D. dissertation, Pennsylvania State University.

Milroy, Lesley. 2002. Social networks. The handbook of language variation and change, ed. by J. K. Chamber, Peter Trudgill, and Natalie Schilling-Estes, 549-72. Oxford: Blackwell.

Milroy, Lesley and Matthew Gordon. 2003. Sociolinguistics: Methods and interpretation. Malden, MA: Blackwell.

Monroy Casas, Rafael. 1980. Aspectos fonéticos de las vocales españolas. Madrid, España. Sociedad General Española de Librería.

Moreno de Alba, José G. 1994. La pronunciación del español en México. México, D.F.: Colegio de México, Centro de Estudios Lingüísticos y Literarios.

Morrison, Geoffrey, and Paola Escudero. (2007). A cross-dialect comparison of Peninsula-and Peruvian-Spanish vowels. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken: University of Saarbrucken, 1505- 1508.

Navarro Tomás, Tomás. 1916. Cantidad de las vocales acentuadas. Revista de Filología Española, 4, 387-408.

Navarro Tomás, Tomás. 1917. Cantidad de las vocales inacentuadas. Revista de Filología Española, 4, 371-388.

Navarro Tomás, Tomás. 1948 [1999]. El español en Puerto Rico. San Juan: Editorial de la Universidad de Puerto Rico.

Navarro Tomás, Tomás. 1977. Manual de pronunciación española (19th edition). Madrid: Consejo Superior de Investigaciones Científicas, lnstituto Miguel de Cervantes.

230

Nobre, Maria Alzira, and Frances Ingemann. 1987. Oral vowel reduction in Brazilian Portuguese. In Honor of Ilse Lehiste. Dordrecht-Holland/Providence-USA: Foris Publications, 195-206.

Núñez Cedeño, Rafael Antonio, and Alfonso Morales-Front.1999. Fonología generativa contemporánea de la lengua española. Washington, DC: Georgetown University Press.

Nutini, Hugo G., and Barry L. 2010. Isaac. Social Stratification in Central Mexico, 1500- 2000. University of Texas Press.

Ohso, Mieko. 1973. A phonological study of some English loanwords in Japanese. Ohio State University Working Papers in Linguistics 14. 1-26.

Oliver Rajan, Julia. 2007. Mobility and its effects on vowel raising in the coffee zone of Puerto Rico. Selected Proceedings of the Third Workshop on Spanish Sociolinguistics, ed. by Jonathan Holmquist, Augusto Lorenzino & Lotfi Sayahi, 44-52. Somerville, MA: Cascadilla Proceedings Project.

Oliver Rajan, Julia. 2008. Vowel raising in Puerto Rican Spanish. Chicago, IL: University of Illinois at Chicago, doctoral dissertation.

Ortega-Llebaria, Marta. 2006. Phonetic cues to stress and accent in Spanish. Selected proceedings of the 2nd conference on laboratory approaches to Spanish phonetics and phonology. Cascadilla Proceedings Project Somerville, MA.

Perissinotto, Giorgio Sabino Antoni. 1975. Fonología del español hablado en la ciudad de México: Ensayo de un método sociolingüístico. México D.F.: El Colegio de México.

Pike, Kenneth L. 1945. The Intonation of American English. Ann Arbor, MI: University of Michigan Press.

Prince, Alan and Paul Smolensky. 2004. Optimality theory: Constraint interaction in generative grammar. Maiden, MA: Blackwell.

Prosper Sanchez, Goria D. 1996. Homophonetic neutralization of syllable-final liquids: Sociolinguistic aspects of Puerto Rican Spanish. Unpublished Ph.D. dissertation. University of Massachusetts, Boston.

Quilis, Antonio. 1981. Fonética acústica de la lengua española. Madrid: Gredos.

Quilis, Antonio, and Manuel Esgueva. 1980. Frecuencia de fonemas en el español hablado. LEA: Lingüística española actual, 2(1), 1-25.

231

Quilis, Antonio, and Manuel Esgueva. 1983. Realización de los fonemas vocálicos españoles en posición fonética normal. In Estudios de Fonética, vol. I, ed. by Manuel Esgueva & Margarita Cantero, 159-252. Madrid: Centro Superior de Investigaciones Científicas.

Rissel, Dorothy A. 1989. Sex, attitudes, and the assibilation of /r/ among young people in San Luis Potosí, Mexico. Language Variation and Change 1. 269–283.

Sanders, Benjamin P. 1998. The Eastern Andalusian vowel system: Form and structure. Rivista di Linguistica, 10, 109-135.

Sankoff, David and Suzanne Laberge. 1978. The linguistic market and the statistical explanation of variability. In David Sankoff (ed.), Linguistic variation: models and methods, 239–250. New York: Academic.

Sankoff, David, Sali Tagliamonte, and Eric Smith. 2005. GoldVarb X: a variable rule application for Macintosh and Windows. http://individual.utoronto.ca/tagliamonte/Goldvarb/GV_index.htm

Serrano, Julio. 2006. En torno a las vocales caedizas del español mexicano: una aproximación sociolingüística. Los líderes lingüísticos. Estudios de variación y cambio, 3, 7-59.

Sessarego, Sandro. 2012a. Vowel weakening in Afro-Yungueño: Linguistic and social considerations//El debilitamiento vocálico en el español afroyungueño: Consideraciones lingüísticas y sociales. PAPIA-Revista Brasileira de Estudos Crioulos e Similares 22.2. 279-294.

Sessarego, Sandro. 2012b. Unstressed vowel reduction in Cochabamba, Bolivia. Revista Internacional de Lingüística Iberoamericana, 213-227.

SIL Mexico. 2018. The vowels of Nahuatl. Retrieved from http://www.mexico.sil.org/language_culture/aztec/vowels-of-nahuatl

Silva, David James. 1997. The variable deletion of unstressed vowels in Faialense Portuguese. Language Variation and Change, 9(3), 295-308.

Silva, David James. 1998. Vowel lenition in São Miguel Portuguese. Hispania, 166-178.

Silva Corvalán, Carmen. 1979. An investigation of phonological and syntactic variation in spoken . Unpublished Ph.D. dissertation, University of California, Los Angeles.

232

Solomon, Julie. 1999. Phonological and syntactic variation in the Spanish of Valladolid, Yucatan. Unpublished Ph.D. dissertation, Stanford University.

Stasinopoulos, Mikis, Bob Rigby, Calliope Akantziliotou, and Vlasios Voudouris. 2015. Generalized Additive Models for Location Scale and Shape, R package version 4.2-7.

Tagliamonte, Sali A. 2011. Variationist Sociolinguistics: Change, Observation, Interpretation. West Sussex: Wiley-Blackwell.

Tagliamonte, Sali A. & R. H. Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2). 135-178.

Terborg, Roland and Virna Velázquez. 2017. Mexico City: Diversity and homogeneity. In Urban Sociolinguistics: The City as a Linguistic Process and Experience, ed. by Dick Smakman, and Patrick Heinrich. Routledge. 45-57.

Theophanopoulou-Kontou, Dimitra. 1973. Fast speech rules and some phonological processes of Modern Greek: a preliminary investigation. Yearbook of the School of Philosophy, University of Athens 1972-1973, 372-390.

Tsuchida, Ayako. 1997. The phonetics and phonology of Japanese vowel devoicing. Unpublished Ph.D. dissertation, Cornell University.

Van den Berghe, Pierre L, and George P. Primov, George P. 1977. Inequality in the Peruvian Andes: Class and ethnicity in Cusco. Colombia, Missouri: The University of Missouri Press.

Varden, John Kevin. 1999. On high vowel devoicing in Standard Modern Japanese: Implications for current phonological theory. Unpublished Ph.D. dissertation, University of Washington.

233

APPENDIX A. PARTICIPANT INFORMATION

Participants with more than 5% voice weakened tokens overall were considered to be frequent voice weakeners, and are marked here with an asterisk.

Pseudonym Age Gender Level of Occupation Neighborhood initials education completed ASC 61 F primary homemaker Tlapan completed ASF* 38 F secondary homemaker Nezahualcoyotl completed professional/progra ASM* 46 M M.A. m manager Roma Vasco de Quiroga/Villa completed professional/psycho Santa Cecilia AMF 39 F B.A. logist Coapa completed professional/educati ABF* 61 F M.A. on Atzacoalco completed EVG* 30 M B.A. student/sales Atzacoalco completed Guerrero/20 de ESF 68 F primary homemaker noviembre completed Fuentes de FBM 38 M M.A. actor Pedregal vocational homemaker/account FDM* 60 F school ant’s assistant Tetepilco completed GTF* 36 F preparatory laboratory assistant Gustavo Madero vocational HAF* 55 F school secretary Norte CDMX rubber completed manufacturing Venustiano HOM 72 M primary (retired) Carranza/Tlalpan continued

234

completed HMM 69 M primary taxi driver Colonia Morelos completed customer service JVL* 35 M secondary representative EDM, Norte completed industrial mechanic JGA 72 F primary (retired) Tlapan completed JUN* 25 M secondary waiter EDM completed KOM 22 M B.A. student EDM completed LMM* 26 M preparatory unemployed Atzacoalco completed customer service Tlatelolco; Del. LGM 29 M preparatory representative Cuauhtemoc completed LAL 32 M B.A. engineer Iztapalapa completed MDK 63 M M.A. engineer Atzacoalco completed Miguel MGF 41 F B.A. financial analyst Hidalgo/norte vocational Tacuba/San Jose MMF* 53 F school jewelry maker Insurgentes completed homemaker, actress, Jardines de MCF 51 F B.A. English teacher Pedregal completed E-commerce MRG* 31 M B.A. manager Ciudad Satélite completed MJM* 56 F primary homemaker Tlapan vocational Centro/Mixcoac/ MVN 81 F school secretary (retired) Narvarte completed PMC 26 F B.A. student Roma/Polanco completed Atzacoalco/Vasc PVM 30 M B.A. analytic specialist o de Quiroga completed PDF* 28 F secondary domestic worker Iztapalapa completed RPF* 32 F B.A. chemist Atzacoalco completed RTG* 57 F primary homemaker Atzacoalco continued

235

completed Vasco de SBF 21 F preparatory homemaker Quiroga completed Vasco de SVF 21 F B.A. homemaker Quiroga completed professional/project Delegación SMF 32 F M.A. manager Alvaro Obregon completed Tacuba/Naucalp TBF 46 F primary domestic worker an completed TTM 40 M B.A. taxi/Uber driver Naucalpan completed TZM* 40 M preparatory singer norte CDMX Tepito, completed artisan (glass EDM/Ecatepec URK 43 M primary worker) de Morelos completed VWM* 26 M B.A. graphic design Rio Blanco

236

APPENDIX B: SAMPLE QUESTIONS FOR INTERVIEW, SPANISH/ENGLISH

1. Dime algo sobre tu familia. ¿Cuántos hermanos tienes? ¿Dónde viven? ¿Y tus padres? ¿Cómo conociste a tu esposo/a? Tell me about your family. How many brothers and sisters do you have? Where do they live? How about your parents? How did you meet your significant other?

2. ¿Qué haces en tu tiempo libre? ¿Escuchas a música? ¿Cuál es tu canción o artista favorita? What do you do in your free time? Do you listen to music? What is your favorite song/artist?

3. ¿Qué tipo de trabajo tienes? ¿Trabajas en el centro del D.F. o en qué parte de la ciudad? O, ¿Qué quieres hacer cuando te gradúes? ¿Qué estudias? What do you do for a living? Do you work downtown, or in what part of the city? OR what do you want to do when you finish school? What do you study?

4. ¿Cuáles son algunas comidas típicas de aquí? ¿Cuál es tu favorita? ¿Tienes un plato especial que te gusta preparar? What are some typical foods from your hometown? What is your favorite? Do you have a special dish that you like to prepare?

5. ¿Me puedes decir algo de las celebraciones aquí? (Navidad, Año Nuevo, Reyes) Can you tell me about some of the celebrations here?

6. ¿Cómo fue tu niñez? ¿Qué hacías? ¿Con quienes andabas? ¿Cómo fue al ambiente escolar? How was your childhood? What did you do? Who were your friends? What was school like?

7. ¿Has ido a otras partes de México? ¿O al extranjero? Have you ever been to other parts of Mexico? Or abroad? Where did you go? What was it like?

237

APPENDIX C: SENTENCE STIMULI FOR READING TASK

1. Todos mis parientes, menos yo y mis hermanitos, son de Zacatecas. All my relatives, except me and my brothers, are from Zacatecas

2. Me encantan los animales, por lo cual tengo varias mascotas en casa: dos perros, dos gatos, un perico y cinco peces. I love animales, which is why I have several pets at home: two dogs, two cats, one bird, and five fish.

3. Según esos chicos, los tamales de su mamá son los mejores. According to those boys, their mom makes the best tamales.

4. El elote es una planta que se vende en los mercados y con la que se puede preparar tortillas en comales. Corn is a plant that is sold in the markets and with which one can make tortillas on griddles.

5. Al pintor le gusta pintar paisajes con muchos arbustos y árboles. The painter likes painting landscapes with a lot of trees and bushes.

6. Los artesanos están elaborando canastas para la basura. The artisans are making baskets for trash.

7. El obispo comisionó la escultura del espíritu santo. The bishop commissioned the sculpture of the holy spirit.

8. Las fotos tienen el fin de mostrar los efectos de los desastres naturales. The photographs have the aim of showing the effects of natural disasters.

9. Mis frutas preferidas son la pitaya, el coco, y el jitomate. My favorite fruits are pitaya, coconut, and tomato.

10. En estas horas el metro está muy atascado, mejor andamos en nuestras bicis. At this time, the metro is very full and cramped, it’s better if we go on our bikes.

11. Se debe consultar el corpus de prácticas correctas para el uso de pistolas antes de probarlas. One should consult the manual for correct usage of handguns before trying one.

238

APPENDIX D: AVERAGE DURATION MEASUREMENTS

Speaker Unstressed Vs Stressed Vs

Vowel a e i o u a e i o u ASC 66 60 47 65 58 79 66 74 74 43 ASF 70 57 62 67 70 76 74 67 81 72 ASM 54 54 42 54 40 72 74 52 56 67 AMF 68 62 70 63 54 104 84 57 82 57 ABF 84 74 54 74 81 80 73 54 75 71 AVG 66 44 43 60 43 100 54 76 56 53 ESF 73 68 95 63 47 101 89 63 70 88 FBM 55 59 45 53 29 80 72 60 61 38 FDM 68 52 53 50 0 95 71 45 32 53 GTF 82 76 46 74 44 90 94 67 82 73 HAF 72 62 63 68 39 87 65 72 85 84 HOM 81 80 65 80 75 95 118 90 88 76 HMM 86 72 98 96 64 98 123 98 103 80 JVL 73 62 46 80 59 105 67 86 88 63 JGA 76 73 82 72 64 84 80 68 88 92 JUN 75 69 70 67 0 87 93 89 71 69 KOM 53 46 43 63 49 80 53 43 55 60 LMM 83 81 63 82 50 81 80 60 79 57 LGM 68 59 54 63 56 83 73 87 62 70 LAL 72 56 61 57 39 102 74 47 61 81 MVN 86 62 60 80 47 110 67 105 96 49 MDK 65 58 79 70 26 73 71 63 69 49 MGF 61 66 45 71 52 80 87 69 83 49 continued

239

MMF 73 59 76 67 44 86 106 83 88 57 MCF 66 55 59 52 47 79 65 66 76 0 MRG 62 64 62 55 45 68 75 46 72 54 MJM 84 72 77 77 111 94 84 75 89 63 PMC 64 57 40 66 58 103 87 79 77 64 PVM 72 49 57 71 47 73 58 100 70 96 PDF 74 65 69 70 45 80 90 62 103 73 RTG 88 78 73 70 0 112 92 62 80 108 RPF 98 79 67 85 42 106 79 84 71 96 SBF 82 60 49 53 71 84 68 86 53 71 SVF 82 74 63 66 53 74 86 74 61 66 SMF 78 59 47 57 53 85 71 106 72 58 TBF 87 69 61 69 53 81 65 107 91 99 TZM 53 38 65 52 32 75 55 65 57 60 TTM 82 51 46 61 46 89 61 80 60 47 URK 84 64 66 78 49 100 81 74 69 78 VWM 66 46 33 53 58 64 59 63 57 57

240