Perceptual Learning of Systemic Cross-Category Vowel Variation

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree

Doctor of Philosophy

in the Graduate School of The State University

By

Kodi Weatherholtz

∼6 6

Graduate Program in Linguistics

The Ohio State University

2015

Dissertation Committee:

Professor Cynthia G. Clopper, Advisor

Professor Shari R. Speer

Professor Mark A. Pitt c Kodi Weatherholtz, 2015 Abstract

Phonological processes such as vowel chain shifting result in complex systems of cross- category vowel variation across spoken varieties of a language (Labov, 1994). The ex- periments comprising this dissertation aimed to understand how listeners cope with such systemic pronunciation variation to recognize spoken words. Recent research has shown that the ability of listeners to cope with pronunciation variation is due, in large part, to perceptual learning mechanisms that adjust speech perception and recognition processes based on distributional properties of the speech input (see Samuel & Kraljic, 2009). Per- ceptual learning is not a singular phenomenon: learning outcomes differ depending on the nature of the variability in the speech input and the nature of exposure conditions. To date, however, despite the ubiquity of vowel variation across dialects and accents, few stud- ies have investigated perceptual learning of systemic vowel variation (though see, e.g., Maye,

Aslin, & Tanenhaus, 2008; Sidaras, Alexander, & Nygaard, 2009). The experiments com- prising this dissertation aimed to address this gap in the literature by investigating (i) the level of representational specificity at which exposure-driven perceptual adjustments occur when adapting to systemic vowel variation (i.e., the locus of learning) ; (ii) the nature of the perceptual adjustments that occur at that level of representational specificity; and

(iii) the manner in which learning and generalization are constrained by properties of the environment.

The empirical approach of the current experiments centered on a between-subjects exposure-test perceptual learning paradigm. During the initial exposure phase, one group of listeners was familiarized to a talker with a novel vowel chain shift, such as a “back vowel lowered” chain shift involving a clockwise rotation of English back vowels (e.g., the vowel /u/ shifted to sound like [U] and hence goose as “g[U]s”; /U/ shifted to sound like [o] and hence

ii wooden as “w[o]den”; /o/ shifted to sound like [A] and hence nose as “n[A]se”). Listeners in control conditions were familiarized to a talker with a standard-sounding accent. Following exposure, listeners performed multiple word recognition tasks, which were designed to assess generalization of learning to new words, new talkers, and untrained vowel shifts. Perceptual learning and generalization were assessed by exposure-driven differences in word recognition at test.

Across six experiments, passive familiarization to a talker with a novel vowel chain shift markedly improved recognition of accent-consistent pronunciations that were otherwise per- ceived as nonwords (e.g., “w[o]den” for wooden, given the lowering of /U/ to sound like [o]).

This exposure-driven word recognition benefit consistently generalized to new words pro- duced by the trained talker (i.e., words presented only during the post-familiarization word recognition tasks), which indicates that listeners learned to cope with the unfamiliar vowel variants by remapping their perceptual vowel space (a sublexical locus of learning), as op- posed to relying on memory of specific word forms from the familiarization phase. Further, perceptual learning generalized to new talkers with the same chain shift (Experiments 1-

3), despite the fact that listeners were only familiarized to a single talker, indicating that listeners remapped their perceptual vowel space talker-independently. The finding of si- multaneous generalization across words and talkers was robust across exposure and test conditions (Experiments 1-3). That is, varying the exposure phase to involve between 20 minutes and only 2 minutes of accent exposure—and hence manipulating listeners’ experi- ence with the trained talker’s chain shift to involve several hundred words containing the target vowel shifts or only a few dozen words—had minimal influence on the strength of generalization. Further, listeners’ ability to generalize learning across talkers under single talker exposure conditions was not determined (and only minimally influenced) by acoustic similarity between the trained and new talkers in terms of their vowel productions (Exper- iments 1-3; cf. Reinisch & Holt, 2014).

iii Experiments 4-6 investigated generalization to untrained vowel variants. When listeners experienced only a subset of the vowel shifts that defined an unfamiliar vowel shift system, listeners were able to generalize learning to fill in incidental gaps in their experience (Exper- iment 4), indicating that listeners learned a pattern of co-variation among vowel categories, rather than adapting to an unfamiliar vowel chain shift by learning each constituent shift independently. In some cases, listeners were also able to leverage experience with a trained chain shift to facilitate recognition of words produced (by a different talker) with an un- trained but structurally-related chain shift. Listeners who were familiarized to a novel system of back vowel lowering were able to generalize learning to an untrained system of back vowel raising (a system of shifts among the same vowels, but in the opposite direction from training; Experiments 4 and 5) and to an untrained system of front vowel lowering (a structurally parallel system of shifts in a different region of the vowel space; Experiment

6). These findings indicate that learning involved a general broadening of perceptual vowel categories, resulting in a greater tolerance for mismatch. However, when listeners were familiarized to a system of back vowel raising, they appeared to learn a direction-specific system of variation (i.e., they did not generalize to back vowel lowered variants; Experiment

4).

Taken together, these findings demonstrate tremendous exposure-driven flexibility in vowel perception. Listeners adapt to unfamiliar vowel chain shifts by dynamically and sys- temically adjusting talker-independent perceptual vowel representations, and these adjust- ments can involve a general broadening of perceptual vowel categories or targeted perceptual shifts that reflect the direction of the vowel shifts in the speech input. An important direc- tion for future research is to investigate why exposure to unfamiliar pronunciation variation affecting the same vowels but in different directions (i.e., back vowel lowering vs. back vowel raising) results in different patterns of adjustment to perceptual vowel representations.

iv Acknowledgements

I am grateful to many people for helping make this dissertation a reality. Foremost, I would like to express my deepest gratitude to Cynthia G. Clopper, my advisor and mentor, for immeasurable guidance, encouragement, patience, and support throughout my graduate career. She continually pushed me to think more deeply about language and cognition.

She allowed me great freedom to explore my research interests, and she was unfailing in helping me identify meaningful and practical steps forward in my research when I got lost in the weeds. It is no exaggeration that Cynthia read and thoughtfully reviewed every word of every document I ever sent to her—and the clarity of my thinking and writing improved markedly because of her careful attention to the fine details of my work. I also extend my sincerest appreciation to Shari Speer, not least for her support and pragmatism.

She encouraged my research vision from the outset, and she was always quick to put pen to paper in order to flesh out experimental designs and map out predictions. And I am grateful to Mark Pitt for keeping an eye on the big picture and challenging me to do the same by asking deceptively simple questions at every turn, like Why does this matter? and

What do we learn if your predictions are correct?

I owe a great deal more than gratitude to Kathryn Campbell-Kibler. When I applied to graduate school in linguistics, I had essentially no formal education in the field—only a stack of linguistics books and articles that I’d read and some vague but impassioned ideas about the psycholinguistics-sociolinguistics interface. Kathryn saw promise in my ideas.

She took a chance on potential over credentials and, in doing so, changed my life. I am continually inspired by the breadth and depth of her thinking.

I am grateful to many others for their persistent investment in my intellectual and personal growth. Kiwako Ito and Becca Morely were especially generous with their time

v and energy, and I benefitted greatly from their hands-on approach. On many occasions, they permitted a one-hour meeting to last much longer (sometimes all afternoon) so we could really wrap our heads around a set of data together and discuss it deeply. T. Florian Jaeger sparked my early interest in statistics and data visualization—and he always answered my lengthy emails with equally-detailed replies full of insightful answers, poignant questions and snippets of R code that guided me forwarded.

I am gratefully indebted to Laurie Maynell, Elizabeth A. McCullough, Mike Phelan, and Sara Philips-Bourass—both for their friendship and for recording speech stimuli for my dissertation experiments. These experiments would not have been possible without their vocal talents, generosity, and above all their patient willingness to spend hours huddled with me in a sound booth as I directed them to record (and re-record... and re-record) stimuli with bizarre vowels. I also extend my sincere appreciation to the hard-working and tremendously vigilant research assistants who assisted me with data collection and coding:

Clay Gilkerson, Erin Luthern, and Adam Royer.

I would also like to thank the National Science Foundation for supporting this work through a Graduate Research Fellowship (#DGE-0822215); the OSU Graduate School for supporting this work through a Dean’s Graduate Enrichment Fellowship; and the countless developers and programmers around the world who have supported my work by contributing to the open-source software that forms the backbone of my analytic approach (R, Praat,

LaTex). Hadley Wickham, a prolific contributor to the R community, deserves special thanks for profoundly improving my R workflow.

I have benefitted immensely from the friendly culture of the OSU Department of Linguis- tics, as well as from the support of OSU faculty and staff beyond my committee and sphere of collaborators: Julie McGory, Marie-Catherine de Marneffe, Mary Beckman, Laura Wag- ner, Per Sederberg, Brian Joseph, Don Winford, Hope Dawson, Julia Papke, Brett Gregory,

Jim Harmon, Claudia Morettini, and Katherine Eckstrand.

vi My officemates and dear friends Abby Walker and Shontael (Shonty) Elward deserve special thanks for many things: research advice, awesome conference road trips, solidarity, the beer fridge, and the circle of niceness. I am thankful to Gregory Kierstead and Liz

McCullough for their lasting friendship and for the many cookouts and misfit Thanksgivings we shared. Many other people helped make my time at Ohio State a genuinely enjoyable time: E. Allyn Smith, Jeonghwa Shin, Manjuan Duan, Bridget Smith, Marten van Schijndel,

Rory Turnbull, Jefferson Barlew, Rachel Burdin, Zack Jones, Keeta Jones, Alec Buchner, and Chris Worth. Jon Dehdari wrote the LaTex template for this dissertation and, in turn, saved me many hours of fretting with formatting.

I have also been fortunate to enjoy the myriad blessings of a rich social network outside of the linguistics world, which made the hard days of graduate school more bearable and which made the good days even better. I am especially thankful for Nicole Gibbs, Todd

Gibbs, Jeff Kidd, Abby Kidd, Simon Woliver, Kris Woliver, and Dustin Kelch.

I am endlessly grateful to my loving and supportive family. To my mom, Lori Weather- holtz, and my dad, Gary Weatherholtz, for always being a voice of reason and compassion and for teaching me the value of hard work and tenacity. To my brother Colter and my sister-in-law Ali, for camaraderie and solidarity throughout life, college and beyond. And to Seth Wiener and Mike Furman, who are my brothers in all but blood and with whom I have shared, enjoyed, and endured many of the best and many of the most difficult things in life.

Finally and most importantly, I gratefully and sincerely thank my wife Danielle. Through- out my tenure as a graduate student, she has seen me at my best and at my worst, and she has loved me regardless. This dissertation is a testament to her thoughtful encouragement, quiet patience, unwavering love, and the occasional “tough love” reality check.

vii Vita

2009 ...... B.A., English University of Central Edmond, OK

Publications

Peer-reviewed Journal Articles

Weatherholtz, K., Campbell-Kibler, K., & Jaeger, T. F. (2014). Socially-mediated syntac- tic alignment. Language Variation and Change, 26, 387-420.

Conference Presentations

Weatherholtz, K. (2014). The influence of high variability exposure conditions on talker- specific versus talker-independent perceptual learning outcomes. Talk presented at The 9thth Annual Meeting of The Mental Lexicon, Niagara-on-the-Lake, ON, Canada, September 30th-October 2nd.

Weatherholtz, K. (2014). Constraints on cross-talker generalization of perceptual learning for speech. Talk presented at The Ohio State University Center for Cognitive and Brain Sciences Fall Retreat, Perrysville, OH, September 19-20th.

Weatherholtz, K. (2014). Lexically guided perceptual learning of systemic vowel varia- tion: Evidence for cross-talker generalization following single-talker exposure. Talk presented at The 20th Annual Meeting of Architectures and Mechanisms for Language Processing (AMLaP), Edinburgh, Scotland, September 3-6thth.

Weatherholtz, K., Walker, A., Melvin, S., Royer, A., & Clopper, C. G. (2014). Effects of experience and expectations on adaptation to dialect variation in noise. Poster presented at The 27th Annual CUNY Conference on Human Sentence Processing, Columbus, OH, March 14-15th.

viii Barlew, J., Carmichael, K., McGory, J., Morton, D., Phelan, M., & Weatherholtz, K. (2014). Bringing linguistic inquiry to high schoolers: A report on the SLIYS program. Poster presented at The 88th Annual Meeting of the Linguistic Society of America (LSA), Minneapolis, MN, January 2-5th.

Walker, A., Weatherholtz, K., & Clopper, C. G. (2013). The effect of listener expec- tations on cross-dialect sentence intelligibility. Poster presented at The 3rd Annual Midwestern Cognitive Science Conference, Columbus, OH, May 18th.

Weatherholtz, K. (2013). Phonological inference and adaptation to cross-category vowel mismatches. Poster presented at the Workshop on Current Issues and Methods in Speaker Adaptation (CIMSA), Columbus, OH, April 6-7th.

Weatherholtz, K. (2013). Is F1 different from F2?: Generalization of lexically-driven perceptual learning. Talk presented at The 87th Annual Meeting of the Linguistic Society of America (LSA), Boston, MA, January 2-6th.

Weatherholtz, K., Campbell-Kibler, K., & Jaeger, T. F. (2012). Similarity, avoidance, and accentedness: Social influences on syntactic alignment. Talk presented at New Ways of Analyzing Variation (NWAV 41), Bloomington, IN, October 25-28th.

Weatherholtz, K., Campbell-Kibler, K., & Jaeger, T. F. (2012). Syntactic alignment is mediated by social perception and conflict management. Talk presented at Archi- tectures and Mechanisms for Language Processing (AMLaP), Riva Del Garda, Italy, September 6-8th.

Weatherholtz, K. (2012). Assessing generality and specificity in adaptation to novel vowel productions. Poster presented at Architectures and Mechanisms for Language Pro- cessing (AMLaP), Riva Del Garda, Italy, September 6-8th.

Weatherholtz, K., Walker, A., & Campbell-Kibler, K. (2012). Accommodation and so- ciolinguistic meaning: Phonetic after-effects of being and interacting with a (dis- )engaged interviewer. Poster presented at the International Symposium on Imitation and Convergence in Speech (ISICS). Aix-en-Provence, France, September 3-5th.

Weatherholtz, K. (2012). Specificity and generality in speaker adaptation. Talk presented at Psycholinguistics in Flanders (PiF), Berg en Dal, The Netherlands, June 6-7th.

Weatherholtz, K. (2012). Effects of speaker-specific information on speech segmentation. Poster presented at The 86th Annual Meeting of the Linguistic Society of America (LSA), Portland, OR, January 5-8th.

ix Weatherholtz, K. (2011). Sociolinguistic expectations and segmentation of conversational speech. Poster presented at the Workshop on the Production and Comprehension of Spontaneous Speech, Nijmegen, The Netherlands, December 11-12th.

Weatherholtz, K. (2011). Socially-motivated garden pathing: When social expectations influence speech segmentation and comprehension. Talk presented at Variation and Language Processing (VaLP), Chester, UK, April 11-13th.

Fields of Study

Major Field: Linguistics

Studies in:

Speech processing Professor Cynthia G. Clopper

Professor Shari R. Speer

Professor Mark A. Pitt

Linguistic alignment Professor T. Florian Jaeger

Professor Kathryn Campbell-Kibler

Sociolinguistic cognition Professor Kathryn Campbell-Kibler

Professor Cynthia G. Clopper

x Table of Contents

Abstract ...... ii

Acknowledgements ...... v

Vita ...... viii

List of Figures ...... xiv

List of Tables ...... xx

1 Introduction ...... 1 1.1 Terminology...... 5 1.2 Lexically-guided perceptual learning...... 9 1.3 Linking hypotheses: Generalization and the mechanisms of perceptual learning 16 1.4 Overview of experiments and predictions...... 19

2 Experiments 1-3: Effects of Input Variability and Talker Similarity on Generalization across Words and Talkers ...... 23 2.1 Introduction...... 23 2.1.1 Perceptual learning of isolated cross-category mismatches...... 26 2.1.2 Perceptual learning of vowel chain shifts...... 28 2.1.3 Factors influencing generalization across words and talkers...... 31 2.1.4 Overall design and predictions...... 33 2.2 Experiment 1...... 36 2.2.1 Method...... 37 2.2.2 Results...... 50 2.2.3 Discussion...... 58 2.3 Experiment 2...... 60 2.3.1 Method...... 61 2.3.2 Results...... 63 2.3.3 Discussion...... 70 2.4 Experiment 3...... 71 2.4.1 Method...... 72 2.4.2 Results...... 74 2.4.3 Discussion...... 81 2.5 Omnibus analyses...... 84 2.5.1 Omnibus analysis of endorsement rates...... 84

xi 2.5.2 Omnibus analysis of word identification accuracy...... 86 2.6 General Discussion...... 89 2.6.1 Theories of adaptive speech processing...... 92 2.6.2 Individual variation in word recognition...... 93 2.6.3 Conclusions...... 95

3 Experiment 4: Generalization to Untrained Parts of a Chain Shift System ...... 96 3.1 Introduction...... 96 3.2 Method...... 101 3.2.1 Participants...... 101 3.2.2 Exposure materials...... 101 3.2.3 Test materials...... 104 3.2.4 Procedure...... 107 3.2.5 Coding...... 108 3.2.6 Analysis...... 111 3.3 Results...... 111 3.3.1 Lexical decision endorsement rates...... 112 3.3.2 Naming accuracy...... 116 3.3.3 Naming latency...... 120 3.4 Discussion...... 124

4 Experiments 5-6: Generalization Across Chain Shift Systems ...... 129 4.1 Introduction...... 129 4.2 Experiment 5...... 137 4.2.1 Method...... 137 4.2.2 Results...... 147 4.2.3 Discussion...... 154 4.3 Experiment 6...... 156 4.3.1 Method...... 157 4.3.2 Results...... 159 4.3.3 Discussion...... 166 4.4 General Discussion...... 169

5 Conclusions and theoretical implications ...... 174 5.0.1 Category shifts, category broadening and response biases...... 178 5.1 Implications for speech processing...... 179 5.2 Implications for cognitive processing...... 182 5.3 Implications for language change...... 184 5.4 Implications for sociolinguistics...... 186 5.5 Limitations and future directions...... 188

References ...... 191

xii A Stimulus materials for Experiments 1-3 ...... 207 A.1 Complete high token variability exposure passage for Experiment 1..... 207 A.2 Complete medium token variability exposure passage for Experiment 2... 218 A.3 Complete low token variability exposure passage for Experiment 3..... 221 A.4 Test materials...... 223

B Stimulus materials for Experiment 4 ...... 226 B.1 Exposure passage transcribed in the novel back vowel lowered accent.... 226 B.2 Exposure passage transcribed in the novel back vowel raised accent..... 230 B.3 Test materials...... 233

xiii List of Figures

1.1 Schematic representation of the novel front vowel shifts used in the study by

Weatherholtz (2013)...... 15

1.2 Schematic representation of the novel back vowel lowered chain shift.... 20

2.1 Schematic representation of the novel back vowel lowered chain shift.... 38

2.2 Experiment 1. Mean F1 and F2 (Hz) at vowel midpoint of all stressed tokens

of each vowel in the exposure passage when produced in the talker’s standard-

sounding Midland accent (phonetic symbols) and in the novel back vowel

lowered accent (arrows). Diphthongs and schwa are omitted for clarity.... 39

2.3 Experiment 1. Midpoint F1 and F2 for all stressed back vowel tokens in the

standard-sounding Midland accent version of the exposure passage and the

novel back vowel lowered (BVL) version of the passage (n = the number

of tokens, and l.c. = the number of unique lexical contexts in which those

tokens occurred)...... 40

2.4 Acoustic comparison of the trained and new talkers’ vowel spaces based on

the midpoint F1-F2 (Hz) of all stressed vowel tokens from the 100 filler

words containing unshifted front and back vowels. The top panel compares

the new female talker to the trained female talker (note the high degree of

acoustic similarity between these talkers’ vowel spaces). The bottom panel

compares the new male talker to the trained female talker (note the acoustic

dissimilarity between these talkers’ vowel spaces)...... 44

xiv 2.5 Production of the back vowel lowered shift by each of the test talkers. Pho-

netic symbols indicate the mean midpoint F1and F2 (Hz) of each talker’s

normal vowels, based on the test items containing unshifted vowels. Arrows

indicate the mean midpoint F1 and F2 of vowels in the back vowel lowered

test items...... 45

2.6 Experiment 1. Mean proportion of ‘word’ responses by item type, exposure

condition, and test talker...... 51

2.7 Experiment 1, mean proportion of ‘word’ responses for target back vowel

lowered items by item status, exposure condition, and test talker...... 53

2.8 Experiment 1, mean trial-wise proportion of ‘word’ responses for the target

back vowel lowered items during the lexical decision task as a function of

exposure condition and talker...... 55

2.9 Experiment 1, mean word identification accuracy as a function of exposure

condition, item type and test talker...... 56

2.10 Experiment 2. Midpoint F1 and F2 for all stressed back vowel tokens in the

medium token variability exposure passage (n = the number of tokens of

each vowel, and l.c. = the number of unique lexical contexts in which those

tokens occurred)...... 62

2.11 Experiment 2. Mean proportion of ‘word’ responses by item type, exposure

condition, and test talker...... 64

2.12 Experiment 2, mean proportion of ‘word’ responses for target back vowel

lowered items by item status, exposure condition, and test talker. Large

points indicate condition grand means. Error bars indicate bootstrapped

95% confidence intervals. Small points indicate subject-wise condition means

(jittered and transparent to show overlap)...... 66

xv 2.13 Experiment 2. Mean trial-wise proportion of ‘word’ responses for the target

back vowel lowered items during the lexical decision task as a function of

exposure condition and talker. Points indicate trial-wise means. Regression

lines indicate binomial best fit curves (curvature is not observable because

the proportion range for each effect is small). Error ribbons indicate boot-

strapped trial-wise 95% confidence intervals...... 67

2.14 Experiment 2. Mean word identification accuracy as a function of exposure

condition, item type and test talker. Error bars indicate bootstrapped 95%

confidence intervals on condition means...... 68

2.15 Experiment 3. Midpoint F1 and F2 for all stressed back vowel tokens in

the low token variability exposure passage (n = the number of tokens of each

vowel, and l.c. = the number of unique lexical contexts in which those tokens

occurred)...... 73

2.16 Experiment 3. Mean proportion of ‘word’ responses by item type, exposure

condition, and test talker...... 75

2.17 Experiment 3, mean proportion of ‘word’ responses for target back vowel low-

ered items by item status, exposure condition, and test talker. Large points

indicate condition grand means. Small points indicate subject-wise condi-

tion means (jittered and transparent to show overlap). Error bars indicate

bootstrapped 95% confidence intervals for condition grand means...... 76

2.18 Experiment 3. Mean trial-wise proportion of ‘word’ responses for the target

back vowel lowered items during the lexical decision task as a function of

exposure condition and talker. Points indicate trial-wise means. Regression

lines indicate binomial best fit curves (curvature is not observable because

the proportion range for each effect is small). Error ribbons indicate boot-

strapped trial-wise 95% confidence intervals...... 78

xvi 2.19 Experiment 3. Mean word identification accuracy as a function of exposure

condition, item type and test talker. Error bars indicate bootstrapped 95%

confidence intervals on condition means...... 79

3.1 Schematic representation of the Northern Cities Shift...... 98

3.2 Schematic representation of the back vowel lowered chain shift (left) and back

vowel raised chain shift (right)...... 102

3.3 Experiment 4. Vowel plots showing the talker’s vowel productions when

recording the exposure passage in each novel accent. Arrows indicate the

mean F1 and F2 (Hz) at vowel midpoint of all stressed tokens of each vowel

in the back vowel lowered (left) and back vowel raised (right) versions of

the exposure passage. Phonetic symbols indicate the corresponding vowel

measurements from a third version of the passage recorded in the talker’s

normal accent to provide a reference point for the target vowel shifts.... 105

3.4 Experiment 4, mean proportion of ‘word’ responses during lexical decision

by item type and exposure condition. Error bars indicate bootstrapped 95%

confidence intervals...... 112

3.5 Experiment 4, mean proportion of ‘word’ responses during lexical decision

for the target back vowel lowered and back vowel raised items, plotted by

exposure condition and target vowel. Error bars indicate bootstrapped 95%

confidence intervals...... 114

3.6 Experiment 4, mean proportion of correct responses during the go/no-go

naming task by item type and exposure condition. Error bars indicate boot-

strapped 95% confidence intervals...... 116

3.7 Experiment 4, mean naming accuracy for the target back vowel lowered and

back vowel raised items, plotted by exposure condition and target vowel.

Error bars indicate bootstrapped 95% confidence intervals...... 117

xvii 3.8 Experiment 4, mean naming response time for correct responses to the test

items containing unshifted (standard-sounding) and shifted back vowels, plot-

ted by exposure condition and target vowel. Error bars indicate bootstrapped

95% confidence intervals...... 121

4.1 Vowel plot from a study by Hillenbrand, Getty, Clark, & Wheeler(1995)

showing the mean F1 and F2 of American English vowels as produced by 140

talkers. Note that to improve the clarity of the display, Hillenbrand et al.

(1995) did not plot the vowels /e/ and /o/, and redundant data points were

omitted...... 131

4.2 Schematic representation of the co-dependent vowel shifts that characterize

the Northern Cities Shift (left) and the Northern California Shift (right) in

American English...... 132

4.3 Experiment 5, schematic representation of the trained talker’s back vowel

lowered accent (left) and the new talker’s back vowel raised accent (right).. 138

4.4 Production of the novel back vowel lowered chain shift by the trained talker

(left), and production of the novel back vowel raised chain shift by the new

talker (right)...... 142

4.5 Schematic representation of the exposure-test paradigm for Experiment 5.. 143

4.6 Experiment 5, mean proportion of ‘word’ responses during lexical decision

for the target vowel-shifted items...... 149

4.7 Experiment 5, mean trial-wise proportion of ‘word’ responses during lexical

decision for the target vowel-shifted items...... 151

4.8 Experiment 5, mean word identification accuracy as a function of exposure

condition, item type and test talker...... 152

4.9 Experiment 6. Schematic representation of the trained talker’s back vowel

lowered accent (left) and the new talker’s front vowel lowered accent (right). 158

xviii 4.10 Production of the novel back vowel lowered chain shift by the trained talker

(left; same as Experiment 5), and production of the novel front vowel lowered

chain shift by the new talker (right)...... 160

4.11 Experiment 6, mean proportion of ‘word’ responses during lexical decision

for the target vowel shifted items...... 163

4.12 Experiment 6, mean trial-wise proportion of ‘word’ responses during lexical

decision for the target vowel shifted items...... 165

4.13 Experiment 6, mean word identification accuracy as a function of exposure

condition, item type and test talker...... 166

xix List of Tables

2.1 Number of tokens of each target back vowel category in the exposure materi-

als for Experiments 1-3, and the number of unique lexical contexts in which

these tokens occurred...... 40

2.2 Mean midpoint F1 and F2 (and standard deviation in parentheses) of stressed

back vowel tokens in the novel back vowel lowered accent (BVL) and standard-

sounding Midland accent versions of the exposure passage used in each ex-

periment...... 41

2.3 Experiments 1-3, example test stimuli...... 42

2.4 Experiment 1, summary of the full mixed logit model of endorsement rates

on target lexical decision trials...... 52

2.5 Experiment 1, summary of the full by-trial mixed logit model of endorsement

rates on target lexical decision trials...... 55

2.6 Experiment 1, summary of the full mixed logit model of word identification

accuracy...... 57

2.7 Experiment 2, summary of the full mixed logit model of endorsement rates

on target lexical decision trials...... 65

2.8 Experiment 2, summary of the full by-trial mixed logit model of endorsement

rates on target lexical decision trials...... 67

2.9 Experiment 2, summary of the full mixed logit model of word identification

accuracy...... 69

2.10 Experiment 3, summary of the full mixed logit model of endorsement rates

on target lexical decision trials...... 76

xx 2.11 Experiment 3, summary of the full by-trial mixed logit model of endorsement

rates on target lexical decision trials...... 77

2.12 Experiment 3, summary of the full mixed logit model of word identification

accuracy...... 80

2.13 Experiments 1-3, summary of the omnibus analysis of endorsement rates on

target lexical decision trials...... 85

2.14 Experiments 1-3, summary of the omnibus analysis of word identification

accuracy...... 87

3.1 Experiment 4, example test stimuli...... 106

3.2 Experiment 4, summary of the full mixed logit model of endorsement rates

on target lexical decision trials...... 113

3.3 Experiment 4, summary of the full mixed logit model of identification accu-

racy on target naming trials...... 118

3.4 Experiment 4, summary of the full mixed linear model of RTs on correct

naming trials...... 122

4.1 Example test stimuli for Experiment 5...... 140

4.2 Experiment 5, mean proportion of ‘word’ responses during the lexical decision

task...... 147

4.3 Experiment 5, summary of the full mixed logit model of lexical decisions to

the target vowel-shifted items...... 148

4.4 Experiment 5, summary of the full by-trial mixed logit model of endorsement

rates on target lexical decision trials...... 151

4.5 Experiment 5, summary of the full mixed logit model of word identification

accuracy...... 153

4.6 Example test stimuli for Experiment 6...... 159

xxi 4.7 Experiment 6, mean proportion of ‘word’ responses during the lexical decision

task...... 161

4.8 Experiment 6, summary of the full mixed logit model of lexical decisions to

the target vowel-shifted items...... 162

4.9 Experiment 6, summary of the full by-trial mixed logit model of endorsement

rates on target lexical decision trials...... 164

4.10 Experiment 6, summary of the full mixed logit model of word identification

accuracy...... 167

A.1 Experiments 1-3. Complete set of target back vowel lowered items..... 223

A.2 Experiments 1-3. Complete set of filler words pronounced with unshifted

(standard-sounding) vowels...... 224

A.3 Experiments 1-3. Complete set of maximal nonwords...... 225

B.1 Experiment 4. Complete set of test words for the back vowel lowered and

back vowel raised item types...... 233

B.2 Experiment 4. Complete set of filler words pronounced with unshifted (standard-

sounding) vowels...... 234

B.3 Experiment 4. Complete set of maximal nonwords...... 234

xxii Chapter 1

Introduction

A central topic in research on spoken word recognition concerns the delineation of mental processes that enable listeners to achieve both stability and flexibility (Dahan & Magnuson,

2006). To recognize spoken words, listeners must map an acoustic signal produced by a given talker onto the corresponding memory representation in their own mental lexicon.

This process is complicated by the fact that the acoustic realization of speech sounds varies considerably within and across talkers (Klatt, 1986). Undoubtedly source variation, such as a talker’s idiolect or accent, can be detrimental to speech processing (Mattys, Davis,

Bradlow, & Scott, 2012). However, listeners can correctly recognize spoken words charac- terized by nonstandard or otherwise atypical variation, sometimes with minimal (if any) additional processing cost (Connine, Titone, Deelman, & Blasko, 1997; Connine, Blasko,

& Wang, 1994; Connine, Blasko, & Titone, 1993; Davis, Marslen-Wilson, & Gaskell, 2002;

Frauenfelder, Scholten, & Content, 2001; LoCasto & Connine, 2011; Ranbom, Connine, &

Yudman, 2009; Ranbom & Connine, 2007; Sumner & Samuel, 2009, 2005; Utman, Blum- stein, & Burton, 2000). Collectively, this work indicates that the mental processes involved in spoken word recognition are remarkably flexible with respect to pronunciation variation.

Three general properties of the speech perception system enable listeners to cope with pronunciation variation. One property is that the speech perception system rapidly adapts to new talkers and atypical pronunciation variants by adjusting the perceptual representa- tion of speech categories based on patterns of variation in the input (for a review of recent literature, see Samuel & Kraljic, 2009). As evidence of such adaptation, recognition of spoken words tends to be slower and less accurate when words are produced by unfamiliar

1 talkers than by familiar talkers (Nygaard & Pisoni, 1998; Nygaard, Sommers, & Pisoni,

1994), especially when listening to unfamiliar talkers who speak with an unfamiliar accent

(Adank, Evans, Stuart-Smith, & Scott, 2009; Maye et al., 2008; Floccia, Goslin, Girard, &

Konopczynski, 2006). However, the processing costs associated with an unfamiliar talker’s accent diminish rapidly with experience (Clarke & Garrett, 2004). The second property of the speech perception system is the ability to generalize learning about particular talkers and pronunciation variants to untrained sources of variation, such as new talkers producing the same or structurally similar patterns of variation (Reinisch & Holt, 2014; Baese-Berk,

Bradlow, & Wright, 2013; Bradlow & Bent, 2008; Kraljic & Samuel, 2007, 2006; Lively,

Logan, & Pisoni, 1993). The third general property involves detailed memory for speech episodes: instance-specific information is stored in memory for long periods of time, which facilitates recognition when words are repeated by the same talker (Bradlow, Nygaard,

& Pisoni, 1999; Goldinger, 1996; Palmeri, Goldinger, & Pisoni, 1993; Church & Schacter,

1994).

While these general properties of the speech perception system promote recognition

flexibility, coping with pronunciation variation is not a singular process (Loebach, Bent, &

Pisoni, 2008). The perceptual adjustments that drive adaptation can occur at various lev- els of representational specificity: e.g., talker-specific or talker-independent representations of specific words, segments, or features (e.g., see Trude & Brown-Schmidt, 2012; Kraljic

& Samuel, 2007, 2006; Davis, Johnsrude, Hervais-Ademan, Taylor, & McGettigan, 2005;

Eisner & McQueen, 2005; Greenspan, Nusbaum, & Pisoni, 1988). The nature of these ad- justments is varied, as well: e.g., broadening or narrowing the perceptual representation of speech categories to account for a greater or narrower range of category-relevant variability in the input (e.g., Bertelson, Vroomen, & de Gelder, 2003; Norris, McQueen, & Cutler,

2003; Vroomen, van Linden, Keetels, de Gelder, & Bertelson, 2004), developing new ab- stract category representations (Ranbom & Connine, 2007; Logan, Lively, & Pisoni, 1991),

2 selectively attending to specific properties of the input that are relevant for distinguish- ing perceptual categories (Francis & Nusbaum, 2002), or relying on episodic memory to facilitate recognition of repeated forms (Goldinger, 1996; Palmeri et al., 1993; Church &

Schacter, 1994). Further, the likelihood that learning generalizes to untrained sources of variation depends, in part, on environmental factors, such as the range of variability that listeners initially experienced (Baese-Berk et al., 2013; Bradlow & Bent, 2008; Lively et al.,

1993; Greenspan et al., 1988, see also Posner & Keele, 1968) and the degree of similarity between the trained and new speech episodes (Reinisch & Holt, 2014; Kraljic & Samuel,

2007; Goldinger, 1996).

Over the last decade, researchers in perceptual learning for speech have made tremen- dous advances in understanding the dynamics of adaptation: that is, the nature and speci-

ficity of the perceptual adjustments that occur in response to different patterns of variation in the input, and the environmental factors that facilitate or constrain generalization (for relevant overviews, see Mattys et al., 2012; Samuel & Kraljic, 2009). In this line of work, studies on perceptual learning of native pronunciation variation have focused largely on within-category variation due to a talker’s idiolect (Eisner, Melinger, & Weber, 2013; Mit- terer, Scharenborg, & McQueen, 2013; Mitterer, Chen, & Zhou, 2011; Kraljic, Brennan,

& Samuel, 2008; Kraljic & Samuel, 2007, 2006; Eisner & McQueen, 2006, 2005; Bertelson et al., 2003; Norris et al., 2003). Comparatively little is known about how listeners cope with cross-category mismatches (cf. Trude & Brown-Schmidt, 2012; Sumner, 2011; Dahan,

Drucker, & Scarborough, 2008). Yet, category mismatches abound across accents and di- alects due to myriad phonological processes that result in one speech sound being realized with acoustic-phonetic properties typically associated with a different speech sound: vowel splits, such as the selective raising of the vowel /æ/ before /g/ (e.g., bag sounds more like “b[e@]g”; Zeller, 1997); vowel mergers, such as the realization of the vowel /O/ as [A] by talkers with the caught-cot merger (Labov, Ash, & Boberg, 2006); th-fronting and th- stopping, such as the realization of think as “[f]ink” or “[t]ink”, respectively (Kerswill, 2003;

3 Wells, 1982); and vowel chain shifts, which are complex systems of cross-category variation that involve multiple vowels shifting positions in acoustic-phonetic space in a codependent manner (Labov, 1994).

Compared to within-category (subphonemic) variation, cross-category mismatches de- viate to a greater degree from category prototypes, which causes greater disruption to word recognition processes (e.g., Connine et al., 1997). Further, cross-category mismatches are a source of considerable phonological and lexical confusion in cross-dialect speech perception

(Jacewicz & Fox, 2012; Clopper, Pierrehumbert, & Tamati, 2010). Consider the conse- quences of two vowel chain shifts that occur in different varieties of American English. The

Northern Cities Shift is a system of cross-category vowel shifts that involves the vowel /A/

(as in sod) being fronted to sound more like the vowel [æ], and hence the word sod in this variety sounds more like the word sad produced by a talker with a standard sounding

American English accent. The Northern California Shift involves the vowel /E/ (as in said)

being lowered to sound more like [æ]. Thus, listeners must be able to map a word form

like “s[æ]d” to multiple different lexical representations in memory depending on the accent

of the talker. Without prior knowledge of the talker’s accent, such cross-category varia-

tion impairs word recognition (Clopper et al., 2010), though listeners can rapidly adapt to

a talker’s cross-category vowel shifts, which in turn facilitates word recognition (Trude &

Brown-Schmidt, 2012; White & Aslin, 2011; Maye et al., 2008).

The overarching goal of this study was to provide a more nuanced understanding of the

learning mechanism that enables listeners to cope with cross-category pronunciation varia-

tion. These experiments focused on systemic cross-category vowel variation resulting from

vowel chain shifts—specifically on instances when cross-category vowel shifts cause spoken

words to be perceived as nonwords by listeners who are not familiar with the underlying

pattern of variation. For example, the word dot produced by a talker with the Northern

Cities Shift sounds like the nonword “dat” in standard American English, given the realiza-

tion of /A/ as [æ]. The six experiments comprising this study investigated three questions

4 concerning the learning mechanism that drives adaptation to systemic cross-category vowel variation: (i) where in the speech perception system (i.e., at what level of representational specificity) does learning occur; (ii) what is the nature of the exposure-driven adjustments that occur at that level of representational specificity; and (iii) how does learning differ depending on properties of the learning environment? The current experiments used gener- alization of perceptual learning under a range of exposure and test conditions as a method to address three questions. Specifically, these experiments investigated generalization along three dimensions: generalization to new words, new talkers, and untrained vowel shifts.

Further, these experiments investigated three factors that potentially mediate generaliza- tion: the degree of category-relevant variability experienced prior to test; the degree of acoustic similarity between the trained and new talkers in terms of their vowel productions; and the structural similarity between the trained and new vowel shifts.

Before outlining the current set of experiments, I define a series of terms that are impor- tant for this work. Next, I review research on lexically-guided perceptual learning, which provides the empirical foundation for the current experiments. I then delineate the link- ing hypotheses that relate observable patterns of generalization behavior (i.e., outcomes of adaptation) to the underlying mechanisms that drive adaptation. Finally, this chapter concludes with an overview of the design and predictions of the experiments that comprise this thesis.

1.1 Terminology

The terms perceptual learning and adaptation have been variously defined in research on

speech processing (and more generally in the field of cognitive psychology). In some cases,

perceptual learning and adaptation are defined as related but distinct phenomena, with

the critical distinction based on the persistence of exposure-driven perceptual adjustments

5 over time (see Goldstone, 1998). According to this view, perceptual learning refers to long- lasting changes in how the perceptual system processes incoming stimulus materials, while adaptation refers to relatively short-term adjustments (Helson, 1948). In other cases, the critical distinction is based on the flow of information that drives perceptual adjustments, with perceptual learning guided by top-down knowledge, while adaptation refers to per- ceptual adjustments driven strictly by bottom-up information (see Eisner, 2012). In yet other cases, adaptation and perceptual learning are treated as different perspectives on the same cognitive phenomena: adaptation is considered the behavioral outcome of a learning mechanism that tracks and responds to properties of the environment (see Fine & Jaeger,

2013; Bradlow & Bent, 2008; Maye et al., 2008). For the purposes of the current discus- sion, I follow the latter distinction. I use the term adaptation as a general cover term for behavioral changes that result from experience with pronunciation variation (e.g., observ- able changes in word recognition performance). I use the term perceptual learning to refer to the cognitive mechanisms that cause these behavioral changes: that is, the underlying adjustments to perceptual recognition processes. As indicated above, there are multiple mechanisms that can drive adaptation, and these mechanisms can operate at various levels of representational specificity. I use phrases like the nature of perceptual learning or the

nature of perceptual adjustments when discussing the mechanism(s) driving adaptation in

a particular situation. I use the phrase locus of learning to refer to the level of represen-

tational specificity at which a learning mechanism operates in a particular situation (see

McQueen, Cutler, & Norris, 2006).

I maintain a distinction between the terms sublexical and prelexical. I use the term

sublexical as a general term for any unit of spoken language that is structurally smaller than

a word (e.g., segments, features). Thus, a sublexical locus of learning involves exposure-

driven adjustments that affect the perception or representation of sublexical information.

I reserve the term prelexical as a qualifier to describe processes and representations that

influence access to lexical candidates in memory during online word recognition. Whereas

6 sublexical is a theory-neutral term that simply distinguishes perceptual units of different size, prelexical entails assumptions about the manner and timing with which linguistic representations of different sized units interact during online spoken word recognition (see

Norris, McQueen, & Cutler, 2000).

The speech stimuli used for the current experiments involve novel complex systems of cross-category vowel variation. I characterize these systems as vowel chain shifts be-

cause multiple vowels are shifted co-dependently in acoustic-phonetic space, and such co-

dependent vowel movement is the defining characteristic of chain shifts (Martinet, 1955). A

number of structural principles have been argued to guide the chain shifting of vowels, based

on large-scale studies of diachronic and synchronic vowel shifts (Labov, 1994). It should be

noted from the outset that the novel vowel chain shifts used here do not fully obey these

structural principles. However, as Labov explains, “[t]he principles of chain shifting are not

absolute. Even for the strongest ones, ... exceptional changes were found operating in the

opposite direction” (Labov, 1994, p. 155). Thus, it is not inherently problematic that some

of the vowel shifts comprising the novel chain shifts used here are exceptional with respect

to general principles of vowel chain shifting. I return to this issue in Chapter 5, where I

discuss the implications of the current results for theories of language variation and change.

Naturally occurring chain shifts are often named after the geographic region in which

these shifts occur: e.g., the Northern Cities Shift, the Southern Shift, the Northern Califor-

nia Shift (Labov et al., 2006). Since the chain shifts used here are novel systems designed

to probe specific aspects of speech processing, an alternative strategy is needed to refer to

these systems. I use unidimensional descriptors like lowering and raising to provide a broad characterization of the overall pattern of vowel movement due to a novel chain shift in a par- ticular region of the vowel space (e.g., “back vowel lowering”, “back vowel raising”). This approach, as with most naming conventions, has several benefits but also several drawbacks.

One benefit of this approach is that the chain shifts are named succinctly and in a manner that is consistent with related psycholinguistic research on adaptation to vowel chain shifts

7 (Maye et al., 2008). A second benefit is that this naming convention evokes the desired sense of structural similarity across different chain shifts, which is helpful when discussing potential generalization of learning to untrained but structurally-related vowel shifts. For example, shifting the tense high back vowel /u/ to sound more like the lax high back vowel

[U], which is part of the novel “back vowel lowered” chain shift used in Chapters 2, 3 and 4, is structurally parallel to shifting the tense high front vowel /i/ to sound more like the lax high front vowel [I], which is part of the novel “front vowel lowered” chain shift (Chapter 4), despite the fact that these sets of vowels differ in backness and lip roundedness. Further, the “lowering” of /u/ to [U] is structurally opposite from “raising” /U/ to [u] in the novel

“back vowel raised” chain shift (Chapters 3 and 4).

The main drawback of this naming approach is that the names of the novel vowel chain shifts are highly reductive. Vowels can be described, to an approximation, by the frequency of the first and second formants (F1 and F2). F1 is inversely correlated with tongue height, and F2 is correlated primarily and inversely with tongue retraction (Ladefoged, 2000): that is, F1 decreases as the height of the tongue increases during vowel production, and

F2 decreases as the tongue is shifted back in the mouth. Referring to the novel chain shifts with names like “back vowel lowering” and “back vowel raising” suggests that the relevant vowel shifts are defined along a single articulatory or acoustic dimension (vowel height or F1), rather than involving complex patterns of movement through a continuous multidimensional acoustic parametric space. Indeed, as will be shown in the experimental chapters below, the cross-category vowel shifts that comprise the novel chain shifts for each experiment involve simultaneous variation in F1 and F2, as well as variation in vowel duration and trajectory. Labov points out that “[a]s an empirical fact, we find that the great majority of raisings are not simple alterations of F1, but rather combine changes in both F1 and F2 along this dimension” (Labov, 1994, p. 160). Thus, while the current naming conventions are reductive, this terminological issue is a general one that pervades descriptions of vowel systems across branches of the speech sciences (see, e.g., Benson, Fox,

8 & Balkman, 2011; Bauer & Parker, 2008; Purnell, 2008; Ahn, 2001; Ohala, Beddor, Krakow,

& Goldstein, 1986).

1.2 Lexically-guided perceptual learning

The empirical foundation for the current study comes from a recent body of research on lexically-guided perceptual learning of segmental variation. To date, this line of research has focused largely on perceptual learning of within-category idiolectal variation, though a few studies have expanded the empirical framework to investigate perceptual learning of cross-category variation (Maye et al., 2008; Weatherholtz, 2013; White & Aslin, 2011).

In a now-classic paper, Norris, McQueen and Cutler (2003) found that listeners use lexical knowledge to dynamically recalibrate phonetic category boundaries in response to atypical segmental variation in the speech input. During the initial phase of their experi- ment, which was conducted in Dutch, listeners repeatedly heard a talker produce a fricative sound that was acoustically ambiguous between [s] and [f]. One group of listeners heard this sound in the context of /s/-final Dutch words with no /f/-final counterpart (e.g., in

English, “platypu[sf?]” for platypus), and thus were biased to interpret the acoustically ambiguous fricative as /s/. Another group of listeners heard the same sound in the context of /f/-final words with no /s/-final counterpart (e.g., in English, “gira[sf?]” for giraffe), and thus were biased to interpret the acoustically ambiguous fricative as /f/. These words were presented in the context of an auditory lexical decision task. The endorsement rate for these words was at ceiling for both groups, indicating that listeners were able to recognize the words despite the atypical fricative variant. Following this initial exposure phase, listeners performed a phonetic categorization task. Listeners in the /s/-biased group categorized a greater proportion of sounds on an [s]-to-[f] continuum as /s/, indicating that they broad- ened their perceptual representation of /s/ to include the otherwise ambiguous fricative token. Listeners in the /f/-biased group showed the reverse pattern, categorizing a greater

9 proportion of the continuum as /f/. This “phonetic recalibration” effect did not occur when listeners initially heard the acoustically ambiguous fricative in the context of Dutch nonwords. Thus, listeners used lexical knowledge to disambiguate the fricative variant and guide category recalibration.

Over the last decade, many follow up studies have been conducted, providing an in- creasingly nuanced understanding of the learning mechanism that drives adaptation to within-category segmental variation and the environmental factors that constrain learning and generalization. In addition to lexically-guided category recalibration, learning can be guided by visual information about the physical production of speech sounds (Bertelson et al., 2003) and statistical knowledge about contingencies among acoustic-phonetic cues

(Idemaru & Holt, 2011). However, recalibration does not occur if listeners have reason to believe the atypical variant was caused by extraneous factors, like the talker having a pen in her mouth, as opposed to being a property of the talker’s idiolect (Kraljic, Samuel, &

Brennan, 2009). Several studies have shown that hearing an acoustically ambiguous frica- tive variant in the context of specific words facilitates later recognition of untrained words containing the same variant (Sjerps & McQueen, 2010; McQueen et al., 2006). This find- ing of generalization to new words containing the trained variant indicates that listeners abstracted over the trained words to learn a sublexical pattern of variation (note that the phonetic categorization results reported by Norris et al., 2003, in the original formulation of the phonetic recalibration paradigm also indicate a sublexical locus of learning). However, in these studies, when adapting to atypical fricative variation under single talker exposure conditions, listeners did not fully abstract away from the acoustic details of the talker’s productions. If they did, learning should generalize to new talkers who produce the same pattern of fricative variation, but such generalization only occurs if the new talker’s fricative productions are acoustically very similar to those of the trained talker (Reinisch & Holt,

2014; Kraljic & Samuel, 2007, 2005; Eisner & McQueen, 2005, though see Kraljic & Samuel,

10 2006 for talker-independent learning of voice onset time variation for stop consonants). Fur- ther, there is evidence that listeners can leverage learning about specific consonant contrasts

(e.g., a talker’s atypical voicing contrast for /t/-/d/) to interpret untrained but structurally similar patterns of consonant variation (e.g., the voicing contrast for /p/-/b/).

Maye et al. (2008) modified the paradigm developed by Norris et al. (2003) for studying lexically-guided phonetic recalibration in order to investigate the learning mechanism that enables listeners to cope with cross-category variation, as in the case of vowel chain shifts.

Maye et al.’s (2008) study is discussed in detail because it directly motivates the current experiments. Maye et al. (2008) created a novel “front vowel lowered” chain shift by rotating

English front vowels in a counterclockwise pattern: /i/ was shifted to sound like [I] (e.g.,

beetle as “b[I]tle”); /I/ was shifted to sound like [E] (e.g., witch as “w[E]tch”); /E/ was shifted

to sound like [æ] e.g., (yellow as “y[æ]llo”); and /æ/ was shifted to sound like [A] (e.g., answer

as “[A]nswer”). Adaptation to the front vowel lowered accent was assessed using a multi-

session exposure-test paradigm. Each session comprised a 20-minute accent familiarization

phase, during which participants passively listened to an excerpt from The Wizard of Oz, followed by an auditory lexical decision task, which was designed to assess perception of words pronounced in the front vowel lowered accent. The target items for the lexical decision task were designed to sound like nonwords in standard American English as a result of the target vowel shifts (e.g., “w[E]tch” for witch). It should be noted that all exposure and test materials were produced by text-to-speech synthesis. The two sessions of the experiment were identical, except for the the exposure phase. For the first session, the exposure passage was produced by a synthesized voice with an Inland accent (the regional accent in upstate New York where the experiment was conducted). Thus, the lexical decision data from the first session established the baseline rate at which listeners perceived the unfamiliar front vowel lowered items as real words of English, despite having no experience with the front vowel lowered accent. For the second session, which occurred one to three days after the first session, the exposure passage was produced by the same

11 synthesized voice but in the novel front vowel lowered accent. Adaptation was measured as the change across sessions in lexical decisions to the target front vowel lowered items (i.e., the change in the proportion of ‘word’ responses).

Maye et al. predicted that the target front vowel lowered items (e.g., “w[E]tch”) would be difficult to recognize without exposure to the front vowel lowered accent, resulting in a high ‘nonword’ response rate for these items during the lexical decision task on the first day of testing. Further, Maye et al. predicted that by hearing nonword surface forms like

“w[E]tch” for witch in the context of a familiar story, listeners could use top-down knowledge to identify the intended lexical items and in turn to guide learning of the unfamiliar vowel variants (cf. Norris et al., 2003). The lexical decision data showed several interesting results.

As expected, front vowel lowered items that were perceived as nonwords on the first day of testing came to be perceived as real words after exposure to the front vowel lowered accent. Further, the change in endorsement rates across sessions was comparable for front vowel lowered items that occurred during the exposure passage (trained items) and front vowel lowered items that occurred only during the test phases (new items), which suggests that listeners remapped their perceptual vowel space in response to the unfamiliar accent (a sublexical locus of learning), as opposed to learning specific word forms from the exposure passage (see McQueen et al., 2006).

Maye et al.’s (2008) results also provide insight into the nature of the exposure-driven sublexical adjustments. In their Experiment 1, exposure to the novel front vowel lowered accent influenced perception of words pronounced with an untrained but structurally parallel

“back vowel lowered” chain shift (e.g., choose as “ch[U]se”, due to the lowering of /u/ to

[U]; look pronounced as “l[o]k”, due to the lowering of /U/ to [o]) (see Bardhan, Aslin, &

Tanenhaus, 2006, for an unpublished replication of this effect). Maye et al. discussed the possibility that remapping the vowel space to accommodate the front vowel lowered variants resulted in some degree of perceptual generalization across the vowel space. However, they

12 ultimately concluded that perceptual learning involved targeted vowel-specific and direction- specific perceptual adjustments after demonstrating in a second experiment that exposure to the front vowel lowered accent had no influence on perception of the back vowel lowered items, nor on perception of words pronounced with an untrained “front vowel raised” chain shift (e.g., witch as “w[i]tch”, given the raising of /I/ to [i], as opposed to the trained lowering of /I/ to [E]). A final noteworthy result from Maye et al.’s (2008) study is that filler words pronounced with unshifted (standard-sounding) front vowels (e.g., shelf as “sh[E]lf”) were

still perceived as words after exposure to the novel front vowel lowered accent, rather than

as the nonwords that result from applying knowledge of the front vowel lowered variants

(e.g., [E] maps to /I/, so “sh[E]lf” is the nonword “shilf”), which suggests that listeners were

able to maintain multiple mappings for how vowel forms relate to phonological categories

(see also Trude & Brown-Schmidt, 2012).

While Maye et al.’s (2008) results are highly interesting, there are several important

caveats when interpreting their results. First, the experiment involved a within-subjects

design, and all test stimuli were repeated across sessions. Thus, the finding of increased en-

dorsement rates for the front vowel lowered items after exposure to the front vowel lowered

accent (which always occurred on the second day of testing) could be due, at least in part, to

stimulus repetition across sessions (see Zeelenberg, Wagenmakers, & Schiffrin, 2004, for dis-

cussion of nonword repetition resulting in word-like priming effects during lexical decision).

A relevant point is that episodic traces of nonword surface forms are retained in memory

for at least a week (see Goldinger, 1998, 1996), and Maye et al.’s experiment sessions were

only separated by one to three days. Maye et al. discuss the issue of stimulus repetition

and argue that a repetition account would predict increased endorsement of the filler items

pronounced with the untrained vowel shifts (e.g., the back vowel lowered and front vowel

raised items), which was not consistently observed. While it is unlikely that their results

reduce to repetition effects, it is not possible to distinguish the (independent) influences of

13 perceptual learning and nonword episodic encoding on listeners’ task performance. A sec- ond caveat is that all materials were produced by low-grade concatenative text-to-speech synthesis using the built-in synthesizer on Macintosh computers (with modifications to the pronunciation dictionary, as necessary, to create the novel vowel-shifted items). Processing spectrally degraded speech, such as synthesized speech, is difficult for human listeners (Mat- tys et al., 2012). Relative to human speech, processing synthesized speech involves increased short-term memory demands (Smither, 1993) and different patterns of electrophysiological activity (Lattner et al., 2003); further, listeners show a dramatically impaired ability to use acoustic-phonetic cues to predict linguistic structure during online processing (White et al., 2009), even for high quality synthesized speech that is impressionistically nearly in- distinguishable from human speech. These findings about processing of synthesized speech raise the question of whether adaptation to synthesized speech reflects the processes and outcomes involved in adapting to human speech.

In a currently unpublished follow up to Maye et al.’s (2008) study, Weatherholtz(2013) investigated perceptual learning of systemic cross-category vowel variation produced natu- rally by a human talker. The exposure and test materials were modeled on those used by

Maye et al., but the experimental design was modified to involve a single session of testing: that is, listeners performed an auditory lexical decision task before and after exposure to a talker with a novel front vowel lowered chain shift (see Figure 1.1a for a schematic represen- tation of this chain shift). Consistent with Maye et al.’s (2008) conclusions, Weatherholtz

(2013) found evidence for a sublexical locus of learning. Accent-consistent pronunciations that were perceived as nonwords before accent exposure (e.g., “w[E]tch” for witch, given the lowering of /I/ to [E]) were perceived as real words after exposure, and this effect general- ized to new accent-consistent words. Unlike in Maye et al.’s (2008) study, the new words only occurred during the post-exposure test phase (i.e., not during the exposure passage or pre-exposure test phase), so the finding of lexical generalization cannot be explained by stimulus repetition across test blocks. In contrast to Maye et al.’s (2008) conclusion that

14 i u i u i u I U I U I U e o e o e o @ @ @ E 2 E 2 E 2 æ æ æ A,O A,O A,O (a) Front vowel lowering (b) Front vowel raising (c) Front vowel backing

Kodi Weatherholtz Perceptual learning of systemic vowel variationKodi Weatherholtz PerceptualAMLaP learning 2014 of systemic vowel variationKodi Weatherholtz PerceptualAMLaP learning 2014 of systemic vowel variation AMLaP 2014 Figure 1.1: Schematic representation of the novel front vowel shifts used in the study by Weatherholtz (2013).

the sublexical adjustments involved targeted perceptual shifts that reflect the direction of

the vowel shifts in production, Weatherholtz(2013) found evidence that perceptual learn-

ing of the unfamiliar vowel chain shift involved a general broadening of perceptual vowel

categories. That is, perceptual learning of the front vowel lowered chain shift generalized to

words pronounced with an untrained system of front vowel raising (e.g., witch as “w[i]tch”,

given the raising of /I/ to [i], as opposed to the trained lowering of /I/ to [E]; see Figure

1.1b). Crucially, learning did not generalize to words pronounced with a set of structurally

unrelated shifts in which front vowels were independently shifted to sound like their back

vowel counterparts (see Figure 1.1c; e.g., witch as “w[U]tch”, given the realization of /I/ as

[U]). This latter finding indicates that the exposure-driven perceptual adjustments were not

arbitrary: i.e., listeners did not simply allow any vowel to substitute for any other vowel.

Rather, listeners learned the front vowel lowered chain shift and were able to leverage learn-

ing to facilitate processing of words pronounced with an untrained but structurally-related

system of front vowel raising. A second experiment by Weatherholtz(2013) replicated these

findings using different stimulus materials produced by a different talker: perceptual learn-

ing of a novel back vowel lowered chain shift transferred to an orthogonal back vowel raised

chain shift, but not to an unrelated pattern of back vowel fronting.

The studies by Weatherholtz(2013) and Maye et al.(2008) suggest that adaptation to

an unfamiliar vowel chain shift may have far reaching benefits for the recognition of spoken

15 words containing unfamiliar vowel variants. However, further research is needed to provide a more nuanced understanding of the exposure-driven perceptual adjustments that promote such flexibility in word recognition. The current experiments provide this research. In the studies by Weatherholtz(2013) and Maye et al.(2008), all stimulus materials were produced by a single talker (or synthesized voice), so it is unclear whether the sublexical locus of learn- ing demonstrated in these studies involved developing talker-specific representations of the unfamiliar vowel variants or adjusting the perceptual representation of talker-independent vowel representations. Since these studies used a within-subjects design, listeners heard words pronounced with multiple different vowel shifts before and after familiarization to a talker with a particular system of shifts. It is unclear what, if any, influence this task struc- ture had on learning and generalization. High variability learning environments influence perceptual learning (see, e.g., Lively et al., 1993; Greenspan et al., 1988; Posner & Keele,

1968), so initial exposure to a wide range of vowel shifts during the pre-familiarization test phase might have established task-dependent biases for how listeners subsequently adapted to the novel vowel chain shift. Further, the accent familiarization phase in these studies comprised 20 minutes of passive listening to a passage that contained several hundred to- kens of the target vowel shifts. A open question is whether the observed learning outcomes were dependent on such high token variability exposure materials.

1.3 Linking hypotheses: Generalization and the mechanisms

of perceptual learning

The current experiments use generalization of perceptual learning as a method to assess the nature and locus of learning. The discussion below lays out the linking hypotheses that relate observable patterns of generalization to aspects of the underlying learning mechanism.

This discussion focuses specifically on the three types of generalization investigated in the

16 current experiments: generalization of learning of atypical segmental variation to new words, to new talkers, and to untrained but structurally similar segmental variants.

In order for learning of an atypical segmental variant to generalize to new words con- taining that variant, listeners must abstract over past lexical episodes to learn a sublexical

(e.g., segmental, featural) pattern of variation, and this pattern must be represented in memory in such a way as to influence subsequent word recognition (see McQueen et al.,

2006, for extensive discussion of these points). Thus, finding generalization to new words indicates a sublexical locus of learning and provides evidence that recognizing spoken words involves an acoustic-to-lexical mapping process mediated by flexible representations of sub- lexical information (Sjerps & McQueen, 2010; McQueen et al., 2006). By contrast, finding that learning is specific to the trained words suggests that the locus of learning involves representations of lexical form (Buchholz, 2009; Greenspan et al., 1988). Note that testing for generalization across words is informative in its own right, but distinguishing between a sublexical and lexical locus of learning provides only part of the picture concerning the specificity of the representations involved in learning. For example, if the trained and new words are produced by the same talker, then behavioral evidence showing generalization of learning to new words could result from adjusting abstract talker-independent sublexi- cal representations or from developing sublexical representations defined by talker-specific pronunciation details. Likewise, lexically-specific learning could involve adjusting the repre- sentation of abstract representations of lexical form or relying on memory for specific lexical episodes (e.g., Goldinger, 1996).

Generalization to new talkers indicates that listeners abstracted over the indexical prop- erties of the trained talker’s speech to learn a talker-independent pattern of variation. By contrast, if experience with a particular talker only facilitates later processing of speech from that same talker, then learning relied on detailed representations of talker-specific pronun- ciation variation (for extensive discussion, see Bradlow & Bent, 2008; Lively et al., 1993).

Again, investigating generalization across this one dimension is informative in isolation, but

17 only provides a partial view of the locus of learning. By jointly investigating generalization across talkers and words, it is possible to determine whether the locus of learning involved talker-specific or talker-independent lexical or sublexical representations.

Investigating whether perceptual learning of segmental variation generalizes to untrained but structurally similar segmental variants has implications for determining both the locus of learning (e.g., whether sublexical learning involves specific segments or features) and the nature of exposure-driven perceptual adjustments. A recent study concerning perceptual learning of variation in voice onset time for stop consonants provides an empirical demon- stration of the general logic (Kraljic & Samuel, 2006). In this study, listeners were initially familiarized to a talker whose realization of /t/ was acoustically ambiguous between [t] and [d] (i.e., an idiolectal variant characterized by an atypically short voice onset time).

Listeners then performed a phonetic categorization task to test perception of the talker’s

/t/-/d/ and /p/-/b/ contrasts, which share a voicing distinction in English. As a result of exposure, listeners shifted their /t/-/d/ boundary to include the otherwise ambiguous

[td?] variant as an instance of /t/, and this effect generalized to the untrained /p/-/b/ con- trast (i.e., listeners categorized the acoustically ambiguous segment [pb?] as /p/). Thus, listeners learned generally about the relationship between voice onset time cues and stop consonant categories (i.e., a featural locus of learning), rather than learning specifically about the talker’s realization of /t/ (a segmental locus of learning). Further, with respect to the nature of learning, these findings indicate that listeners broadened their representa- tion of voiceless stop categories to include a greater range of featural variation. However, it is unclear whether this broadening was direction-specific, or whether perceptual learning influenced perception of stop consonants produced with both atypically short and atypically long VOTs (e.g., a direction-independent broadening of category “goodness”; Miller, 1997).

18 1.4 Overview of experiments and predictions

The six experiments comprising this study used different implementations of a between- subjects lexically-guided perceptual learning paradigm to investigate the perceptual adjust- ments that enable listeners to cope with unfamiliar cross-category variation. The general structure of this paradigm is as follows. During the initial exposure phase, participants were familiarized to a talker with either a standard-sounding American English accent (control condition) or an unfamiliar accent characterized by a novel vowel chain shift (adaptation condition). Figure 1.2 shows a schematic representation of one of the novel vowel chain shifts, referred to as the “back vowel lowered” chain shift, which involved a clockwise rota- tion of English back vowels: the vowel /u/ was lowered and fronted to sound like [U] (e.g., goose as “g[U]s”); /U/ was lowered and backed to sound like [o] (e.g., wooden as “w[o]den”);

/o/ was lowered and fronted to sound like [A] (e.g., nose as “n[A]se”); and the vowels /O/ and

/A/, which are merged in perceptually standard varieties of American English, were fronted to sound like [æ] (e.g., closet as “cl[æ]set”). Following the exposure phase, participants performed various word recognition tasks—auditory lexical decision, word identification, auditory naming—to probe learning and generalization. The target items for these test tasks were lexical items produced with various cross-category vowel shifts (e.g., “w[o]den” for wooden, given the shift of /U/ to [o] in the back vowel lowered accent). The specific properties of the target items were manipulated across experiments (e.g., whether the target items were produced by the trained talker or a new talker) in order to investigate the extent of generalization following various exposure conditions. However, the target lexical items were always designed to sound like nonword surface forms in standard American English due to the cross-category vowel shifts. Thus, it was expected that listeners in the control condition would have difficulty recognizing these word forms (e.g., rejecting these items as words during lexical decision and showing low identification accuracy for these items dur- ing the identification and naming tasks). The general measure of interest was the extent

19 i u I U e o @ E 2 æ A,O

Figure 1.2: Schematic representation of the novel back vowel lowered chain shift

Kodi Weatherholtz Perceptual learning of systemic vowel variation AMLaP 2014 to which participants in the adaptation condition were better able to recognize the target items, depending on the various properties of these items that were manipulated across experiments.

Experiments 1-3 investigated the locus of learning. Listeners in the adaptation condition were trained on the novel back vowel lowered chain shift and then tested on their ability to recognize trained and new words pronounced in the back vowel lowered accent by either the trained talker or various new talkers. By testing for simultaneous generalization of learning across words and talkers, these experiments aimed to determine whether the locus of learning involved lexical or sublexical representations and whether these representations were talker-specific or talker-independent. Finding generalization to new words containing the trained cross-category vowel variants would indicate that listeners in the adaptation condition adjusted their perceptual vowel space to cope with the talker’s vowel system (i.e., a sublexical locus of learning), as opposed to learning specific word forms that occurred during the exposure phase. If learning only generalize to new words by the trained talker, this finding would indicate that the locus of learning involved a perceptual representation of the trained talker’s vowels, whereas if learning generalizes across both words and talkers, this finding would indicate that the locus of learning involved adjusting talker-independent perceptual vowel representations.

20 Further, Experiments 1-3 investigated whether the ability to generalize learning across words and talkers was influenced by two properties of the environment: the degree of acoustic similarity between the trained and new talkers in terms of their vowel productions, and the degree of category-relevant vowel variability experienced prior to test (e.g., Logan et al., 1991). The influence of these two factors provides further insight into the locus of learning. Consider the possible finding that perceptual learning of the back vowel lowered chain shift only generalizes to new words produced by new talkers if the new talker’s vowel productions are acoustically similar to those of the trained talker. This finding would indicate that listeners adapted to the unfamiliar cross-category vowel variants by developing a detailed representation of the trained talker’s vowel space, but that listeners were able to leverage this learning when listening to a new talker with the same accent, as long as the new talker’s vowel productions were sufficiently similar to those of the trained talker (e.g.,

Reinisch & Holt, 2014; Kraljic & Samuel, 2007). By contrast, if listeners generalize across words and talkers regardless of talker similarity, this finding would indicate that listeners abstracted over the indexical properties of the trained talker’s vowel space and adjusted the perceptual representation of vowels talker independently.

Experiments 4-6 investigated the nature of the perceptual adjustments that drive adap- tation to cross-category vowel variation. Experiment 4 investigated the systematicity of learning. In speech production, chain shifts are systems of codependent vowel shifts (Labov,

1994), but in perception there is no a priori reason that listeners must represent the higher level system of codependencies in order to adapt to an unfamiliar chain shift. For example, consider a hypothetical three-shift push chain in which vowel A moves to the position of vowel B, which in turn pushes vowel B to the position of vowel C, which in turns pushes vowel C to the position of vowel D. Listeners could adapt to this system by learning each shift independent of the others (A → B, B → C, and C → D), or listeners could learn a pattern of codependent vowel variation. Experiment 4 tests these accounts using a “leave-one-out”

21 exposure paradigm. To briefly illustrate using the hypothetical chain shift, listeners ex- perience the shifts A → B and C → D, but have an incidental gap in their experience regarding the realization of vowel B, the middle vowel in the chain shift. If adaptation involves learning each shift independently, listeners should only learn the vowel shifts that they experienced. By contrast, if listeners represent higher level systematicity among vowel shifts, they should be able to leverage learning to fill in the gap in their experience, resulting in generalization to the untrained shift B → C.

Experiments 5 and 6 investigate whether perceptual learning of cross-category vowel variation is direction-specific and vowel-specific. Early work on adaptation to vowel chain shifts assumed an acute adaptive process, predicting that exposure to an unfamiliar vowel chain shift should cause listeners to learn that particular vowels were shifted in a particular direction through phonetic space, relative to canonical vowel realizations (Maye et al., 2008).

However, a series of unexpected but inconclusive findings suggest instead that perceptual learning of one vowel chain shift may transfer to untrained but structurally similar chain shifts (Maye et al., 2008, Experiment 1; see also unpublished studies by Weatherholtz,

2013, and Bardhan et al., 2006). Experiments 5 and 6 test the possibility of cross-system generalization of learning. Experiment 5 investigates direction specificity: listeners are trained on a novel system of “back vowel lowering” and then tested on an orthogonal system of “back vowel raising”. Experiment 6 investigates vowel specificity: again listeners are trained on “back vowel lowering” but then tested on a parallel system of “front vowel lowering”.

22 Chapter 2

Experiments 1-3: Effects of Input Variability and Talker Similarity on Generalization across Words and Talkers

2.1 Introduction

Listeners demonstrate a remarkable ability to understand speech, despite considerable vari- ability in the realization of speech sounds. Unfamiliar, non-standard or otherwise atypical pronunciation variation is often initially detrimental to speech processing (Mattys et al.,

2012), but with exposure, listeners can rapidly adapt to a talker’s idiolect or accent, en- abling them to map atypical pronunciation variants to the correct categories in memory

(Clarke & Garrett, 2004). This ability to adapt to pronunciation variation is due in large part to perceptual learning mechanisms that tailor speech perception and recognition pro- cesses to the statistics of the environment (see Samuel & Kraljic, 2009). For example, when listeners encounter an unfamiliar talker whose idiolectal realization of /s/ is acoustically ambiguous between [s] and [f] (e.g. platypus sounds like “platypu[sf?]”), they shift their

[s]-[f] boundary to perceive the otherwise ambiguous sound as an instance of /s/ (Norris et al., 2003). A central question in research on perceptual learning for speech concerns the locus of learning: that is, the level of representational specificity at which exposure- driven perceptual adjustments occur. Perceptual learning is not a singular phenomenon.

Depending on the nature of the input, listeners can learn lexically-specific patterns of vari- ation (Buchholz, 2009; Greenspan et al., 1988), or learning can involve adjusting sublexical

(e.g., segmental, featural) representations, as in the example above, which allows learning

23 to generalize to new words containing the trained pronunciation variant(s) (Sjerps & Mc-

Queen, 2010; Kraljic & Samuel, 2006; McQueen et al., 2006). Further, perceptual learning can be talker-specific, indicating that the exposure-driven perceptual adjustments relied on detailed information about the trained talker’s vocal characteristics (Eisner & McQueen,

2005; Kraljic & Samuel, 2005, see also Reinisch & Holt, 2014). Learning can also involve talker-independent perceptual adjustments, allowing exposure-driven recognition benefits to generalize to new talkers who produce similar patterns of variation (e.g., talkers with the same accent; Bradlow & Bent, 2008; Kraljic & Samuel, 2006; Lively et al., 1993). Thus, understanding how listeners cope with pronunciation variation requires understanding the locus of perceptual adjustments that occur in response to particular patterns of pronun- ciation variation and the extent to which these adjustments depend on properties of the learning environment.

Cross-category mismatches, the focus of the current study, have received relatively little attention in research on perceptual learning for speech (Sumner, 2011). Cross-category mis- matches occur when one speech sound is realized with acoustic-phonetic properties typically associated with a different speech sound. Such cross-category variation abounds across na- tive and non-native accents due to a range of phonological processes: vowel splitting, such as the selective raising of /æ/ before /g/ in some varieties of American English (e.g., bag sounds more like “b[e@]g”; Zeller, 1997); vowel merging, such as the realization of the vowel /E/ as

[I] by talkers with the pin-pen merger (Labov et al., 2006); th-fronting and th-stopping, such

as the realization of think as “[f]ink” or “[t]ink”, respectively (Kerswill, 2003; Wells, 1982);

and L1 influences on L2 productions, such as the realization of /I/ as [i] by many non-native

speakers of English (Flege, Bohn, & Jang, 1997; Flege, 1992). To cope with cross-category

mismatches, listeners must maintain multiple mappings between acoustic-phonetic forms

and higher-level speech categories and be able to determine which of multiple mappings is

appropriate given the talker they are listening to (e.g., depending on the talker, the pho-

netic form [I] maps either to the phonological category /I/ or /E/ and hence “p[I]n” maps

24 to either pin or pen). To date, relatively little is known about how listeners achieve such

flexibility (though see Trude & Brown-Schmidt, 2012; Maye et al., 2008): that is, about the level of representational specificity at which exposure-driven adjustments occur in order for listeners to cope with unfamiliar cross-category pronunciation variation, the factors that fa- cilitate or constrain generalization of learning to untrained sources of variation, such as new words and talkers, or the extent to which learning outcomes differ depending on properties of the learning environment.

The overarching goal of the current study is to provide a more nuanced understanding of the learning mechanism that enables listeners to cope with cross-category pronunciation variation. The three experiments comprising this study focus specifically on vowel chain shifts, which are complex systems of cross-category variation that affect the realization of multiple vowel categories across multiple acoustic-phonetic dimensions (Lubowicz, 2011;

Labov, 1998, 1994). For example, the Northern Cities Shift is a clockwise rotation of mid and low vowels relative to the realization of these vowels in standard-sounding Midland

English: in historical order, the vowel /æ/ (as in cad) was raised, fronted and diphthongized to sound like [ej] (as in cade); the vowel /A/ (as in cod) was fronted to [æ], and /O/ (as in cawed) was lowered to [A]; the vowel /E/ (as in Ked) was centralized to [2] (as in cud), and in response /2/ was shifted back to [O]; and finally the vowel /I/ (as in kid) was lowered and backed to [E](Labov et al., 2006). Such cross-category mismatches can be detrimental to speech processing, in part because they are a source of phonological and lexical confusion

(e.g., did the talker say /A/ or /æ/, cod or cad?; Jacewicz & Fox, 2012; Clopper et al.,

2010), and in part because the realization of words in one dialect or accent can sound like nonwords to listeners who are not familiar with the category structure of that variety (e.g., the Northern Cities realization of cog sounds like the nonword “cag” due to the /A/ → [æ]

shift).

The current experiments investigate perceptual learning of systemic cross-category vowel

variation and test for generalization of learning to new words and new talkers to identify

25 the locus of learning. Further, these experiments investigate potential constraints on gen- eralization by manipulating properties of the learning environment that have been shown to influence other forms of perceptual learning for speech: the degree of token variability experienced prior to test (Logan et al., 1991), and the degree of acoustic similarity between the trained and new talkers in terms of their realization of speech cues that are relevant to the category variants being learned (Reinisch & Holt, 2014; Kraljic & Samuel, 2007).

2.1.1 Perceptual learning of isolated cross-category mismatches

Listeners can rapidly adapt to isolated patterns of cross-category variation, but the locus of learning is unclear. In one study, eye movements during online spoken word recognition showed that exposure to a talker who selectively raised /æ/ before /g/ but not before /k/

(e.g., bag sounded like “b[E]g”, but back sounded like “b[æ]ck”) facilitated word recognition by altering the dynamics of lexical competition among word cohorts (Dahan et al., 2008).

Before hearing any raised vowel items, listeners exhibited a strong cohort competition effect, such that the onset [bæ] activated both back and bag, as evidenced by fixations to written versions of these words on screen. After exposure to the raised vowel items, the onset [bæ] led to faster identification of the word back and fewer fixations on the written word bag, because listeners learned that bag was not pronounced with [æ] and hence was not a strong cohort competitor for this talker. The test materials in Dahan et al.’s (2008) study were a matched set of back-like and bag-like words produced by a single talker. Thus, while their results indicate that exposure to selective /æ/-raising induced a reorganization of lexical neighborhoods in memory, it is unclear whether this reorganization was driven by learning at a lexical or sublexical level. Listeners could have learned specifically about the trained bag-like words, adjusting the lexical representation of only these words in memory, in which case the facilitation effect would be specific to the corresponding back-like words.

Alternatively, listeners could have learned the pattern of selective /æ/-raising (a sublexical

26 locus), in which case the changes in lexical competition should extend to untrained back- like and bag-like words (e.g., McQueen et al., 2006). Distinguishing these accounts requires testing for generalization of learning to new words.

In a followup study using a similar eye-tracking paradigm at test, Trude and Brown-

Schmidt (2012) investigated whether listeners can simultaneously learn competing patterns of vowel variation from different talkers. Listeners were initially familiarized to two talkers: one who produced the pattern of selective /æ/-raising, and one who did not. Later, when listening to the “unaccented” talker at test, listeners showed a strong pattern of competition among bag-like and back-like words. By comparison, when listening to the “accented” talker, listeners were faster to recognize back-like words, replicating Dahan et al.’s (2008) finding that the corresponding bag-like words were excluded as onset competitors for this talker.

Thus, listeners adapted to the “accented” talker’s accent, but they did not wrongly apply learning when listening to a familiar talker whom they knew did not produce the raised-

/æ/ variant. Trude and Brown-Schmidt (2012) explained these results by arguing that listeners developed and maintained multiple representations of the phoneme /æ/ (see also

Sumner & Samuel, 2009; Ranbom & Connine, 2007). However, since they did not test for generalization to new words, it is equally possible that listeners developed talker-specific lexical representations of the trained bag-like and back-like words.1

An orthogonal issue concerning talker-specificity, which has received little attention, is whether listeners can generalize learning of cross-category vowel variation to untrained talkers who produce the same pattern of variation (cf. Kraljic & Samuel, 2007). Further research is clearly needed to understand whether perceptual learning of cross-category pro- nunciation variation transfers across words and talkers, and if so, the factors that constrain generalization.

1Note that the number of -ag/-ack minimal pairs in English is very small (see Trude & Brown-Schmidt, 2012), so testing for generalization to new words was not practical in the studies on perceptual learning of selective /æ/-raising by Trude and Brown-Schmidt (2012) and Dahan et al. (2008).

27 2.1.2 Perceptual learning of vowel chain shifts

While vowel shifts can occur in isolation, such as the pattern of selective /æ/-raising dis- cussed above (Zeller, 1997), vowel shifts can also occur as part of a larger chain of codepen- dencies, such as the clockwise rotation of mid and low English vowels that characterizes the

Northern Cities Shift (Labov et al., 2006). Maye, Aslin and Tanenhaus (2008) investigated the nature of the perceptual learning mechanism that enables listeners to cope with complex systems of cross-category vowel variation, focusing specifically on instances of vowel shifts that result in nonword surface forms if listeners have not adapted to the talker’s vowel sys- tem (e.g., witch pronounced as “w[E]tch”, cf. /wIÙ/). The design of this study is discussed in detail because it forms the empirical basis for the current experiments.

Maye et al. (2008) created a novel accent by rotating English front vowels counterclock- wise (i.e., /i/ realized as [I], beetle as “b[I]tle”; /I/ realized as [E], witch as “w[E]tch”; /E/ realized as [æ], yellow as “y[æ]llo”; and /æ/ realized as [A], trap as “tr[A]p”). Adaptation to this novel “front vowel lowered” accent was assessed using a multi-session lexically-guided perceptual learning paradigm. During each session, participants passively listened to a 20- minute excerpt from The Wizard of Oz and then performed an auditory lexical decision task. Target lexical decision items were a set of English words pronounced in the front vowel lowered accent, all of which sounded like nonwords in standard American English

(e.g., “w[E]tch”). The exposure passage for the first session was produced by a synthesized voice with an Inland North American English accent. The exposure passage for the second session (one to three days later) was produced by the same synthesized voice but in the front vowel lowered accent (created by modifying the phonological dictionary of the built-in text- to-speech synthesizer on Macintosh computers to reflect the target vowel substitutions).

Maye et al. predicted that by hearing nonword surface forms like “w[E]tch” for witch in the context of a familiar story, listeners could use top-down knowledge to identify the intended lexical items and in turn to guide learning of the vowel variants (cf. Norris et al., 2003).

28 Maye et al. (2008) found that as a result of passive exposure to the novel accent, front vowel lowered items that were perceived as nonwords on the first day of testing (e.g.,

“w[E]tch”) came to be perceived as real words (e.g., witch). This effect emerged for target

items that occurred during the exposure passage (trained items) and for new items that

occurred only at test. This finding of apparent generalization to new words suggests that

listeners remapped their perceptual vowel space to reflect the talker’s vowel system, rather

than learning lexically-specific pronunciations. Interestingly, filler words pronounced with

unshifted (standard-sounding) front vowels (e.g., shelf as “sh[E]lf”) were still perceived as

words after exposure to the novel accent, rather than as the nonwords that result from

applying knowledge of the front vowel lowered variants (e.g., [E] maps to /I/, so “sh[E]lf”

is the nonword “shilf”). Based on these results, Maye et al. suggested that the mechanism

driving adaptation involved developing and maintaining multiple mappings for how vowel

forms relate to phonological categories (see also Trude & Brown-Schmidt, 2012), and that

multiple different mappings can be applied even to the speech of a single talker.

However, given two aspects of Maye et al.’s (2008) design, their data do not provide

conclusive evidence that learning generalized to new words, nor that listeners learned the

cross-category vowel variants in the first place. Maye et al.’s (2008) study was conducted

within subjects, and the lexical decision tasks comprised the same set of items before and

after exposure to the novel accent. Given that listeners maintain episodic traces of nonword

forms in memory for at least one week (Goldinger, 1998, 1996), Maye et al.’s (2008) finding

that listeners endorsed more of the front vowel lowered items after exposure to the front

vowel lowered accent could be a simple repetition effect, rather than evidence of pattern

abstraction and extension. Note in particular that repetition of nonword forms has been

shown to cause wordlike priming effects in lexical decision (Zeelenberg et al., 2004). A

separate issue is that processing synthesized speech is more difficult and involves increased

working memory demands, relative to human speech, even for high quality synthesized

speech modeled on a standard-sounding variety (White, Rajkumar, Ito, & Speer, 2009;

29 Lattner et al., 2003; Smither, 1993). Thus, Maye et al.’s (2008) results, based on low-grade synthesized speech, may not reflect the processes and outcomes involved in coping with naturally occurring cross-category vowel variation in human speech.

Using a modified version of Maye et al.’s (2008) paradigm, Weatherholtz(2013) in- vestigated the lexical specificity of perceptual learning when listeners adapt to a naturally produced system of cross-category vowel variation. Participants performed an auditory lexi- cal decision task before and after exposure to a novel “back vowel lowered” accent (produced by a trained phonetician; e.g., /U/ pronounced as [o], wooden as “w[o]den”; /o/ pronounced as [A], nose as “n[A]se”). Unlike in Maye et al.’s (2008) study, a subset of target back vowel lowered test items occurred only during the post-exposure lexical decision task, providing a test of generalization to new words without the confound of repetition priming. Compared to pre-exposure baseline rates, listeners judged more of the back vowel lowered items as words following exposure to the novel vowel chain shift. This effect generalized to new words, indicating a sublexical locus of learning. Further, accent exposure had no significant influence on perception of English words pronounced with unshifted back vowels. Thus, results from an experiment using human speech, a different novel chain shift and a differ- ent exposure-test paradigm found converging evidence for Maye et al.’s (2008) conclusion: listeners learned to remap their vowel space to reflect the talker’s vowel system without overwriting their representation of standard sounding vowel variants.

The current experiments build on these studies to determine whether such vowel space remapping is talker-specific. There are multiple learning mechanisms that could produce the behavioral response patterns observed by Weatherholtz(2013) and Maye et al.(2008) but would produce different patterns of behavior when processing speech from untrained talkers with the trained vowel chain shift. One possibility is that learning occurs at a sublexical level but is strictly talker-specific, in which case listeners should generalize to new words from the trained talker but there should be no generalization to trained or new words produced by different talkers with the same chain shift. Another possibility is that listeners remap their

30 vowel space talker-independently, allowing learning to generalize to new words produced by new talkers with the same vowel chain shift. Yet another possibility is that learning involves both sublexical and lexical adjustments. Listeners might remap their perceptual vowel space, enabling them to cope with vowel shifts in untrained words, while also adjusting the representation of specific lexical items that were experienced with shifted vowels. In principle, the vowel space remapping could occur talker-specifically (e.g., in order to capture acoustic-phonetic details of the talker’s vowel system), while the lexical representations could be adjusted talker-independently, enabling listeners to recognize trained words with the trained vowel shifts even when these words are produced by new talkers.2 The current study aimed to distinguish these possibilities by testing whether learning of systemic cross- category vowel variation simultaneously generalizes across words and talkers. An important caveat is that, in general, perceptual learning outcomes differ depending on the statistics of the environment (Love, 2003; Yamauchi, Love, & Markman, 2002), and specifically in the domain of speech processing, several factors are known to influence generalization across words and talkers.

2.1.3 Factors influencing generalization across words and talkers

Two of the primary factors known to influence generalization across words and talkers are the degree of category-relevant variability experienced prior to test (Bradlow & Bent, 2008;

Logan et al., 1991) and the degree of acoustic similarity between trained and new talkers in the realization of category-relevant speech cues (Reinisch & Holt, 2014; Kraljic & Samuel,

2007, 2005, see also Goldinger, 1996). The current study tests these two factors as potential constraints on generalization of learning of systemic cross-category variation.

Regarding lexical specificity, exposure to a pattern of segmental variation across a wide range of lexical and phonotactic contexts (high token variability) facilitates phonological

2It should be noted that word-specific effects could be due to repetition priming (e.g., short-term ad- justments in lexical activation) or to a learning mechanism that adjusts for word-specific pronunciation variation.

31 abstraction, which in turn enables generalization to new words containing that pronunci- ation variant (e.g., Logan et al., 1991; Greenspan et al., 1988). By contrast, exposure to the same pattern of pronunciation variation in a limited set of lexical contexts leads to lexically-specific learning outcomes (Greenspan et al., 1988). Likewise, exposure to multi- ple talkers with the same accent or dialect (high talker variability) facilitates cross-talker generalization by helping listeners distinguish the abstract pattern of variation that occurs across talkers from intertalker variability in the realization of this pattern (Sidaras et al.,

2009; Bradlow & Bent, 2008; Clopper & Pisoni, 2004; Lively et al., 1993; Logan et al., 1991;

Gass & Varonis, 1984). While high talker variability promotes cross-talker generalization, such exposure conditions are neither sufficient (Jongman, Wade, & Sereno, 2003) nor neces- sary (Kraljic & Samuel, 2006) for talker-independent learning. In particular, several recent studies on perceptual learning of segmental variation have demonstrated cross-talker gen- eralization following single talker exposure (e.g., Reinisch & Holt, 2014; Kraljic & Samuel,

2007, 2006, 2005).

The likelihood of cross-talker generalization following single talker exposure appears to be determined, in large part, by the degree of similarity between the trained and new talkers, though it is currently unclear whether the relevant similarity metric involves overall perceived voice similarity (Goldinger, 1996), fine acoustic similarity in the realization of particular speech sounds (Kraljic & Samuel, 2007, 2005), or both (see Reinisch & Holt,

2014, for discussion of this point). In a recent series of interrelated studies, exposure to a single talker with an atypical [s]-like realization of /f/ caused listeners to recalibrate their

[s]-[f] boundary (McQueen et al., 2006; Norris et al., 2003). This category recalibration generalized across talkers, but only when the trained and new talkers were highly similar in terms of the spectral cues associated with their respective [s]-to-[f] ranges (Reinisch &

Holt, 2014; Kraljic & Samuel, 2007, 2005). No cross-talker generalization was observed when the trained and new talkers were acoustically dissimilar in the spectral range of their fricative productions (Kraljic & Samuel, 2007; Eisner & McQueen, 2005). Thus, multi-talker

32 exposure conditions may be necessary for cross-talker generalization (Bradlow & Bent, 2008;

Lively et al., 1993), except when the trained and new talkers are highly similar.

Talker similarity effects are often discussed in terms of episodic models of memory and categorization in which category activation is proportional to the similarity between input and stored instances of that category (e.g., Goldinger, 1996). Thus, if perceptual learning of pronunciation variation based on single talker exposure only transfers to similar sound- ing talkers, this finding would suggest that the generalization process relied, at least in part, on detailed representations of speech episodes. By contrast, if perceptual learning under single talker exposure conditions generalizes across talkers regardless of talker sim- ilarity, this finding would indicate that learning involved adjusting the representation of higher-level abstract categories (e.g., phonological or lexical representations). Note, the issue here is not whether episodic details in memory influence speech processing. There is substantial evidence that listeners rely on both episodic and abstract representations dur- ing spoken word recognition (Brown & Gaskell, 2014; McLennan, Luce, & Charles-Luce,

2005; Pisoni, 1997; Feustel, Shiffrin, & Salasoo, 1983). Rather, the issue concerns the level of representational specificity at which exposure-driven adjustments occur—talker-specific or talker-independent lexical or sublexical representations—depending on the pattern of variation being learned and the properties of the learning environment.

2.1.4 Overall design and predictions

The three experiments comprising this study investigated the nature and specificity of the perceptual learning mechanism that enables listeners to cope with systemic cross-category vowel variation. This study had three primary goals. First, this study investigated whether the patterns of vowel space remapping observed by Weatherholtz(2013) and Maye et al.

(2008) under single talker exposure conditions are talker-specific (i.e., listeners generalize to new words but only from the trained talker) or whether single talker exposure is suf-

ficient for listeners to adjust talker-independent sublexical representations. To distinguish

33 these possibilities, the current experiments tested for simultaneous generalization of learning across words and talkers. The second goal of this study was to assess whether the likeli- hood of generalizing across talkers was dependent on category-relevant acoustic similarity between trained and new talkers in order to determine whether the mechanism underlying cross-talker generalization was driven by detailed representations of the input. Third, this study investigated the extent to which token variability during initial exposure to the novel accent influences learning outcomes. In the studies by Weatherholtz(2013) and Maye et al.(2008), participants were familiarized to a novel vowel chain by listening to a 20-minute story containing several hundred vowel shifted tokens. These vowel tokens occurred across a wide range of lexical contexts and were characterized by a high degree of acoustic-phonetic variability. Given that high-variability learning environments promote generalization, the current study tested for generalization across words and talkers following similar exposure conditions (Experiment 1) and then tested whether learning becomes increasingly specific as the input becomes less variable (Experiments 2 and 3).

The design of all three experiments involved a between-subjects lexically-guided per- ceptual learning paradigm, with all speech materials produced naturally by human talkers

(cf. the use of synthesized speech by Maye et al., 2008). During the initial exposure phase, participants passively listened to a popular children’s story, The Adventures of Pinocchio, which was presented to one group of participants in a standard-sounding American En- glish accent (control condition) and to another group of participants in a novel “back vowel lowered” accent characterized by a clockwise rotation of English back vowels (adaptation condition). For example, the word wooden sounded like “w[o]den” in the back vowel lowered accent, given the cross-category shift of /U/ → [o] in this accent. Following the exposure phase, listeners performed an auditory lexical decision task and a word identification task, which were designed to investigate perceptual learning of the novel vowel chain shift and generalization of learning to new words and new talkers. The target items for these test tasks were a set of back vowel lowered word forms, all of which sounded like nonwords in

34 standard American English (e.g., “w[o]den”). Thus, it was expected that listeners’ ability to recognize these word forms would depend on whether they were familiarized to the back vowel lowered accent prior to test. The lexical decision and word identification data pro- vide complementary insight with respect to this overarching prediction. The lexical decision task assessed whether exposure to the novel back vowel lowered accent enabled listeners to perceive as real words accented pronunciations that would otherwise sound like nonwords in American English (e.g., “w[o]den” for wooden). One limitation of lexical decision data is that listeners can change their criteria for making a ‘word’ response based on properties of the experiment, without a corresponding change in word recognition (e.g., Zeelenberg et al., 2004). Thus, the word identification task complemented the lexical decision data by assessing whether exposure to the novel accent improved listeners’ ability to map the back vowel lowered forms onto the correct lexical representations in memory, which would be indexed by greater identification accuracy of the target items in the adaptation condition, relative to the control condition.

To test for generalization of learning to new words, a subset of the target stimuli were lexical items that occurred only at test (i.e., not during the exposure passage). To investi- gate cross-talker generalization and the potential constraining influence of talker similarity, the test materials were produced by three talkers: the trained talker, a new talker with an acoustically similar vowel space, and a new talker with an acoustically dissimilar vowel space. Based on results of the studies by Weatherholtz(2013) and Maye et al.(2008), lis- teners in the adaptation condition were expected to remap their vowel space to reflect the trained talker’s vowel system. If so, listeners in the adaptation condition should perceive both trained and new back vowel lowered items from the trained talker as real words during lexical decision and should show greater identification accuracy for these items during the word identification task, relative to participants in the control group. The central ques- tion of interest was whether this vowel space remapping occurs talker-specifically (indexed by generalization to new words, but only from the trained talker) or talker-independently

35 (indexed by simultaneous generalization across words and talkers). The use of a between- subjects accent exposure manipulation, rather than a within-subjects paradigm in which word recognition tasks flank the novel accent exposure phase (cf. Maye et al., 2008; Weath- erholtz, 2013), provides a stronger test of learning and generalization by avoiding potential confounds due to task and stimulus repetition and by reducing the likelihood of fatigue during test by decreasing the number of test trials presented to each participant.

To investigate the influence of input variability on learning outcomes, the exposure ma- terials were manipulated across experiments to create learning environments characterized by different degrees of token variability. The three experiments comprising this study were otherwise identical. The passage used for Experiment 1 was about 20-minutes long when spoken aloud and contained more than 600 target back vowel tokens distributed across nearly 200 unique lexical contexts. These back vowel tokens were characterized by a wide range of acoustic variability when produced naturally, whether in the talker’s native accent or in the novel back vowel lowered accent. Thus, the exposure materials for Experiment 1 created a high token variability learning environment. These exposure materials were pared down to create 6-minute medium token variability materials for Experiment 2 and 2-minute low token variability materials for Experiment 3.

2.2 Experiment 1

Experiment 1 used a single-talker high token variability learning environment to investi- gate perceptual learning of a novel vowel chain shift. The goal of this experiment was to determine whether learning under these exposure conditions generalizes across words and talkers and, if so, whether generalization is constrained by the degree of acoustic similarity between the trained and new talkers in their realization of the vowel variants.

36 2.2.1 Method

Participants

A total of 85 undergraduates at The Ohio State University participated in Experiment 1 in exchange for partial course credit. Nine participants were excluded because they were not native monolingual English speakers with normal speech and hearing, and three were excluded due to technical issues (i.e., script crash mid-experiment). An additional nine participants were excluded for performance issues: either they fell asleep during the initial passive listening accent exposure phase, or they exhibited exceptionally low accuracy on non-target trials during lexical decision (<70% accuracy in rejecting maximal nonwords).

After exclusions, there were 64 usable participants evenly split between the adaptation condition and control condition.

Exposure materials

The materials for the exposure phase were two spoken versions of a 20-minute passage from The Adventures of Pinocchio, a popular children’s story (see Appendix A.1 for the full text). Both versions were recorded by the same trained phonetician, once in her normal accent (standard-sounding Midland American English) and once in a novel accent created by systematically rotating all English back vowels clockwise. Figure 2.1 shows a schematic representation of the novel vowel chain shift: the vowel /u/ was lowered and fronted to sound like [U] (e.g., goose as “g[U]s”); /U/ was lowered and backed to sound like [o] (e.g., wooden as “w[o]den”); /o/ was lowered and fronted to sound like [A] (e.g., nose as “n[A]se”); and the vowels /O/ and /A/, which the talker merges in her native variety, were fronted to sound like [æ] (e.g., closet as “cl[æ]set”). For convenience, the novel chain shift is referred to as “back vowel lowering”, even though the chain shifted tokens involved variation along multiple acoustic-phonetic dimensions, including vowel height, vowel backness, trajectory and duration.

37 i u I U e o @ E 2 æ A,O

Figure 2.1: Schematic representation of the novel back vowel lowered chain shift

Kodi Weatherholtz Perceptual learning of systemic vowel variation AMLaP 2014

Prior to recording the exposure materials, the talker was presented with a written version of the story in which all instances of back vowel words were phonetically transcribed ac- cording to the remapping rules above. The talker practiced reading the story aloud multiple times over the course of a week to achieve a natural sounding delivery of the vowel variants.

Recordings of the novel accent and Midland accent versions of the exposure passage were made in a sound-attenuated booth using a high quality headset microphone (Sennheiser

HMD 280-13), which was passed through an Art Tube MP microphone preamp to a Win- dows laptop. Recordings were made using Audacity at a 44.1-kHz sampling rate and were later downsampled to 22.05-kHz and scaled to an average intensity of 70dB.

The vowel plot in Figure 2.2 shows the mean midpoint frequency (Hz) of the first and second formants (F1 and F2) of all stressed tokens of each vowel in the Midland accent version of the exposure passage (denoted by phonetic symbols) and the back vowel lowered version of the exposure passage (denoted by arrows). To obtain these measures, individual word and segment boundaries in each version of the passage were automatically aligned in

Praat (Boersma & Weenink, 2014) using the Penn Phonetics Lab Forced Aligner (Yuan &

Liberman, 2008); these boundaries were hand corrected for accuracy, and the frequencies of

F1 and F2 were extracted at vowel midpoint (50% of the duration of the vowel). The vowel plot in Figure 2.2 indicates that when speaking in the novel accent, the talker produced the desired pattern of cross-category back vowel shifts consistently on average, without any substantial change in her realization of non-back vowels.

38 400 i u 500 ɪ ʊ 600 e

700 o ɛ 800 F1 (Hz) at vowel midpoint F1 (Hz) at vowel æ ɔ 900 ɑ

2500 2000 1500 1000 F2 (Hz) at vowel midpoint

Figure 2.2: Experiment 1. Mean F1 and F2 (Hz) at vowel midpoint of all stressed tokens of each vowel in the exposure passage when produced in the talker’s standard-sounding Midland accent (phonetic symbols) and in the novel back vowel lowered accent (arrows). Diphthongs and schwa are omitted for clarity.

In total, there were 654 stressed back vowel tokens in the passage, distributed across

180 unique lexical contexts. Table 2.1 shows the number of tokens and contexts for each of the target back vowel categories individually, along with the corresponding by-category counts for the materials used in Experiments 2 and 3. The scatterplots in Figure 2.3 show the midpoint F1 and F2 (Hz) of all stressed tokens of each target back vowel in the Midland accent version of the exposure passage (light gray symbols) and the back vowel lowered version (dark gray symbols). As is apparent in these scatterplots, there was considerable acoustic variability in the realization of the target back vowels in each accent (see Table 2.2 for by-category means and variances for the materials used in Experiments 1, 2 and 3), which is expected given the natural production of these vowels across a wide range of lexical and phonotactic contexts. Given the number of shifted vowel tokens in the back vowel lowered version of the passage, the number of unique lexical contexts in which these tokens occurred, and the degree of acoustic variability among these tokens, the learning environment created by these exposure materials can be characterized as a high token variability environment.

39 Table 2.1: Number of tokens of each target back vowel category in the exposure materials for Experiments 1-3, and the number of unique lexical contexts in which these tokens occurred.

Experiment 1 Experiment 2 Experiment 3 vowel lexical vowel lexical vowel lexical Vowel tokens contexts tokens contexts tokens contexts u 127 39 41 23 21 15 U 109 23 28 9 12 7 o 230 56 67 26 21 14 O 53 18 18 11 5 5 A 135 44 30 16 13 10

Total 654 180 184 85 72 51

u, ʊ, o, (n = 127; l.c. = 39) (n = 109; l.c. = 23) (n = 230; l.c. = 56) 200 u u u u u u ʊ uuuuuuuuuu u uu uuu o 400 uu uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu ʊ ʊ uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu u ʊʊʊʊʊʊ ʊ ʊ oo u uuuuuuuuuuuuuuuuuuuuuuuuuu u ʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊ o oooooooo 600 u uuuuuuuuuuuuuu ʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊʊ ooooooooooooooooooooooooooo uuuuuuuuuuuuu ʊʊʊʊʊʊʊʊʊʊ ʊʊʊʊʊʊʊʊʊ oooooooooooooooooooo u uuuuu uuuu ʊʊ ʊʊ ʊʊ ʊʊʊʊʊʊ ooooooooooooooooooooooo o uu uu ʊ ʊ ʊ ʊʊ oooooooooooooooooooo o 800 u ʊ ooooooooooooooooooo oo o u oooooooooooooooooooooo o uu oo ooooooooooooooooo 1000 oooooooooooooo o ooooooo ooo o ooo 1200 ɔ, ɑ, (n = 53; l.c. = 18) (n = 135; l.c. = 44) 200

400 Exposure Condition 600 Midland AmEng accent

F1 (Hz) at vowel midpoint F1 (Hz) at vowel ■ ɑ ɑ ɔ ɔɔɔɔ ɑ ɑɑ ɑ ɑɑ 800 ɔɔ ɔ ɔɔɔɔɔɔɔɔ ɑ ɑ ɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑ BVL chain shift ɔɔɔɔɔ ɔɔɔɔɔɔɔɔɔɔɔɔɔɔ ɑɑɑɑɑɑɑ ɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑ ■ ɔɔɔɔɔɔɔɔ ɔɔɔɔɔɔɔɔɔɔɔ ɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑ ɔɔɔɔɔɔɔɔɔ ɔɔɔɔɔ ɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑɑ 1000 ɔɔɔɔɔɔ ɔ ɔ ɑɑɑɑɑɑɑ ɑɑɑɑɑɑɑ ɔɔɔ ɔ ɑɑɑɑɑ ɑ ɑ 1200 2500 2000 1500 1000 2500 2000 1500 1000 F2 (Hz) at vowel midpoint

Figure 2.3: Experiment 1. Midpoint F1 and F2 for all stressed back vowel tokens in the standard-sounding Midland accent version of the exposure passage and the novel back vowel lowered (BVL) version of the passage (n = the number of tokens, and l.c. = the number of unique lexical contexts in which those tokens occurred).

40 Table 2.2: Mean midpoint F1 and F2 (and standard deviation in parentheses) of stressed back vowel tokens in the novel back vowel lowered accent (BVL) and standard-sounding Midland accent versions of the exposure passage used in each experiment.

midpoint F1 (Hz) midpoint F2 (Hz) Accent Vowel Exp 1 Exp 2 Exp 3 Exp 1 Exp 2 Exp 3 BVL u 524 (98) 548 (93) 566 (93) 1805 (282) 1860 (249) 1816 (141) BVL U 555 (53) 564 (42) 549 (39) 1270 (193) 1235 (184) 1232 (124) BVL o 887 (78) 887 (73) 883 (52) 1501 (160) 1500 (150) 1493 (132) BVL O 907 (76) 907 (74) 907 (67) 1753 (184) 1734 (180) 1738 (177) BVL A 890 (84) 883 (84) 879 (83) 1792 (244) 1817 (201) 1768 (247)

Midland u 423 (92) 434 (82) 429 (59) 1674 (347) 1718 (364) 1734 (304) Midland U 562 (75) 543 (69) 573 (66) 1740 (227) 1735 (257) 1735 (188) Midland o 667 (95) 666 (93) 645 (84) 1376 (176) 1396 (200) 1427 (191) Midland O 836 (63) 849 (67) 811 (57) 1296 (129) 1313 (86) 1362 (126) Midland A 865 (70) 849 (57) 874 (50) 1382 (133) 1359 (151) 1415 (111)

Note. The table is formatted to emphasize the difference in acoustic variability by vowel across experiments: high token variability (Experiment 1), medium token variability (Experiment 2), and comparatively low token variability (Experiment 3).

An excerpt from The Adventures of Pinocchio was selected as the exposure passage for two reasons. This story contains a large number of back vowel words, which facilitated the creation of high token variability exposure materials. Further, participants were likely to be familiar with the content of this popular children’s story, which should enable them to use top-down knowledge to guide perceptual learning of the cross-category vowel variants

(e.g., Maye et al., 2008; Norris et al., 2003).

Test materials

The test materials comprised a set of 220 unique monosyllabic and bisyllabic words and nonwords (see Table 2.3 for example items and Appendix A.4 for the full set of test items).

Target items were 60 back vowel lowered surface forms, which were real words of English but pronounced with the novel back vowel lowered chain shift. Crucially, the lexical items

41 Table 2.3: Experiments 1-3, example test stimuli.

Item Type N Example Items and Phonetic Forms

Back vowel lowered items trained lexical items 20 wooden “w[o]den” nose “n[A]se” new lexical items 40 pudding “p[o]dding” expose “exp[A]se”

Filler words (unshifted vowels) 100 room “r[u]m” nickel “n[I]ckle”

Maximal nonwords 60 dorve “d[O]rve” yolash “y[o]l[æ]sh” comprising the target set were selected so that realization of these words in the back vowel lowered accent would sound like nonwords in standard American English (e.g., wooden as

“w[o]den”; nose as “n[A]se”). Thus, without exposure to the back vowel lowered accent,

listeners should have difficulty recognizing these word forms. Of the 60 target items, 20

were lexical items that occurred in the exposure passage (trained words) and 40 were lexical

items that occurred only at test (new words). The remaining test items were 100 English

words pronounced with unshifted (i.e., standard-sounding) back and front vowels (e.g., room as “r[u]m”, nickel as “n[I]ckle”), and 60 items that were phonotactically legal nonwords in both standard American English and the back vowel lowered accent (e.g., dorve [dOrv]). For clarity, the nonword filler stimuli are referred to as “maximal” nonwords (cf. Connine et al.,

1997), since they differ from real English words by multiple segments and features, whereas the back vowel lowered items, which sound like nonword surface forms without knowledge of the accent, differ from citation forms by a single vowel shift.

All 220 test items were recorded by three phoneticians: the same female who recorded the exposure passages (i.e., the trained talker); a new female talker, and a new male talker.

All three talkers speak natively with a standard-sounding Midland accent (including the merger of the vowels /O/ and /A/, as in caught and cot, respectively). The test items were recorded using the same equipment and recording procedure as for the exposure passage.

After recording, the sound files containing each test item were downsampled to 22.05-kHz

42 and individually scaled to an average intensity of 70dB. The segment boundaries for each test item were automatically aligned and hand corrected in Praat, and then formant and duration measures were extracted for later analysis.3

The new female talker was selected because her vowel space is acoustically very similar to that of the trained female. Figure 2.4a shows the vowel space envelopes for the trained female and the new female, based on the midpoint F1-F2 from stressed vowels in the 100 test words from each talker containing that talker’s normal vowels. The vowel spaces of the two female talkers are almost entirely overlapping, particularly in the back vowel portion of the space, though these talkers differ somewhat in the acoustic realization of the front vowels /i/ and /æ/. The new male talker was selected because his vowel space is acoustically dissimilar to that of the trained talker, as shown by the vowel envelopes in Figure 2.4b.

The vowel plots in Figure 2.5 show that each of the test talkers produced the desired cross-category back vowel shifts consistently. Phonetic symbols indicate the mean midpoint

F1 and F2 (Hz) of each vowel in each talker’s normal accent, based on the 100 test words containing unshifted front and back vowels. Arrows indicate the mean midpoint F1 and F2

(Hz) of stressed vowel tokens in the 60 back vowel lowered test items. Note that, as shown in Figure 2.5, the set of target back vowel lowered items contained no instances of the /A/

→ [æ] shift. Participants were likely to be familiar with this vowel shift before participating in the present study because this shift is one of the characteristics of the Inland North dialect, which is spoken throughout parts of Ohio (Labov et al., 2006) and is prominently represented among students at The Ohio State University (Campbell-Kibler, Walker, &

Elward, in preparation). Thus, this vowel shift was omitted to increase the likelihood that the back vowel lowered items would be perceived as nonwords by participants without exposure to the novel accent.

3To automatically align the nonword stimuli, the dictionary for the Penn Phonetics Lab Forced Aligner was locally updated with phonological transcriptions for these items.

43 Talker Status trained female new female

250 i i iii i i i ii i i i i i i i u u ɪ ɪ ɪ u u i o ɔ-ru e ɪ ɪ ɪ ɔu-r ɪ ɪ ɪɪ ɪ ɪ ɪ ɔ-r ɔ-r 500 eɪ ɪ ɪ ʊ ɔɔ-r-r ɪ ee ɪ ɪ ɪ ɪ o ɔ-r e ɪ ɪ ɛ ɪ ɪ ʊ ɔ-rɔ-rɔɔo-r-r ɪ ɪ ɪ o ɔ-ro o ɔ-r ee ɪ o ɔɔ-r-r æ ɪ ɛɪ o o æ-næ-n ɪ o æ-n æ-n æ ɛɑ æ-n æ-n ɑ 750 ɑ ɔ ɔ ɛ æ-n ɛæ-nɛ ɛɛ ɑ ɛ ɛ ɑ ɔ ɛ ɛ ɛ ɛ ɑ ɑ ɛ ɑ ɑɑɑɔ ɑ ɑ ɑɑɑ æ æ æ F1 (Hz) at vowel midpoint F1 (Hz) at vowel æ ɑ 1000 æ æ ɑ æææ æ æ æ æ æææ æ æ 1250 3000 2000 1000 F2 (Hz) at vowel midpoint

(a) trained female vs. new female

Talker Status trained female new male

250 i iii ii i i iiii ɪ i i i i i ɪ u u ɪ ɪ ɪ ɪ u u u ɪ ɪ ɔ-r u ɪ ɪ ɪ ʊ ɪ æ-n æ-n ɪ ɪɛ e ɔ-r 500 eɪ æe eɪ ɪɪɪ o ɔ-r ɔ-r oɔ-r ɪ ɪ ɪ e ɪ o o ɔɔ-r-r ɔ-r ɔ-r ɪ ee ɪ ɪ ɪ æ-n o ɔɔ-r-r ɔ-r ɔ-r e ɪ ɪ ɪ ɪ ʊ oɔ-r ɔ-rɔ-r o ɪ æ-nɛ o ɛ ɪ ɛɛæ ɛ ɛ ɑɛ ɪ ɛ ɑ ɑ o ɔ æ oæ æ ɑ ɔ æ-n ææ æ ɑ æ-n æ æ ɔ 750 ɑ ææ ɑ ɔ ɑ ɑ æ-n ɛæ-nɛ ɛ ɑ ɛ ɛ ɛ ɑ ɛ ɑ ɑɑɑɔ ɑ æ æ æ

F1 (Hz) at vowel midpoint F1 (Hz) at vowel ɑ 1000 æ æ æææ æ

1250 3000 2000 1000 F2 (Hz) at vowel midpoint

(b) trained female vs. new male

Figure 2.4: Acoustic comparison of the trained and new talkers’ vowel spaces based on the midpoint F1-F2 (Hz) of all stressed vowel tokens from the 100 filler words containing unshifted front and back vowels. The top panel compares the new female talker to the trained female talker (note the high degree of acoustic similarity between these talkers’ vowel spaces). The bottom panel compares the new male talker to the trained female talker (note the acoustic dissimilarity between these talkers’ vowel spaces).

44 trained female new female new male 300 i i i u u u ɪ ʊ 500 ɪ ʊ e e ɪ ʊ e o o o ɛ 700 æ ɑɔ ɛ ɔ ɛ ɔ 900 ɑ ɑ

F1 (Hz) at vowel midpoint F1 (Hz) at vowel æ

1100 æ 3000 2500 2000 1500 10003000 2500 2000 1500 10003000 2500 2000 1500 1000 F2 (Hz) at vowel midpoint

Figure 2.5: Production of the back vowel lowered shift by each of the test talkers. Phonetic symbols indicate the mean midpoint F1and F2 (Hz) of each talker’s normal vowels, based on the test items containing unshifted vowels. Arrows indicate the mean midpoint F1 and F2 of vowels in the back vowel lowered test items.

Procedure

The experiment comprised three tasks in a fixed order: a passive listening accent exposure task followed by an auditory lexical decision task and a word identification task, which were designed to investigate the nature of exposure-driven changes in word recognition. The entire experiment was conducted on a Windows PC computer and lasted about 50 min- utes. Participants sat at a desk with a button box and a standard computer keyboard in front of them. Stimulus presentation was controlled using E-Prime (Schneider, Eschman, &

Zuccolotto, 2012), and stimulus materials were presented through Sennheiser HMD280-13 headphones. The exposure phase was manipulated between subjects. Half of the partic- ipants heard the exposure passage spoken in the novel back vowel lowered accent before performing the word recognition tasks. The other half of the participants heard the same passage spoken by the same talker but in a standard-sounding Midland accent of American

45 English (control condition) before performing the same test tasks. Participants were in- structed that the listening phase was about 20 minutes long and that their only task during this phase was to pay attention to the content of the story.

For the lexical decision task, spoken stimuli were presented binaurally one at a time, and participants were instructed to use the button box to indicate whether each stimulus was a real word of English or a nonword by pressing the corresponding buttons labelled “Word” and “Nonword”. Participants used their right and left index fingers to press the “Word” and “Nonword” buttons, respectively. Participants were instructed to respond as quickly as possible on each trial without sacrificing accuracy. The experiment advanced to the next trial once participants registered a lexical decision, with an inter-trial interval of 1000ms.

Response type (word vs. nonword) and response time from trial onset were recorded for each trial. Stimulus presentation was blocked by talker. The stimulus materials in the

first block were always produced by the trained talker to assess adaptation to this talker immediately following the exposure phase. The stimulus materials in the second block were produced either by the new female talker with an acoustically similar vowel space or by the new male talker with an acoustically dissimilar vowel space. Thus, each participant heard test items from both trained and new talkers, but the talker similarity manipulation was between subjects, with half of the participants in each exposure condition (n = 16) hearing the new female talker and the other half hearing the new male talker. Each lexical decision block comprised half of the test items of each item type: that is, 30 target back vowel lowered items (10 trained, 20 new), 50 words pronounced with unshifted vowels, and 30 maximal nonwords. Four lists were created so that across participants each item was heard equally often from each talker. Four fixed pseudorandom orders were created for each list to dissociate item from position in the experiment while also ensuring that target items were approximately even distributed throughout the task and always separated by at least one

filler trial. Participants were allowed to take a break halfway through the lexical decision task: i.e., between the trained talker and new talker block. During this break, participants

46 were instructed that the stimulus materials in the second block were produced by a different talker than in the first block.

After the second lexical decision block, the button box was moved aside, and a stan- dard computer keyboard was placed at a comfortable typing position so participants could perform the word identification task. For this task, participants heard a fixed subset of

80 stimulus items presented binaurally one at a time: the 60 back vowel lowered target items (30 each from the trained talker and the same new talker presented during the lexical decision task) and 20 of the standard-sounding test words containing unshifted back vowels

(10 each from the trained talker and the new talker in that session). Participants were instructed that if they recognized a stimulus item as a real word of English, they should identify the word by transcribing it orthographically. Participants were instructed to re- spond by pressing ‘x’ for each stimulus item that they did not recognize as a real word of

English. The experiment advanced to the next trial once participants typed a response and pressed the Enter key, with an inter-trial interval of 1500ms. The typed character sequence for each trial was recorded. Items from both talkers were intermixed, and presentation order was randomized. Due to an error in the experiment script, the items produced by each talker were not counterbalanced across participants. Experiments 1-3 were conducted in parallel, so this error persists across experiments. However, this issue does not inval- idate the interpretation of results. The central question is whether exposure to the back vowel lowered accent influences recognition of untrained back vowel lowered words produced by new talkers. The word identification data provide a means of answering this question: listeners heard new target items from each test talker, albeit a fixed subset.

Coding

The dependent measure of interest for the lexical decision task was response type: ‘word’

(coded as 1) vs. ‘nonword’ (coded as 0). There was technically no correct response for the target back vowel lowered items, as these items were designed to be perceived differently

47 depending on the preceding exposure phase. The central question addressed by these data is whether exposure to the novel back vowel lowered accent can alter the perceived lexical status of accent-consistent (i.e., back vowel lowered) word forms, which would be indexed by a greater ‘word’ response rate on target trials in the adaptation condition than in the control condition. Response times, which are typically considered the primary measure of interest in the analysis of lexical decision performance, are not analyzed here due to aspects of the experimental design that complicate the interpretation of response latencies. The typical strategy of analyzing response times for the subset of correct “word” responses on target trials (and providing supporting evidence from the timing of errors) is not possible given the lack of correct and incorrect responses on target trials. A more fundamental issue concerns data sparsity and effect estimation. In principle, it is possible to forgo the issue of correct vs. incorrect responses and analyze response times for all ‘word’ responses on target trials; that is, to assume that ‘word’ responses are a relatively accurate (i.e., noiseless) index of perceived lexical status. However, in practice this approach would lead to unreliable timing estimates. In studies where target word stimuli are designed to be highly recognizable (e.g., accuracy greater than 80-90%), psychophysical analyses have shown that internal (participant-level) noise is three times greater for response times than for response choices (Diependaele, Brysbaert, & Neri, 2012). In the current study, the target words are designed to be unrecognizable to participants in the control condition, so the small number of ‘word’ responses for target items in the control condition combined with considerable participant-level noise would result in an unreliable baseline against which to assess effects of exposure on the speed of word recognition in the adaptation condition.

For the word identification task, the dependent measure of interest was identification accuracy. Responses were coded as correct if and only if the response was a correct spelling of the target word (including alternate spellings, e.g., “humour” for “humor”) or a correct spelling of a homophone of the target word (e.g., “doe” for “dough”). All other responses were scored as incorrect. This coding criterion avoids inherent problems in distinguishing

48 typographic errors from misidentifications (e.g., ‘evoke’ in response to the stimulus word

“revoke”) and thus provides a conservative measure of word identification accuracy. To esti- mate the number of responses that were scored as incorrect due to spelling, the generalized

Levenstein distance (i.e., edit distance) between each response and the corresponding target word was calculated—i.e., the number of insertions, deletions, and substitutions needed to transform one string into another. Less than 4% of responses had an edit distance of 1, indicating a limited number of potential minor spelling-related issues.

Analysis

Lexical decision responses (word vs. nonword) and word identification accuracy (correct vs. incorrect responses) were analyzed using generalized linear mixed-effects regression (see

Jaeger, 2008, for an introduction), as implemented in the lme4 package (version 1.1-7: Bates,

Maechler, Bolker, & Walker, 2014) in R (R Core Team, 2014). All mixed-effects models were specified with the maximal random effects structure justified by the experimental design: that is, by-subject and by-item random intercepts, by-subject random slopes for all design variables manipulated within subjects, and by-item random slopes for all design variables manipulated within items (for discussion of this approach, see Barr, Levy, Scheepers, &

Tily, 2013). Mixed-effects models were fit with the bobyqa optimizer, and the minimum number of iterations for each analysis was determined by squaring the total number of model parameters (i.e., all fixed effects terms, random effects terms, and correlations among random effects) and multiplying this product by 10 (Bates, Mullen, Nash, & Varadhan,

2014). If the definitionally maximal model failed to converge in the allotted number of iterations, the random effects structure was systematically simplified in a step-wise fashion until the model returned reliable parameter estimates. These steps involved uncorrelating random intercepts from the corresponding by-unit random slopes and dropping the random effects term with the least variance. Log-likelihood model comparisons were used to assess the contribution of fixed effect terms to model fit: each fixed effect term was removed

49 one at a time (along with corresponding higher-order terms), and the subset model was evaluated against the full model using a χ2-test over the difference in model deviance (-

2 × ∆log−likelihood), with the degrees of freedom equal to the difference in the number of parameters for the two models. When log-likelihood comparisons revealed significant higher-order interactions among fixed effects, the significance of the lower-order terms (i.e., the constituent main effects and lower-order interaction terms) was assessed using Wald’s z-score.

Across analyses, the critical comparisons involved factorial design variables: e.g., the pre-test exposure condition (the novel back vowel lowered accent vs. the standard-sounding

Midland accent), the status of the target lexical items pronounced in the back vowel lowered accent (trained lexical items, which occurred during the exposure phase vs. new lexical items that occurred only at test), and the talker who produced the target items (the trained talker vs. the new acoustically similar and dissimilar talkers). All categorical variables were coded as sum contrasts, which reduces collinearity among predictors and enables regression coefficients to be interpreted as main effects and overall interactions, as opposed to pair-wise comparisons to a reference level.

2.2.2 Results

To investigate perceptual learning of the back vowel lowered accent, two dependent mea- sures were analyzed: endorsement rates during lexical decision (i.e., proportion of ‘word’ responses) and accuracy during the word identification task.

Lexical decision endorsement rates

Figure 2.6 shows the mean endorsement rate for each item type: the filler words pronounced with unshifted vowels, the target back vowel lowered items, and the maximal nonwords.

Endorsements rates are plotted by exposure condition (the novel back vowel lowered accent vs. the standard-sounding Midland accent) and test talker (the trained female vs. the new

50 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2

Proportion response 'word' 0.0 unshifted back vowel maximal unshifted back vowel maximal unshifted back vowel maximal vowel lowered nonwords vowel lowered nonwords vowel lowered nonwords items items items items items items

Figure 2.6: Experiment 1. Mean proportion of ‘word’ responses by item type, exposure condition (BVL = back vowel lowered), and test talker. Error bars indicate bootstrapped 95% confidence intervals. acoustically similar female vs. the new acoustically dissimilar male). Listeners who were familiarized to the back vowel lowered accent endorsed numerically more of the back vowel lowered items as words, compared to listeners in the control condition (an increase of ∼18-

23% across talkers), whereas the endorsement rates for each filler item type were nearly identical across exposure conditions and talkers.

To assess perceptual learning of the novel vowel chain shift and potential generalization across words and talkers, a mixed logit model was fitted to lexical decisions on target trials with fixed effects for exposure condition, item status (trained vs. new items), talker, and all interactions. Random intercepts were included for subjects and items, along with by- subject random slopes for item status and talker, and by-item random slopes for exposure condition, talker and their interaction. Table 2.4 summarizes the full model. Coefficients are reported in log-odds, the space in which logit models are fitted to the data. Positive co- efficient estimates indicate increased log-odds (and hence increased probabilities) of making a ‘word’ response during lexical decision. The only significant predictor of lexical decision

51 Table 2.4: Experiment 1, summary of the full mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.19 0.31 −0.6 0.54 exposure (= BVL chain shift) 1.68 0.44 3.8 <.001 16.85 6 <.01 item status (= new items) −0.67 0.45 −1.5 0.13 8.77 6 0.18 talker (= new female) −0.06 0.31 −0.2 0.86  4.62 8 0.80 talker (= new male) −0.10 0.36 −0.3 0.78 BVL chain shift : new items −0.46 0.26 −1.8 0.07 3.15 3 0.37 BVL chain shift : new female 0.21 0.56 0.4 0.70  1.91 4 0.75 BVL chain shift : new male −0.62 0.61 −1.0 0.32 new items : new female 0.64 0.44 1.4 0.15  2.82 4 0.32 new items : new male −0.29 0.49 −0.6 0.56 BVL chain shift : new items : new female 0.25 0.67 0.4 0.71  0.35 2 0.83 BVL chain shift : new items : new male −0.41 0.67 −0.6 0.54

Note. All factors were coded as sum contrasts. For the log-likelihood comparisons, brackets indicate the predictor that was dropped for each subset model (along with all higher-order terms containing that 2 2 predictor); χ∆ = χ -test over the difference in model deviance (-2 × ∆log−likelihood); degrees of freedom (df) = difference in number of model parameters (i.e., the number of levels of the omitted factor plus all higher-order interactions containing that factor); p 2 = significance level for difference in model χ∆ deviance given the degrees of freedom. performance was exposure condition: as shown in Figure 2.7, listeners who were familiar- ized to the novel back vowel lowered accent endorsed more of the back vowel lowered items as words, compared to listeners in the control condition. Neither item status nor talker contributed significantly to model fit, nor did any of the interaction terms involving these predictors. However, there was a non-significant trend suggesting that the exposure-driven difference in endorsement rates was marginally smaller for the new back vowel lowered items than the trained items.

These results suggest that learning of the trained talker’s accent generalized across both words and talkers regardless of whether the trained and new talkers were acoustically similar in terms of their vowel productions. However, it is possible that the aggregate patterns of lexical decision performance were due to rapid adaptation to the new talkers at test, rather

52 Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

0.8 ● ● ● 0.6 ● ● ● ● 0.4 ● ● ● ● ●

0.2 Proportion response 'word' 0.0 trained new trained new trained new Item Status

Figure 2.7: Experiment 1. Mean proportion of ‘word’ responses for target back vowel lowered items by item status, exposure condition, and test talker. Large points indicate condition grand means. Error bars indicate bootstrapped 95% confidence intervals. Small points indicate subject-wise condition means (jittered and transparent to show overlap). than word-independent and talker-independent learning during the exposure phase. That is, the learning mechanism that enables listeners to remap their vowel space on the basis of exposure might be talker-specific but operate quickly enough that the effects of exposure become apparent after hearing only a few isolated instances of back vowel lowered items from the new talkers at test. If so, participants in the adaptation and control conditions should show comparable endorsement rates for back vowel lowered items at the beginning of the new talker test blocks, followed by a rapid endorsement increase among participants in the adaptation condition.

To assess the possibility of rapid adaptation at test, a mixed logit model was fitted to lexical decisions on target trials, with fixed effects for exposure condition, talker, target trial (1-30 for each talker; centered to reduce collinearity) and all interactions. Note that item and trial were dissociated in the experimental design by using multiple test lists with different item orders. Random intercepts were specified for subjects and items, along with

53 a by-subject random slope for talker and a by-item random slope for exposure condition.

Table 2.5 summarizes the full model. As in the main analysis reported in Table 2.4, there was an overall effect of exposure condition, but no effect of talker, nor any interaction between exposure condition and talker. There was a numeric trend of trial: as shown in Figure 2.8, listeners tended to endorse more of the back vowel lowered items as the task progressed, independent of exposure condition, indicating a modest degree of adaptation during the test phase. The interaction between talker and trial was significant, indicating that the effect of trial was smaller than average for the new male talker, though not for the new female talker.

There was a marginal interaction between exposure condition and trial: the slope of this interaction indicates that listeners in the back vowel lowered condition showed a smaller trial-wise endorsement increase overall than listeners in the control condition. Finally, there was a marginal three-way interaction of exposure condition, talker and trial: the slope for the effect of trial was marginally smaller for the new female talker after exposure to the back vowel lowered accent than after exposure to the standard sounding American English accent, but comparable between exposure conditions for the new male talker. Taken together, the results of the trial-wise analysis provide no evidence that listeners in the back vowel lowered exposure condition rapidly adapted to the new talkers at test. As shown in Figure 2.8, the effect of exposure was present from the first target trial from each test talker and persisted across the test phase. In sum, the lexical decision data indicate that exposure to the back vowel lowered accent caused word forms that were otherwise perceived as nonwords (e.g.,

“w[o]den” for wooden) to be perceived as real words, and this effect transferred across both words and talkers.

Word identification accuracy

The word identification data complement the lexical decision data by indicating whether listeners were able to map the unfamiliar pronunciation variants to the correct lexical repre- sentations in memory, as opposed to listeners perceiving a stimulus as sufficiently word-like

54 Table 2.5: Experiment 1, summary of the full by-trial mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.30 0.28 −1.1 0.29 exposure (= BVL chain shift) 1.53 0.42 3.7 <.001 20.37 6 < .01 talker (= new female) 0.01 0.25 0.0 0.98 talker (= new male) −0.10 0.27 −0.4 0.71 trial 0.01 0.01 1.7 0.10 BVL chain shift : new female 0.29 0.50 0.6 0.56  7.31 4 0.12 BVL chain shift : new male −0.72 0.55 −1.3 0.19 BVL chain shift : trial −0.02 0.01 −1.9 0.06 6.78 3 0.08 new female : trial −0.01 0.02 −0.5 0.62  16.41 4 < .01 new male : trial −0.04 0.02 −2.3 <.05 BVL chain shift : new female : trial −0.06 0.03 −1.9 0.06  5.10 2 0.08 BVL chain shift : new male : trial 0.00 0.03 0.1 0.90

Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

0.8 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● 0.6 ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● 0.4 ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● 0.2 ● ● ● ● ● ● Proportion response 'word'

0.0 ● 1 10 20 30 1 10 20 30 1 10 20 30 Target trial during lexical decision

Figure 2.8: Experiment 1. Mean trial-wise proportion of ‘word’ responses for the target back vowel lowered items during the lexical decision task as a function of exposure condition and talker. Points indicate trial-wise means. Regression lines indicate binomial best fit curves (curvature is not observable because the proportion range for each effect is small). Error ribbons indicate bootstrapped trial-wise 95% confidence intervals.

55 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2 Proportion correct identification 0.0 unshifted trained new unshifted trained new unshifted trained new back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel items lowered lowered items lowered lowered items lowered lowered items items items items items items

Figure 2.9: Experiment 1. Mean word identification accuracy as a function of exposure condition, item type and test talker. Error bars indicate bootstrapped 95% confidence intervals on condition means. to yield a ‘word’ response without actually recognizing the intended word. Figure 2.9 shows the mean identification accuracy by exposure condition and talker for the three types of items comprising this task: the trained and untrained back vowel lowered items and the words containing unshifted back vowels. Consistent with the lexical decision data, identi-

fication accuracy was greater for the trained and new back vowel lowered items from each talker following exposure to this accent, relative to the control condition.

A mixed logit model was fitted to the accuracy data (correct response = 1, incorrect re- sponse = 0), with fixed effects for exposure condition, item type, talker, and all interaction terms. This model contained random intercepts for subjects and items, by-subject random slopes for item type and talker, and a by-item random slope for exposure condition. Table

2.6 summarizes the fixed effect estimates for the full model and the results of log-likelihood comparisons between this model and each subset model. There was a significant main ef- fect of item type: identification accuracy was lower for both the trained and untrained back

56 Table 2.6: Experiment 1, summary of the full mixed logit model of word identification accuracy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z- score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 1.72 0.25 6.9 <.001 exposure (= BVL chain shift) 0.92 0.30 3.1 <.01 16.97 9 <.05 item set (= trained BVL items) −1.91 0.64 −3.0 <.01  50.23 12 <.001 item set (= new BVL items) −2.44 0.54 −4.5 <.001 talker (= new female) 0.02 0.34 0.1 0.96  18.86 12 0.09 talker (= new male) −0.80 0.33 −2.5 <.05 BVL chain shift : trained BVL items 1.36 0.51 2.7 <.01  10.01 6 0.12 BVL chain shift : new BVL items 0.12 0.41 0.3 0.77 BVL chain shift : new female −0.18 0.44 −0.4 0.68  7.01 6 0.32 BVL chain shift : new male −0.30 0.41 −0.7 0.47 trained BVL items : new female −0.91 0.89 −1.0 0.31    new BVL items : new female 0.55 0.77 0.7 0.48 7.52 8 0.48 trained BVL items : new male −0.65 0.88 −0.7 0.46   new BVL items : new male 0.14 0.76 0.2 0.85 BVL chain shift : trained BVL items : new female 0.39 0.99 0.4 0.69    BVL chain shift : new BVL items : new female −0.34 0.87 −0.4 0.70 5.19 4 0.26 BVL chain shift : trained BVL items : new male 1.67 0.97 1.7 0.08   BVL chain shift : new BVL items : new male 0.10 0.85 0.1 0.90 vowel lowered items than for the words containing unshifted back vowel tokens. There was also a main effect of exposure condition: identification accuracy was greater among listeners who were familiarized to the back vowel lowered accent than among listeners in the control condition who heard the Midland accent version of the exposure passage. There was a marginal effect of talker, indicating that identification accuracy was lower than average for the new male talker. None of the interaction terms contributed significantly to model fit.

However, there was a numeric trend suggesting an interaction between exposure condition and item type: as shown in Figure 2.9, the effect of exposure condition on identification accuracy tended to be larger for the back vowel lowered items than the words pronounced with unshifted back vowels, regardless of talker. Thus, listeners in the back vowel low- ered condition were able to leverage experience with the trained talker’s accent to facilitate

57 recognition of back vowel lowered word forms produced by the new talkers. Further, expo- sure to the back vowel lowered accent did not impair recognition of words pronounced with unshifted back vowels.

2.2.3 Discussion

The results of Experiment 1 replicate and extend the findings by Maye et al. (2008) and

Weatherholtz (2013) using similar high token variability exposure materials. Converging evidence from lexical decision and word identification responses indicated that passive fa- miliarization to a talker with the novel “back vowel lowered” chain shift was sufficient for listeners in the adaptation condition to correctly recognize accent-consistent word forms that otherwise tended to be perceived as nonwords (e.g., “w[o]den” for wooden, “n[A]se” for nose). This recognition benefit, relative to performance in the control condition, extended to new back vowel lowered words produced by the trained talker (i.e., lexical items that did not occur during the exposure phase), without adversely affecting the recognition of words produced with unshifted back vowels. The finding of generalization to new words in- dicates that the locus of learning was sublexical (Sjerps & McQueen, 2010; McQueen et al.,

2006). Thus, consistent with the findings reported by Maye et al. (2008) and Weatherholtz

(2013), listeners remapped their perceptual vowel space to accommodate the unfamiliar cross-category vowel variants, rather than learning a set of lexically-specific pronunciations, and this remapping did not involve overwriting existing knowledge of standard sounding vowel variants.

Going beyond previous findings, the results of Experiment 1 demonstrated that the effect of accent exposure generalized simultaneously across words and talkers. That is, compared to listeners who heard a standard sounding American English accent during the pre-test exposure phase, listeners in the adaptation condition were better able to recognize new back vowel lowered words produced by the two new test talkers: the new female talker whose vowel space was acoustically similar to the trained talker, and the new male talker

58 with an acoustically dissimilar vowel space. This word recognition benefit was present from the first back vowel lowered trial from each new talker at test. Thus, performance in the adaptation condition was driven by perceptual adjustments that occurred while listening to the trained talker, rather than by rapid adaptation to the new talkers at test. Taken together, the findings of simultaneous generalization across words and talkers following single talker exposure conditions indicate that perceptual learning of the back vowel lowered accent involved adjusting the mapping between vowel forms and abstract talker-independent vowel categories.

These findings have several implications for a theory of adaptive speech processing.

First, an emerging perspective in the adaptation literature is that multi-talker exposure conditions are necessary for perceptual learning effects to transfer across talkers (Bradlow

& Bent, 2008; Lively et al., 1993), except when the trained and new talkers are “sufficiently similar” along dimensions of speech that are relevant for the pronunciation variant(s) being learned (Reinisch & Holt, 2014; Kraljic & Samuel, 2007, 2005; Eisner & McQueen, 2005).

The general argument is that multi-talker exposure conditions facilitate talker-independent category abstraction by providing listeners with the necessary input to distinguish general patterns of pronunciation variation from talker-specific or otherwise idiosyncratic variability among category exemplars (Sidaras et al., 2009). However, in cases where the trained and new talkers are sufficiently similar, single talker exposure conditions can result in a limited degree of cross-talker transfer due to what Goldinger(1996) calls “residual savings”

(p. 1179) in spoken word recognition. According to this account, episodic representations of spoken words are stored in memory in the form of procedural records: that is, detailed records of the process by which spoken stimuli were mapped to lexical entries in the lexicon.

When listeners experience new stimuli that are similar in form to previous stimuli (e.g., an exact stimulus repetition, an acoustically similar word form produced by a similar-sounding talker), the overlap in form enables listeners to access the procedural record for the past stimulus and reuse this record to facilitate the process of matching the new stimulus to

59 a lexical representation in memory. The current results require a more nuanced account of constraints on cross-talker generalization. The current results showed that perceptual learning of the back vowel lowered accent based on exposure to a single talker transferred across talkers, regardless of the acoustic similarity between the trained and new talkers in terms of their vowel productions. Thus, talker similarity, at least as operationalized in the current study, is insufficient as an explanatory mechanism for cross-talker generalization.

Second, the finding of generalization to new words from new talkers indicates that the recognition benefit for back vowel lowered items produced by these talkers cannot be explained by a general lexical repetition effect (i.e., recognition of pronunciations like

“w[o]den” for wooden during the exposure passage facilitates later recognition of these vari- ant forms when produced by different talkers).

Regarding cross-talker generalization, it is possible that the super high token variability learning environment functioned similarly to multi-talker input conditions. That is, with exposure to more than 650 acoustically and lexically variable back vowel tokens over the course of a 20-minute passage, listeners in the adaptation condition abstracted the under- lying pattern of cross-category vowel variation, independent of fine details of the trained talker’s vowel space. If the current results are driven by the high variability exposure condi- tions, then markedly reducing the degree of token variability during exposure should reduce the extent of generalization at test. Experiment 2 was designed to test this possibility.

2.3 Experiment 2

The goal of this experiment was to test whether the patterns of generalization across words and talkers observed in Experiment 1 were dependent on high token variability exposure.

Experiment 2 was identical to Experiment 1, except the exposure materials were modified to create exposure conditions characterized by substantially less variability among the target back vowel tokens. This manipulation involved reducing the number of target back vowel

60 tokens that listeners experienced, the number of unique lexical contexts in which those tokens occurred, and the range of acoustic variability in the realization of those tokens.

Specifically, the 20-minute passage from Experiment 1 was cut to a 6-minute passage that contained fewer than one-third the number of back vowel tokens distributed across roughly half the number of lexical contexts, and the realization of these tokens was characterized by a narrower range of acoustic variability in F1 x F2 space.

2.3.1 Method

Participants

A total of 107 undergraduates at The Ohio State University participated in Experiment

2 in exchange for partial course credit. Of these participants, 34 were excluded because they were not native monolingual English speakers with normal speech and hearing, and

2 participants were excluded either for technical issues with the experiment or for failing to follow instructions on how to use the button box. An additional 7 participants were excluded for demonstrating exceptionally low accuracy on non-target trials during lexical decision (<70% accuracy in rejecting maximal nonwords). After exclusions, there were 64 usable participants evenly split between exposure conditions (the adaptation and control conditions) and test conditions (generalization to a new similar-sounding talker vs. a new dissimilar talker).

Exposure materials

The exposure materials for Experiment 2 comprised corresponding 6-minute excerpts from the back vowel lowered accent and standard-sounding Midland accent versions of the 20- minute exposure passage used in Experiment 1 (see Appendix A.2 for the full text of the

6-minute exposure passage). In total, each version of the exposure passage contained 184 stressed back vowel tokens distributed across 85 unique lexical contexts (cf. 654 tokens across 180 lexical contexts in Experiment 1). As shown in Table 2.1 above, the number of

61 u, ʊ, o, (n = 41; l.c. = 23) (n = 28; l.c. = 9) (n = 67; l.c. = 26) 200 u ʊ 400u uuuuuuuu u uu o uu uuuuuuuuuuuuuu uuuu o u uu uuuuu u uu u ʊʊʊʊ ʊ uuu uuu uu u ʊʊʊʊʊʊʊ ʊʊʊʊ ʊʊʊʊʊʊ o oo oo o 600 uuuuu u u ʊʊʊʊ ʊ ʊʊʊʊʊʊʊʊ o o oooooo uuuuuuu ʊʊ ʊʊ ʊ oooooooooooooooo u u oooooooooooo o 800 u ooooooooo oo u oooooooooo o ooooooooooooo o oooooo oo 1000 oo o oo o 1200 ɔ, ɑ, (n = 18; l.c. = 11) (n = 30; l.c. = 16) 200

400 Exposure Condition 600 Midland AmEng accent

F1 (Hz) at vowel midpoint F1 (Hz) at vowel ■ ɔ ɔ ɔ ɑ ɑɑ ɑ ɑ 800 ɔ ɔɔ ɑɑɑɑ ɑɑɑ ɑɑɑɑ ■ BVL chain shift ɔ ɔ ɔɔɔɔɔ ɑɑɑɑ ɑɑɑɑɑɑɑɑɑɑ ɔɔɔ ɔɔɔɔɔɔ ɑɑɑ ɑɑɑɑɑɑɑ ɑ 1000 ɔɔɔɔɔ ɔ ɑɑ ɑ ɑɑ ɔ ɑ ɑ 1200 2500 2000 1500 1000 2500 2000 1500 1000 F2 (Hz) at vowel midpoint

Figure 2.10: Experiment 2. Midpoint F1 and F2 for all stressed back vowel tokens in the medium token variability exposure passage (n = the number of tokens of each vowel, and l.c. = the number of unique lexical contexts in which those tokens occurred). tokens of each vowel category was reduced by at least one-third, compared to the materials in Experiment 1, and the number of unique lexical contexts in which these these tokens occurred was nearly cut in half. The scatterplots in Figure 2.10 show the midpoint F1 and F2 of all stressed tokens of each back vowel category. While there is still considerable variability among the tokens of each vowel in F1 x F2 space, the variance is smaller than in the corresponding materials used in Experiment 1 (see Table 2.2 for comparison of vowel variation in F1 x F2 space across the exposure materials used in each experiment).

Test materials

The materials used for the auditory lexical decision task and the word identification task were identical to Experiment 1, including the set of trained back vowel lowered items. That is, in creating the exposure materials for Experiment 2, the original 20-minute passage was

62 markedly shortened, but all of the lexical items comprising the set of trained target items still occurred during the passage.

Design and procedure

The design and procedure were identical to Experiment 1 with one exception: during the exposure phase, participants listened to a 6-minute passage spoken in either the novel back vowel lowered accent or a standard-sounding Midland accent, rather than the corresponding

20-minute passages used in Experiment 1.

Coding and analysis

The coding and analysis procedures were identical to Experiment 1. For the word identi-

fication data, the edit distance between each typed response and the corresponding target word was again used to estimate the number of responses that were scored as incorrect due to minor spelling-related issues. Only 4.4% of responses had an edit distance of 1, indicating a limited number of potential spelling-related issues.

2.3.2 Results

Lexical decision endorsement rates

Figure 2.11 shows the mean endorsement rate for target items (the back vowel lowered items) and fillers (words pronounced with unshifted vowels and maximal nonwords) by exposure condition and test talker. As in Experiment 1, listeners who were familiarized to the back vowel lowered accent endorsed numerically more of the back vowel lowered items as words, compared to listeners in the control condition (an increase of ∼22-28% across talkers), whereas the endorsement rates for each filler item type were nearly identical across exposure conditions and talkers.

To test for generalization across words and talkers, lexical decisions to target items were

fitted with a mixed logit model containing fixed effects for exposure condition, item status,

63 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2

Proportion response 'word' 0.0 unshifted back vowel maximal unshifted back vowel maximal unshifted back vowel maximal vowel lowered nonwords vowel lowered nonwords vowel lowered nonwords items items items items items items

Figure 2.11: Experiment 2. Mean proportion of ‘word’ responses by item type, exposure condition (BVL = back vowel lowered), and test talker. Error bars indicate bootstrapped 95% confidence intervals. talker and all interactions. Random intercepts were specified for subjects and items. By- subject random slopes were specified for item status and talker, and by-item random slopes were specified for exposure condition, talker and their interaction. Table 2.7 summarizes the fixed effect parameter estimates from the full model and the results of log-likelihood comparisons testing whether each fixed effect term contributed significantly to model fit.

As in Experiment 1, the only significant predictor of listeners’ lexical decisions was exposure condition: as shown in Figure 2.12, listeners who were familiarized to the novel accent judged a significantly greater proportion of the target back vowel lowered items as words overall, independent of item status and talker, than participants in the control exposure condition.

Item status and talker did not contribute significantly to model fit, either as main effects or interactions, which suggests that perceptual learning of the novel back vowel lowered accent was both word- and talker-independent. There was a trend suggesting that the exposure- driven endorsement difference was somewhat smaller for the new back vowel lowered items than the trained items (see Figure 2.12). While the exposure by item status interaction did

64 Table 2.7: Experiment 2, summary of the full mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.35 0.33 −1.1 0.29 exposure (= BVL chain shift) 1.99 0.44 4.6 <.001 28.57 6 <.001 item status (= new items) −0.62 0.50 −1.2 0.22 7.13 6 0.31 talker (= new female) −0.22 0.40 −0.6 0.58  6.48 8 0.59 talker (= new male) −0.33 0.34 −1.0 0.33 BVL chain shift : new items −0.61 0.27 −2.2 <.05 4.60 3 0.20 BVL chain shift : new female −0.54 0.73 −0.7 0.46  1.49 4 0.83 BVL chain shift : new male 0.04 0.57 0.1 0.94 new items : new female 0.24 0.46 0.5 0.59  1.51 4 0.82 new items : new male 0.26 0.50 0.5 0.61 BVL chain shift : new items : new female −0.13 0.69 −0.2 0.85  0.04 2 0.98 BVL chain shift : new items : new male 0.05 0.68 0.1 0.94 not contribute significantly to model fit, this trend was consistent across experiments (see the corresponding results for Experiment 1 in Table 2.4). This trend can be interpreted either as a slight processing benefit for the trained items in the adaptation condition (e.g., a repetition priming effect) or as a slight processing cost for transferring learning to untrained lexical items.

To test whether the finding of cross-talker generalization was due to rapid adaptation to the new talkers at test, rather than talker-independent learning during the exposure phase, a mixed logit model was fitted to lexical decisions on the target back vowel lowered trials with fixed effects for exposure condition, talker, target trial (1-30 for each talker; centered to reduce collinearity) and all interaction terms. The random effects structure comprised random intercepts for subjects and items, a by-subject random slope for talker, and a by-item random slope for exposure condition. The fixed effect parameter estimates are summarized in Table 2.8. As in the main analysis reported in Table 2.7, there was an overall effect of exposure condition, but no effect of talker, nor any interaction between

65 Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

0.8 ● ● 0.6 ● ● ● ● 0.4 ● ● ● ● ● ● 0.2 Proportion response 'word' 0.0 trained new trained new trained new Item Status

Figure 2.12: Experiment 2, mean proportion of ‘word’ responses for target back vowel lowered items by item status, exposure condition, and test talker. Large points indicate condition grand means. Error bars indicate bootstrapped 95% confidence intervals. Small points indicate subject-wise condition means (jittered and transparent to show overlap). exposure condition and talker. Further, there was a main effect of trial: as shown in Figure

2.13, the endorsement of back vowel lowered items increased over the course of the lexical decision task, regardless of exposure condition or talker. Crucially, however, there were no significant interactions involving trial and either exposure condition or talker. Thus, the trial-wise endorsement increase was not significantly greater for any of the test talkers following exposure to the back vowel lowered accent than in the control condition. Further, as shown in Figure 2.13, the exposure-driven endorsement increase was present from the

first target trial from each test talker and persisted throughout the lexical decision task.

Together with the results from Experiment 1, the lexical decision results from Exper- iment 2 provide further evidence for robust word- and talker-independent learning of the novel back vowel lowered accent during the exposure phase.

66 Table 2.8: Experiment 2, summary of the full by-trial mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.42 0.29 −1.5 0.14 exposure (= BVL chain shift) 1.75 0.40 4.4 <.0001 27.9 6 <.001 talker (= new female) −0.24 0.32 −0.8 0.44  12.39 8 0.13 talker (= new male) −0.19 0.24 −0.8 0.44 trial 0.02 0.01 3.2 <.01 15.08 6 <.05 BVL chain shift : new female −0.57 0.62 −0.9 0.36  3.78 4 0.44 BVL chain shift : new male 0.02 0.48 0.0 0.97 BVL chain shift : trial −0.02 0.01 −1.5 0.14 3.73 3 0.29 new female : trial 0.02 0.02 1.4 0.17  3.57 4 0.47 new male : trial −0.01 0.02 −0.5 0.60 BVL chain shift : new female : trial −0.03 0.03 −1.1 0.29  1.45 2 0.48 BVL chain shift : new male : trial 0.04 0.03 1.2 0.25

Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ●●● ● ● ● ●● ● 0.4 ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

Proportion response 'word' ● ● 0.0 1 10 20 30 1 10 20 30 1 10 20 30 Target trial during lexical decision

Figure 2.13: Experiment 2. Mean trial-wise proportion of ‘word’ responses for the target back vowel lowered items during the lexical decision task as a function of exposure condition and talker. Points indicate trial-wise means. Regression lines indicate binomial best fit curves (curvature is not observable because the proportion range for each effect is small). Error ribbons indicate bootstrapped trial-wise 95% confidence intervals.

67 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2 Proportion correct identification 0.0 unshifted trained new unshifted trained new unshifted trained new back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel items lowered lowered items lowered lowered items lowered lowered items items items items items items

Figure 2.14: Experiment 2. Mean word identification accuracy as a function of exposure condition, item type and test talker. Error bars indicate bootstrapped 95% confidence intervals on condition means.

Word identification accuracy

Figure 2.14 shows the mean word identification accuracy by exposure condition and talker for the three types of items comprising this task: the words containing unshifted back vowels and the trained and untrained back vowel lowered items. The accuracy data (correct response = 1, incorrect response = 0) were fitted with a mixed logit model containing fixed effects for exposure condition, item type, talker, and all interaction terms, random intercepts for subjects and items, by-subject random slopes for item type and talker, and a by-item random slope for exposure condition. Table 2.9 shows the fixed effect parameter estimates for the full model and the results of log-likelihood comparisons between this model and each subset model. There was a significant main effect of item type: identification accuracy was lower for both the trained and untrained back vowel lowered items than for the words containing unshifted back vowel tokens. As in Experiment 1, there was also a significant main effect of exposure condition: identification accuracy was higher among listeners who

68 Table 2.9: Experiment 2, summary of the full mixed logit model of word identification accuracy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z- score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 1.52 0.24 6.5 <.001 exposure (= BVL chain shift) 1.05 0.28 3.8 <.001 item set (= trained BVL items) −1.94 0.60 −3.2 <.01 item set (= new BVL items) −2.34 0.51 −4.6 <.001 talker (= new female) −0.14 0.31 −0.5 0.65  15.28 12 0.23 talker (= new male) −0.45 0.31 −1.4 0.15 BVL chain shift : trained BVL items 1.95 0.45 4.3 <.001  20.15 6 <.01 BVL chain shift : new BVL items 0.77 0.34 2.2 <.05 BVL chain shift : new female 0.11 0.39 0.3 0.78  3.46 6 0.75 BVL chain shift : new male −0.30 0.39 −0.8 0.45 trained BVL items : new female −1.76 0.85 −2.1 <.05    new BVL items : new female 1.14 0.72 1.6 0.12 10.86 8 0.21 trained BVL items : new male −0.65 0.85 −0.8 0.44   new BVL items : new male −0.10 0.72 −0.1 0.89 BVL chain shift : trained BVL items : new female 0.52 0.97 0.5 0.59    BVL chain shift : new BVL items : new female −0.91 0.81 −1.1 0.26 3.39 4 0.50 BVL chain shift : trained BVL items : new male −0.05 0.95 −0.1 0.96   BVL chain shift : new BVL items : new male 1.36 0.80 1.7 0.09 were familiarized to the back vowel lowered accent than among listeners in the control condition. Further, there was a significant interaction between exposure condition and item type: the exposure-driven accuracy increase was greater than average for the trained and new back vowel lowered items. The word identification data complement the lexical decision data, indicating that listeners in the adaptation condition not only perceived more of the back vowel lowered items as words, compared to listeners in the control condition, they were also better able to recognize the intended lexical items.

69 2.3.3 Discussion

The results of Experiment 2 were nearly identical to the results of Experiment 1, despite the considerable reduction in both the number of back vowel tokens that listeners experi- enced prior to test and the degree of lexical and acoustic variability among these tokens.

Converging evidence from the lexical decision and word identification tasks showed that passive familiarization to a single talker with the novel back vowel lowered accent improved the recognition of words produced in this accent (e.g., “w[o]den” for wooden), relative to

performance in the control condition. In particular, this exposure-driven recognition bene-

fit was independent of (i) whether the test words pronounced with the cross-category back

vowel shifts were trained (i.e., occurred during the pre-test exposure passage) or new; (ii)

whether the back vowel lowered items were produced by the trained talker or a new talker;

and (iii) whether the trained and new talkers were similar in terms of the acoustics of their

vowel productions. Thus, as in Experiment 1, listeners remapped their vowel space talker-

independently, allowing learning of the cross-category back vowel variants to generalize

simultaneously across words and talkers.

From one perspective, the finding of nearly identical results across Experiments 1 and

2 is unexpected. Learning environments characterized by high token variability are known

to facilitate pattern abstraction and extension (Lively et al., 1993; Greenspan et al., 1988;

Posner & Keele, 1968). Thus, it was expected that markedly reducing the degree of token

variability experienced prior to test would reduce generalization. The results of Experiment

2 showed no evidence of reduced generalization. In fact, judging by the mixed logit param-

eter estimates, the word- and talker-independent learning effects appeared to be somewhat

stronger in Experiment 2 than in Experiment 1: that is, in both experiments the only sig-

nificant predictor of lexical decision performance was the main effect of pre-test exposure

condition, but the size of this effect was larger in Experiment 2 than in Experiment 1 (βexp2

= 1.99, SEexp2 = 0.44 vs. βexp1 = 1.68, SEexp1 = 0.44).

70 However, findings from research on foreign accent adaptation suggest that the nearly identical results in Experiments 1 and 2 may not be so surprising because adaptation hap- pens rapidly and can reach plateau within the first few minutes of exposure. In one study,

Clarke and Garrett (2004) used a cross-modal matching task to investigate adaptation to foreign-accented English, measuring perceptual processing time as the latency for judging whether a visual probe word matched the final word in the preceding spoken sentence. Pro- cessing speed was initially slower for foreign accented speech than for native speech, but this difference diminished significantly within one minute of accent exposure and was effec- tively absent by the end of the experiment. These findings provide a lens for interpreting the results of Experiments 1 and 2. The single talker exposure conditions in Experiment

2—that is, nearly 200 naturally produced and hence acoustically variable back vowel tokens experienced in running speech over the course of six minutes—were sufficient to yield ro- bust word- and talker-independent learning of the novel cross-category vowel variants. The

20-minute exposure materials used in Experiment 1, which contained three times as many target back vowel tokens and a greater range of token variability, had no additive influ- ence on recognition of the back vowel lowered word forms, suggesting that learning reached asymptote during the first few minutes of exposure. It is still possible that token variability and talker similarity influence the specificity of perceptual learning of cross-category vowel variation in single talker learning environments; however, the results of Experiments 1 and

2 suggest that such constraints may only be evident during the early phases of adaptation.

Experiment 3 was designed to investigate this possibility.

2.4 Experiment 3

The goal of Experiment 3 was to provide a stricter test of input variability on the specificity of perceptual learning of systemic cross-category vowel variation. Experiment 3 was iden- tical to Experiments 1 and 2, except the exposure phase was a brief 2-minute passage that

71 contained roughly ten times fewer tokens of the target back vowels than in the 20-minute passage used in Experiment 1, and hence less than one-third the number of tokens compared to Experiment 2. These tokens were distributed across a limited set of lexical contexts and were characterized by the narrowest range of variability in F1 x F2 space, compared to the materials used in Experiments 1 and 2.

2.4.1 Method

Participants

A total of 91 undergraduates at The Ohio State University participated in Experiment 3 in exchange for partial course credit. Of these participants, 22 were excluded because they were not native monolingual English speakers with normal speech and hearing, and one was excluded due to technical failures during the experiment. An additional four participants were excluded for exhibiting exceptionally low accuracy on non-target trials during lexical decision (<70% accuracy in rejecting maximal nonwords). After exclusions, there were 64 usable participants evenly split between the adaptation condition and control condition.

Exposure materials

The exposure materials for Experiment 3 were corresponding 2-minute excerpts from the back vowel lowered accent and standard-sounding Midland accent versions of the 20-minute exposure passage used in Experiment 1 (see Appendix A.3 for the full text of the 2-minute exposure passage). Each version of the passage contained a total of 72 stressed tokens of the target back vowels (/u, U, o, O, A/), and these tokens were distributed across 51 unique lexical contexts (cf. 654 tokens across 180 lexical contexts in Experiment 1, and 184 tokens across

85 lexical contexts in Experiment 2). As shown in Table 2.1 above, the exposure materials contained approximately one-tenth the number of stressed tokens of each target back vowel category, compared to the original 20-minute exposure materials, and these tokens were

72 u, ʊ, o, (n = 21; l.c. = 15) (n = 12; l.c. = 7) (n = 21; l.c. = 14) 200 uu 400 uuu uuuuuuu u u u uuu uu u u ʊ ʊ ʊ uuuuuu u u ʊ ʊʊ ʊʊʊʊʊ o oo o 600 u uu u ʊʊʊ ʊ ʊ ʊ o o ooo uuuu ʊʊ ʊ ʊ ooo oo ooo ooo 800 o o o u oo oo oooooooooo 1000 oo

1200 ɔ, ɑ, (n = 5; l.c. = 5) (n = 13; l.c. = 10) 200

400 Exposure Condition 600 Midland AmEng accent

F1 (Hz) at vowel midpoint F1 (Hz) at vowel ■ ɔ ɑ 800 ɔ ɔ ɑ ɑ ɑ ɑ ■ BVL chain shift ɔɔɔ ɑ ɑɑɑɑɑ ɑɑ ɔ ɑ ɑ ɑ ɑɑɑɑɑ 1000 ɔ ɑɑ 1200 2500 2000 1500 1000 2500 2000 1500 1000 F2 (Hz) at vowel midpoint

Figure 2.15: Experiment 3. Midpoint F1 and F2 for all stressed back vowel tokens in the low token variability exposure passage (n = the number of tokens of each vowel, and l.c. = the number of unique lexical contexts in which those tokens occurred). distributed across approximately one-third the number of unique lexical contexts.4 The scatterplots in Figure 2.15 show the midpoint F1 and F2 for all stressed tokens of each back vowel category. Since the tokens were naturally produced, there is still considerable acoustic variability in F1 x F2 space, but this variance is much smaller than in the corresponding materials used in Experiments 1 and 2 (see Table 2.2 above).

Test materials

The materials used for the auditory lexical decision task and the word identification task were identical to those in Experiments 1 and 2, including the set of trained back vowel lowered items, which occurred in the 2-minute exposure passage used in Experiment 3.

4The token reduction for /u/ was smaller than for the other vowels in order to preserve coherence of the passage content. Certain grammatically important /u/-words like you occurred frequently in the passage, and there were limits on how many of these instances could be removed without adversely affecting the clarity of the story.

73 Design and procedure

The design and procedure were identical to Experiments 1 and 2, except during the exposure phase participants listened to a 2-minute low token variability passage spoken in either the novel back vowel lowered accent or a standard-sounding Midland accent, rather than the lengthier and more variable versions of these passages used in the previous experiments.

Coding and analysis

The coding and analysis procedures were identical to Experiments 1 and 2. For the word identification data, the edit distance between each typed response and the corresponding target word was again used to estimate the number of responses that were scored as incorrect due to minor spelling-related issues. Less than 4% of responses had an edit distance of 1, indicating a limited number of potential spelling-related issues.

2.4.2 Results

Lexical decision endorsement rates

Figure 2.16 shows the mean endorsement rate for target items (the back vowel lowered items) and fillers (words pronounced with unshifted vowels and maximal nonwords) by exposure condition and test talker. As in Experiments 1 and 2, listeners in the adaptation condition endorsed numerically more of the back vowel lowered items as words, compared to listeners in the control condition (an increase of ∼14-27% across talkers), whereas the endorsement rates for each filler item type were nearly identical across exposure conditions and talkers.

To assess the effects of exposure condition, item status and test talker on perception of the back vowel lowered items, a mixed logit model was fitted to lexical decisions on target trials with fixed effects for these three factors and all interaction terms. The random effects structure comprised random intercepts for subjects and items, by-subject random slopes

74 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2

Proportion response 'word' 0.0 unshifted back vowel maximal unshifted back vowel maximal unshifted back vowel maximal vowel lowered nonwords vowel lowered nonwords vowel lowered nonwords items items items items items items

Figure 2.16: Experiment 3. Mean proportion of ‘word’ responses by item type, exposure condition (BVL = back vowel lowered), and test talker. Error bars indicate bootstrapped 95% confidence intervals. for item status and talker, and by-item random slopes for exposure condition, talker, and the interaction term. The results of the full model are summarized in Table 2.10. As in

Experiments 1 and 2, there was a main effect of exposure condition (as indicated by the

Wald’s z-score, since log-likelihood comparisons revealed a significant interaction involving exposure condition). The main effect of exposure condition confirmed that listeners who were familiarized to the back vowel lowered accent endorsed more of the back vowel lowered items as words than listeners in the control condition. There was also a significant three- way interaction of exposure condition, item status, and talker: as shown in Figure 2.17, the exposure-driven endorsement difference was larger than average for the new words produced by the new acoustically-similar female talker but was smaller than average for the new words produced by the new acoustically-dissimilar male talker. This three-way interaction suggests that listeners’ ability to generalize learning simultaneously across words and talkers was constrained by the degree of similarity between the trained and new talkers.

75 Table 2.10: Experiment 3, summary of the full mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.27 0.28 −1.0 0.34 exposure (= BVL chain shift) 1.47 0.36 4.0 <.0001 item status (= new items) −0.70 0.44 −1.6 0.11 talker (= new female) −0.40 0.30 −1.4 0.17 talker (= new male) 0.21 0.29 0.7 0.47 BVL chain shift : new items −0.21 0.26 −0.8 0.42 BVL chain shift : new female −0.66 0.54 −1.2 0.22 BVL chain shift : new male −0.28 0.50 −0.6 0.57 new items : new female 0.43 0.40 1.1 0.27 new items : new male 0.12 0.45 0.3 0.79 BVL chain shift : new items : new female 1.32 0.62 2.1 <.05  6.92 2 <.05 BVL chain shift : new items : new male −1.62 0.62 −2.6 <.01

Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

0.8 ● ● 0.6 ● ● ● ● ● ● 0.4 ● ● ● ● 0.2 Proportion response 'word' 0.0 trained new trained new trained new Item Status

Figure 2.17: Experiment 3, mean proportion of ‘word’ responses for target back vowel lowered items by item status, exposure condition, and test talker. Large points indicate condition grand means. Small points indicate subject-wise condition means (jittered and transparent to show overlap). Error bars indicate bootstrapped 95% confidence intervals for condition grand means.

76 Table 2.11: Experiment 3, summary of the full by-trial mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.37 0.26 −1.4 0.15 exposure (= BVL chain shift) 1.35 0.34 3.9 <.0001 talker (= new female) −0.32 0.24 −1.3 0.18 talker (= new male) 0.19 0.22 0.9 0.39 trial 0.01 0.01 2.7 <.01 9.48 6 0.15 BVL chain shift : new female −0.46 0.47 −1.0 0.33  10.04 4 <.05 BVL chain shift : new male −0.49 0.45 −1.1 0.28 BVL chain shift : trial 0.00 0.01 0.5 0.64 0.93 3 0.82 new female : trial 0.02 0.02 1.3 0.19  2.67 4 0.61 new male : trial −0.01 0.02 −0.8 0.43 BVL chain shift : new female : trial 0.02 0.03 0.6 0.57  0.90 2 0.64 BVL chain shift : new male : trial 0.01 0.03 0.2 0.81

To test whether the finding of cross-talker generalization was due to rapid adaptation to the new talkers at test, a mixed logit model was fitted to lexical decisions on the target back vowel lowered trials, with fixed effects for exposure condition, test talker, target trial (1-30 for each talker; centered to reduce collinearity) and all interaction terms. Random intercepts were specified for subjects and items, along with a by-subject random slope for test talker, and a by-item random slope for exposure condition. The fixed effect parameter estimates are reported in Table 2.11. As in the main analysis reported in Table 2.10, there was a significant main effect of exposure condition, but no main effect of talker. The main effect of trial did not contribute significantly to model fit, though the model parameter estimates suggested a tendency for listeners to endorse more of the back vowel lowered items as the task progressed (see Figure 2.18). There was a significant interaction between exposure and talker: releveling the talker factor revealed that the effect of exposure was greater for the trained talker than for the new talkers (β = 0.94, z = 3.2, p < .01). There were no significant two-way or three-way interactions between exposure condition, talker, and target

77 Exposure ● BVL chain shift ● Midland accent

trained female new female new male 1.0

● ● ● 0.8 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.6 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● 0.4 ● ●● ● ● ● ●●● ● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● 0.2 ● ● ● ● ● ● ● ● ● ● Proportion response 'word'

0.0 1 10 20 30 1 10 20 30 1 10 20 30 Target trial during lexical decision

Figure 2.18: Experiment 3. Mean trial-wise proportion of ‘word’ responses for the target back vowel lowered items during the lexical decision task as a function of exposure condition and talker. Points indicate trial-wise means. Regression lines indicate binomial best fit curves (curvature is not observable because the proportion range for each effect is small). Error ribbons indicate bootstrapped trial-wise 95% confidence intervals. trial, which indicates that the rate of the trial-wise endorsement increase for target items was comparable for participants in the adaptation and control conditions, regardless of test talker. Thus, the trial-wise analysis provides no support for a learning account in which listeners in the adaptation condition rapidly adapted to the new talkers at test.

Word identification accuracy

Figure 2.19 shows the mean accuracy on the word identification task by pre-test exposure condition and test talker for the three types of items comprising this task: the filler words containing unshifted back vowels and the trained and new back vowel lowered items. To assess the effect of exposure on identification accuracy, the trial-wise accuracy data (correct

= 1, incorrect = 0) were fitted with a mixed logit model containing fixed effects for exposure condition, item type, test talker and all interaction terms. This model contained by-subject

78 Exposure BVL chain shift Midland accent

trained female new female new male 1.0

0.8

0.6

0.4

0.2 Proportion correct identification 0.0 unshifted trained new unshifted trained new unshifted trained new back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel back vowel items lowered lowered items lowered lowered items lowered lowered items items items items items items

Figure 2.19: Experiment 3. Mean word identification accuracy as a function of exposure condition, item type and test talker. Error bars indicate bootstrapped 95% confidence intervals on condition means. and by-item random intercepts, by-subject random slopes for item type and talker, and a by-item random slope for exposure condition. The fixed effect parameter estimates are shown in Table 2.6, along with the results of log-likelihood comparisons testing whether each fixed effect term contributed significantly to model fit. The main effect of exposure condition was significant: identification accuracy was greater overall among participants who were familiarized to the back vowel lowered accent. The main effect of item type was also significant: accuracy was lower overall for the trained and new back vowel lowered items than for the words containing unshifted (standard sounding) back vowels. These two main effects were qualified by an exposure condition by item type interaction: as shown in Figure 2.19, the effect of exposure on identification accuracy was greater than average for the trained and new back vowel lowered items. The effect of talker did not contribute significantly to model fit, nor did any of the other interaction terms. Taken together, these

79 Table 2.12: Experiment 3, summary of the full mixed logit model of word identification accuracy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z- score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 1.63 0.22 7.4 <.001 exposure (= BVL chain shift) 0.52 0.24 2.1 <.05 item set (= trained BVL items) −2.01 0.57 −3.5 <.001 item set (= new BVL items) −2.33 0.49 −4.7 <.001 talker (= new female) 0.23 0.32 0.7 0.47  13.93 12 0.31 talker (= new male) −0.44 0.30 −1.4 0.15 BVL chain shift : trained BVL items 0.86 0.36 2.4 <.05  14.15 6 <.05 BVL chain shift : new BVL items 0.74 0.34 2.2 <.05 BVL chain shift : new female −0.20 0.45 −0.4 0.66  3.30 6 0.77 BVL chain shift : new male 0.22 0.39 0.6 0.58 trained BVL items : new female −2.48 0.84 −2.9 <.01    new BVL items : new female −0.12 0.74 −0.2 0.87 12.63 8 0.13 trained BVL items : new male −0.32 0.82 −0.4 0.70   new BVL items : new male 0.28 0.71 0.4 0.69 BVL chain shift : trained BVL items : new female 1.01 0.99 1.0 0.31    BVL chain shift : new BVL items : new female 0.43 0.91 0.5 0.64 3.29 4 0.51 BVL chain shift : trained BVL items : new male −1.64 0.90 −1.8 = 0.07   BVL chain shift : new BVL items : new male −0.28 0.81 −0.3 0.73 results indicate that the word recognition benefits resulting from exposure to the back vowel lowered accent were both word- and talker-independent.

While the three-way interaction of exposure condition, item type and talker did not contribute significantly to model fit, there was a non-significant trend suggesting that the exposure-driven accuracy difference was smaller than average for the trained items produced by the new acoustically dissimilar male talker. Thus, the word identification data provide weak support for talker similarity as a constraint on cross-talker generalization.

80 2.4.3 Discussion

The accent familiarization phase for Experiment 3 was two minutes long and contained approximately one tenth the number of back vowel tokens from the original exposure ma- terials used in Experiment 1, with these tokens distributed across far fewer lexical contexts and characterized by a narrower range of acoustic variability in F1 x F2 space. Despite this reduction in token variability, the word recognition benefits resulting from exposure to the novel back vowel lowered accent generalized across words and talkers. Thus, adapta- tion to the novel cross-category vowel shifts happened rapidly (see also Clarke & Garrett,

2004), and brief exposure was sufficient for listeners to remap their perceptual vowel space, allowing learning to generalize beyond the accented words and talker experienced prior to test.

In contrast to results from Experiments 1 and 2, listeners’ ability to generalize simultane- ously across words and talkers (i.e., to new words produced by a new talker) was constrained by the degree of acoustic similarity between the trained and new talkers in terms of their vowel productions. The lexical decision and word identification data provided converging evidence that familiarization to the trained talker with the back vowel lowered accent im- proved recognition of new back vowel lowered items produced by the new female talker with an acoustically similar vowel space, compared to performance in the control condition.

However, the lexical decision data indicated that exposure to the trained talker with the back vowel lowered accent had a limited influence on perception of the new back vowel low- ered items produced by the new acoustically-dissimilar male talker. Thus, while listeners remapped their perceptual vowel space to cope with the unfamiliar cross-category vowel shifts, this remapping was not fully talker-independent, otherwise listeners would have been able to generalize to new back vowel lowered items produced by new talkers, regardless of whether the new talker’s vowel productions were acoustically similar to those from the trained talker. The lexical decision task was blocked by talker, and listeners were instructed before the new talker block that all stimulus materials in the upcoming block were produced

81 by a different talker from the one presented up to that point in the experiment. Because of this aspect of the design, it is unlikely that performance for the new female talker was a pseudo-generalization effect based on listeners failing to distinguish productions from the trained and new female talkers (see e.g., Cutler, Eisner, McQueen, & Norris, 2010).

Findings concerning talker similarity effects in continuous recognition memory provide insight into experience-based word recognition benefits that extend to similar-sounding talkers but not to dissimilar-sounding talkers. In a now-classic study, Goldinger(1996) found that recognition of repeated words was faster and more accurate when the initial and repeated tokens were produced either by the same talker or by different talkers with perceptually similar voices, as opposed to repetitions produced by perceptually dissimilar talkers (cf. Palmeri et al., 1993). The same-voice repetition effect is one of the canonical

findings providing evidence that listeners store speech episodes in long term memory and that these episodes influence subsequent word recognition (Bradlow et al., 1999; Craik &

Kirsner, 1974; Goldinger, Kleider, & Shelley, 1999; Palmeri et al., 1993; Schacter & Church,

1992). To explain the similar-voice repetition effect, Goldinger(1996) argued that speech episodes could be stored in memory as procedural records for mapping specific instances to higher-level categories (as opposed to episodes stored as veridical representations of the speech signal). Thus, when a similar sounding episode is encountered later, the procedure for mapping this input to the correct underlying category partially overlaps with a stored record in memory, resulting in “residual savings” that can be observed behaviorally as a speed and accuracy benefit in word recognition performance (Goldinger, 1996, p. 1179).

Conversely, when the initial and repeated instances are produced by dissimilar sounding talkers, there is minimal form overlap; hence the procedural record for recognizing the initial episode is of little use, and no recognition benefit is observed. It should be noted that the focus in these continuous recognition memory experiments was on word repetition effects, which has led to the widespread, though not inherently necessary, assumption that

82 episodic encoding involves word forms, as opposed to instance-specific memory for sublexical forms.

Goldinger’s (1996) procedural record account provides a plausible explanation of the current talker similarity effects, under the assumption that listeners can develop procedural records for mapping segmental information to phonological representations, not just word forms to lexical representations. If so, listeners who were familiarized to the talker with the back vowel lowered accent could have remapped their perceptual vowel space talker- specifically to accommodate fine acoustic-phonetic detail of the trained talker’s vowel pro- ductions, with residual savings when processing vowel forms from the new female talker with an acoustically similar vowel space. Conceptually, this account works as follows: instead of listeners in the adaptation condition learning, for example, that the category percept [o] maps to the phonological category /U/ (and hence “w[o]den” maps to wooden), which is a remapping among higher-level phonetic and phonological categories, listeners could learn that in the trained talker’s accent, a specific range of spectral-temporal values produced by the trained talker (e.g., F1, F2, duration) maps to /U/. The procedural record for mapping values in this range to the correct phonological category could be recycled when similar vowel tokens are later heard from the trained talker in the context of new words, hence accounting for generalization across words. Further, since the trained female and the new female have almost entirely overlapping vowel spaces (see Figure 2.4), the procedural records based on exposure to the trained talker’s back vowel productions could be recycled when processing back vowel productions from the new female, regardless of whether these tokens occur in the context of trained or new lexical items. This procedural record account does not provide a straightforward explanation of the finding that learning generalized to trained words produced by the new acoustically-dissimilar male talker; however, this latter effect could be due to talker-independent word-repetition priming.

83 2.5 Omnibus analyses

Two omnibus analyses were conducted to assess the influence of input variability (manip- ulated across experiments) on perception of the target back vowel lowered items: one on endorsement rates for the target items and one on identification accuracy for these items.

The pooled data comprised responses from 192 subjects.

2.5.1 Omnibus analysis of endorsement rates

The pooled set of lexical decisions to the target back vowel lowered items were fitted with a mixed logit model containing four fixed effect terms and all interactions: exposure (the back vowel lowered accent vs. the standard sounding American English accent), item status

(trained vs. new items), talker (the trained female talker, the new acoustically-similar fe- male talker, and the new acoustically-dissimilar male talker) and experiment (Experiment

1 = high token variability; Experiment 2 = medium token variability; Experiment 3 = comparatively low token variability). The random effects structure comprised random in- tercepts for subjects and items, by-subject random slopes for talker and item status, and by-item random slopes for exposure condition, talker and experiment. Table 2.13 sum- marizes the fixed effect parameter estimates (reported as log-odds of a ‘word’ response) and shows the results of log-likelihood comparisons testing whether each fixed effect term contributed significantly to model fit.

As expected, there was a main effect of exposure condition, indicating that listeners who were familiarized to the back vowel lowered accent endorsed more of the back vowel lowered items at test than listeners in the control condition, regardless of the item status or talker.

There was also a marginally significant interaction between exposure condition and item type, indicating that the exposure-driven endorsement difference tended to be somewhat smaller for new words pronounced in the back vowel lowered accent than for words that occurred in the exposure passage. This interaction suggests either a processing benefit for

84 Table 2.13: Experiments 1-3, summary of the omnibus analysis of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.26 0.25 −1.0 0.31 exposure (= BVL chain shift) 1.72 0.25 7.0 <.001 item status (= new items) −0.66 0.45 −1.5 0.15 talker (= new female) −0.24 0.24 −1.0 0.32  27.41 24 0.29 talker (= new male) −0.06 0.25 −0.3 0.80 experiment (= Exp 2) −0.16 0.33 −0.5 0.64  15.49 24 0.91 experiment (= Exp 3) −0.05 0.33 −0.1 0.89 BVL chain shift : new items −0.42 0.19 −2.2 <.05 16.15 9 = .06 BVL chain shift : new female −0.27 0.34 −0.8 0.43  18.90 12 0.10 BVL chain shift : new male −0.29 0.31 −0.9 0.36 new items : new female 0.45 0.39 1.2 0.25  13.16 12 0.36 new items : new male 0.03 0.43 0.1 0.95 BVL chain shift : Exp 2 0.47 0.67 0.7 0.48  9.58 12 0.65 BVL chain shift : Exp 3 −0.36 0.66 −0.5 0.59 new items : Exp 2 0.15 0.18 0.8 0.42  8.96 12 0.71 new items : Exp 3 −0.12 0.17 −0.7 0.45 new female : Exp 2 0.02 0.47 0.0 0.97    new male : Exp 2 −0.47 0.43 −1.1 0.28 11.39 16 0.61 new female : Exp 3 −0.29 0.47 −0.6 0.53   new male : Exp 3 0.45 0.43 1.1 0.29 BVL chain shift : new items : new female 0.63 0.36 1.7 0.08  9.54 6 0.15 BVL chain shift : new items : new male −0.76 0.35 −2.2 <.05 BVL chain shift : new items : Exp 2 −0.24 0.34 −0.7 0.48  2.67 6 0.61 BVL chain shift : new items : Exp 3 0.46 0.33 1.4 0.16 BVL chain shift : new female : Exp 2 −0.48 0.94 −0.5 0.61    BVL chain shift : new male : Exp 2 0.62 0.86 0.7 0.47 6.54 8 0.59 BVL chain shift : new female : Exp 3 −0.71 0.94 −0.8 0.45   BVL chain shift : new male : Exp 3 0.18 0.86 0.2 0.83 new items : new female : Exp 2 −0.34 0.50 −0.7 0.50    new items : new male : Exp 2 0.42 0.49 0.9 0.38 5.91 8 0.67 new items : new female : Exp 3 −0.05 0.49 −0.1 0.92   new items : new male : Exp 3 0.13 0.48 0.3 0.79 BVL chain shift : new items : new female : Exp 2 −1.19 0.99 −1.2 0.23    BVL chain shift : new items : new male : Exp 2 1.36 0.97 1.4 0.16 4.45 4 0.34 BVL chain shift : new items : new female : Exp 3 1.64 0.98 1.7 0.09   BVL chain shift : new items : new male : Exp 3 −1.95 0.96 −2.0 <.05

85 trained items (e.g., a lexical repetition effect) or a slight processing cost of generalizing learning to new words. No other main effects or interactions were significant. The fact that the effect of experiment did not contribute significantly to model fit indicates that the token variability manipulation across experiments had no reliable influence on learning outcomes.

Further, while the lexical decision results for Experiment 3 showed that similarity between the trained and new talkers influenced the strength of generalization to new words produced by a new talker, the influence of talker similarity on generalization of learning was not strong enough to appear in the omnibus analysis (i.e., the four-way interaction of exposure, item status, talker, and experiment was not significant).

2.5.2 Omnibus analysis of word identification accuracy

The pooled trial-wise word identification accuracy data (correct = 1, incorrect = 0) were

fitted with a mixed logit model containing four fixed effect terms and all interactions: ex- posure (the back vowel lowered accent vs. the standard sounding American English accent), item type (filler words pronounced with unshifted back vowels, trained back vowel low- ered items, and new back vowel lowered items), talker (the trained female talker, the new acoustically-similar female talker, and the new acoustically-dissimilar male talker) and ex- periment (Experiment 1 = high token variability; Experiment 2 = medium token variability;

Experiment 3 = comparatively low token variability). The random effects structure com- prised by-subject and by-item random intercepts, by-subject random slopes for item type and talker, and by-item random slopes for exposure condition and experiment. Table 2.14 summarizes the fixed effect parameter estimates (reported as log-odds of a correct response) and shows the results of log-likelihood comparisons testing whether each fixed effect term contributed significantly to model fit (note that Table 2.14 is split across multiple pages).

As expected, there was a significant main effect of exposure condition: listeners in the back vowel lowered exposure condition showed higher identification accuracy overall than

86 Table 2.14: Experiments 1-3, summary of the omnibus analysis of word identification accu- racy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 1.51 0.20 7.5 <.001 exposure (= BVL chain shift) 0.80 0.16 5.1 <.001 Item Set (= trained BVL items) −1.76 0.56 −3.1 <.01 Item Set (= new BVL items) −2.17 0.48 −4.5 <.001 talker (= new female) 0.04 0.26 0.2 0.88  47.63 36 = .09 talker (= new male) −0.56 0.26 −2.2 <.05 experiment (= Exp 2) −0.10 0.21 −0.5 0.63  28.14 36 0.82 experiment (= Exp 3) −0.03 0.21 −0.1 0.91 BVL chain shift : trained BVL items 1.41 0.27 5.2 <.001  42.05 18 < .01 BVL chain shift : new BVL items 0.59 0.22 2.7 <.01 BVL chain shift : new female −0.03 0.24 −0.1 0.90  14.34 18 0.71 BVL chain shift : new male −0.14 0.23 −0.6 0.52 trained BVL items : new female −1.76 0.75 −2.4 <.05    new BVL items : new female 0.53 0.64 0.8 0.41 31.35 24 0.14 trained BVL items : new male −0.56 0.74 −0.8 0.45   new BVL items : new male 0.11 0.63 0.2 0.86 BVL chain shift : Exp 2 0.53 0.42 1.3 0.21  16.69 18 0.54 BVL chain shift : Exp 3 −0.62 0.42 −1.5 0.14 trained BVL items : Exp 2 −0.17 0.34 −0.5 0.62    new BVL items : Exp 2 −0.05 0.27 −0.2 0.85 22.84 24 0.53 trained BVL items : Exp 3 −0.04 0.34 −0.1 0.91   new BVL items : Exp 3 0.06 0.28 0.2 0.82 new female : Exp 2 −0.25 0.33 −0.8 0.44    new male : Exp 2 0.17 0.31 0.6 0.58 21.24 24 0.62 new female : Exp 3 0.23 0.35 0.7 0.51   new male : Exp 3 −0.01 0.31 0.0 0.97 BVL chain shift : trained BVL items : new female 0.59 0.57 1.0 0.30    BVL chain shift : new BVL items : new female −0.45 0.50 −0.9 0.37 12.86 12 0.38 BVL chain shift : trained BVL items : new male 0.00 0.55 0.0 0.99   BVL chain shift : new BVL items : new male 0.50 0.47 1.1 0.29 BVL chain shift : trained BVL items : Exp 2 0.90 0.67 1.3 0.18    BVL chain shift : new BVL items : Exp 2 0.39 0.54 0.7 0.47 13.55 12 0.33 BVL chain shift : trained BVL items : Exp 3 −0.88 0.67 −1.3 0.19   BVL chain shift : new BVL items : Exp 3 0.27 0.54 0.5 0.62 BVL chain shift : new female : Exp 2 0.33 0.66 0.5 0.62    BVL chain shift : new male : Exp 2 −0.31 0.62 −0.5 0.62 11.08 12 0.52 BVL chain shift : new female : Exp 3 −0.23 0.69 −0.3 0.74   BVL chain shift : new male : Exp 3 0.53 0.61 0.9 0.39 Table continued on next page.

87 Table 2.14, continued

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆  trained BVL items : new female : Exp 2 −0.16 0.77 −0.2 0.83   new BVL items : new female : Exp 2 1.11 0.67 1.7 0.10    trained BVL items : new male : Exp 2 0.06 0.74 0.1 0.94   new BVL items : new male : Exp 2 −0.29 0.64 −0.5 0.64  18.52 16 0.29 trained BVL items : new female : Exp 3 −1.40 0.79 −1.8 0.08   new BVL items : new female : Exp 3 −1.08 0.70 −1.5 0.12    trained BVL items : new male : Exp 3 0.42 0.73 0.6 0.56   new BVL items : new male : Exp 3 0.69 0.63 1.1 0.28  BVL chain shift : trained BVL items : new female : Exp 2 0.22 1.51 0.1 0.89   BVL chain shift : new BVL items : new female : Exp 2 −0.71 1.32 −0.5 0.59    BVL chain shift : trained BVL items : new male : Exp 2 −0.30 1.44 −0.2 0.84   BVL chain shift : new BVL items : new male : Exp 2 1.48 1.26 1.2 0.24  10.11 8 0.26 BVL chain shift : trained BVL items : new female : Exp 3 0.43 1.55 0.3 0.78   BVL chain shift : new BVL items : new female : Exp 3 0.99 1.37 0.7 0.47    BVL chain shift : trained BVL items : new male : Exp 3 −2.97 1.43 −2.1 <.05   BVL chain shift : new BVL items : new male : Exp 3 −0.85 1.25 −0.7 0.49 listeners in the control condition. The main effect of item type was also signifiant: identifica- tion accuracy was lower overall for the back vowel lowered items than for words pronounced with unshifted (standard-sounding) vowels. These two main effects were qualified by a significant two-way interaction: the effect of accent exposure on identification accuracy was larger for the back vowel lowered items than words pronounced with unshifted vowels.

Thus, exposure to the back vowel lowered accent facilitated recognition of the back vowel lowered items. There was also a marginally significant trend suggesting that identification accuracy was lower overall for words produced by the new male talker than words produced by the other talkers. No other main effects or interaction terms were significant. Thus, the omnibus analysis indicated that token variability (manipulated across experiments) had no reliable influence on identification accuracy.

88 2.6 General Discussion

Three experiments investigated the specificity of perceptual learning of systemic cross- category vowel variation following exposure to a single talker with a novel accent. The system of vowel variation used here was a novel “back vowel lowered” chain shift that af- fected the realization of the English vowels /u, U, o, O, A/ (e.g., /U/ was shifted to [o], and

hence wooden sounded like “w[o]den”, which is a nonword in standard American English).

These experiments tested for generalization of learning across words and talkers in order

to understand the nature of the perceptual adjustments that enable listeners to cope with

complex patterns of cross-category pronunciation variation. Further, these experiments in-

vestigated the extent to which potential generalization across words and talkers depended

on properties of the environment: namely, the degree of token variability experienced prior

to test (high token variability in Experiment 1, and comparatively medium and low to-

ken variability in Experiments 2 and 3, respectively) and the degree of acoustic similarity

between the trained and new talkers in terms of their vowel productions.

Overall, these experiments found consistent evidence for generalization of learning across

both words and talkers. Listeners in the control condition, who heard a standard-sounding

American English accent prior to test, tended to perceive words produced in the back

vowel lowered accent as nonwords (i.e., across experiments, control participants endorsed

less than 40% of the vowel-shifted forms as words during lexical decision). Results from

the adaptation condition showed that passive familiarization to a single talker with the

novel back vowel lowered accent was sufficient to improve recognition of accent-consistent

pronunciations, regardless of whether the test words occurred during the accent exposure

phase or only at test (i.e., generalization to new words) and regardless of whether the

test words were produced by the trained talker or a new talker (i.e., generalization to new

talkers). The exposure-driven recognition benefit for the target back vowel lowered items

was observed in both lexical decision endorsement rates and word identification accuracy;

89 thus, listeners in the adaptation condition did not simply relax the decision criteria for making a “word” response during lexical decision as a result of sustained exposure to atypical word forms.

The finding of generalization to new words indicates that the locus of learning involved adjustments at a sublexical level of representation, as opposed to listeners memorizing atyp- ical pronunciations of specific lexical items (Sjerps & McQueen, 2010; McQueen et al., 2006).

The finding of generalization to new talkers indicates that learning involved adjusting the mapping between input and abstract talker-independent category representations (Sidaras et al., 2009; Kraljic & Samuel, 2006). Thus, going beyond previous results concerning per- ceptual learning of cross-category vowel variation (e.g., Maye et al., 2008; Weatherholtz,

2013), the current experiments demonstrated that passive familiarization to the trained talker with the back vowel lowered accent resulted in a talker-independent remapping of listeners’ perceptual vowel space. Regarding constraints on the adaptive speech processing system, these findings indicate that multi-talker exposure conditions are not necessary to achieve talker-independent learning of segmental variation, even when learning complex pat- terns of pronunciation variation that affect the realization of multiple speech sounds across multiple acoustic-phonetic dimensions (cf. Sidaras et al., 2009; Bradlow & Bent, 2008; Lively et al., 1993).

The current experiments also showed that generalization across words and talkers was constrained, albeit it to a limited extent, by the degree of input variability experienced prior to test and by the degree of acoustic similarity between the trained and new talkers. In Ex- periments 1 and 2, learning generalized simultaneously across words and talkers regardless of talker similarity, despite considerable differences between experiments in the degree of token variability experienced during the accent familiarization phase. However, in Experi- ment 3—when the familiarization phase was brief (two minutes) and characterized by the narrowest range of token variability in the current experiments—simultaneous generaliza- tion across words and talkers was constrained by talker similarity. Lexical decision results

90 showed that the exposure-driven word recognition benefits extended to new words produced by the new female talker with an acoustically similar vowel space, but not to new words produced by the new male talker with an acoustically dissimilar vowel space. These findings suggest a more nuanced process of cross-talker generalization than previously articulated in the literature. An emerging perspective suggests that multi-talker exposure conditions may be necessary for cross-talker generalization (Bradlow & Bent, 2008; Lively et al., 1993;

Sidaras et al., 2009), except when the trained and new talkers are sufficiently similar in terms of their productions of the pronunciation variant(s) being learned (Reinisch & Holt,

2014, see also Kraljic & Samuel, 2007, 2005). The current results suggest that, at least for perceptual learning of cross-category vowel variation, talker similarity influences the likeli- hood of cross-talker generalization under single talker exposure conditions, but only during the early phases of adaptation, before listeners have rich experience with the trained talker’s accent. As listeners accrue more experience with the trained talker’s accent and experience a greater range of variability among the target pronunciation variants, listeners abstract over the acoustic-phonetic details of the trained talker’s productions, enabling learning to transfer across talkers regardless of talker similarity.

An important caveat for interpreting the current effects of input variability on learning outcomes is that the exposure materials in Experiment 3 were characterized as low token variability materials by comparison to the materials used in Experiments 1 and 2 (i.e., 72 shifted back vowels across 51 lexical contexts, compared to several hundred back vowel tokens across a far greater range of lexical contexts in the earlier experiments). However, these exposure materials still contained relatively high token variability compared to other studies concerned with effects of input variability on learning outcomes. For example, in one study, Greenspan et al.(1988) found that adaptation to synthesized speech transferred across words under high token variability conditions, operationalized as 200 unique words presented once each, whereas adaptation was lexically-specific under low token variability

91 conditions, operationalized as 10 words repeated 20 times each. The fundamental observa- tion concerning input variability is that the specificity of learning depends on the degree to which the input materials represent the structure of the full stimulus set (i.e., the distribu- tion of variant forms from which the trained tokens are drawn; see Greenspan et al., 1988;

Logan et al., 1991). In Experiment 3, the range of acoustic and lexical variability among the back vowel lowered tokens in the exposure passage still provided robust information about the underlying structure of the novel vowel chain shift. Thus, a question for future research is whether further reduction of token variability in this exposure-test paradigm leads to stronger evidence for instance-specific learning outcomes (word-specific, talker-specific, or both).

2.6.1 Theories of adaptive speech processing

Any viable theory of adaptive speech processing must be able to account for both instance- specific recognition effects and more general recognition effects that result from dynamically adjusting higher level category representations on the basis of experience. The exposure- driven word recognition benefits observed in Experiments 1 and 2 were of the latter type: listeners adjusted word- and talker-independent vowel representations as a result of expe- rience with the trained talker’s cross-category vowel shifts. In Experiment 3, the exposure- driven word recognition benefits were at least partially specific to the acoustic-phonetic properties of the trained talker’s vowel space.

Dual-system models of learning and memory (e.g., Davis & Gaskell, 2009; McClel- land, McNaughton, & O’Reilly, 1995; O’Reilly & Norman, 2002) and hybrid episodic- abstractionist models of speech processing (e.g., Feustel et al., 1983; Goldinger, 2007) pro- vide a framework for interpreting both types of effects. These models assume that two complementary systems are involved in learning, one system for learning specifics, and one system for learning generalities. When new stimuli are experienced, speech episodes are encoded and retained in a temporary memory store, and over time these episodic records

92 become integrated into long-term memory via consolidation processes that result in abstract representations. The dual systems approach is consistent with hybrid episodic-abstraction models, which share the assumption that representations of the input are initially episodic but combine over time to form more abstract representations (Feustel et al., 1983; Goldinger,

2007; Grossberg, 1986; McLennan et al., 2005). Crucially, in these hybrid models, episodic records do not disappear once more abstract representations are formed. Rather, both types of representations exist together in memory. Thus, instance-specific recognition ef- fects (as in Experiment 3) can be explained by the recruitment of episodic records during word recognition, and word- and talker-independent effects (as in Experiments 1 and 2) can be explained by recruitment of more abstract category representations.

2.6.2 Individual variation in word recognition

An interesting aspect of the present data is the degree of individual variation, both in the control and adaptation conditions. The target back vowel lowered items were designed to be nonword surface forms in standard American English. Thus, it was expected that lis- teners in the control condition, who were not familiarized to the back vowel lowered accent, would have difficulty recognizing these word forms. Indeed, aggregate lexical decision per- formance indicated that listeners in the control condition tended to reject the back vowel lowered items as nonwords; the mean endorsement rate for these items across subjects and experiments was about 30-40%. However, the subject-wise mean endorsement rates for the back vowel lowered items ranged from 0% to upwards of 90% in the control condition (see

Figures 2.7, 2.12 and 2.17), indicating that the cross-category vowel variants were much less detrimental to word recognition for some listeners in the control condition than others.

On the one hand, these data are a testament to the fact that the speech processing system is remarkably robust to pronunciation variation, even without prior familiarization to par- ticular pronunciation variants (see Andruski, Blumstein, & Burton, 1994; Connine et al.,

1997, 1993, for relevant discussion of mismatch tolerance). On the other hand, the degree of

93 individual variation in the control condition raises important questions about the cognitive underpinnings of perceptual flexibility and why some listeners appear to be generally more tolerant of pronunciation mismatch than others. Differences in linguistic experience outside the lab likely play an important role. Regular exposure to multiple dialects and accents over long periods of time has an enduring influence on perceptual flexibility (Sumner & Samuel,

2009). Further, listeners can leverage knowledge of pronunciation variation in familiar vari- eties to facilitate recognition of similar patterns of pronunciation variation in an unfamiliar accent (Baese-Berk et al., 2013). While all participants in the current experiments were native monolingual English speakers, participants inevitably varied in terms of the range and regularity of their past experiences with different dialects and accents. It is possible that the listeners in the control condition who showed the least detriment in word recogni- tion due to the novel vowel chain shift were listeners who had considerable experience with different spoken varieties of English.

Performance in the adaptation condition also showed massive individual variation. Over- all, exposure to the back vowel lowered accent resulted in a reliable increase in the recog- nition of words produced in this accent. However, in each experiment, subject-wise en- dorsement rates for the back vowel lowered items during lexical decision spanned the entire proportion range: some listeners endorsed none or very few of the target items, while some endorsed all or nearly all of these items (mean endorsement of about 50-60% across subjects; see Figures 2.7, 2.12 and 2.17). Thus, the extent to which individual listeners benefitted from exposure to the back vowel lowered accent varied substantially. Perceptual learning requires sensitivity to the statistical distribution of speech cues in the input, and individ- ual listeners vary in their degree of sensitivity to input statistics (see Neger, Rietveld, &

Janse, 2014). Thus, the participant-level variation in the adaptation condition is likely due to a combination of long term experiential factors, as discussed above, and differences in sensitivity to the distribution of vowel cues in the trained talker’s accent.

94 2.6.3 Conclusions

The findings of the current experiments highlight the flexibility of the speech processing system with respect to pronunciation variation. These findings are broadly consistent with the growing literature showing that the processes and representations involved in speech perception and spoken word recognition are dynamically adjusted in response to the envi- ronment (Bradlow & Bent, 2008; Clarke & Garrett, 2004; Dahan et al., 2008; Kleinschmidt

& Jaeger, 2015; Maye et al., 2008; Norris et al., 2003; Sidaras et al., 2009; Sjerps & McQueen,

2010; Bertelson et al., 2003). However, in contrast to much of the previous work on cross- talker generalization of learning, which has shown talker-specific perceptual adjustments under single talker exposure conditions (e.g., Bradlow & Bent, 2008; Eisner & McQueen,

2005; Kraljic & Samuel, 2007; Reinisch & Holt, 2014, ; though see Kraljic & Samuel, 2006), the current experiments revealed robust talker-independent perceptual adjustments follow- ing exposure to a single talker with an unfamiliar accent. A potentially relevant factor is that this previous work focused either on adaptation to atypical consonant variation or adaptation to globally foreign-accented speech, whereas the novel accent in the current ex- periments was defined by atypical vowel variation. Vowels have been argued to contribute less to lexical identity than consonants (Cutler, Sebasti´an-Gall´es,Soler-Vilageliu, & can

Ooijen, 2000; Delle Luche et al., 2014; Nespor, Pe˜na,& Mehler, 2003; van Ooijen, 1996), which might lead to greater flexibility (and hence broader generalization) when processing atypical vowel variants. Further, spoken varieties of many languages, including American

English, are distinguished in large part by variation in the vowel system (Labov et al., 2006); hence, listeners might be biased to generalize learning of vowel variation across talkers in order to facilitate recognition of words produced by other talkers with the same accent.

95 Chapter 3

Experiment 4: Generalization to Untrained Parts of a Chain Shift System

3.1 Introduction

The speech processing system is remarkably robust to variation in the realization of vowels.

Previous research has shown that passive exposure to an unfamiliar accent characterized by a novel vowel chain shift (e.g., a counterclockwise rotation of English front vowels) is sufficient for listeners to adapt to the cross-category pronunciation variants that result from this chain shift (e.g., /i/ realized as [I], beetle as “b[I]tle”; /I/ realized as [E], witch as “w[E]tch”;

/E/ realized as [æ], yellow as “y[æ]llo”) (Maye et al., 2008; Weatherholtz, 2013, see also

Experiments 1-3 of this dissertation). Results of these studies indicate that the learning

mechanism driving adaptation involves developing and maintaining multiple mappings for

how acoustic-phonetic cues relate to vowel categories and the lexical representations that

contain those vowel categories. As a result of exposure to the novel accent, surface forms

otherwise perceived as nonwords (e.g., “w[E]tch”) came to be perceived as real words (e.g.,

witch), indicating that listeners remapped their perceptual vowel space to reflect the talker’s

chain shift (e.g., [E] maps to /I/). At the same time, canonical word forms from the same

talker (e.g, “sh[E]lf” for shelf ) were still correctly perceived as real words, rather than

as the nonwords that would result if the vowels in these words were perceived as shifted

variants (e.g., “shilf” if [E] was interpreted as /I/). Thus, listeners were able to remap

their perceptual vowel space to accommodate the novel accent without overwriting existing

mappings used to process typical pronunciation variants.

96 An open question concerns the systematicity of this vowel space remapping. The goal of Experiment 4 was to determine whether listeners adapt to vowel chain shifts by learning each constituent vowel shift independently (e.g., /i/ → [I], /I/ → [E]) or whether listeners

learn a more general system of co-variation among vowel categories.

From the perspective of phonology and language change, vowel chain shifts are system-

atic in nature: one shift triggers another shift, which in turn can trigger another shift,

resulting in a chain of co-dependent vowel movements in acoustic-phonetic space1 (Labov,

1994; Martinet, 1955). The initial conditions that actuate a vowel chain shift are often

unknown, although research in historical linguistics aims to identify these conditions (Wein-

reich, Labov, & Herzog, 1968; Coseriu, 1958). However, once a vowel in a given language

or language variety begins to shift, language-internal pressures have been hypothesized to

regulate the progression of the shift and the development of vowel shift systems. Func-

tional theories of language typology propose that vowel systems are regulated by functional

pressures to maintain balance in the system and maximize perceptual contrast between

vowel categories (Maddieson, 1984; Liljencrants & Lindblom, 1972; Martinet, 1955; Jakob-

son, 1931). When one vowel shifts towards another vowel, this shift creates a phonetic gap

in one region of the vowel space and creates a “crowding” effect in another region. Thus,

in functional or teleological terms, the shift of one vowel can trigger other vowels to move

to “fill the slot” vacated by the first shift, and the first shift can “push” other vowels out of

the way to avoid crowding (i.e., to maintain “margins of security” for perceptual contrast;

Martinet, 1952, p. 6).

Consider, for example, the development of the Northern Cities Shift, a chain shift that

partially characterizes the Inland North dialect of American English (Labov et al., 2006).

Figure 3.1 shows a schematic representation of this chain shift. In historical order, the vowel

1Although many chain shifts involve vowels, chain shifting is not exclusively a vocalic phenomenon. Chain shifts can also affect consonants: e.g., the First Germanic Sound Shift, also known as Grimm’s Law, in which voiceless stop consonants became fricatives, plain voiced stops became voiceless, and breathy voiced stops became plain voiced stops (Grimm, 1822).

97 i u I U e o 6 @ 1 E 4 2 5 O æ 3 2 A

Figure 3.1: Schematic representation of the Northern Cities Shift. Numbers indicate the historic orderKodi of Weatherholtz the vowel shifts.Perceptual learning of systemic vowel variation AMLaP 2014

/æ/ (as in cad) was raised, fronted and diphthongized to sound like [ej] (as in cade). In

turn, the vowel /A/ (as in cod) shifted forward to sound like [æ], filling the gap in the low

front region of the vowel space caused by the initial raising of /æ/. The vowel /O/ (as in

cawed) was lowered to the space previously occupied by [A], and the vowel /E/ (as in Ked)

was centralized to [2] (as in cud). In response /2/ shifted back to [O], and finally the vowel

/I/ (as in kid) was lowered and backed to [E].

Labov(1994) outlined three principles that govern the chain shifting of vowels based on

the phonological structure of vowel systems: (i) long vowels rise; (ii) short vowels fall; and

(iii) back vowels move to the front. These principles are not absolute; many exceptions have

been documented cross-linguistically, as outlined by Labov(1994). However, the fact that

these three principles account for the vast majority of known chain shifts across languages,

both synchronically and diachronically, provides further evidence that vowel chain shifts are

systematic in nature.

From the perspective of adaptive speech processing, it is currently unknown whether

listeners are sensitive to the systematicity among vowel variants when adapting to an un-

familiar vowel chain shift. The most restrictive learning hypothesis is that listeners adapt

to vowel chain shifts by learning each vowel shift independently. For example, in the study

by Maye et al.(2008), listeners could have learned a set of isolated vowel shifts: /i/ → [I],

98 /I/ → [E], and /E/ → [æ]. An alternative hypothesis is that listeners adapt to vowel chain shifts by abstracting the underlying system of codependencies that characterizes the chain shift. For example, participants in Maye et al.’s (2008) study could have learned a system of “front vowel lowering” (i.e., front vowels systemically rotated counterclockwise such that each front vowel is realized with acoustic-phonetic properties typically associated with the

“lower” neighbor in the vowel space).

These accounts make different predictions about the early stages of adaptation to sys- temic pronunciation variation, specifically when listeners have not experienced the full set of vowel shifts that characterize an unfamiliar vowel chain shift. American English vowels are asymmetrically distributed in both type and token frequency (Mines, Hanson, & Shoup,

1978; Roberts, 1965), so it is not uncommon for listeners to hear stretches of speech from an unfamiliar talker and have only a partial representation of the talker’s vowel space based on experienced tokens. If listeners adapt to an unfamiliar chain shift system by learning a set of isolated vowel shifts—without any representation of the higher-level codependencies among these pronunciation variants—then learning will be specific to the vowel shifts in the input. According to this restrictive learning account, and critical for the current study, a listener’s ability to learn untrained shifts that are part of an unfamiliar chain shift (i.e., incidental gaps based on listeners’ experience) will be contingent on further input from the talker. By contrast, if listeners learn a system of co-dependencies among vowel categories, they should be able to leverage this learning to fill in gaps in their experience by inferring properties about the realization of untrained vowel categories.

The current experiment aimed to distinguish these two learning accounts by using a

“leave-one-out” learning paradigm. This paradigm involved exposing listeners to a subset

(n-1) of the vowel shifts that comprise an unfamiliar vowel chain shift and then testing whether listeners are able to generalize learning to fill in the incidental gap in their ex- perience of the vowel shift system. During the initial phase of the experiment, listeners passively listened to a short story spoken by a talker whose accent was characterized by a

99 novel vowel chain shift. One group of listeners heard the talker speak with a novel “back vowel lowered” chain shift: i.e., a clockwise rotation of the vowels /u, U, o, O, A/. A second group of listeners heard the same talker speak with a novel “back vowel raised” chain shift, which involved the same vowels but rotated counterclockwise in the vowel space. The short story was designed to contain no instances of the vowel /U/, one of the middle vowels in each novel chain shift. Thus, listeners experienced a consistent pattern of cross-category vowel shifting (i.e., either “lowering” or “raising”) among the vowels /u, o, O, A/, but they had neither positive nor negative evidence for whether the talker shifted the intermediate vowel /U/. Following the exposure phase, listeners performed an auditory lexical decision task and an auditory naming task to assess perceptual learning of the vowel chain shifts.

The test tasks were identical for both groups of listeners. Target items comprised a set of back vowel lowered items and a set of back vowel raised items that exemplified the full range of vowel shifts defining each vowel shift system. If listeners adapt to an unfamiliar vowel chain shift by learning each vowel shift independently (i.e., a piecemeal learning process), then listeners who were familiarized to the back vowel lowered accent should be better at recognizing accent-consistent word forms containing the trained vowels /u, o, O, A/ (i.e., words containing lowered variants of these vowels) than accent-consistent words containing the untrained vowel /U/. Likewise, listeners who were familiarized to the back vowel raised chain shift should be better at recognizing words containing raised variants of the trained vowels than words containing raised variants of the untrained vowel /U/. By contrast, if listeners learn a system of co-variation among vowel categories, then listeners in each accent exposure condition should be able to generalize learning based on the trained vowel shifts to recognize accent-consistent word forms containing the untrained vowel /U/ (i.e., words containing lowered and raised variants of this vowel following exposure to the back vowel lowered and back vowel raised chain shifts, respectively).

100 3.2 Method

3.2.1 Participants

A total of 42 undergraduates at The Ohio State University participated in Experiment 4 in exchange for partial course credit. Ten participants were excluded because they were not native monolingual English speakers with normal speech and hearing. Two participants were excluded for having exceptionally low accuracy on non-target trials during the post- exposure test tasks; these participants endorsed more than 30% of the maximal nonwords as words during lexical decision. After exclusions, there were 30 usable participants evenly split between the two accent exposure conditions. The naming data for one participant in the back vowel lowered exposure condition was lost due to technical issues with the recording software. Thus, 29 subjects contributed both lexical decision and auditory naming data, and one subject contributed only lexical decision data.

3.2.2 Exposure materials

The exposure materials comprised a short excerpt from a popular children’s story, The Ad- ventures of Pinocchio, read aloud in two different novel accents of American English by a trained phonetician who speaks natively with a standard sounding Midland American En- glish accent. The two target accents were characterized by different novel vowel chain shifts, depicted schematically in Figure 3.2 and referred to, respectively, as “back vowel lowering” and “back vowel raising”. These labels are terms of convenience, providing an approxi- mate description of the overall pattern of vowel movement in each chain shift, though the individual vowel shifts involve variation along multiple dimensions including vowel height, backness, trajectory, and duration. The back vowel lowered chain shift was characterized by four cross-category vowel shifts: /u/ was lowered and fronted to sound like [U] (e.g., smooth as “sm[U]th”); /U/ was lowered and backed to sound like [o] (e.g., wooden as “w[o]den”);

/o/ was lowered and fronted to sound like [A] (e.g., nose as “n[A]se”); and the vowels /O/

101 i u i u I U I U e o e o @ @ E 2 E 2 æ æ A,O A,O

(a) Back Vowel Lowering with incidental gap (b) Back Vowel Raising with incidental gap

Kodi Weatherholtz Figure 3.2:Perceptual learningSchematic of systemic vowel variation representation ofKodi the Weatherholtz backAMLaP 2014 vowel loweredPerceptual learning chain of systemic vowel shift variation (left) and back vowelAMLaP 2014 raised chain shift (right). The exposure passage was manipulated to contain no instances of the vowel /U/. Thus, listeners had an incidental gap in their experience of the trained talker’s vowel system, as denoted by the grayed out vowel symbol in each panel.

and /A/, which are merged in the talker’s native variety, were fronted to sound like [æ]

(e.g., often as “[æ]ften”). The back vowel raised chain shift was designed to affect the

realization of the same vowels but to involve a directionally opposite pattern of movement:

/u/ was fronted to sound like [i] (e.g., smooth as “sm[i]th”); /U/ was raised and backed

to sound like [u] (e.g., wooden as “w[u]den”); /o/ was raised and fronted to sound like [U]

(e.g., nose as “n[U]se”); and the vowels /O/ and /A/ were raised to sound like [o] (e.g., often

as “[o]ften”). See Appendices B.1 and B.2 for a complete written version of the exposure

passage, transcribed in the back vowel lowered and back vowel raised accents, respectively.

The excerpt from Pinocchio was manipulated and partially rewritten to remove all words

containing the vowel /U/ (e.g., wooden, could).2 Thus, when listening to either the back

vowel lowered or back vowel raised version of the passage, listeners received positive evidence

for cross-category shifts affecting the vowels /u, o, O, A/, but they received neither positive

nor negative evidence for whether the vowel /U/ was shifted in a manner consistent with

the surrounding back vowel shifts (i.e., whether /U/ lowered to [o] in the back vowel lowered

accent, or whether /U/ raised to [u] in the back vowel raised accent). This property of the

2For example, Pinocchio was originally described as a wooden marionette. To avoid using the word wooden (/wUd@n/), the relevant text was rewritten to describe Pinocchio as a small piece of lumber that had been carved into a marionette.

102 stimuli is referred to hereafter as an incidental gap in listeners’ experience of the trained talker’s vowel space.

The decision to create an incidental gap in experience based on the vowel /U/ was motivated by two factors. First, from a learning perspective, it was necessary for the gap to occur in the middle of the novel chain shift. Chain shifts generally affect only a subset of vowels in a language (Labov, 1994), so when adapting to an unfamiliar chain shift, listeners must determine which vowels are affected and which are not. In the present case, because vowel shifts occurred on either end of the untrained vowel category /U/ (i.e., vowel shifts affecting the realization of /u/ and /o/), listeners had reason to believe that /U/ was within the affected portion of the vowel space. Thus, if learning does not generalize to the untrained vowel category /U/, this finding would suggest that listeners learned a pattern of variation associated with the trained vowel categories, and could not be attributed to listeners assuming that the untrained vowel category was outside the scope of the vowel shift system. Second, from an ecological perspective, /U/ has a relatively low type and token

frequency compared to other American English vowels (Mines et al., 1978), which means

listeners might often have an incomplete representation of this portion of the vowel space

when adapting to unfamiliar talkers. Thus, creating an incidental gap in experience based

on the vowel /U/—in combination with the use of a passive listening task for the exposure

phase – provides an ecologically plausible environment in which to test for generalization

of learning to untrained vowel categories.

To obtain recordings of the back vowel lowered and back vowel raised versions of the

passage, the trained phonetician was given two written versions of the story in which all

words containing back vowels were phonetically transcribed according to the remapping

rules for either the back vowel lowered or back vowel raised chain shifts. The talker then

practiced reading the story aloud in each accent over the course of several days, until she

achieved a natural-sounding production of each accent. The talker was given the two written

versions of the story separately and, hence, only practiced one accent at a time. Recordings

103 of the two versions of the passage were made on separate days in a sound-attenuated booth.

The talker wore a high quality headset microphone (Sennheiser HMD 280-13), which was passed through an Art Tube MP microphone preamp to a Windows laptop. Speech was digitally recorded using Audacity at a sampling rate of 44.1-kHz and later downsampled to 22.05-kHz and scaled to an average intensity of 70dB. Each version of the passage was edited in Praat (Boersma & Weenink, 2014) to remove disfluencies. After editing, each version of the passage was about 5 minutes long.

The vowel plots in Figure 3.3 show the mean midpoint F1 and F2 (Hz) of stressed vowel tokens in these two passages (denoted by arrows), plotted relative to the corresponding vowel measurements from a third version of the passage produced in the taker’s normal accent

(denoted by phonetic symbols), which was recorded only to provide a reference point for the vowel shifts in the two novel accented versions of the passage. To obtain these measures, individual word and segment boundaries in each version of the passage were automatically aligned in Praat (Boersma & Weenink, 2014) using the Penn Phonetics Lab Forced Aligner

(Yuan & Liberman, 2008). The forced-aligned boundaries were hand corrected for accuracy, and the frequencies of F1 and F2 were extracted at vowel midpoint (50% of the duration of the vowel). The plots in Figure 3.3 indicate that the talker produced each of the novel back vowel chain shifts without any substantial change in her realization of non-back vowels.

Note that the vowel /U/ is missing from the vowel plots in Figure 3.3 because the exposure materials contained no instances of this vowel.

3.2.3 Test materials

The test materials were a set of 256 monosyllabic and bisyllabic words and nonwords (see

Table 3.1 for example items and Appendix B.3 for the full set of test stimuli). There were two sets of target words, none of which occurred in the exposure passage. One target set comprised 64 English words pronounced in the novel back vowel lowered accent (e.g.,

“b[o]shel” for bushel, given the lowering of /U/ to [o]). Within this set, there were 16 words

104 Back vowel lowered chain shift Back vowel raised chain shift with incidental gap with incidental gap

400 i 400 i u u 500 500 ɪ ɪ 600 e 600 e o o 700 700 ɛ ɛ 800 æ 800 æ F1 (Hz) at vowel midpoint F1 (Hz) at vowel ɑɔ ɑɔ 900 900

2500 2000 1500 1000 2500 2000 1500 1000 F2 (Hz) at vowel midpoint F2 (Hz) at vowel midpoint

Figure 3.3: Experiment 4. Vowel plots showing the talker’s vowel productions when record- ing the exposure passage in each novel accent. Arrows indicate the mean F1 and F2 (Hz) at vowel midpoint of all stressed tokens of each vowel in the back vowel lowered (left) and back vowel raised (right) versions of the exposure passage. Phonetic symbols indicate the corre- sponding vowel measurements from a third version of the passage recorded in the talker’s normal accent to provide a reference point for the target vowel shifts. pronounced with each of the four vowel shifts that define the back vowel lowered chain shift

(16 × 4 = 64): the three shifts that occurred in the back vowel lowered passage (i.e., /u/ →

[U], /o/ → [A], /O, A/ → [æ]) and the one shift that did not (i.e., the incidental gap: /U/ →

[o]). The second set of target items comprised the same 64 English words but pronounced

in the novel back vowel raised accent. Thus, across this set there were 16 words pronounced

with each of the four vowel shifts that define the back vowel raised chain shift: the three

shifts that occurred in the back vowel raised passage (i.e., /u/ → [i], /o/ → [U], /O, A/ →

[o]) and the incidental gap (i.e., /U/ → [u]). The lexical items used for these two target

sets were selected to sound like nonwords in standard American English when pronounced

either in the back vowel lowered accent or the back vowel raised accent (e.g., “red[U]ce” for

105 Table 3.1: Experiment 4, example test stimuli. Shaded rows indicate test stimuli involving the untrained vowel variant in each novel chain shift (i.e., the incidental gap in experience during the exposure phase).

Item Type N Example Items Spoken Form

Back vowel lowered items /u/ → [U] 16 (8 per list) reduce “red[U]ce” /U/ → [o] 16 (8 per list) bushel “b[o]shel” /o/ → [A] 16 (8 per list) open “[A]pen” /O, A/ → [æ] 16 (8 per list) cottage “c[æ]ttage” Back vowel raised items /u/ → [i] 16 (8 per list) remove “rem[i]ve” /U/ → [u] 16 (8 per list) butcher “b[u]tcher” /o/ → [U] 16 (8 per list) oval “[U]val” /O, A/ → [o] 16 (8 per list) hostage “h[o]stage” Filler words (unshifted vowels) 64 room “r[u]m” Maximal nonwords 64 dorve “d[O]rve” reduce, given the lowering of /u/ to [U], and “rem[i]ve” for remove, given the raising of /u/ to [i]).

In addition to the target vowel-shifted items, there were 128 filler items: 64 filler words pronounced with standard sounding (i.e., unshifted) front vowels and back vowels (e.g,

“r[u]m” for room, “n[I]ckel” for nickel), and 64 phonotactically legal nonwords (e.g., dorve).

The filler nonwords differed from canonical forms in American English by multiple consonant and vowel features. Hence, these items are referred to as ‘maximal nonwords’ to distinguish them from the target vowel-shifted items, which differed from canonical forms by a single vowel shift and which were designed to sound like nonwords only to listeners who lacked knowledge of the cross-category vowel variants. The test stimuli were recorded using the same equipment and digitization procedure as for the exposure materials. After recording, each stimulus item was saved to an individual sound file, downsampled to 22.05-kHz and scaled to an average intensity of 70dB.

106 3.2.4 Procedure

The experiment involved three tasks, which occurred in a fixed order: a passive listening accent familiarization task, followed by an auditory lexical decision task and an auditory naming task, which were designed to provide complementary insight into exposure-driven differences in word recognition. The entire experiment lasted about 45 minutes and was conducted on a Windows PC, with stimulus presentation controlled using E-prime (Schnei- der et al., 2012). Participants sat at a computer desk wearing a Sennheiser HMD 280-13 headset microphone with a button box in front of them. The headset was adjusted so the microphone was about two inches from the corner of the mouth. During the initial accent familiarization task, half of the participants passively listened to the exposure passage read aloud in the novel back vowel lowered accent, and the other half of the participants listened to the passage in the novel back vowel raised accent. The story was presented binaurally over headphones. Participants were simply instructed to pay attention to the content of the story, and they were informed that the story would last about 5 minutes.

For the auditory lexical decision task, stimuli were presented binaurally over headphones one at a time. Participants were instructed to indicate whether each stimulus was a real word of English or a nonsense word by pressing the button on the button box labelled “Word” or

“Nonword”, respectively, as quickly as possible without sacrificing accuracy. Participants used their right index finger to press the “Word” button and their left index finger to press the “Nonword” button. After participants made a valid response, the experiment advanced to the next trial with an inter-trial interval of 1000 ms. Response choice and response time from trial onset were recorded for every trial. The lexical decision task comprised 192 items: the 64 standard sounding filler words, the 64 maximal nonwords, half of the target lexical items (n = 32) pronounced with the novel back vowel lowered chain shift (8 items with each of the four shifts that define the system of lowering), and the other half of the target lexical items (n = 32) produced in the novel back vowel raised accent (8 items with each of the four shifts that define the system of raising). Two lists were created so that the 64 target

107 lexical items were presented equally often in the back vowel lowered and back vowel raised accents across participants. Four fixed pseudorandom orders of each list were created to dissociate item and trial while ensuring two presentation constraints: (i) the target items were separated by at least one filler trial, and (ii) the items containing shifted (lowered and raised) variants of each target vowel category (/u/, /U/, /o/, and merged /O, A/) were approximately evenly distributed throughout the task. Participants were allowed to take a self-paced break halfway through the task.

The auditory naming task was designed as a go/no-go task with stimulus presentation advancing automatically. The stimuli were the same 192 test items from the lexical decision task; however, four different fixed pseudorandom orders were created for the two stimulus lists to ensure the items occurred in different orders across tasks. Stimuli were presented binaurally over headphones one at a time, and participants were instructed to respond only if they recognized a stimulus as a real word of English, in which case they were instructed to identify the word by saying it aloud as quickly as possible without sacrificing accuracy.

If they did not recognize a stimulus as a real word of English, they were instructed to remain silent. The beginning of each trial was signaled by a beep lasting 250ms, followed by 500ms of silence and then presentation of the auditory stimulus. There was a 2000ms response window following each stimulus before the next trial began. Audacity was run in the background to record the audio on each trial (i.e., the onset beep, the auditory stimulus, and participants’ responses, if any) for subsequent analysis of identification accuracy and response latency.

3.2.5 Coding

The dependent measure of interest for the lexical decision task was response type: ‘word’

(coded as 1) vs. ‘nonword’ (coded as 0). Lexical decision response times were not analyzed due to properties of the test stimuli that confound interpretation of these data. In brief, the target vowel shifted items were designed to be perceived differently depending on the

108 preceding exposure phase. Thus, there was technically no correct response for these items

(and hence no way of determining errors), and there was considerable data sparsity across cells of the design. Together, these factors greatly increase the likelihood of spurious re- sults when analyzing response time data (see Chapter 2 for extensive discussion; see also

Diependaele et al., 2012).

The dependent measures of interest for the naming task were response accuracy and response time. Response accuracy was determined using the following blind coding proce- dure. The verbal responses were first transcribed by talkers from the same dialect region as the participants, thereby mediating the likelihood of cross-dialect phonological and lexical confusion when listening to the spoken responses (e.g., Jacewicz & Fox, 2012; Clopper et al.,

2010). Participants’ native dialect region was determined by assessment of their residential history from birth to 18 years of age. With one exception, all participants were from Ohio or from Detroit, Michigan and, hence, from either the North or Midland dialect regions.

Highway US-30 was used as the boundary for these two regions (Clopper & Bradlow, 2008).

Participants from any city on or north of US-30 in Ohio and Michigan were coded as from the North, and naming data from these participants were transcribed by a trained research assistant from Northern Ohio. Participants from any city south of US-30 to the were coded as from the Midland region, and naming data from these participants were transcribed by a trained research assistant from central Ohio (in the Midland region). One participant was mobile across dialect regions: she lived in southern California (the West dialect region) from birth to 4 years of age and then lived in the Midland region from ages 4 to 18. This participant had an impressionistically standard-sounding vowel space

(as judged by the author based on responses to stimulus words containing unshifted vow- els), so naming data from this participant were coded by the transcriber from the Midland region. When transcribing participants’ responses, the transcribers were blind to the test stimulus on each trial. For each participant, transcribers were given a sound file contain- ing the auditory naming data for that participant, and a Praat TextGrid with intervals

109 time-aligned to the response on each trial (if any). The transcribers were instructed to lis- ten to each response over headphones in a quiet room, to orthographically transcribe each response that sounded like a real word of English using the corresponding interval on the

TextGrid, and to write “nonword” in the corresponding interval for any response that did not sound like a real word of English. The transcribed responses were then compared to the corresponding test stimuli to code trial-wise accuracy: correct (= 1) vs. incorrect (=

0). Since the naming task used a go/no-go design, accuracy was coded differently for trials involving maximal nonwords and trials involving real word stimuli (i.e., words pronounced with either standard or shifted vowels). For trials involving maximal nonwords, trial-wise performance was coded as correct if participants made no response; if participants made a response of any kind (e.g., repeating the nonword, or uttering a phonologically similar word like “dwarf” in response to the nonword dorve), performance on that trial was coded as incorrect. For trials involving back vowel lowered items, back vowel raised items, or

filler words pronounced with standard sounding vowels, responses were coded as correct if the transcribed response matched the target lexical item on that trial. If the transcribed response word did not match the target lexical item, or if participants made no response, performance on that trial was coded as incorrect.

Auditory naming response times were measured as the latency between the offset of the stimulus item and the onset of vocalization marking a response. Response times were only analyzed for trials involving correct identification of the target vowel-shifted words and filler words (no analysis of response times to maximal nonwords). Overall, 23% of responses to target and filler words were excluded for involving an incorrect response. Response times were log-transformed to correct for skewness in the distribution of response times. Outliers were defined as response times greater than 2.5 standard deviations from the corresponding subject’s mean response time. Outliers were removed prior to analysis (2.9% of trials involving correct identifications of target and filler words).

110 3.2.6 Analysis

The binary lexical decision response data (‘word’ vs. ‘nonword’ responses) and auditory naming accuracy data (correct vs. incorrect responses) were analyzed using generalized linear mixed-effects regression, as implemented in the lme4 package (version 1.1-7 Bates,

Maechler, et al., 2014) in R (R Core Team, 2014). Response times were analyzed using linear mixed-effects regression. The modeling strategies were the same as for Experiments

1-3: (i) categorical variables were coded as sum contrasts; (ii) model specification involved the design-driven maximal random effects structure (Barr et al., 2013); (iii) in cases of non-convergence, the random effects structure was systematically simplified in a step-wise manner by removing correlations among random effects and then removing the random effects with the smallest variance; and (iv) log-likelihood comparisons were used to assess the significance of fixed effects terms by determining their contribution to model fit. When log-likelihood comparisons revealed significant interactions among fixed effects, the signifi- cance of lower order terms involved in these interactions (i.e., main effects and lower order interactions) was assessed using the Wald’s z-score for logistic regression models or the t-statistic for linear regression models. The current data sets are sufficiently large that a t-statistic with an absolute value greater than or equal to 2 can be assumed to correspond to an alpha value of < .05.

3.3 Results

To assess perceptual learning of the back vowel lowered and back vowel raised chain shifts, three dependent measures were analyzed: endorsement rates during lexical decision (i.e., proportion of ‘word’ responses), identification accuracy during the auditory naming task, and response times for correct naming responses.

111 Exposure Back Vowel Lowering Back Vowel Raising

1.0

0.8

0.6

0.4

0.2 proportion response 'word' 0.0 unshifted unshifted back vowel back vowel maximal front vowel back vowel lowered items raised items nonwords items items Item Type

Figure 3.4: Experiment 4, mean proportion of ‘word’ responses during lexical decision by item type and exposure condition. Error bars indicate bootstrapped 95% confidence intervals.

3.3.1 Lexical decision endorsement rates

Figure 3.4 shows a summary of overall endorsement rates during lexical decision, plotted by exposure condition (the back vowel lowered vs. the back vowel raised accent) for each of the five item types: the target back vowel lowered and back vowel raised items, the filler front vowel and back vowel words pronounced with unshifted (standard sounding) vowels, and the filler nonwords.

The central question of this study was whether listeners can fill in gaps in their expe- rience when adapting to an unfamiliar vowel chain shift: specifically, whether listeners can generalize from experience with vowel shifts affecting the vowels /u, o, O, A/ to an untrained shift affecting the vowel /U/. To address this question, lexical decisions to the target back vowel lowered and back vowel raised items were fitted with a mixed logit model containing

fixed effects for exposure condition, item type, and vowel (a four level factor coding the

112 Table 3.2: Experiment 4, summary of the full mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model. BVR = back vowel raised.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −0.14 0.35 −0.4 0.70 exposure condition (= BVR chain shift) −1.78 0.62 −2.9 <.01 19.83 12 < .05 item type (= BVR items) −0.24 0.30 −0.8 0.43 vowel (= /u/) −0.09 0.73 −0.1 0.90 vowel (= /U/) −0.52 0.68 −0.8 0.45 vowel (= /o/) 0.69 0.66 1.0 0.30 exposure condition : item type 0.77 0.38 2.0 <.05 5.50 4 0.24 exposure condition : /u/ −0.06 0.77 −0.1 0.94   exposure condition :/U/ −0.47 0.61 −0.8 0.44 5.23 6 0.52 exposure condition : /o/ −0.33 0.53 −0.6 0.53  item type : /u/ −8.39 1.09 −7.7 <.001   item type :/U/ 3.19 0.95 3.4 <.001 53.49 6 < .001 item type : /o/ 3.45 0.95 3.6 <.001  exposure condition : item type : /u/ 1.50 1.21 1.2 0.22   exposure condition : item type :/U/ −0.27 0.96 −0.3 0.78 2.42 3 0.49 exposure condition : item type : /o/ −1.30 0.95 −1.4 0.17  identity of the shifted vowel in each target word: /u/, /U/, /o/, and the merged vowels /O,

A/). This model contained random intercepts for subjects and items, by-subject random slopes for item type and vowel, and by-item random slopes for item type and exposure con- dition. The fixed effect parameter estimates from this model are shown in Table 3.2, along with the the results of log-likelihood comparisons testing the contribution of each fixed ef- fect term to model fit. Coefficient estimates are reported in log-odds. Positive coefficients indicate increased log-odds (and hence increased probability) of making a ‘word’ response.

The plots in Figure 3.5 show the behavioral results that were modeled by this analysis: that is, endorsement rates for words containing the target vowel shifts by exposure condition, item type, and vowel.

113 Item Type back vowel lowered items back vowel raised items

Exposure: Back Vowel Lowering Exposure: Back Vowel Raising 1.0

0.8

0.6

0.4

0.2 proportion 'word' response 'word' proportion 0.0 /u/ /ʊ/ /o/ /ɑ,ɔ/ /u/ /ʊ/ /o/ /ɑ,ɔ/ Vowel in Test Items

Figure 3.5: Experiment 4, mean proportion of ‘word’ responses during lexical decision for the target back vowel lowered and back vowel raised items, plotted by exposure condition and target vowel. Error bars indicate bootstrapped 95% confidence intervals.

There was a significant main effect of exposure condition. Listeners who were famil- iarized to the back vowel raised accent endorsed fewer of the vowel shifted items overall than listeners who were familiarized to the back vowel lowered accent (see Figure 3.4), even though the exposure conditions were identical except for the direction of the vowel chain shift. Thus, the degree of flexibility in coping with cross-category vowel variation depended in part on the nature of the vowel shifts that listeners initially experienced. This novel

finding is discussed below in terms of the learning mechanisms that drive adaptation. The main effect of vowel did not contribute significantly to model fit, which suggests that en- dorsement rates for words containing the incidental gap vowel /U/ were comparable overall to words containing the trained back vowels /u, o, O, A/. This pattern is visible in Fig- ure 3.5 by comparing vowel-wise endorsement rates within each exposure condition (since endorsement rates differed by condition), independent of item type.

114 There was a significant two-way interaction between item type and vowel. As shown in Figure 3.5, words containing shifted variants of the vowel /u/ were perceived differently depending on the direction of the vowel shift. Back vowel raised items containing the vowel

/u/ (e.g., “rem[i]ve” for remove, given the shift of /u/ to [i]) tended to be perceived as nonwords, even by listeners who were trained on the back vowel raised accent (right panel of Figure 3.5), whereas listeners in both exposure conditions tended to have less difficulty recognizing back vowel lowered items containing the vowel /u/ (e.g., “red[U]ce” for reduce, given the lowering of /u/ to [U]). Further, the item type by vowel interaction showed that the back vowel raised items containing the untrained vowel /U/ (e.g., “b[u]tcher” for butcher, given the raising of /U/ to [u]) and the trained vowel /o/ (e.g., “[U]val” for oval, given the raising of /o/ to [U]) were endorsed significantly more often, across exposure conditions, than words containing lowered variants of these vowels (e.g., “b[o]shel” for bushel, given the lowering of /U/ to [o], and “[A]pen” for open, given the lowering of /o/ to [A]). There were no other interactions involving vowel. Thus, as shown in Figure 3.5, listeners in each exposure condition showed comparable performance for words containing the vowel /U/ (i.e., the incidental gap vowel) and words containing the other target vowels (except /u/, which patterned differently overall), which suggests that learning extended beyond the specific shifts experienced during the training phase.

There was a numeric relationship between exposure condition and item type: listeners in the back vowel raised condition endorsed a numerically greater proportion of the accent- consistent items at test, whereas listeners in the back vowel lowered condition endorsed both accent-consistent and accent-inconsistent vowel shifted items. This interaction did not reach significance according to log-likelihood comparisons; note, however, that the log-likelihood ratio test was conservative in this case because the full model was compared to a subset model lacking both the exposure by item type interaction and the higher order three-way interaction with vowel. This trend suggests that listeners in the back vowel raised condition learned a direction specific pattern of vowel shifts, while learning in the back vowel lowered

115 Exposure Back Vowel Lowering Back Vowel Raising

1.0

0.8

0.6

0.4 naming response proportion correct 0.2

0.0 unshifted unshifted back vowel back vowel maximal front vowel back vowel lowered items raised items nonwords items items Item Type

Figure 3.6: Experiment 4, mean proportion of correct responses during the go/no-go naming task by item type and exposure condition. Error bars indicate bootstrapped 95% confidence intervals. condition involved a general broadening of perceptual vowel categories to accommodate atypical vowel variation.

3.3.2 Naming accuracy

Figure 3.6 shows a summary of overall accuracy on the auditory naming task plotted by exposure condition and item type. Recall that since the naming task used a go/no-go design, accuracy for the maximal nonwords was measured as the proportion of trials involving no response, whereas accuracy for all other item types (i.e., filler words pronounced with standard sounding vowels, and target words pronounced with shifted vowels) was measured as the proportion of spoken words that were correctly identified.

To assess learning and generalization of the target vowel chain shifts, the trial-wise accuracy data for the back vowel lowered and back vowel raised test items were fitted

116 Item Type back vowel lowered items back vowel raised items

Exposure: Back Vowel Lowering Exposure: Back Vowel Raising 1.0

0.8

0.6

0.4

0.2

proportion correct identification proportion 0.0 /u/ /ʊ/ /o/ /ɑ,ɔ/ /u/ /ʊ/ /o/ /ɑ,ɔ/ Vowel in Test Items

Figure 3.7: Experiment 4, mean naming accuracy for the target back vowel lowered and back vowel raised items, plotted by exposure condition and target vowel. Error bars indicate bootstrapped 95% confidence intervals. with a mixed logit model containing fixed effects for exposure condition, item type, and the identity of the shifted vowel in each test word (i.e., /u/, /U/, /o/, and the merged vowels /O, A/). This model contained a random intercept for subjects, by-subect random slopes for item type and vowel, and by-item random slopes for item type and exposure condition.3 Table 3.3 shows the fixed effect parameter estimates for this model and the results of log-likelihood comparisons testing the contribution of fixed effect terms to model

fit. Coefficients are reported in log-odds. Positive coefficients indicate increased log-odds of correct word identification. Figure 3.7 shows the behavioral results that correspond to this analysis: that is, mean accuracy for the target items by exposure condition, item type and vowel. 3Due to convergence issues, the by-item random intercept was dropped after uncorrelating the item intercept and the by-item random slopes

117 Table 3.3: Experiment 4, summary of the full mixed logit model of identification accuracy on target naming trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 0.16 0.63 0.3 0.80 exposure accent (= BVR chain shift) −2.47 1.15 −2.1 <.05 item type (= BVR items) −0.19 0.36 −0.5 0.60 vowel (= /u/) 0.66 1.09 0.6 0.54 vowel (= /U/) −0.62 1.06 −0.6 0.56 vowel (= /o/) 0.39 1.01 0.4 0.70 exposure accent : item type 2.09 0.45 4.6 <.001 exposure accent : /u/ −0.69 1.03 −0.7 0.50 exposure accent :/U/ 0.57 0.98 0.6 0.56 exposure accent : /o/ −0.30 0.80 −0.4 0.71 item type : /u/ −10.68 1.40 −7.7 <.001 item type :/U/ 4.28 1.11 3.9 <.001 item type : /o/ 4.27 1.14 3.8 <.001 exposure accent : item type : /u/ −0.50 1.66 −0.3 0.76   exposure accent : item type :/U/ −2.32 1.22 −1.9 0.06 10.09 3 < .05 exposure accent : item type : /o/ −0.48 1.32 −0.4 0.72 

The naming results closely mirror the lexical decision results. There was a significant main effect of exposure condition (as indicated by the z-score for this model term, since log- likelihood comparisons revealed a higher order interaction involving this term). The effect of exposure condition indicates that listeners in the back vowel raised condition correctly identified fewer of the vowel shifted items overall than listeners in the back vowel lowered condition (see the aggregate results in Figure 3.6). This main effect was qualified by a significant interaction between exposure condition and item type: listeners in the back vowel raised condition showed greater identification accuracy for the back vowel raised items (i.e., accent-consistent items) than for the back vowel lowered items (i.e., accent-inconsistent items), whereas listeners in the back vowel lowered condition correctly identified more of the vowel shifted items overall regardless of the direction of the shift. There was no main

118 effect of vowel, which indicates that identification accuracy for target words containing

/U/ (the incidental gap vowel) was comparable overall to identification accuracy for words containing the trained back vowels /u, o, O, A/. This pattern is visible in Figure 3.7 by comparing identification accuracy for words containing each vowel within each exposure condition (since accuracy differed by condition), independent of item type.

As in the lexical decision data, there was a significant interaction between item type and vowel. Identification accuracy for words containing shifted variants of the vowel /u/ differed markedly depending on the direction of the vowel shift. Accuracy was significantly lower for words pronounced with the raised variant of /u/ (e.g., “rem[i]ve” for remove, given the shift of /u/ to [i]) than for words pronounced with the lowered variant of this vowel (e.g.,

“red[U]ce” for reduce, given the shift of /u/ to [U]), even for listeners who were trained on the back vowel raised chain shift (see the right panel of Figure 3.7). This finding indicates that the /u/ → [i] shift was fundamentally more difficult for listeners to cope with than the

/u/ → [U] shift, likely due to the greater distance of the former shift in acoustic space. By contrast, identification accuracy was significantly higher overall for words pronounced with raised variants of the vowels /U/ and /o/ (e.g., “b[u]tcher” for butcher, given the raising of

/U/ to [u], and “[U]val” for oval, given the raising of /o/ to [U]) than for words pronounced with lowered variants of these vowels (e.g., “b[o]shel” for bushel, given the lowering of /U/ to [o], and “[A]pen” for open, given the lowering of /o/ to [A]). This effect is apparent in the middle two sets of bars in each panel of Figure 3.7. Further, there was a significant three-way interaction between exposure, item type, and vowel. This interaction concerned accuracy for words containing the vowels /O, A/; hence, the factor structure of the mixed logit model had to be re-leveled to obtain parameter estimates for this effect. This interaction indicated that in the back vowel raised condition, accuracy for words containing raised variants of

/O, A/ (e.g., “c[o]ttage” for cottage, given the shift /A/ to [o]) was greater than for words containing lowered variants of these vowels (e.g., “h[æ]stage” for hostage, given the shift of

119 /A/ to [æ]), relative to the accuracy difference for the same words following exposure to the back vowel lowered accent (β = 3.3, z = 2.7, p < .01).

While these data show variability in word identification accuracy depending on the na- ture of the vowel shift and exposure conditions, a basic pattern is consistent across these data concerning accuracy for words containing the incidental gap vowel /U/. In the back vowel lowered condition, identification accuracy for words containing the untrained but accent- consistent variant of /U/ (i.e., the lowered variant) was comparable to words containing the lowered variants of the trained vowels /o, O, A/ (see the overlapping confidence intervals among the light gray bars in the left panel of Figure 3.7). Likewise, in the back vowel raised condition, identification accuracy for words containing the untrained but accent-consistent raised variant of /U/ was comparable to words containing the raised variants of the trained vowels /o, O, A/ (see the overlapping confidence intervals among the dark gray bars in the right panel of Figure 3.7). Thus, learning extended beyond the specific shifts experienced during the training phase.

3.3.3 Naming latency

Figure 3.8 shows the mean response time for correctly identifying test words containing the target back vowels /u, U, o, O, A/: the filler words pronounced with unshifted realizations of these back vowels, the target words pronounced with lowered variants of these back vowels, and the target words pronounced with raised variants of these back vowels. Response times are plotted by exposure condition and the identity of the target vowel in each test item. Two patterns are apparent in the response time data. First, the largest response time differences appear to be due to the identity of the vowel in the test words: listeners were slowest overall to identify words containing the incidental gap vowel /U/, regardless of the manner in which this vowel was produced (i.e., a shifted or unshifted variant). Second, response times for words containing each target vowel were quite similar across the unshifted, back

120 Item Type unshifted back vowel items back vowel lowered items back vowel raised items

Exposure: Back Vowel Lowering Exposure: Back Vowel Raising

600

400 RT (ms) 200 from stimulus offset from stimulus

0 /u/ /ʊ/ /o/ /ɑ,ɔ/ /u/ /ʊ/ /o/ /ɑ,ɔ/ Vowel in Test Items

Figure 3.8: Experiment 4, mean naming response time for correct responses to the test items containing unshifted (standard-sounding) and shifted back vowels, plotted by exposure condition and target vowel. Error bars indicate bootstrapped 95% confidence intervals. vowel lowered, and back vowel raised item types (i.e., comparing response times within each set of shaded bars in Figure 3.8), particularly for words containing the vowels /u, O, A/.

To assess post-exposure recognition speed for words containing trained and untrained back vowel variants, a mixed linear model was fitted to log response times (measured from stimulus offset) on the subset of trials involving correct identification of back vowel stimulus words. This model contained fixed effects for three factors and all interactions: exposure condition (the back vowel lowered accent vs. back vowel raised accent), item type (filler words pronounced with unshifted back vowels; back vowel lowered items; and back vowel raised items) and the identity of the vowel in the test items (/u/, /o/, the merged vowels

O, A/, and the incidental gap vowel /U/). The random effects structure comprised random intercepts for subjects and items, and by-subject random slopes for item type, vowel and the interaction term. The fixed effect parameter estimates for this model are summarized in Table 3.4, along with the results of log-likelihood comparisons testing the contribution of

121 Table 3.4: Experiment 4, summary of the full mixed linear model of RTs on correct naming trials: coefficient estimates β, standard errors SE(β), and associated t-statistic (= β/SE(β)) for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Test statistic LL comparisons 2 Coef β SE(β) t χ df p 2 ∆ χ∆ (Intercept) 5.88 0.08 69.9 exposure accent (= BVR chain shift) 0.01 0.16 0.1 item type (= BVR items) 0.11 0.05 2.1 item type (= BVL items) 0.11 0.05 2.1 vowel (= /u/) −0.49 0.12 −4.2 vowel (= /U/) 0.40 0.11 3.8 vowel (= /o/) −0.02 0.10 −0.2 exposure accent : item type (= BVR) −0.12 0.07 −1.7  14.66 8 = .07 exposure accent : item type (= BVL) 0.22 0.07 3.0 exposure accent : /u/ −0.20 0.14 −1.4 ) exposure accent :/U/ 0.04 0.11 0.4 11.41 9 0.25 exposure accent : /o/ 0.18 0.07 2.6 item type (= BVR) : /u/ 0.78 0.18 4.4    item type (= BVL) : /u/ −0.91 0.16 −5.5   item type (= BVR) : /U/ −0.19 0.16 −1.2  83.19 12 < .001 item type (= BVL) : /U/ 0.12 0.17 0.7   (= BVR) : /o/ −0.40 0.16 −2.5  item type  item type (= BVL) : /o/ 0.67 0.17 3.9 exposure accent : item type (= BVR) : /u/ −0.37 0.24 −1.5    exposure accent : item type (= BVL) : /u/ 0.30 0.21 1.4   exposure accent : item type (= BVR) : /U/ 0.01 0.21 0.1  4.96 6 0.55 exposure accent : item type (= BVL) : /U/ −0.24 0.23 −1.0   : (= BVR) : /o/ 0.11 0.20 0.5  exposure accent item type  exposure accent : item type (= BVL) : /o/ −0.02 0.22 −0.1

fixed effect terms to model fit. Coefficients are reported in log milliseconds; hence, positive coefficients indicate longer response times.

The main effect of exposure accent (manipulated between subjects) was not significant, indicating that overall response times were comparable between listeners who were famil- iarized to the back vowel lowered accent and listeners who were familiarized to the back vowel raised accent. There was a main effect of item type, as indicated by the criterion of a t-statistic with an absolute value greater than or equal to 2, since log-likelihood comparisons revealed a significant interaction involving item type. The main effect of item type indicates

122 that listeners were slower overall to recognize words pronounced with lowered and raised back vowels, relative to the grand mean response time. Note, however, that the main effect parameter estimates for the back vowel lowered and back vowel raised items were small (βs

= 0.11, which corresponds to a delay in word recognition of about 40 milliseconds), and these estimates were just above the threshold for determining significance (ts = 2.1). Thus, while the atypical vowel variants caused a significant overall decrease in word recognition speed, the magnitude of this decrease was rather small. The implications of this finding are discussed below. There was also a main effect of vowel: response times were faster than on average for words containing the vowel /u/ and slower than on average for words containing the incidental gap vowel /U/. The latter effect suggests that listeners were sensitive to the phonological structure of the exposure materials. Lacking evidence for how the trained talker produces /U/, listeners were slower to recognize words containing this vowel—relative to their average word recognition speed—regardless of whether /U/ was lowered to [o] (e.g.,

“b[o]shel” for bushel), raised to [u] (e.g., “b[u]tcher” for butcher), or pronounced as the unshifted standard-sounding variant [U].

There was a marginal interaction between exposure accent and item type: listeners who were familiarized to the back vowel raised accented tended to be faster to recognize accent- consistent (i.e., back vowel raised) items and slower to recognize the accent-inconsistent back vowel lowered items, relative to their grand mean naming latency. There was also a significant interaction between item type and vowel. Regardless of exposure condition, listeners were slower than average to recognize back vowel raised items containing the vowel

/u/ (e.g., “rem[i]ve” for remove, given the shift of /u/ to [i]) and were faster than average to recognize back vowel lowered items containing the vowel /u/ (e.g., “red[U]ce” for reduce, given the shift of /u/ to [U]). These findings are consistent with the lexical decision and naming accuracy data, which showed that listeners had difficulty accommodating the raising of /u/ to [i], even after training on the back vowel raised accent, whereas listeners in both exposure conditions had little difficulty accommodating the lowering of /u/ to [U].

123 3.4 Discussion

This study investigated generalization to untrained parts of a vowel chain shift to understand the systematicity of perceptual learning. From a sound change perspective, vowel chain shifts are systematic in nature: one vowel shift triggers another, resulting in a chain of co- dependent vowel movement through acoustic-phonetic space (Labov, 1994; Martinet, 1955).

From a learning perspective, however, sensitivity to co-variation among vowel categories is not inherently necessary for adaptation to a vowel chain shift. In principle, listeners could adapt to a system of vowel shifts by learning each vowel shift independently (i.e., a piecemeal learning process). This study used a “leave-one-out” exposure design to probe the nature of learning. During the accent exposure phase, listeners experienced a subset (n-1) of the vowel shifts that defined a novel chain shift (i.e., either a novel “back vowel lowered” or

“back vowel raised” chain shift, manipulated between subjects). At test, listeners performed a lexical decision task and an auditory naming task, during which they heard words that collectively exemplified the full set of novel vowel shifts. Performance on these tasks showed that listeners were able to leverage knowledge about the trained vowel shifts (i.e., shifts affecting the back vowels /u, o, O, A/) to fill in the incidental gap in their experience of the novel chain shift (i.e., to recognize words containing shifted realizations of the untrained back vowel /U/). This finding of generalization to an untrained vowel shift provides strong

evidence against a rule-based account in which listeners adapt to an unfamiliar vowel chain

shift by learning individual vowel shifts.

Response times from the post-exposure auditory naming task suggested that listeners were sensitive to the phonological structure of the exposure materials. Listeners were slower to recognize words containing the incidental gap vowel /U/ (e.g., “b[o]shel” for bushel, given the lowering of /U/ to [o] as part of the back vowel lowered chain shift, and “b[u]tcher” for butcher, given the raising of /U/ to [u] as part of the back vowel raised chain shift), relative to the speed of recognition for words containing the trained vowel categories. Thus, while

124 listeners were able to generalize learning to fill in an incidental gap in their experience, there was a processing cost associated with generalization.

An unexpected finding of this experiment was that exposure-driven perceptual adjust- ments differed depending on the direction of the vowel chain shift that listeners initially experienced. Listeners who were familiarized to the novel back vowel raised accent were subsequently better able to recognize the back vowel raised test items (i.e., accent-consistent items) than the back vowel lowered test items (i.e., accent-inconsistent items), which sug- gests that listeners learned a direct-specific system of variation. By contrast, listeners who were familiarized to the novel back vowel lowered accent were better able to recognize both the accent-consistent back vowel lowered items and the accent-inconsistent back vowel raised items, relative to listeners in the back vowel raised condition. Thus, as a result of exposure to the back vowel lowered accent, listeners became more tolerant of vowel variation among back vowels in general, regardless of the direction of the vowel shifts, which suggests that perceptual learning involved a general expansion of perceptual vowel categories, as opposed to direction-specific learning (cf. Maye et al., 2008). Despite these different adaptive pro- cesses, the exposure-driven word recognition benefits in each exposure condition extended to the incidental gap vowel /U/.

The finding of different learning outcomes for different vowel chain shifts under otherwise identical exposure conditions might be due in part to the “naturalness” of the vowel shift systems: that is, the degree to which a given chain shift follows the general principles of chain shifting. As outlined above, three general principles guide the chain shifting of vowels cross-linguistically, given the phonological structure of vowel systems: (i) long vowels rise,

(ii) short vowels fall, and (iii) back vowels move to the front (see Labov, 1994). Before developing a learning account based on the linguistic “naturalness” of the speech input, it is important to explain how the novel vowel chain shifts used in the current experiments relate to the principles of chain shifting. For the novel back vowel lowered chain shift, all back vowels (including the long vowels /u/ and /o/) were lowered to the nearest neighboring

125 vowel category, which is unnatural according to the principle that long vowels rise. For the novel back vowel raised chain shift, all back vowels (including the short vowels /O/, /A/ and

/U/) were raised to the nearest neighboring vowel category, which is unnatural according to the principle that short vowels fall. Note that the presence of unnatural shifts in each novel chain shift system is not inherently problematic. Diachronic and synchronic cross-linguistic evidence reveals many chain shift systems comprised of constituent shifts “operating in the opposite direction” from what is predicted by the general principles of chain shifting Labov

(1994, p. 155). Thus, as listeners move through the world and listen to people with different accents, they encounter (and must adapt to) natural and unnatural shifts. However, not all “unnatural” shifts are equal. The historic record of attested chain shifts reveals only one exception to the principle that long vowels rise, but multiple exceptions to the other principles (see Labov, 1994, Chapter 5). The principle that long vowels rise is thus the strongest of the three principles (and is perhaps a near linguistic universal). So from a linguistic perspective, lowering long vowels (i.e., violating the strongest principle, as in the case of the back vowel lowered chain shift) is less natural than violating either of the other two principles.

It is possible that the manner in which the speech processing system adapts to an un- familiar chain shift depends on the “naturalness” of the shift system. In particular, when

(some of) the vowel shifts comprising the shift system run counter to the strongest di- rectional constraints on chain shifting, listeners might be biased against direction-specific learning. In such cases, listeners might cope with the unfamiliar vowel shifts by broadening the corresponding perceptual vowel categories to encompass a wider range of variation. Of course, a major caveat is that for this learning account based on linguistic “naturalness” to be tenable, listeners must have knowledge of the general principles of chain shifting (i.e., these principles must be cognitively real, as opposed to parsimonious descriptions of cross- linguistic phonological tendencies). It is conceivable that listeners have such knowledge,

126 either through inference about likely and unlikely vowel shifts based on experience with ex- isting chain shifts, or perhaps due to linguistic competence about the phonological structure of vowel systems (e.g., tacit knowledge of strong constraints on how vowel systems operate).

Another potentially relevant factor regarding the different learning outcomes for the two chain shifts in the current experiment is that /u/ to [i] (one of the shifts comprising the back vowel raised accent) was difficult for listeners to accommodate, even for listeners who were trained on the back vowel raised accent. Thus, it is also possible that listeners had greater difficulty adapting to the back vowel raised chain shift than to the back vowel lowered chain shift (despite being comparatively more natural with respect to the principles of chain shifting), in turn impairing their ability to generalize beyond the properties of this chain shift.

The current study is not the first to find that exposure to different pronunciation vari- ants under otherwise similar or identical exposure conditions markedly influences learning outcomes. In a series of interrelated studies, Kraljic and Samuel (2007, 2006, 2005) found that listeners dynamically recalibrated phonetic categories to cope with atypical segmen- tal variation in the input. For example, hearing a talker whose realization of the fricative

/s/ was acoustically ambiguous between [s] and [S] caused listeners to recalibrate the pho- netic boundary between the categories /s/ and /S/; as a result, the otherwise ambiguous token [sS?] came to be perceived as an instance of the category /s/. When adapting to a talker’s acoustically ambiguous fricative variant (e.g., [sS?] for /s/), phonetic recalibra- tion was talker-specific: that is, listeners did not generalize learning to an untrained talker who produced the same acoustically ambiguous fricative variant (see also Reinisch & Holt,

2014; Eisner & McQueen, 2005). By contrast, when adapting to atypical variation in the realization of stop consonants (e.g., [td?] for /t/ due to a talker’s atypical voice onset time distinciton between [t] and [d]), learning generalized across talkers (Kraljic & Samuel,

2006). These results suggest that listeners can adjust the perceptual representation of talker-specific or talker-independent phonological categories in memory, depending on the

127 nature of the atypical pronunciation variants in the input. Kraljic and Samuel (2006) pro- posed that these different learning outcomes might be due to inherent differences between fricatives and stop consonants. Their proposal was that fricative productions convey rich talker indexical information (e.g., spectral cues provide information about the talker’s sex and vocal physiology), which constrains learning to be talker-specific, whereas temporal variability in voice onset times for stop consonants does not carry talker indexical infor- mation, which allows learning to transfer across talkers. One wrinkle with this account is that talkers not only produce reliable individual differences in voice onset times, even when stronger conditioning factors like speaking rate are controlled for (Allen, Miller, & DeSteno,

2003), but listeners are sensitive to and can learn these talker-specific differences (Allen &

Miller, 2004). Thus, the findings by Kraljic and Samuel (2007, 2006, 2005) cannot be fully explained by an account that proposes different learning processes for different types of segments. With respect to this latter point, the findings of the current study showed that pronunciation variation affecting the same segments but in different directions (i.e., back vowel lowering vs. back vowel raising) resulted in markedly different learning outcomes.

In summary, the results of the current experiment demonstrate that listeners are sensi- tive to systematic co-variation among vowels and can leverage this knowledge to facilitate recognition of accented speech. Specifically, when adapting to a talker with an unfamil- iar vowel chain shift, listeners can generalize experience with specific vowel shifts to fill in gaps in their experience of the talker’s vowel space and thereby improve word recognition.

The results of this study also highlight the fact that perceptual learning is not a singu- lar process: listeners can adapt to atypical segmental variation by broadening or shifting perceptual categories (see also Kleinschmidt & Jaeger, 2015; White & Aslin, 2011), though further research is needed to understand the factors that condition these different processes.

128 Chapter 4

Experiments 5-6: Generalization Across Chain Shift Systems

4.1 Introduction

Vowel categories are immensely variable and highly overlapping in production (Hillenbrand et al., 1995; Peterson & Barney, 1952). This variability is due in part to talker-level factors like vocal physiology and idiolectal articulatory gestures (Nordstr¨om& Lindblom, 1975;

Klatt, 1986), and in part to phonological processes involving cross-category vowel shifts

(e.g., vowel chain shifts, mergers, splits), which result in vastly different pronunciation patterns for the same vowel categories across dialects and accents of a language (Labov,

1994). Thus, listeners must maintain considerable flexibility in how vowels are perceived and represented in order to map words produced by talkers with different accents to the correct lexical representations in memory. Previous research has shown that listeners cope with vowel variation by dynamically adjusting their perceptual vowel space in response to speech input, which in turn facilitates the recognition of words produced with similar vowel properties (Adank & Janse, 2010; Dahan et al., 2008; Evans & Iverson, 2004; Hay, Nolan, &

Drager, 2006; Trude & Brown-Schmidt, 2012; Weber, Di Betta, & McQueen, 2014, see also

Ladefoged & Broadbent, 1957; Chapters 2 and 3). The current experiments investigated the bounds of these exposure-driven word recognition benefits. These experiments focused on systemic cross-category vowel variation resulting from vowel chain shifts, a phenomenon in which multiple vowels are shifted co-dependently in acoustic-phonetic space (Labov, 1994;

Martinet, 1955). Specifically, these experiments used a lexically-guided perceptual learning paradigm to investigate whether exposure-driven recognition benefits occur only for words

129 containing the trained cross-category vowel variants, or whether these benefits extend to words pronounced with untrained but structurally similar vowel variants, as suggested by recent but inconclusive findings (Maye et al., 2008, Experiment 1; Weatherholtz, 2013;

Chapter 3). Generalization is the hallmark of adaptive behavior and provides a window onto the nature of perceptual flexibility (Fenn, Nusbaum, & Pisoni, 2003; Karni & Bertini, 1997).

By testing for generalization of learning to untrained vowel variants, the current experiments aimed to understand the manner in which the perceptual vowel space is adjusted in response to the environment, and thereby to provide insight into the role of vowel variation in spoken word recognition.

To illustrate the extent of the vowel variability that listeners must cope with, consider

first the degree of acoustic-phonetic variability among talkers with the same dialect. Vowel categories can be described, to an approximation, by the mean frequency of the first and second formants (F1 and F2) of corresponding vowel tokens, and likewise relationships among vowel categories can be described by the relative position of these categories in F1 x F2 space. Perceptual identification studies have shown that vowel identification is driven largely by F1 and F2 (Yang & Fox, 2014; Fox, Flege, & Munro, 1995), which indicates that these dimensions are important for the perceptual representation of vowel categories, not just their acoustic characterization. Although F1 and F2 are relevant for vowel perception, vowel categories are massively overlapping in F1 x F2 space, as shown in Figure 4.1, which shows the mean F1 and F2 of American English vowels produced by 140 talkers from the

Inland North dialect region, as reported by Hillenbrand et al.(1995). The ellipse that circumscribes tokens of the back vowel /U/, for instance, overlaps almost entirely with the ellipses that circumscribe neighboring vowel categories. Note that Hillenbrand et al.

(1995) did not plot tokens of the neighboring back vowel /o/ or the front vowel /e/, a decision intended to increase readability of the figure by decreasing category overlap. Similar patterns of category overlap are observed in acoustic analyses of speech from individual talkers (Feagin, 1986; Labov, 1994).

130 i. i 3400 i i i iii. i. iii i 3000 i i

i i ae ae i 2600 i a•e

2200 a

a 1800 &

•aa aaaa a a 1400

c lOOO

u

600 300 450 600 750 900 1050 1200 FIRST FORMANT (Hz)

FIG. 4. Valuesof F1 andF2 for46 men,48 women,and 46 childrenfor 10 vowelswith ellipses fit to thedata ("ae" =/•e/, "a" =/o/, "c" =/3/, "n"=/M, Figure 4.1:"a"=/aq). VowelMeasurements plotfor/e/and/o/have from a studybeen omitted, byand Hillenbrandthe data have been thinned etof redundant al.(1995data points. ) showing the mean F1 and F2 of American English vowels as produced by 140 talkers. Note that to improve the clarity of the display, Hillenbrand et al.(1995) did not plot the vowels /e/ and /o/, and redundant

300 data points were250 omitted. -- i,, AVERAGEFOPd•ANT VALUES FORWOMEN i, . AVERAGEFORMANT VALUES FOR MEN U 400 ,_, 350 t• 500 The degree[.-, 450of vowel category overlap in continuous acoustic parametric space is even

•550 7O0 O greater when• considering650 cross-category vowel shifts that occur across dialects and accents t• • ?50 900 -- PRESENT STUDY of a language. For-- example,PRESENT STUDY Figure 4.2 shows a schematic representation of two different ...... PETERSON & BARNEY ...... PETERSON & BARNEY 1000 850 2800 2400 2000 1600 1200 800 2400 2200 2000 1800 1600 1400 1200 1000 800 vowel chain shifts in American English. The Northern CitiesSECOND Shift FORMANT (left panel)(Hz) is a prominent SECOND FORMANT (Hz) characteristicFIG. 5. ofAcoustic thevowel Inlanddiagrams showing Northaverage formant dialect frequencies andfor comprisesFIG. 6. Acousticvowel a clockwisediagrams showing average rotationformant frequencies of thefor English menfrom the present study and from Peterson and Barney ("ae"=/•e/, womenfrom the present study and from Peterson and Barney ("ae"=/ae/, "a" =/o/, "c" =/3/, "^"=/,q, "a" =/a,/). "a" =/a/, "c" =/3/, "n" =/•/, "a" =/a•/). mid and low vowels (Labov et al., 2006). The Northern California Shift (right panel) is a 3104 d.Acoust. Soc. Am., Vol. 97, No.5, Pt. 1, May1995 Hillenbrandet al.: Acoustic characteristics of vowels 3104 structurally similar system of cross-category variation that affects many of the same vowels but involves a counterclockwise rotation (Hinton et al., 1987; see also Clarke, Elms, &

Youssef, 1995). Given these chain shifts, a vowel token that sounds like [æ] could be a realization of the phonological category /æ/ (as in sad) in standard-sounding American

131 i u i u I U I U e o e o @ @ E 2 O E 2 O æ æ A A (a) Northern Cities Shift (b) Northern California Shift

Figure 4.2: Schematic representation of the co-dependent vowel shifts that characterize the Kodi Weatherholtz Northern CitiesPerceptual Shift learning of (left) systemic vowel and variation theKodi Northern Weatherholtz California ShiftPerceptualAMLaP 2014 (right) learning of systemic in vowel American variation English. AMLaP 2014

English, a fronted realization of /A/ (as in sod) by talkers with the Northern Cities Shift, or

a lowered realization of /E/ (as in said) by talkers with the Northern California Shift. Thus,

listeners not only face a many-to-one mapping problem in that many acoustically different

vowel tokens can map to the same category (i.e., the ‘lack of invariance’ problem: Liberman,

Cooper, Shankweiler, & Studdert-Kennedy, 1967, as suggested by Figure 4.1), they also face

a one-to-many mapping problem in that very similar tokens produced by different talkers

can map to entirely different vowel categories depending on the talker’s dialect. Given the

complexity of this mapping problem, it is not surprising that considerable phonological and

lexical confusion occurs in cross-dialect speech perception (Jacewicz & Fox, 2012; Clopper

et al., 2010).

Listeners adapt to dialect and accent variation in large part via perceptual learning

mechanisms that adjust recognition processes to properties of the input (Sumner, 2011;

Bradlow & Bent, 2008; Kraljic et al., 2008; Floccia et al., 2006; Clarke & Garrett, 2004;

Norris et al., 2003). Several recent perceptual learning studies investigated the nature of the

perceptual adjustments that enable listeners to cope with cross-category vowel shifts, and

thereby recognize words pronounced with these shifts (Weber et al., 2014; Trude, Tremblay,

& Brown-Schmidt, 2013). Results of these studies have been interpreted as evidence that

132 the underlying mechanism involves perceptual shifts that correspond to the direction of the vowel shifts in production (White & Aslin, 2011; Maye et al., 2008). In other words, the ar- gument is that as a result of exposure to a talker with unfamiliar vowel shifts, listeners learn that particular vowels are shifted in a particular direction through acoustic-phonetic space

(e.g., learning that [æ] maps to /A/ for talkers with the Northern Cities Shift, given the cor- responding shift in production). This direction-specific, vowel-specific learning mechanism predicts that exposure-driven word recognition benefits should occur only for words that contain the trained vowel shift(s). Crucially, under this account, exposure to one pattern of cross-category vowel variation should have no influence on recognition of words pronounced with untrained but structurally similar vowel variants, such as structurally parallel shifts affecting different vowels or shifts affecting the same vowels but in a different direction (e.g., compare the Northern Cities Shift and Northern California Shift).

There is increasing evidence, however, that listeners can generalize learning of specific vowel shifts to facilitate recognition of words pronounced with untrained vowel shifts. Re- sults of Experiment 4 (Chapter 3) showed that listeners who were passively familiarized to a talker with a novel “back vowel lowered” chain shift generalized learning to recog- nize words pronounced with an untrained “back vowel raised” chain shift (i.e., a system of shifts affecting the same vowels but in the opposite direction) (see also a recent unpublished study by Weatherholtz, 2013). This finding of cross-system generalization indicates that listeners broadened their perceptual categories for back vowels, resulting in greater overlap among vowel categories in this region of the vowel space, as opposed to listeners learning a direction-specific system of vowel shifts. Further evidence of general category broadening comes from a study of adaptation to cross-category vowel variation by toddlers (White &

Aslin, 2011). White and Aslin (2011) familiarized toddlers to an accent in which /A/ was shifted to sound more like [æ] by showing objects on a screen (e.g., an image of a dog) and having the names of these objects spoken aloud with that vowel shift (e.g, dog as “d[æ]g”).

A subset of test objects were shown during the familiarization phase but were not labeled in

133 order to test for generalization. Toddlers adapted to the vowel shift, as indicated by subse- quent looking times and errors in a visual search task (e.g., “Find the d[æ]g! Do you see the d[æ]g?”), relative to performance by toddlers in a control group who were not trained on the vowel shift. The authors interpreted this finding as evidence for specificity of learning:

“once a particular shift was learned in exposure (/A/ → [æ]), toddlers expected subsequent pronunciations to match this shift” (White & Aslin, 2011, p. 381). However, results of a follow up experiment showed that, after controlling for familiarity effects in their test item set, toddlers in the adaptation condition were better at recognizing previously unlabeled words pronounced with an untrained shift of /A/ to [E](ball as “b[E]ll”). Although limited to a subset of items, this finding of generalization to an untrained vowel variant suggests that adaptation might have involved a general expansion of the vowel category /A/, resulting in both [æ]-like and [E]-like tokens being included in this category.

Results from a study by Maye et al.(2008) provide preliminary evidence that percep- tual learning of cross-category vowel shifts in one region of the vowel space generalizes to structurally parallel shifts in a different region of the vowel space. In this study, listen- ers were familiarized to a synthesized voice with a novel “front vowel lowered” chain shift, which was similar to the Northern California Shift depicted in Figure 4.2b in terms of a counterclockwise rotation affecting English front vowels, though the particular vowel shifts were somewhat different (e.g., there was an additional shift of /i/ to [I], and /A/ was unaf- fected). Adaptation was judged by comparing listeners’ performance on an auditory lexical decision task before and after a passive accent familiarization phase. In their Experiment 1, familiarization to a talker with the front vowel lowered accent facilitated later recognition of accent-consistent word forms from this talker, which were perceived as nonwords before familiarization (e.g., beetle as “b[I]tle”, due to the lowering of /i/ to [I]; witch as “w[E]tch”, due to the lowering of /I/ to [E]). Unexpectedly, this exposure-driven word recognition ben- efit extended to words pronounced with an untrained but structurally parallel “back vowel

134 lowered” chain shift (e.g., choose as “ch[U]se”, due to the lowering of /u/ to [U]; look pro- nounced as “l[o]k”, due to the lowering of /U/ to [o]). Maye et al. discussed the possibility that remapping the vowel space to accommodate the trained front vowel lowered variants resulted in some degree of perceptual generalization across the vowel space, in turn affect- ing the recognition of words pronounced with untrained lowered vowel variants. Although they ultimately rejected this possibility after failing to replicate the apparent generalization effect in a second experiment, an unpublished study from the same lab byBardhan et al.

(2006) reported further evidence of generalization from front vowel lowering to back vowel lowering.

Taken together, the results of these studies on perceptual learning of cross-category vowel shifts suggest that the mechanism by which listeners adapt involves a general broadening of perceptual vowel categories to include a greater range of variability, with some degree of category broadening occurring across the perceptual vowel space (Experiment 4, Chapter

3; Weatherholtz, 2013; White & Aslin, 2011; Maye et al., 2008). An important caveat for interpreting these findings is that all of the exposure and test materials in each study were produced by a single talker (or synthesized voice). Thus, it is unclear whether listeners developed a perceptual representation of the trained talker’s vowel space in which vowel categories for this talker were broadly defined, or whether listeners adjusted the perceptual representation of talker-independent vowel categories. The current study aimed to distin- guish these accounts by testing whether perceptual learning of unfamiliar cross-category vowel variation generalizes to untrained vowel shifts produced by a new talker. Thus, in the current experiments, generalization to untrained vowel variants was contingent on listeners abstracting over idiolectal pronunciation variation from the trained talker to adjust vowel perception talker independently.

During the initial phase of the current experiments, one group of listeners (those in the adaptation condition) heard an excerpt from a popular story, The Adventures of Pinocchio, read aloud by a talker with a novel “back vowel lowered” chain shift—a clockwise rotation

135 among English back vowels (e.g., /U/ pronounced as [o], wooden as “w[o]den”; /o/ pro- nounced as [A], nose as “n[A]se”). Listeners in the control condition were familiarized to the same talker speaking in a standard sounding American English accent. A novel chain shift was used to avoid familiarity effects based on participants’ dialect experience, and the decision to use a system of back vowel shifts was motivated by results suggesting that the high confusability among select pairs of neighboring front vowels influences learning of trained vowel shifts (see White & Aslin, 2011, for discussion of both points). Following this initial exposure phase, listeners performed an auditory lexical decision task and a word identification task to assess exposure-driven differences in recognition of words containing trained and untrained cross-category vowel shifts. The target stimuli for these test tasks comprised two sets of items. One set were words produced by the trained talker in the back vowel lowered accent. These items were designed so that the vowel shifts resulted in nonword surface forms in standard American English (e.g., “w[o]den” for wooden). Thus, it was predicted that listeners in the adaptation condition would be better able to recognize these word forms, relative to listeners in the control condition who would perceive these forms as nonwords. The second set of test items were words produced by a new talker with an untrained chain shift, which was manipulated across experiments to test for different patterns of generalization to untrained but structurally similar vowel variants.

Experiment 5 aimed to replicate and extend the category broadening effect found in

Experiment 4 (Chapter 3) by testing for generalization from the trained back vowel lowered chain shift to an untrained “back vowel raised” chain shift produced by a different talker.

Experiment 6 investigated the question of vowel specificity by testing for generalization from the trained back vowel lowered chain shift to a novel untrained “front vowel lowered” chain shift produced by a different talker (i.e., a structurally parallel system of vowel shifts in a different region of the vowel space). Since the untrained vowel variants were produced by an untrained talker, any evidence of generalization cannot be explained by listeners developing

136 a skewed representation of the trained talker’s vowel space, but instead indicates perceptual adjustments to talker-independent vowel categories.

4.2 Experiment 5

Experiment 5 used generalization performance in a lexically-guided perceptual learning paradigm to assess whether the perceptual adjustments involved in coping with unfamiliar cross-category vowel variation are direction-specific. This experiment established a rigorous test of generalization (and hence a rather liberal test of specificity) by investigating whether exposure to a single talker with a novel “back vowel lowered” chain shifted influenced later recognition of words produced by a different talker with an untrained “back vowel raised” chain shift, a system of shifts in the opposite direction from the trained chain shift. Given that generalization in this experiment requires transferring learning simultaneously across talkers and vowel shifts, finding such generalization would provide strong evidence against a direction-specific learning mechanism.

4.2.1 Method

Participants

A total of 59 undergraduates at The Ohio State University participated in Experiment 5 in exchange for partial course credit. Of these participants, 26 were excluded because they did not meet the inclusion criteria of being native monolingual English speakers with normal speech and hearing, and one was excluded for exhibiting exceptionally low accuracy on non- target trials during lexical decision (<70% accuracy in rejecting maximal nonwords). After exclusions, there were 32 usable participants evenly split between the adaptation condition and the control condition.

137 i u i u I U I U e o e o @ @ E 2 E 2 æ æ A,O A,O

(a) back vowel lowered chain shift (b) back vowel raised chain shift

Kodi Weatherholtz Figure 4.3:Perceptual Experiment learning of systemic vowel variation 5, schematicKodi representation Weatherholtz AMLaP 2014 of thePerceptual trained learning of systemic talker’s vowel variation back vowel low- AMLaP 2014 ered accent (left) and the new talker’s back vowel raised accent (right).

Exposure materials

The exposure materials were two spoken versions of an excerpt from a popular children’s

story, The Adventures of Pinocchio. These exposure materials were the same ones used

in Experiment 2. The procedure for creating and recording these passages is described in

detail in Chapter 2, Section 2.2.1, and the acoustic, phonetic and lexical properties of these

materials are detailed in Section 2.3.1. Only the most relevant information is repeated here.

The exposure materials were recorded by a trained phonetician who speaks natively with

a standard-sounding Midland American English accent. This talker recorded the excerpt

once in her native accent and once in a novel accent characterized by a systematic clockwise

rotation of English back vowels. A schematic representation of the novel vowel chain shift is

shown in Figure 4.3a: the vowel /u/ was lowered and fronted to sound like [U] (e.g., goose as

“g[U]s”); /U/ was lowered and backed to sound like [o] (e.g., wooden as “w[o]den”); /o/ was

lowered and fronted to sound like [A] (e.g., nose as “n[A]se”); and the vowels /O/ and /A/,

which are merged in the talker’s native variety, were fronted to sound like [æ] (e.g., closet as

“cl[æ]set”). For convenience, the novel chain shift is referred to as “back vowel lowering”,

even though the vowel shifts involved variation in vowel height, backness, trajectory and

duration.

138 Each version of the passage was about 6-minutes long when spoken aloud. Each version of the passage contained 184 stressed tokens of the target back vowels /u, U, o, O, A/, and these tokens were distributed across 85 unique lexical contexts (/u/: n = 41, lexical contexts

= 23; /U/: n = 28, lexical contexts = 9; /o/: n = 67, lexical contexts = 26; /O/: n = 18, lexical contexts = 11; /A/: n = 30, lexical contexts = 16). Further, there was considerable acoustic variability in F1 x F2 space among the target back vowel tokens in each version of the exposure passage. This acoustic variability was expected, since the target vowel tokens were produced naturally in a range of lexical (and phonotactic) contexts. Despite this token variability, the talker consistently produced the target cross-category vowel shifts for the back vowel lowered version of the exposure passage.

Test materials

The test materials were a set of 220 monosyllabic and bisyllabic words and nonwords (see

Table 4.1 for example items). There were two sets of target items. One set comprised 30 back vowel lowered surface forms, which were real words of English but pronounced in the novel back vowel lowered accent that participants in the adaptation condition experienced during the exposure phase (see Figure 4.3a). All of the back vowel lowered items sounded like nonwords in standard American English (e.g., wooden as “w[o]den”). Of the 30 back vowel lowered items, 10 were lexical items that occurred in the exposure passage (i.e., trained lexical items; average number of tokens during training = 2.1; range = 1 - 4) and

20 were lexical items that occurred only at test (new lexical items).

The second set of target items comprised 30 English words pronounced with a novel untrained “back vowel raised” chain shift. As shown in Figure 4.3, the back vowel raised chain shift was designed to resembled the trained back vowel lowered chain shift but with the vowels shifted in the opposite direction: the vowels /O/ and /A/ were raised to sound like [o] (e.g., wash as “w[o]sh”); /o/ was raised and fronted to sound like [U] (e.g., broken as

“br[U]ken”); and /U/ was raised and backed to sound like [u] (e.g., foot as “f[u]t”). All of the

139 Table 4.1: Example test stimuli for Experiment 5.

Item Type N Example Items and Phonetic Forms

Back vowel lowered (trained talker only) trained lexical items 10 wooden “w[o]den” nose “n[A]se” new lexical items 20 bushel “b[o]shel” expose “exp[A]se”

Back vowel raised (new talker only) trained lexical items 10 took “t[u]k” broken “br[U]ken” new lexical items 20 butcher “b[u]tcher” gross “gr[U]ss”

Filler words (unshifted vowels) 100 room “r[u]m” nickel “n[I]ckle”

Maximal nonwords 60 dorve “d[O]rve” yolash “y[o]l[æ]sh” back vowel raised items sounded like nonwords in both standard American English and the back vowel lowered accent. For example, the back vowel lowered item “br[U]ken”, which is a nonword surface form in standard American English, maps to the real English word broken if [U] is interpreted as a raised variant of /o/, but maps to the nonword “br[u]ken” if [U] is interpreted as a lowered variant of /u/. Thus, ‘word’ responses to the back vowel raised items during the lexical decision task cannot be explained by listeners misinterpreting these items as words pronounced with lowered back vowels. Of the 30 back vowel raised items,

10 were lexical items that occurred in the exposure passage (trained lexical items; average number of tokens during training = 2.7; range = 1 - 5), and 20 were new lexical items that occurred only at test. Note that for these trained items, the training was in conflict with the vowel variants experienced at test. That is, listeners heard these lexical items during the exposure phase pronounced with either lowered back vowels or standard sounding back vowels (depending on the exposure condition), and then these lexical items were heard with raised back vowels at test.

In addition to the 60 target items, there were 160 filler items: 100 standard sounding

English words pronounced with unshifted back and front vowels (e.g., room as “r[u]m”, nickel as “n[I]ckle”), and 60 phonotactically legal surface forms that were nonwords in standard American English, the novel back vowel lowered accent, and the novel back vowel

140 raised accent (e.g., yolash [jolæS]). Each of these latter nonword stimulus items differed from canonical forms in American English by multiple consonant and vowel features, which were altered unsystematically across the item set to prevent listeners (particularly those in the novel accent adaptation condition) from detecting any pattern that would help them perceive these items as real words. For clarity, these latter nonword stimuli are referred to as “maximal nonwords” to distinguish them from the target back vowel lowered and back vowel raised items, which were designed to sound like nonwords only to listeners who lacked knowledge of the cross-category vowel shifts.

The test stimuli were recorded by two trained phoneticians who speak natively with a standard-sounding Midland American English accent: the same female who recorded the exposure materials (i.e., the trained talker) and a new female. Since the primary question of interest was whether perceptual learning of one vowel chain shift generalizes across talkers to an untrained chain shift, the back vowel lowered test items were only recorded by the trained female talker, and the back vowel raised items (i.e., target items produced with the untrained chain shift) were only recorded by the new female talker. All 160 filler items were recorded by both talkers. The recording equipment and digitization procedure were the same as for the exposure materials. After recording, each stimulus item was saved to an individual sound file, downsampled to 22.05-kHz and scaled to an average intensity of

70dB. The vowel plots in Figure 4.4 show the mean midpoint frequency (Hz) of the first and second formants (F1 and F2) from each talker’s shifted vowel productions in the target test items (denoted by arrows), relative to the mean midpoint F1 and F2 of vowels from the filler words pronounced with unshifted standard-sounding vowels (denoted by phonetic symbols).

To obtain these measures, the segment boundaries for each test item were automatically aligned in Praat using the Penn Phonetics Lab Forced Aligner (Yuan & Liberman, 2008); boundaries were hand corrected for accuracy, and then formant frequencies were extracted at 50% of the duration of vowel. These vowel plots indicate that each talker produced the target cross-category vowel shifts consistently on average.

141 trained female new female (back vowel lowering) (back vowel raising) 300 i i u u 500 e e ɪ ʊ ɪ ʊ o o 700

ɛ ɛ 900 æ ɑɔ ɔ

F1 (Hz) at vowel midpoint F1 (Hz) at vowel ɑ æ 3000 2500 2000 1500 3000 2500 2000 1500 F2 (Hz) at vowel midpoint

Figure 4.4: Production of the novel back vowel lowered chain shift by the trained talker (left), and production of the novel back vowel raised chain shift by the new talker (right). Phonetic symbols indicate the mean midpoint F1 and F2 (Hz) of each talker’s normal vowels, based on stressed vowel tokens from the filler words containing unshifted vowels. Arrows indicate the mean midpoint F1 and F2 of stressed vowels in the target items containing shifted vowels from each talker.

Procedure

A schematic representation of the experiment is shown in Figure 4.5. The experiment occurred in three phases: an initial passive listening task, followed by an auditory lexical decision task and a word identification task, which were designed to investigate exposure- driven differences in word recognition. The entire experiment was conducted on a Windows

PC computer and lasted about 40 minutes. Participants sat at a desk with a button box and a keyboard in front of them, and auditory stimuli were presented binaurally through

Sennheiser HMD280-13 headphones, with stimulus presentation controlled using E-prime

(Schneider et al., 2012). The exposure phase was manipulated between subjects. Half of the participants passively listened to the exposure passage read aloud in the novel back vowel lowered accent (adaptation condition). The other half of the participants listened to the

142 Exposure Phase Test Phase (passive listening task) i u I U Lexical Decision, Block 1: Lexical Decision, Block 2: Adaptation condition: Trained talker, New talker, Back vowel lowering e o back vowel lowering back vowel raising @ i u i u E 2 I U I U æ e o + e o A,O @ @ E 2 E 2 i u æ æ Kodi Weatherholtz PerceptualI learning of systemic vowel variationU AMLaP 2014 A,O A,O Control condition: e o Midland AmEng @ E 2 Kodi Weatherholtz Perceptual learning ofWord systemic vowel variationIdentification,Kodi Weatherholtz singlePerceptualAMLaP learningblock: 2014 of systemic vowel variation AMLaP 2014 æ trained and new talkers intermixed A,O

FigureKodi Weatherholtz 4.5: SchematicPerceptual learningrepresentation of systemic vowel variation of theAMLaP exposure-test 2014 paradigm for Experiment 5. same passage read aloud by the same talker but in a standard-sounding Midland accent of

American English (control condition). For this phase of the experiment, participants were instructed that they would listen to a 6-minute story and that their only task was to pay attention to the content of the story.

After the story ended, participants began the lexical decision task. Participants were instructed that for this task they would hear a series of spoken stimuli presented one at a time and that their job was to indicate whether each stimulus was a real word of English or a nonword by pressing the button labelled “Word” or “Nonword”, respectively, on the button box in front of them. Participants used their right index finger to respond “Word” and their left index finger to respond “Nonword”. Participants were instructed to respond as quickly as possible without sacrificing accuracy. The experiment advanced to the next trial once participants made a valid response, with an inter-trial interval of 1000 ms. On each trial, participants’ response (word vs. nonword) and response time from trial onset were recorded. The lexical decision task was blocked by talker. The trained talker block comprised the 30 target back vowel lowered items (i.e., accented items that were consistent

143 with the novel accent that participants in the adaptation condition experienced), along with half of the items from each of the filler item sets: 50 words pronounced with unshifted vowels and 30 maximal nonwords. The trained talker block always occurred first in order to assess exposure-driven differences in response to the back vowel lowered items immediately following the exposure phase, during which half of the participants were trained on this accent. The new talker block occurred second and comprised the 30 target back vowel raised items (i.e., words pronounced with the untrained novel chain shift) and the other half of the filler items. The target items were unique to each block since each talker only produced items in one of the two novel accents. Two lists were created to counterbalance the filler items across participants, and four fixed pseudorandom orders of each list were created to dissociate item and trial. Target items were always separated by at least one

filler trial and were approximately evenly distributed throughout each block. Participants were allowed a self-paced break between the trained talker block (first) and new talker block (second). Before beginning the new talker block, participants were instructed that the stimulus materials would be produced by a different talker than in the previous block.

After the lexical decision task, participants were instructed to move the button box aside and bring the keyboard to a comfortable typing position for the word identification task. Participants were informed that they would again be hearing spoken stimuli presented one at a time, but that instead of judging the lexicality of each stimulus, their new task was to orthographically transcribe each stimulus item that they recognized as a real word of English. Participants were instructed to respond by pressing ‘x’ for each stimulus item that they did not recognize as a real word of English. The experiment advanced to the next trial once participants typed a response and pressed the Enter key. The inter-trial interval was 1500ms. The typed character sequence for each trial was recorded. For this task, participants heard the 30 back vowel lowered target items from the trained talker, the 30 back vowel raised items from the new talker, and a subset of standard-sounding test

144 words containing unshifted back vowels (20 total: 10 each from the trained and new talker).

Items from each talker were intermixed, and presentation order was randomized.

Coding

The dependent measure of interest for the lexical decision task was listeners’ response choice on target trials: ‘word’ (coded as 1) vs. ‘nonword’ (coded as 0). There was technically no correct response for the target back vowel lowered and back vowel raised items, which were designed to be perceived differently depending on pre-test exposure conditions: i.e., as nonwords by participants who listened to a standard-sounding American English talker during the exposure phase and potentially as words by participants who were familiarized to the novel back vowel lowered accent, depending on the specificity of perceptual learning.

Lexical decision response times were not analyzed for the reasons detailed in Chapter 2

(see Section 2.2.1). Briefly, the most serious issue for statistical analysis of response times is data sparsity between exposure conditions: since the target items were designed to be perceived as nonwords in the control condition, it is not possible to reliably estimate the speed with which listeners in this condition were able to perceive the vowel shifted items as real words. Thus there is no baseline against which to assess exposure-driven changes in response latency.

The dependent measure of interest from the word identification task was identification accuracy: correct (coded as 1) vs. incorrect (coded as 0). A response was coded as correct if and only if the response was a correct spelling of the target word (alternate spellings were allowed, such as “humour” for “humor”) or a correct spelling of a homophone of the target word (e.g., “doe” for “dough”). This coding criterion provides a conservative measure of identification accuracy by avoiding inherent complications in distinguishing typographic errors (i.e., correct perceptual identification, but erroneous response) from misidentifications

(i.e., hearing one word as a different but similar-sounding word), as in the case of the typed sequence ‘oat’ in response to the stimulus “boat”. To estimate the number of responses

145 that were scored as incorrect due to spelling, the generalized Levenstein distance (i.e., edit distance) between each response and the corresponding target word was calculated—i.e., the number of insertions, deletions, and substitutions needed to transform one string into another. Only 3.6% of responses had an edit distance of 1, indicating a limited number of potential minor spelling-related issues.

Analysis

Generalized linear mixed-effects models (see Jaeger, 2008, for an introduction) were used to analyze the binary lexical decision data (‘word’ vs. ‘nonword’ responses) and the word identification data (correct vs. incorrect responses) using the lme4 package (version 1.1-7

Bates, Maechler, et al., 2014) in R (R Core Team, 2014). Across analyses, categorical vari-

ables were coded as sum contrasts. All mixed effects models were defined to include the

design driven maximal random effects structure: random intercepts for subjects and items,

by-subject random slopes for all design variables manipulated within subjects, and by-item

random slopes for all design variables manipulated within items (for discussion of this ap-

proach, see Barr et al., 2013). Mixed-effects models were fit with the bobyqa optimizer,

and the minimum number of iterations for each analysis was determined by squaring the

total number of model parameters (i.e., all fixed effects terms, random effects terms, and

correlations among random effects) and multiplying this product by 10 (Bates, Mullen, et

al., 2014). If the design driven maximal model failed to converge in the allotted number

of iterations, the random effects structure was simplified in a step-wise manner to find a

model that converged on reliable parameter estimates: random intercepts were uncorre-

lated from the corresponding by-unit random slopes, and the random effects term with the

least variance was dropped. Log-likelihood model comparisons were used to determine the

contribution of fixed effect terms. For this approach, each fixed effect term was removed

one at a time from the full model that converged (along with all higher-order interactions

containing this fixed effect term), and the fit of the subset model, measured as deviance

146 Table 4.2: Experiment 5, mean proportion of ‘word’ responses during the lexical decision task by item type, exposure condition (BVL = back vowel lowered) and talker. Standard errors are in parentheses. Shaded cells highlight the difference on target trials by exposure condition.

Trained talker New talker BVL Midland BVL Midland Item Type chain shift accent chain shift accent exposure exposure exposure exposure

Back vowel lowered items 0.59 (0.078) 0.16 (0.041) – – Back vowel raised items – – 0.50 (0.075) 0.10 (0.046) Filler words (unshifted vowels) 0.99 (0.003) 0.97 (0.006) 0.98 (0.008) 0.97 (0.007) Maximal nonwords 0.07 (0.014) 0.05 (0.010) 0.07 (0.013) 0.08 (0.015)

(-2 × the log-likelihood ratio), was compared to the fit of the full model using a χ2-test on the difference in model deviance, with the degrees of freedom equal to the difference in the number of parameters for the two models. When log-likelihood comparisons revealed significant higher-order interactions among fixed effects, the significance of the lower-order terms (i.e., the constituent main effects and lower-order interaction terms) was assessed using Wald’s z-score.

4.2.2 Results

Endorsement rates

Table 4.2 shows the mean endorsement rate by exposure condition for each of the test item types: the target back vowel lowered items produced by the trained female talker, the target back vowel raised items produced by the new female talker, the filler words pronounced with unshifted (standard sounding) vowels, and the filler nonwords. Listeners in the adaptation condition endorsed numerically more of the target vowel-shifted items as words, compared to listeners in the control condition, whereas the endorsement rates for each filler item type were nearly identical across exposure conditions and talkers.

147 Table 4.3: Experiment 5, summary of the full mixed logit model of lexical decisions to the target vowel-shifted items: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −1.29 0.37 −3.5 <.001 exposure accent (= back vowel lowering) 3.12 0.68 4.6 <.001 17.80 4 <.01 test accent (= back vowel raising by new talker) −0.74 0.46 −1.6 0.11 5.03 4 0.28 item status (= new items) −0.23 0.39 −0.6 0.54 1.89 4 0.77 exposure accent : test accent −0.51 0.65 −0.8 0.43 1.65 2 0.44 exposure accent : item status −0.02 0.43 0.0 0.96 1.47 2 0.48 test accent : item status −0.30 0.79 −0.4 0.70 1.58 2 0.45 exposure accent : test accent : item status 1.14 0.92 1.2 0.21 1.47 1 0.23

2 2 Note. All factors were coded as sum contrasts. For the log-likelihood comparisons, χ∆ = χ -test over the difference in model deviance (-2 × ∆log−likelihood); degrees of freedom (df) = difference in number of model parameters (i.e., the number of levels of the omitted factor plus all higher-order interactions containing that factor); p 2 = significance level for difference in model deviance given the degrees of freedom. χ∆

To assess the influence of accent exposure on perception of the words pronounced with the shifted vowel variants, a mixed logit regression model was fitted to lexical decisions on target trials with fixed effects for three factors and all interactions: exposure accent (the novel back vowel lowered accent vs. the standard sounding Midland accent, manipulated between subjects), test accent (words produced in the back vowel lowered accent by the trained talker vs. words produced in the back vowel raised accent by the new talker), and item status (trained vs. new lexical items). The random effects structure comprised random intercepts for subjects and items, by-subject random slopes for test accent, item status and their interaction, and by-item random slopes for exposure accent. Parameter estimates for the fixed effects in the full model are shown in Table 4.3, along with the results of log- likelihood comparisons testing whether each fixed effect term contributed significantly to model fit. Positive parameter estimates indicate increased log-odds (and hence increased probability) of making a ‘word’ response. This model revealed a single main effect of exposure accent. Neither test accent, item status, nor any interaction terms contributed significantly to model fit. As shown in Figure 4.6, listeners who were initially familiarized to

148 Exposure ● BVL chain shift ● Midland accent

trained talker: new talker: back vowel lowered accent back vowel raised accent 1.0

0.8

0.6 ● ● ● ● 0.4

0.2 ● ● ● ●

Proportion response 'word' 0.0 trained new trained new Item Status

Figure 4.6: Experiment 5. Mean proportion of ‘word’ responses during lexical decision for the target vowel-shifted items: the back vowel lowered items from the trained talker (left) and the back vowel raised items from the new talker (right). Results are plotted by item status and exposure condition (BVL = back vowel lowered). Large points indicate condition grand means. Error bars indicate bootstrapped 95% confidence intervals. Small points indicate subject-wise condition means (jittered and transparent to show overlap). the novel back vowel lowered accent subsequently endorsed more of the vowel-shifted items than participants in the control condition, regardless of whether these items were produced in the trained back vowel lowered accent (by the trained talker) or in the untrained back vowel raised accent (by the new talker), and regardless of whether the target lexical items occurred during the exposure passage or only at test.

One explanation for these results is that exposure to the novel back vowel lowered accent caused a direction-independent broadening of vowel categories, which in turn influenced later perception of words pronounced with either the lowered or raised back vowel variants.

However, it is possible that the aggregate lexical decision results were due to two different patterns of direction-specific learning: listeners in the adaptation condition could have

149 learned the direction-specific pattern of back vowel lowering during the exposure phase, enabling them to cope with the trained talker’s accent, and then later when they encountered the new talker, they accommodated this accent by rapidly learning the direction-specific pattern of back vowel raising. In the latter case, there should be no effect of exposure accent on lexical decisions to the back vowel raised items at the beginning of the new talker test block, followed by a sharp endorsement increase for these items among listeners in the adaptation condition, indicating rapid adaptation at test.

To test whether the finding of cross-accent generalization was due to rapid adaptation to the new talker at test, lexical decisions to the target vowel-shifted items were fitted with a mixed logit model containing fixed effects of exposure accent, test accent, target trial number (1-30 for each test accent; centered to reduce collinearity) and all interactions.

The random effects structure comprised random intercepts for subjects and items, a by- subject random slope for test accent, and a by-item random slope for exposure condition.

The fixed effect parameter estimates for this model are summarized in Table 4.4. As in the aggregate analysis reported in Table 4.3, there was a main effect of exposure accent and no significant effect of test accent. There was also a significant main effect of trial, indicating that listeners tended to endorse more of the target vowel shifted items as words as the task progressed. The main effect of trial suggests some degree of adaptation at test, independent of the exposure condition and test accent. Crucially, however, there were no significant interactions involving trial and either exposure accent or test accent. As shown in

Figure 4.7, the effect of exposure was present from the outset of the test phase for both the back vowel lowered items from the trained talker and the back vowel raised items from the new talker, and this effect persisted across the task. Thus, these data provide no evidence for rapid adaptation at test to the new talker with the untrained accent.

150 Table 4.4: Experiment 5, summary of the full by-trial mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −1.33 0.37 −3.7 <.001 exposure accent (= BVL chain shift) 3.10 0.67 4.6 <.0001 17.91 4 <.01 test accent (= back vowel raising by new talker) −0.78 0.43 −1.8 0.07 4.73 4 0.32 trial 0.04 0.01 4.1 <.0001 18.61 4 <.001 BVL chain shift : test accent −0.27 0.63 −0.4 0.67 0.29 2 0.87 BVL chain shift : trial 0.00 0.02 0.3 0.79 0.19 2 0.91 test accent : trial 0.02 0.02 1.0 0.33 1.28 2 0.53 BVL chain shift : test accent : trial 0.01 0.03 0.4 0.69 0.14 1 0.70

Exposure ● BVL chain shift ● Midland accent

trained talker: new talker: back vowel lowered accent back vowel raised accent 1.0

0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Proportion response 'word' 0.0 ● ● 1 10 20 30 1 10 20 30 Target trial during lexical decision

Figure 4.7: Experiment 5. Mean trial-wise proportion of ‘word’ responses during lexical decision for the target vowel-shifted items: the back vowel lowered items from the trained talker (left) and the back vowel raised items from the new talker (right). Results are plotted by exposure condition (BVL = back vowel lowered). Regression lines indicate binomial best fit curves (curvature is not observable because the proportion range for each effect is small). Error ribbons indicate bootstrapped trial-wise 95% confidence intervals.

151 Exposure BVL chain shift Midland accent

trained talker: new talker: back vowel lowered accent back vowel raised accent 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Proportion correct identification

0.0 0.0 unshifted trained new unshifted trained new back vowel lexical items lexical items back vowel lexical items lexical items items pronounced w/ pronounced w/ items pronounced w/ pronounced w/ shifted vowels shifted vowels shifted vowels shifted vowels

Figure 4.8: Experiment 5. Mean word identification accuracy as a function of exposure condition, item type and test talker. Error bars indicate bootstrapped 95% confidence intervals on condition means.

Word identification accuracy

Mean word identification accuracy is shown in Figure 4.8. A mixed logit regression analysis was conducted to test whether accent exposure influenced identification accuracy for the words produced with shifted and unshifted vowels from each test talker. This model con- tained fixed effects for three factors and all interactions: exposure accent (the novel back vowel lowered accent vs. the standard sounding Midland accent, manipulated between sub- jects), test accent (words produced in the back vowel lowered accent by the trained talker vs. words produced in the back vowel raised accent by the new talker), and item type.

Item type was a three-level variable distinguishing the following test items: (i) lexical items that occurred during the exposure passage and that were produced at test with shifted vowels (with lowered back vowels by the trained talker, or with raised back vowels by the new talker); (ii) new lexical items that occurred only at test and that were produced with

152 Table 4.5: Experiment 5. Summary of the full mixed logit model of word identification accuracy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z- score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 0.85 0.34 2.5 <.05 exposure accent (= back vowel lowering) 1.59 0.60 2.7 <.01 test accent (= back vowel raising by new talker) −0.25 0.38 −0.6 0.52 3.26 6 0.77 item type (= trained lexical items w/ shifted vowels) −2.62 0.57 −4.6 <.001 item type (= new lexical items w/ shifted vowels) −2.30 0.56 −4.1 <.001 exposure accent : test accent −0.27 0.38 −0.7 0.48 1.76 3 0.62 exposure accent : item type (= trained) 1.60 0.60 2.7 <.01  8.56 4 =.07 exposure accent : item type (= new) 1.07 0.74 1.4 0.15 test test accent : item type (= trained) −0.92 1.05 −0.9 0.38  2.77 4 0.59 test talker : item type (= new) 0.91 0.91 1.0 0.32 exposure accent : test accent : item type (= trained) −0.87 0.84 −1.0 0.30  1.42 2 0.49 exposure accent : test accent : item type (= new) 0.56 0.74 0.8 0.44 shifted vowels (either lowered or raised back vowels, depending on the test talker); or (iii) standard sounding words produced by each talker. The model contained by-subject and by-item random intercepts and by-subject random slopes for test accent and item type.

The fixed effect parameter estimates for this model are summarized in Table 4.5, along with the results of log-likelihood comparisons testing whether each fixed effect contributed significantly to model fit. Fixed effects parameter estimates are reported in log-odds, and positive coefficients indicate increased log-odds of a correct response. There was a main ef- fect of exposure accent, indicating that identification accuracy was higher overall following exposure to the back vowel lowered accent than following exposure to the standard-sounding

American English accent. There was also a main effect of item type, indicating that iden- tification accuracy was lower overall for test items produced with the cross-category vowel shifts (the back vowel lowered and back vowel raised items) than for the words pronounced with unshifted vowels. Further, there was a marginal two-way interaction between exposure accent and item type, indicating that accent exposure tended to have a larger influence on identification accuracy for the words pronounced with shifted than with unshifted vowels.

153 Thus, as shown in Figure 4.8, exposure to the back vowel lowered accent influenced later recognition of words pronounced with the trained back vowel lowered chain shift and with the untrained back vowel raised chain shift, consistent with the pattern of endorsement rates from the lexical decision task.

4.2.3 Discussion

Experiment 5 tested whether adaptation to cross-category vowel variation involves learning a direction-specific pattern of variation. Two overall patterns of results indicated that lis- teners remapped their perceptual vowel space to accommodate the unfamiliar back vowel lowered variants and that doing so involved a general broadening of perceptual vowel cat- egories, as opposed to targeted direction-specific perceptual adjustments (cf. Maye et al.,

2008). The first pattern of results involves responses to the trained talker. Relative to lis- teners in the control condition, listeners who were familiarized to the talker with the novel back vowel lowered chain shift were subsequently better able to recognize accent-consistent word forms produced by this talker (e.g., “w[o]den” for wooden, given the lowering of the back vowel /U/ to [o]; “n[A]se” for nose, given the lowering of /o/ to [A]), as indicated by

converging evidence from the auditory lexical decision task and the word identification task

(see also Chapter 2). By contrast, listeners in the control condition tended to perceive the

back vowel lowered items as nonwords (e.g., mean endorsement rate = 16%), indicating

that for listeners who lacked knowledge of the cross-category vowel shifts, these variants

were highly detrimental to word recognition. The exposure-driven word recognition benefit

generalized to new back vowel lowered items that were presented only at test, which in-

dicates that listeners in the adaptation condition adjusted their perceptual vowel space to

cope with this talker’s accent, as opposed to learning lexically-specific pronunciations (see

Sjerps & McQueen, 2010; McQueen et al., 2006). These results are consistent with previous

findings on adaptation to cross-category vowel variation (Maye et al., 2008; White & Aslin,

2011; see also Experiments 1-3 above).

154 The second major finding of this experiment was that the word recognition benefit resulting from exposure to the back vowel lowered accent generalized to words produced by a new talker with an untrained back vowel raised accent (e.g., “b[u]tcher]” for butcher, given the raising of /U/ to [u]). This cross-accent generalization effect was present from the first back vowel raised item experienced at test, and thus cannot be attributed to listeners in the adaptation condition rapidly adapting to the untrained chain shift during the test phase. Crucially, responses from the lexical decision and word identification tasks provided converging evidence for accent adaptation and cross-accent generalization. That is, the exposure-driven difference in endorsement rates (i.e., “word” responses during lexical decision) for the vowel-shifted items (i.e., both the back vowel lowered and back vowel raised items) was coupled with a corresponding exposure-driven difference in word identification accuracy for these items. The evidence from the word identification task is critical because it indicates that listeners in the adaptation condition were better able to map the vowel-shifted items to the correct lexical representations than listeners in the control condition, as opposed to listeners in the adaptation condition simply developing a response bias (e.g., listeners judging the vowel-shifted items as words during lexical decision, without recognizing the target lexical items, because the trained talker had “funny” vowels). Taken together, the current findings replicate the cross-system generalization effect observed in Experiment 4

(Chapter 3) and extend this finding to show that the mechanism driving learning and generalization involved broadening talker-independent perceptual vowel categories.

Two aspects of these data deserve comment. First, generalization to the untrained back vowel raised items occurred for both trained lexical items (i.e., words that occurred during the exposure passage) and new lexical items that occurred only at test. Note that listeners in the adaptation condition heard the trained lexical items produced with lowered back vowels during the exposure phase. Thus, direction-specific training on particular lexical items did not hinder listeners’ ability to recognize these same words when later

155 pronounced with different vowel shifts, which strengthens the evidence against direction- specific learning. The second noteworthy pattern is that the magnitude of the exposure- driven word recognition benefit for the back vowel raised items was comparable to the corresponding benefit for the back vowel lowered items. During the lexical decision task, listeners in the adaptation condition showed an endorsement increase of about 40% for both of these item types, relative to performance by listeners in the control condition (who only recognized about 10-20% of these items), and statistical analysis of the lexical decision data showed no interaction between exposure condition and item type. Thus, exposure to the back vowel lowered accent had a profound influence on listeners’ ability to recognize word forms that were otherwise very difficult to recognize, regardless of whether these words were pronounced with trained or untrained back vowel variants.

The findings from Experiment 5 provide evidence of considerable cognitive flexibility in vowel perception and word recognition. Experiment 6 investigated whether this exposure- driven flexibility is specific to the trained vowel categories, as argued by Maye et al. (2008), or whether listeners can generalize learning about cross-category vowel shifts in one region of the vowel space to facilitate recognition of words pronounced with untrained but parallel vowel shifts in a different region of the vowel space (cf. currently unpublished results by

Weatherholtz, 2013; Bardhan et al., 2006).

4.3 Experiment 6

The goal of Experiment 6 was to determine whether perceptual learning of systemic cross- category vowel variation involves vowel-specific perceptual adjustments, or whether learning generalizes across the perceptual vowel space. Experiment 6 was similar to Experiment 5 in terms of investigating perceptual learning of a novel vowel chain shift and potential generalization to an untrained system of vowel shifts. The only difference concerned the nature of the untrained chain shift. Listeners were trained on a novel system of back vowel

156 lowering and then tested on their ability to recognize words produced by a new talker with a novel “front vowel lowered” chain shift, a structurally parallel chain shift in a different region of the vowel space.

4.3.1 Method

Participants

A total of 45 undergraduates at The Ohio State University participated in Experiment 6 in exchange for partial course credit. Of these participants, 10 were excluded because they did not meet the inclusion criteria of being native monolingual English speakers with normal speech and hearing, and three were excluded for exhibiting exceptionally low accuracy on non-target trials during lexical decision (<70% accuracy in rejecting maximal nonwords).

After exclusions, there were 32 usable participants evenly split between the adaptation condition and the control condition.

Exposure materials

The exposure materials were identical to those used in Experiment 5.

Test materials

The test materials were identical to those used in Experiment 5, with one exception: the 30 target words produced with the untrained back vowel raised chain shift by the new talker were replaced by a set of 30 words produced with a novel untrained “front vowel lowered” chain shift by the same talker (see Table 4.6 for example items). As shown in Figure 4.9b, this front vowel chain shift was designed to mirror as closely as possible in the front vowel space the novel back vowel lowered chain shift that listeners were familiarized to in the adaptation condition: the vowel /i/ was lowered and backed to sound like [I] (e.g., stream

as “str[I]m”); /I/ was lowered, fronted and diphthongized to sound like [e] (e.g., gift as

“g[e]ft”); /e/ was lowered to sound like [E] (e.g., table as “t[E]ble”); and /E/ was lowered

157 i u i u I U I U e o e o @ @ E 2 E 2 æ æ A,O A,O

(a) back vowel lowered chain shift (b) front vowel lowered chain shift

Kodi Weatherholtz Figure 4.9:Perceptual Experiment learning of systemic vowel variation 6. SchematicKodi Weatherholtz representationAMLaP 2014 Perceptual of the learning oftrained systemic vowel variation talker’s back vowel AMLaP 2014 lowered accent (left) and the new talker’s front vowel lowered accent (right).

and backed to sound like [æ] (e.g., pebble as “p[æ]bble”). Of the 30 front vowel lowered

items, 10 were lexical items that occurred during the exposure passage (average number of

tokens during training = 1.6; range = 1 - 4); note that listeners in both exposure conditions

heard these words pronounced with standard-sounding vowels prior to test. The remaining

20 front vowel lowered items were lexical items that occurred only at test. The vowel plots

in Figure 4.10 show the mean midpoint F1 and F2 (Hz) from each talker’s shifted vowel

productions in the target test items (denoted by arrows) relative to the mean midpoint F1

and F2 of vowels in the filler words pronounced with unshifted standard-sounding vowels

(denoted by phonetic symbols). These vowel plots indicate that each talker produced the

target cross-category vowel shifts consistently on average.

Procedure

The procedures for testing, data coding, and analysis were the same as in Experiment 5,

with one exception: the target back vowel raised items produced by the new talker during

the lexical decision and word identification tasks were replaced by the set of target front

vowel lowered items produced by the same new talker.

For the word identification data, the edit distance between each typed response and the

corresponding target word was again used to estimate the number of responses that were

158 Table 4.6: Example test stimuli for Experiment 6.

Item Type N Example Items and Phonetic Forms

Back vowel lowered (trained talker only) trained lexical items 10 wooden “w[o]den” nose “n[A]se” new lexical items 20 bushel “b[o]shel” expose “exp[A]se”

Front vowel lowered (new talker only) trained lexical items 10 gift “g[e]ft” favor “f[E]vor” new lexical items 20 twist “tw[e]st” table “t[E]ble”

Filler words (unshifted vowels) 100 room “r[u]m” nickel “n[I]ckle”

Maximal nonwords 60 dorve “d[O]rve” yolash “y[o]l[æ]sh” scored as incorrect due to minor spelling-related issues. Only 4% of responses had an edit distance of 1, indicating a limited number of potential minor spelling-related issues.

4.3.2 Results

Two dependent measures were analyzed: endorsement rates during lexical decision (i.e., proportion of ‘word’ responses) and identification accuracy during the word identification task.

Endorsement rates

Table 4.7 shows the mean endorsement rate by exposure condition for each of the test item types: the target back vowel lowered items produced by the trained female talker, the target front vowel lowered items produced by the new female talker, the filler words pronounced with unshifted (standard sounding) vowels, and the filler nonwords. As in Experiment 5, listeners in the adaptation condition endorsed numerically more of the target vowel-shifted items as words, compared to listeners in the control condition, whereas the endorsement rates for each filler item type were nearly identical across exposure conditions and talkers.

To assess the influence of accent exposure on perception of the vowel shifted items, a mixed logit regression model was fitted to lexical decisions on target trials. This model

159 trained female new female (back vowel lowering) (front vowel lowering) 300 i i u u 500 e e ɪ ʊ ɪ ʊ o o 700

ɛ ɑ ɛ 900 æ ɔ ɑɔ F1 (Hz) at vowel midpoint F1 (Hz) at vowel æ 3000 2500 2000 1500 3000 2500 2000 1500 F2 (Hz) at vowel midpoint

Figure 4.10: Production of the novel back vowel lowered chain shift by the trained talker (left; same as Experiment 5), and production of the novel front vowel lowered chain shift by the new talker (right). Phonetic symbols indicate the mean midpoint F1 and F2 (Hz) of each talker’s normal vowels, based on stressed vowel tokens from the test items containing unshifted vowels. Arrows indicate the mean midpoint F1 and F2 of stressed vowels in the target items containing shifted vowels from each talker. contained fixed effects for three factors and all interactions: exposure accent (the novel back vowel lowered accent vs. the standard sounding Midland accent, manipulated between subjects), test accent (words produced in the back vowel lowered accent by the trained talker vs. words produced in the the front vowel lowered accent by the new talker), and item status (trained vs. new lexical items). Random intercepts were specified for subjects and items, along with by-subject random slopes for test accent and item status, and by-item random slopes for exposure accent. Table 4.8 summarizes the parameter estimates for the

fixed effects in the full model and shows the results of log-likelihood comparisons testing whether each fixed effect term contributed significantly to model fit. Coefficient estimates are reported in log-odds, the space in which logit models are defined. Positive coefficients

160 Table 4.7: Experiment 6. Mean proportion of ‘word’ responses during the lexical decision task by item type, exposure condition (BVL = back vowel lowered) and talker. Standard errors are in parentheses. Shaded cells highlight the difference on target trials by exposure condition.

Trained talker New talker BVL Midland BVL Midland Item Type chain shift accent chain shift accent exposure exposure exposure exposure

Back vowel lowered items 0.62 (0.071) 0.17 (0.036) – – Front vowel lowered items – – 0.31 (0.056) 0.13 (0.034) Filler words (unshifted vowels) 0.98 (0.005) 0.97 (0.007) 0.98 (0.006) 0.98 (0.005) Maximal nonwords 0.09 (0.018) 0.06 (0.012) 0.11 (0.019) 0.09 (0.019)

indicate increased log-odds, and hence increased probabilities, of making a ‘word’ response to the vowel shifted items.

There was a main effect of exposure accent: as shown in Figure 4.11, listeners who were familiarized to the trained talker with the novel back vowel lowered accent prior to the lexical decision task endorsed more of the vowel shifted items overall than participants in the control condition. There was also a main effect of test accent, indicating that endorsement rates were lower overall for the front vowel lowered items than the back vowel lowered items. Further, there was a two-way interaction between exposure accent and test accent, indicating that exposure to the back vowel lowered accent had a significantly smaller effect on endorsement rates for the front vowel lowered items produced by the new talker than on endorsement of the back vowel lowered items by the trained talker. A corresponding simple effects analysis confirmed that the exposure-driven endorsement increase was significant for both the back vowel lowered items (β = 2.98, z = 5.3, p < 0.001) and the front vowel lowered items (β = 1.6, z = 2.8, p < .01), despite the smaller effect in the latter case. Thus, exposure to the talker with the back vowel lowered chain shift influenced later perception of words produced by a different talker with an untrained vowel chain shift.

161 Table 4.8: Experiment 6. Summary of the full mixed logit model of lexical decisions to the target vowel-shifted items: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −1.44 0.30 −4.9 <.001 exposure accent (= back vowel lowering) 2.29 0.51 4.5 <.001 test accent (= front vowel lowering by new talker) −1.18 0.39 −3.1 <.01 item status (= new items) −0.40 0.35 −1.1 0.25 2.71 4 0.61 exposure accent : test accent −1.38 0.50 −2.7 <.01 6.39 2 <.05 exposure accent : item status 0.30 0.39 0.8 0.44 1.13 2 0.57 test accent : item status −0.70 0.67 −1.0 0.30 1.31 2 0.52 exposure accent : test accent : item status 0.55 0.69 0.8 0.43 0.60 1 0.44

It is possible that this cross-accent generalization effect was due to rapid adaptation to the new talker at test, rather than vowel-independent category adjustments during the exposure phase. That is, listeners who heard the talker with the back vowel lowered chain shift during the exposure phase might have initially adjusted their perceptual representation of only the corresponding back vowels, but then later at test when they encountered the new talker with the front vowel lowered chain shift, they might have rapidly learned a different vowel-specific pattern of variation. If so, participants in both exposure conditions should show comparable responses to the front vowel lowered items at the beginning of the new talker block, followed by a larger than average endorsement increase among listeners in the adaptation condition. To test this possibility, lexical decisions to the target vowel shifted items were fitted with a mixed logit model containing fixed effects of exposure accent, test accent, target trial number (1-30 for each test accent; centered to reduce collinearity) and all interactions. Random intercepts were specified for subjects and items, along with a by-subject random slope for test accent and a by-item random slope for exposure condition.

The fixed effect parameter estimates are summarized in Table 4.9. As in the aggregate analysis reported in Table 4.8, there were significant main effects of exposure accent and test accent, and these main effects were qualified by a significant two-way interaction. There was

162 Exposure ● BVL chain shift ● Midland accent

trained talker: new talker: back vowel lowered accent front vowel lowered accent 1.0

0.8

● 0.6 ●

0.4 ● ●

0.2 ● ● ● ●

Proportion response 'word' 0.0 trained new trained new Item Status

Figure 4.11: Experiment 6. Mean proportion of ‘word’ responses during lexical decision for the target vowel shifted items: the back vowel lowered items from the trained talker (left) and the front vowel lowered items from the new talker (right). Results are plotted by item status and exposure condition (BVL = back vowel lowered). Large points indicate condition grand means. Error bars indicate bootstrapped 95% confidence intervals. Small points indicate subject-wise condition means (jittered and transparent to show overlap). a numeric but non-significant trend of trial, suggesting a very weak tendency for listeners to endorse more of the vowel shifted items as the task progressed. Crucially, however, there were no significant interactions involving trial and either exposure accent or item type. As shown in Figure 4.12, the effect of exposure was present across the new talker test block, despite being larger at the end of the block. Thus, these data provide no evidence that listeners in the adaptation condition rapidly adapted to the front vowel lowered chain shift at test.

163 Table 4.9: Experiment 6, summary of the full by-trial mixed logit model of endorsement rates on target lexical decision trials: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z-score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) −1.48 0.30 −5.0 <.0001 exposure accent (= BVL chain shift) 2.30 0.52 4.4 <.0001 test accent (= front vowel lowering by new talker) −1.24 0.37 −3.4 <.001 trial 0.01 0.01 1.7 0.08 4.08 4 0.40 BVL chain shift : test accent −1.33 0.50 −2.7 <.01 6.07 2 <.05 BVL chain shift : trial 0.01 0.02 0.5 0.59 0.31 2 086 test accent : trial −0.01 0.02 −0.4 0.71 0.15 2 0.93 BVL chain shift : test accent : trial 0.01 0.03 0.2 0.80 0.06 1 0.81

Word identification accuracy

Mean word identification accuracy is shown in Figure 4.13. A mixed logit regression analysis was conducted to test whether pre-test exposure conditions influenced identification accu- racy for the words produced with shifted and unshifted vowels from each test talker. This model contained fixed effects for three factors and all interactions: exposure accent (the novel back vowel lowered accent vs. the standard sounding Midland accent, manipulated between subjects), test accent (words produced in the back vowel lowered accent by the trained talker vs. words produced in the front vowel lowered accent by the new talker), and item type. Item type was a three-level variable coding whether the test items were (i) lexical items that occurred in the exposure passage (trained items) and that were produced at test with shifted vowels (i.e., with lowered back vowels by the trained talker, or lowered front vowels by the new talker); (ii) new lexical items that were produced with shifted vowels

(either lowered back vowels or lowered front vowels, depending on the test talker); or (iii) standard sounding words produced by each talker. The model contained by-subject and by-item random intercepts and by-subject random slopes for test accent and item type.

The fixed effect parameter estimates for this model are summarized in Table 4.10. Posi- tive coefficients indicate increased log-odds of a correct response. Table 4.10 also shows the

164 Exposure ● BVL chain shift ● Midland accent

trained talker: new talker: back vowel lowered accent front vowel lowered accent 1.0

● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Proportion response 'word' 0.0 ● ● ● ● ● ● ● ● 1 10 20 30 1 10 20 30 Target trial during lexical decision

Figure 4.12: Experiment 6. Mean trial-wise proportion of ‘word’ responses during lexical decision for the target vowel shifted items: the back vowel lowered items from the trained talker (left) and the front vowel lowered items from the new talker (right). Results are plot- ted by exposure condition (BVL = back vowel lowered). Points indicate trial-wise means. Regression lines indicate binomial best fit curves (curvature is not observable because the proportion range for each effect is small). Error ribbons indicate bootstrapped trial-wise 95% confidence intervals. results of log-likelihood comparisons testing whether each fixed effect contributed signifi- cantly to model fit. All three main effects were significant. Listeners who were familiarized to the back vowel lowered accent showed higher identification accuracy overall than listeners in the control condition (the main effect of exposure accent). Identification accuracy was lower overall for test items produced by the new talker than by the trained talker (the main effect of test accent) and lower overall for the vowel shifted items than for words pronounced with unshifted standard-sounding vowels (the main effect of item type). Further, there was a two-way interaction between exposure accent and item type: as shown in Figure 4.13, the effect of exposure on identification accuracy was larger for the vowel shifted items (i.e., both the back vowel lowered and front vowel lowered items) than for words pronounced

165 Exposure BVL chain shift Midland accent

trained talker: new talker: back vowel lowered accent front vowel lowered accent 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Proportion correct identification

0.0 0.0 unshifted trained new unshifted trained new back vowel lexical items lexical items back vowel lexical items lexical items items pronounced w/ pronounced w/ items pronounced w/ pronounced w/ shifted vowels shifted vowels shifted vowels shifted vowels

Figure 4.13: Experiment 6. Mean word identification accuracy as a function of exposure condition, item type and test talker. Error bars indicate bootstrapped 95% confidence intervals on condition means. with unshifted vowels. No other interactions were significant. Taken together, these results indicate that listeners had difficulty recognizing the vowel-shifted items overall, but that fa- miliarization to the trained talker with the back vowel lowered accent facilitated subsequent recognition of accent-consistent pronunciations from this talker, as well as recognition of words pronounced in the untrained front vowel lowered accent by the new talker.

4.3.3 Discussion

This experiment investigated whether the perceptual adjustments involved in coping with cross-category vowel variation are vowel-specific. As in Experiment 5, passive familiariza- tion to a talker whose accent was characterized by a novel back vowel lowered chain shift facilitated later recognition of accent-consistent word forms (e.g., “w[o]den” for wooden, given the lowering of /U/ to [o]), which otherwise tended to be perceived as nonwords

166 Table 4.10: Experiment 6. Summary of the full mixed logit model of word identification accuracy: coefficient estimates β (log-odds), standard errors SE(β), associated Wald’s z- score (= β/SE(β)), and significance level pz for all fixed effects, and log-likelihood (LL) comparisons for each subset model relative to the full model.

Predictors (fixed effects) Parameter estimates Wald’s test LL comparisons 2 Coef β SE(β) z pz χ df p 2 ∆ χ∆ (Intercept) 0.36 0.30 1.2 0.23 exposure accent (= back vowel lowering) 1.64 0.51 3.2 <.01 test accent (= front vowel lowering by new talker) −0.90 0.34 −2.6 <.01 19.36 6 <.01 item type (= trained lexical items w/ shifted vowels) −2.62 0.52 −5.0 <.001 item type (= new lexical items w/ shifted vowels) −2.55 0.50 −5.1 <.001 exposure accent : test accent −0.53 0.31 −1.7 =.08 4.01 3 0.26 exposure accent : item type (= trained) 1.26 0.53 2.4 <.05  9.97 4 <.05 exposure accent : item type (= new) 1.65 0.63 2.6 <.01 test accent : item type (= trained) −1.47 0.99 −1.5 0.14  6.57 4 0.16 test accent : item type (= new) −0.84 0.85 −1.0 0.32 exposure accent : test accent : item type (= trained) −0.92 0.78 −1.2 0.24  1.75 2 0.42 exposure accent : test accent : item type (= new) 0.63 0.68 0.9 0.36

(e.g., listeners in the control condition only endorsed 17% of these items as words during lexical decision vs. the 62% endorsement rate in the adaptation condition). This exposure- driven recognition benefit generalized to new back vowel lowered items that occurred only at test (e.g., “b[o]shel” for bushel, given the lowering of /U/ to [o]). Together, these findings replicate previous results showing that listeners remapped their perceptual vowel space in response to hearing the back vowel lowered variants, as opposed to simply memorizing spe- cific word forms from the exposure phase (Maye et al., 2008; White & Aslin, 2011; Chapters

2 and 3).

Going beyond previous results, this experiment showed that perceptual learning of the back vowel lowered chain shift generalized across talkers to an untrained but structurally parallel front vowel lowered chain shift (cf. Maye et al., 2008). The front vowel lowered items included word forms like gift as “g[e]ft”, given the shift of the high-mid front vowel

/I/ to sound like the mid front vowel [e] (which mirrored the trained back vowel shift of high-mid /U/ to mid [o]), and word forms like table as “t[E]ble”, given the shift of

/e/ to sound like the mid-low vowel [E] (which was structurally similar to the trained

167 shift of /o/ to the low back vowel [A]). Crucially, as in Experiment 5, results from both the lexical decision task and the word identification task provided converging evidence for generalization of learning across vowel shift systems. That is, relative to listeners in the control condition, listeners in the adaptation condition not only endorsed more of the vowel-shifted forms (i.e., both the back vowel lowered and front vowel lowered items) during the lexical decision task, they also showed greater identification accuracy for these items during the word identification task. Given the exposure-driven difference in identification accuracy, the current results are evidence of exposure-driven changes in how spoken stimuli are mapped to lexical representations in memory and cannot be attributed to listeners in the adaptation condition developing a liberal response bias (e.g., listeners in the adaptation condition indiscriminately accepting “accented” word forms as words, regardless of whether they could recognize the intended word). Further, since the untrained front vowel lowered accent was produced by an untrained talker, the finding of cross-accent generalization cannot be explained by listeners in the adaptation condition adjusting the perceptual representation of talker-specific vowel categories. Thus, the current results demonstrate talker-independent adjustments in vowel perception (see Chapter 2) and indicate that listeners in the adaptation condition were able to leverage experience with the system of cross-category back vowel shifts to facilitate recognition of words pronounced with structurally similar vowel variants in a different portion of the vowel space.

While these results are inconsistent with the prevailing view of vowel-specific learning, they are not inconsistent with previous experimental findings. As discussed above, Maye et al.(2008) found evidence that exposure to a novel front vowel lowered chain shift influ- enced later recognition of words pronounced in the same voice with an untrained system of back vowel lowering (see also Bardhan et al., 2006), though they did not replicate this

finding across experiments. A relevant factor here is that Maye et al.’s (2008) stimuli were produced by concatenative text-to-speech synthesis, whereas the current materials were

168 produced naturally by human talkers. In addition to inherent difficulties involved in pro- cessing spectrally degraded speech (Mattys et al., 2012), which could influence adaptation, learning outcomes in Maye et al.’s (2008) study might have been influenced by listeners’ beliefs about the nature of the synthesized speech, which involved rearranging and concate- nating prerecorded diphones to form various words. Listeners might have been sensitive to repetition of phonetic forms and developed an expectation that the synthesized accent was characterized by a limited, albeit atypical, range of variation. Given this expectation, listeners might have been biased to learn specific properties of the synthesized accent. By contrast, when listening to human speech, given the ubiquity of variability within and across talkers, particularly for vowel productions (see Figure 4.1 above), listeners might be biased to maintain greater flexibility in recognition processes, favoring more general perceptual adjustments to very specific ones.

4.4 General Discussion

Several recent studies have investigated the nature and specificity of the perceptual adjust- ments that enable listeners to cope with cross-category vowel variation, as in the case of vowel chain shifts (Maye et al., 2008; White & Aslin, 2011; see also Bardhan et al., 2006;

Weatherholtz, 2013). The empirical findings from these studies are mixed (e.g., compare

Experiments 1 and 2 by Maye et al., 2008), though the results have generally been inter- preted as evidence that the speech processing system makes “targeted perceptual shifts” in response to ambient patterns of vowel variation (Samuel & Kraljic, 2009, p. 1214). In other words, as a result of experience with particular vowel shifts, listeners learn vowel-specific and direction-specific patterns of variation from the input and adjust the perceptual repre- sentation of specific vowel categories accordingly.

The current experiments revealed consistent evidence against a learning account based on targeted vowel-specific and direction-specific shifts. The results of Experiment 5 showed

169 that familiarization to a talker whose accent was characterized by a novel back vowel lowered chain shift not only facilitated later recognition of accent-consistent pronunciations from that talker (e.g., “w[o]den” for wooden, given the shift of /U/ to [o]) but also recognition of words produced by a different talker with an untrained back vowel raised chain shift, a system of shifts in the opposite direction (e.g., /U/ raised to [u] instead of lowered to

[o]). The results of Experiment 6 showed that under the same exposure conditions, the exposure-driven word recognition benefits transferred to words produced by a new talker with an untrained front vowel lowered chain shift, a structurally parallel system of vowel shifts in a different region of the vowel space (e.g., “g[e]ft” for gift, given the shift of /I/ to [e], which parallels the back vowel shift of /U/ to [o]). These findings of generalization to untrained vowel variants indicate that the exposure-driven perceptual adjustments were neither direction-specific nor vowel-specific (cf. Maye et al., 2008). Further, the finding of generalization despite the fact that these untrained vowel variants were produced by an untrained talker indicates that the observable changes in word recognition behavior were due to talker-independent adjustments in the perception of vowel information, as opposed to listeners simply accepting variant vowel realizations from the trained talker.

Broadly characterized, the current results indicate that exposure to the novel back vowel lowered chain shift—a system of cross-category vowel shifts in one region of the vowel space—resulted in greater tolerance for mismatch among vowel categories across the vowel space, independent of the talker who produced the vowel variants. Mismatch tolerance is a fundamental and graded aspect of spoken word recognition (Utman, Blumstein, & Sullivan,

2001; Connine et al., 1997; Marslen-Wilson, Moss, & van Halen, 1996; Andruski et al.,

1994). In many cases, listeners can recognize spoken words containing unfamiliar or atypical segmental variation, even if atypical pronunciation variation results in additional processing costs relative to the recognition of familiar or canonical forms (e.g., slower recognition speed). However, the likelihood of recognition decreases as the degree of mismatch increases between atypical segmental variants and canonical forms (Connine et al., 1997, 1994). In

170 the current study, the words pronounced with atypical cross-category vowel shifts (e.g.,

“w[o]den” for wooden; “g[e]ft” for gift) tended to be perceived as nonwords by listeners in the control condition, which indicates that the vowel category mismatches impaired word recognition for these listeners. By comparison, listeners who were familiarized to the back vowel lowered accent were better able to recognize words pronounced with trained and untrained vowel shifts, indicating a generally greater tolerance for vowel category mismatch among listeners in the adaptation as opposed to control condition.

In the current case, the exposure-driven increase in mismatch tolerance could be due to changes in the internal structure of perceptual vowel categories. Listeners in the adaptation condition might have broadened their perceptual vowel categories, resulting in greater over- lap among the representations of neighboring vowels, and thus allowing a greater range of otherwise mismatching vowel productions to map to the same vowel category. For example, broadening the perceptual representation of the vowel /U/ would result in greater overlap with the neighboring vowels /u/ and /o/. In turn, a token that sounds like [U] could be a prototypical instance of /U/, an atypical but sufficiently good instance of /u/ (i.e., a lowered variant of this category), or an atypical but sufficiently good instance of /o/ (i.e., a raised variant of this category). In distributional terms, general category broadening could result from adjusting the variance associated with a category, without necessarily adjusting the category center (i.e., the mean). As a result, listeners would be better able to recognize word forms that otherwise sound like nonwords because the vowel variants would be evalu- ated as possible instances of a wider range of vowel categories. A related but process-based account is that generally relaxed criteria are used for matching speech input to lexical rep- resentations in memory when speech input is characterized by pervasive atypical variation, or is otherwise non-optimal (for discussion Brouwer, Mitterer, & Heuttig, 2012; McQueen

& Heuttig, 2012, see).

A different account of these data involves globally adjusting the relative contributions of vowel and consonant information to word recognition processes. Vowels are highly mutable

171 in perception, and it has been argued that vowel information contributes less to lexical identity and word recognition than consonant information (Delle Luche et al., 2014; Cutler et al., 2000; van Ooijen, 1996). As a result of exposure to the back vowel lowered accent, listeners in the adaptation condition might have become more tolerant of vowel mismatch by further “down weighting” the informativeness of vowel cues and in turn relying more strongly on consonant information (for relevant discussion of selective attention shifting, see Goldstone, 1998; Kruschke, 1992). With respect to this possibility, it is relevant to note two properties of the target stimuli used in the current experiments. Not only did the cross- category vowel shifts result in nonword surface forms in standard American English, the target words were selected such that a further cross-category shift in the same direction (i.e., shifting a given vowel across two categories) also resulted in a nonword surface form. This second property of the stimuli was necessary to test for generalization to untrained vowel variants. Otherwise a word pronounced, for example, with an atypical raised vowel variant

(e.g., mode pronounced as “m[U]d” given the raising of /o/ to [U]) could be misinterpreted as a different word pronounced with a lowered vowel variant (e.g., mood given the lowering of /u/ to [U]). As a result of these stimulus constraints, the target words pronounced with shifted vowels tended to have few vowel minimal pairs in English, which raises the possibil- ity that decreasing attention to the unfamiliar vowel variants and increasing attention to consonant information might have been (largely) sufficient to facilitate recognition of these word forms.

While the current results cannot distinguish between the category broadening and at- tention shifting accounts, the current results are decisive in indicating that exposure to unfamiliar cross-category vowel variation resulted in general perceptual adjustments across the vowel space, as opposed to targeted vowel-specific and direction-specific perceptual shifts. The current experiments are not the first to show that perceptual representations are broadly tuned in response to the environment, with far reaching consequences for speech processing. In a related study on perceptual learning of atypical consonant variation, Kraljic

172 and Samuel (2006) found that perceptual learning of a talker’s atypical voicing contrast be- tween the consonants /t/ and /d/ (e.g., a realization of /t/ that sounded more like [d]) influenced later perception of the untrained consonants /p/ and /b/, which share a featural contrast based on voicing, even when these consonants were produced by a new talker.

Thus, as in the current study, listeners generalized to untrained but structurally similar segmental variation produced by a different talker. In a recent study on adaptation to glob- ally foreign accented speech, Baese-Berk et al.(2013) found evidence of talker-independent and accent-independent perceptual adjustments. Listeners who were initially familiarized to five talkers, each with a different non-native English accent (Thai, Korean, Hindi, Ro- manian, and Mandarin), showed subsequently greater accuracy on sentence intelligibility in noise when listening to a sixth talker with an untrained foreign accent (Slovakian-accented

English), relative to participants in a single-accent exposure condition. Baese-Berk et al.

(2013) argued that despite the vast pronunciation differences across the five non-native accents comprising the training set due to L1 influences on L2 production, listeners were sensitive to structural similarities across these accents (perhaps common patterns of varia- tion in vowel reduction and speaking rate), and further that listeners were able to leverage these similarities to facilitate processing of an untrained foreign accent.

The current results, together with the findings by Kraljic and Samuel (2006) and Baese-

Berk et al.(2013), highlight the flexibility of the speech perception system, which is highly adaptive to ambient pronunciation variation and is able to generalize to unfamiliar but structurally similar patterns of pronunciation variation. These results also emphasize the functional aspect of perceptual learning: to facilitate the mapping of form to meaning by leveraging experience and generalizing broadly when possible.

173 Chapter 5

Conclusions and theoretical implications

The research described in this dissertation investigated the exposure-driven perceptual ad- justments that enable listeners to cope with cross-category vowel variation. These experi- ments focused on perceptual learning of vowel chain shifts, which are complex systems of cross-category pronunciation variation (Labov, 1994). Systemic vowel shifts have received relatively little attention in the literature on adaptation to pronunciation variation (though see Maye et al., 2008; Weatherholtz, 2013), despite the prominence of such systems across spoken varieties of many languages, including English (Lubowicz, 2011; Labov et al., 2006).

Thus, the overarching goal of the current experiments was to provide a deeper understanding of the learning mechanism that drives adaptation when listeners experience an unfamiliar vowel chain shift. The experiments investigated three general issues that are central to the study of perceptual learning for speech: (i) the level of representational specificity at which exposure-driven perceptual adjustments occur (i.e., the locus of learning); (ii) the nature of the perceptual adjustments that occur at that level of representational specificity; and

(iii) the manner in which learning and generalization are constrained by properties of the environment.

The empirical approach of the current experiments centered on an exposure-test percep- tual learning paradigm: listeners were initially familiarized to a talker with a novel vowel chain shift (or a standard sounding American English accent for listeners in control condi- tions), and then listeners performed multiple word recognition tasks, which were designed to assess generalization of learning to new words, new talkers, and untrained vowel shifts.

The empirical logic of these experiments was that post-exposure generalization performance

174 provides a window onto the perceptual adjustments that drive observable changes in word recognition (see Chapter 1 for discussion of linking hypotheses). The exposure and test con- ditions were manipulated across experiments to assess whether perceptual learning of vowel chain shifts was constrained by factors known to influence perceptual learning of speech more generally: the degree of token variability that listeners initially experienced (for dis- cussion of learning outcomes and high vs. low token variability learning environments, see

Lively et al., 1993; Greenspan et al., 1988; Posner & Keele, 1968), the acoustic similarity between the trained and new talkers in terms of their vowel productions (e.g., see Reinisch

& Holt, 2014; Goldinger, 1996), and the structural relationship between the trained and untrained vowel variants (e.g., see Kraljic & Samuel, 2006).

Across experiments, passive familiarization to a talker with a novel vowel chain shift

(e.g., a “back vowel lowered” chain shift) markedly improved recognition of accent-consistent pronunciations that were otherwise perceived as nonwords (e.g., “w[o]den” for wooden, given the lowering of /U/ to sound like [o]), as indicated by converging evidence from lexical decision and word identification data. This exposure-driven word recognition benefit con- sistently generalized to new words produced by the trained talker (i.e., words presented only during the post-familiarization word recognition tasks), which indicates that listeners learned to cope with the unfamiliar vowel variants by remapping their perceptual vowel space (a sublexical locus of learning), as opposed to relying on memory of specific word forms from the familiarization phase (Chapters 2, 3 and 4). Further, perceptual learning generalized to new talkers with the same chain shift, despite the fact that listeners were only familiarized to a single talker (Chapter 2), indicating that listeners remapped their perceptual vowel space talker-independently. The finding of cross-talker generalization was remarkably robust: reducing the accent familiarization phase from 20 minutes to two min- utes of passive exposure—and hence reducing experience with the trained talker’s chain shift from several hundred words containing the target vowel shifts to only a few dozen

175 words—had a limited influence on cross-talker generalization of learning (see the omnibus analysis of Experiments 1-3 in Chapter 2).

To probe the nature of the sublexical adjustments observed in Experiments 1-3, Exper- iments 4-6 (Chapters 3 and 4) investigated generalization to untrained vowel variants. The focus of Experiment 4 (Chapter 3) concerned generalization to untrained parts of a vowel shift system to understand the systematicity of learning. When listeners experienced only a subset (n - 1) of the vowel shifts that defined an unfamiliar vowel shift system, listeners were able to generalize learning to fill in incidental gaps in their experience, indicating that listeners learned a pattern of co-variation among vowel categories, rather than adapting to an unfamiliar vowel chain shift by learning each constituent shift independently. An unexpected finding of this experiment was that learning outcomes were influenced by the structure of the vowel chain shift that listeners experienced. Listeners who were familiarized to a novel “back vowel raised” chain shift learned a direct-specific system of variation. How- ever, when listeners were familiarized to a novel “back vowel lowered” chain shift, perceptual learning showed a lack of direction specificity; rather, listeners broadened their perceptual vowel categories, which facilitated recognition of words produced by the trained talker with either lowered (accent-consistent) or raised (accent-inconsistent) back vowel variants.

Building on the “category broadening” results from Experiment 4, Experiments 5 and

6 (Chapter 4) investigated generalization of learning across both talkers and vowel shift systems to determine whether the broadening of perceptual categories occurs vowel- and talker-specifically. Results of Experiment 5 showed that familiarization to a system of back vowel lowering facilitated later recognition of words produced by a new talker with an untrained system of back vowel raising, indicating talker-independent broadening of perceptual vowel categories. Results of Experiment 6 showed that familiarization to a system of back vowel lowering facilitated later recognition of words produced by a new talker with an untrained front vowel lowered chain shift, a structurally parallel system of

176 variation in a different region of the vowel space. Thus, perceptual learning involved some degree of talker-independent perceptual generalization across the vowel space.

Taken together, the findings reported in this dissertation demonstrate tremendous exposure- driven flexibility in vowel perception. Listeners cope with systemic cross-category vowel vari- ation by dynamically adjusting talker-independent perceptual vowel representations (i.e., a sublexical locus of learning; Chapters 2 and 4). The nature of these adjustments can involve a general broadening of perceptual vowel categories—resulting in increased toler- ance for mismatch among neighboring vowel categories (Chapters 3 and 4)—or targeted perceptual shifts that reflect the direction of the vowel shifts in the speech input (Chapter

3; see also Maye et al., 2008). These different perceptual adjustments appear to depend on the structure of the vowel chain shift that listeners experienced (Chapter 3), though further research is needed to understand the nature of this dependency. Listeners are sensitive to co-variation among the vowels involved in a chain shift; thus, when adapting to a talker with an unfamiliar chain shift, listeners can leverage learning about particular vowel shifts produced by this talker to fill in incidental gaps in their experience of the talker’s vowel system (Chapter 3). Further, perceptual learning of a vowel chain shift in one region of the vowel space results in some degree of talker-independent perceptual adjustments across the vowel space; thus, listeners can leverage learning of one chain shift to facilitate recognition of words produced by a different talker with a structurally parallel chain shift in a different region of the vowel space (e.g., generalization from a back vowel lowered system to a front vowel lowered system; Chapter 4). The exposure-driven adjustments to talker-independent perceptual vowel representations happen rapidly (within the first few minutes of exposure), and similarity between the trained and new talkers has a limited influence on cross-talker generalization of learning (Chapter 2; cf. Reinisch & Holt, 2014).

177 5.0.1 Category shifts, category broadening and response biases

The results of Experiments 4 and 5 (Chapters 3 and 4) demonstrated that adaptation to cross-category vowel shifts can involve either direction-specific perceptual shifts or a general expansion of perceptual vowel categories. At first glance, it might be tempting to conclude that direction-specific adaptation effects provide evidence of learning, whereas category broadening effects are evidence of a response bias (i.e., because the perceptual adjustments in the latter case do not reflect the structure of the input). The basic assumption of this view is that learning is structural. However, perceptual learning is often defined in functional terms: e.g., “perceptual learning benefits an organism by tailoring the processes that gather information to the organism’s uses of the information” (Goldstone, 1998, p. 586).

From a functional perspective, category broadening is a viable learning strategy, as long as maintaining broadened perceptual categories helps the listener achieve her goals (e.g., facilitating the mapping of form to meaning).

A functional benefit in word recognition is precisely what was observed in the current experiments. Identification accuracy results from Experiments 4 and 5 showed that listeners who were familiarized to a talker with a novel back vowel lowered accent were subsequently more accurate at identifying words produced with lowered (i.e., accent-consistent) back vowel variants and raised (i.e., untrained, accent-inconsistent) back vowel variants, relative to listeners in a control group. Note that the vowel-shifted stimuli in these experiments sounded like nonwords in standard American English (e.g., butcher as either “b[o]tcher” or “b[u]tcher”, given the lowering of /U/ to [o] or the raising of /U/ to [u], respectively).

Maintaining broadened perceptual vowel categories in this case allowed both lowered and raised back vowel variants to be perceived as sufficiently good exemplars of the target category (e.g., both [o] and [u] as sufficiently good exemplars of /U/), thereby facilitating word recognition by establishing a good enough match between the input (e.g., “b[o]tcher” or “b[u]tcher”) and the expected lexical form (i.e., “b/U/tcher”). Thus, category broadening was a beneficial learning strategy. However, maintaining broadened perceptual categories

178 might be less beneficial to the listener when attempting to distinguish vowel-based minimal pairs (e.g., determining whether the word form “c[U]d” is an instance of the word could with a standard-sounding vowel, an instance of cooed with a lowered vowel, or an instance of code with a raised vowel). Thus, the important question is not whether category broadening counts as learning, but rather under what conditions category broadening occurs and serves to benefit speech processing.

5.1 Implications for speech processing

The consistent finding of lexical generalization of perceptual learning (i.e., generalization to new words containing the trained vowel variants) provides further evidence that abstract phonological representations can be adjusted in response to the input and that such flexi- ble phonological representations mediate between the speech signal and the mental lexicon during spoken word recognition (Cutler et al., 2010; Sjerps & McQueen, 2010; McQueen et al., 2006). Thus, the current findings support models of spoken word recognition that posit intermediate levels of representation (e.g., PARSYN: Luce, Goldinger, Auer, & Vite- vitch, 2000; TRACE: McClelland & Elman, 1986), as opposed to direct access models in which speech input is evaluated directly against lexical representations in the mental lexicon

(Gaskell & Marslen-Wilson, 1997; Klatt, 1979).

The current findings showing robust cross-talker generalization following single talker exposure differ from previous findings concerning constraints on talker-independent percep- tual learning. An emerging perspective in the literature on adaptation to pronunciation variation is that multi-talker exposure conditions are necessary for cross-talker generaliza- tion of learning (Bradlow & Bent, 2008; Lively et al., 1993), except when the trained and new talkers are “sufficiently similar” (Reinisch & Holt, 2014; Kraljic & Samuel, 2007, 2005;

Eisner & McQueen, 2005). It is currently unclear whether the relevant metric of talker

179 similarity is overall voice similarity (e.g., Goldinger, 1996) or acoustic similarity in the re- alization of specific speech sounds (for discussion, see Reinisch & Holt, 2014). However, the basic idea is that exposure to the same pronunciation variant(s) produced by multiple talkers enables listeners to abstract over talker-specific idiosyncrasies to learn an abstract talker-independent pattern of variation. By contrast, under single-talker exposure condi- tions, listeners build talker-specific representations of pronunciation variants that include episodic detail, and cross-talker generalization only occurs when episodic details match across encounters, whether these details reflect similar episodes from the same talker or from different but similar-sounding talkers (Reinisch & Holt, 2014). The current findings suggest a more nuanced view of constraints on cross-talker generalization. In the current experiments, perceptual learning of systemic cross-category vowel variation generalized to new talkers despite the fact that learning was based on exposure to a single talker (Chap- ter 2 and 4). Acoustic similarity between the trained and new talkers in terms of their vowel productions influenced the strength of cross-talker generalization when exposure to the trained talker was brief and characterized by low token variability (Chapter 2, Experi- ment 3). Crucially, however, talker similarity did not determine cross talker generalization: that is, talker similarity mediated the strength of cross-talker generalization but did not function to “switch” generalization on and off (cf. Reinisch & Holt, 2014, p .552).

A possible explanation for these different results concerning cross-talker generalization stems from the fact that vowels tend to be highly mutable in perception cross-lingusitically.

When asked to change pseudowords (e.g., “kebra”) into real words by altering either a single vowel (resulting in cobra) or a single consonant (resulting in zebra), speakers are significantly

more likely to change vowels, and this effect has been documented across multiple languages,

including English, Spanish and Dutch (Cutler et al., 2000; van Ooijen, 1996). Based on

these findings, it has been argued that vowels contribute less to lexical identity and word

recognition than consonants (see also Delle Luche et al., 2014; Nespor et al., 2003). The

mutability of vowels in perception might contribute to robust cross-talker generalization

180 when adapting to atypical vowel variation, whereas listeners might require evidence of atypical consonantal variation from multiple talkers before learning such variation talker- independently.

The current findings also provide insight into the time-course of perceptual learning.

In an influential study on adaptation to accented speech, Clarke and Garrett (2004) found that the initial costs associated with processing a foreign accent, relative to a native accent, diminished rapidly during the first minute of exposure and were effectively gone by the end of the experiment. Further, some degree of adaptation was apparent after listeners experienced only two to four sentences of foreign-accented speech. The current results provide further evidence of rapid adaptation to accented speech. Results of Experiment

3 (Chapter 2) showed that after only two minutes of passive exposure to a talker with a novel back vowel lowered accent (the smallest exposure increment tested here), listeners had remapped their perceptual vowel space and were able to generalize learning to a new talker with an acoustically similar vowel space and, to a lesser extent, to a new talker with an acoustically dissimilar vowel space. Together, these results indicate that the speech perception system begins to adapt to atypical pronunciation variation almost immediately and that learning reaches asymptote within the first few minutes of exposure.

Perceptual learning is not a singular process (Goldstone, 1998). In early research con- cerning the effects of language experience on speech perception, Aslin and Pisoni (1980) hypothesized five types of exposure-driven changes in speech perception. Aslin and Pisoni

(1980) were focused particularly on language development and early language experience, though subsequent research has shown that these processes are involved in language ac- quisition and perceptual adaptation. Enhancement and attenuation are complementary

phenomena, respectively involving increased or decreased discriminability of stimuli near

perceptual category boundaries. Sharpening and broadening are complementary processes that respectively lead to more narrowly tuned or more widely tuned perceptual categories

181 (i.e., narrowing or broadening the range of stimulus variability associated with a given cat- egory). Realignment refers to direction-dependent perceptual shifts, as in the case of bilin- gual English-Spanish speakers shifting their voiced-voiceless category boundary depending on language context (Elman, Diehl, & Buchwald, 1977). Exposure-driven perceptual adjust- ments differ due to a range of factors: for example, the type of speech input that listeners experience (e.g., repeated exposure to unambiguous speech sounds vs. repeated exposure to ambiguous speech sounds that are visually or lexically disambiguated; Vroomen et al., 2004;

Eimas & Corbit, 1973); the order in which accented speech materials are presented (e.g., grouped by talker, grouped by sentence, or randomized; Tzeng & Nygaard, 2012); listeners’ beliefs about the source of pronunciation variation (e.g., a talker’s idiolect vs. variation due to external factors like a talker speaking with a pen in her mouth; Kraljic et al., 2008).

The current results showed that perceptual adjustments differed depending on the struc- ture of the novel vowel chain shift. Listeners who were familiarized to a novel system of back vowel raising learned a direction-specific pattern of variation, whereas listeners who were familiarized to a novel system of back vowel lowering appeared to adapt by broadening perceptual vowel categories overall. While it is unclear from the current data why these different accents induced different adaptive processes (though see Section 5.3 for discussion of potentially relevant language-internal factors), the current data highlight the fact that there are multiple routes to perceptual learning for speech (Loebach et al., 2008) and that learning outcomes differ depending on properties of the stimulus materials (e.g., Love, 2003;

Lively et al., 1993).

5.2 Implications for cognitive processing

The ability of listeners to cope with pronunciation variation, as in the case of accented speech, is a domain-specific instance of the general ‘stability-plasticity’ dilemma that ex- ists for human (and artificial) cognitive and neural systems (see Grossberg, 2013, 1980).

182 Both stability and plasticity are necessary for learning, and hence for categorization and recognition: stability is necessary to preserve knowledge over time (e.g., abstract concepts, acquired categories for auditory or visual stimuli), and plasticity is necessary to integrate new information with previously learned information. The dilemma is that too much of either can be detrimental to information processing. Broadly speaking, too much stabil- ity (and too little plasticity) in response to new information impedes learning and results in previously learned information becoming entrenched, whereas too much plasticity (and too little stability) results in new information (e.g., atypical category exemplars) effectively overwriting previously learned information (e.g., the structure of existing categories based on more typical category exemplars). Humans tend to maintain a balance between stability and plasticity, whereas artificial neural systems, such as connectionist networks, often fail to preserve information over time, resulting in ‘catastrophic forgetting’ in sequential learning scenarios (McCloskey & Cohen, 1989; Ratcliff, 1990).

The results of the current experiments demonstrate this balance between stability and plasticity in the context of word recognition processes. Listeners rapidly adapted to unfa- miliar cross-category vowel variation by remapping their perceptual vowel space, without overwriting the category structure of typical vowel variants. For example, as a result of brief familiarization to a talker with the novel back vowel lowered chain shift, listeners were better able to recognize accent-consistent pronunciations from this talker, and listeners were able to generalize learning to some extent across the vowel space, facilitating recognition of words produced by a different talker with an untrained front vowel lowered chain shift

(Chapter 4, Experiment 6). However, this remapping of the perceptual vowel space did not adversely affect recognition of words produced with standard sounding back and front vow- els. Thus, the speech perception system is capable of maintaining considerable flexibility in terms of mapping words pronounced with atypical vowel variants to lexical representa- tions in memory, without losing the ability to map typical word forms to the correct lexical representations.

183 5.3 Implications for language change

The current findings on the perceptual learning of vowel chains shifts (and more generally of exposure-driven flexibility in vowel perception and word recognition) have implications for the study of language change, particularly for understanding the factors that constrain sound change. To situate the current psychological findings alongside historical research on language variation and change, it is first necessary to outline the empirical founda- tions of the latter line of research. In a now-classic paper, Weinreich, Labov, and Herzog

(1968) described five empirical problems for the study of language change (see also Coseriu,

1958). The constraints problem—which is the most relevant for the current discussion—is to identify the general constraints, if any, that determine possible and impossible linguistic changes and the direction of change. In principle, constraints on language change can stem from myriad sources: e.g., universal constraints; language-specific grammatical constraints; learnability constraints (e.g., the types of sound systems that can be spontaneously acquired by learners); biological constraints (e.g., the types of sounds and sound contrasts that hu- mans can reliably produce). Conceptually, the constraints problem is related to the search in formal linguistics for a “universal grammar” (e.g., Chomsky & Halle, 1968). However, unlike in research concerned with linguistic universals, research on language change does not assume that any constraint is necessarily absolute. The assumption in research on lan- guage variation and change is that even the strongest constraints can be overridden, e.g., if other factors conditioning language change create a sufficiently unfavorable situation for the application of a given constraint (see Labov, 1994, for discussion of this point). The transition problem in research on language change concerns the route by which changes oc- cur (e.g., the intermediate stages, if any, that link an earlier form to its current form in the language). The embedding problem concerns “concomitance” or systematicity among lin- guistic changes: in other words, the focus of the embedding problem is the extent to which a given change is associated with (e.g., caused by, or occurring in reaction to) other changes

184 in the language, as opposed to multiple unrelated changes co-occurring due to chance. The evaluation problem is to understand how talkers evaluate a given linguistic change (e.g., social perception of pronunciation variants) and how linguistic change is affected by such evaluation. Finally, the actuation problem is to understand why a given change occurs in a language at a given time and not at a different time (and not in a different language with the same structural properties and conditioning factors).

Vowel chain shifts have figured prominently in research on language change, in part because of their prevalence across languages and language varieties and because of their complexity as systems of co-dependent variation (Lubowicz, 2011; Boberg, 2005; Clarke et al., 1995; Hinton et al., 1987; Martinet, 1955). With respect to the constraints problem, the prevailing view is that three phonological principles guide the chain shifting of vowels

(see Labov, 1994).

• Principle I: In chain shifts, long vowels rise.

• Principle II: In chain shifts, short vowels fall.

• Principle III: In chain shifts, back vowels move to the front.

These principles are substantiated by diachronic and synchronic data on vowel chain shifts across a wide range of languages and language varieties (e.g., regional dialects), though there are exceptions to these principles in the historic record of known vowel chain shifts, as detailed by Labov(1994). A relevant question concerns the source and the strength of these constraints. In principle, these constraints could reflect universal grammatical rules, tendencies based on the phonological structure of vowel systems, constraints on the articulation of vowels, or even learnability factors (e.g., constraints on the types of vowel systems that can be acquired by learners or on the types of pronunciation variants that listeners can cope with in perception). The existence of exceptions to these general principles cross-linguistically is evidence against an account based entirely on linguistic universals.

However, there is only one known exception cross-linguistically to Principle I—the lowering

185 of /E:/ to [a:] in East Lettish, a variety of Latvian (see Endzelin, 1922; Labov, 1994)—which indicates that the principle that long vowels rise in chain shifts is a strong constraint on vowel chain shifting and raises the possibility that this principle is a near linguistic universal.

The current experiments have implications for interpreting these principles of vowel shifting. One of the chain shifts used in the current experiments was a novel system of

“back vowel lowering”. In this system, all back vowels (including the long vowels /u/ and

/o/) were lowered to the nearest neighboring vowel category, which is unnatural according to

Principle I above. In Experiment 4 (Chapter 3), one group of listeners was familiarized to a novel system of “back vowel raising”. This latter chain shift involved raising all back vowels

(including the short vowels /O/, /A/ and /U/) to the nearest neighboring vowel category, which is unnatural according to Principle II above. In all cases, passive exposure to a novel vowel chain shift was sufficient for listeners to adapt to the unfamiliar vowel shifts, and generalization of learning (e.g., to new words and new talkers) was robust. Further, the results of Experiment 4 (Chapter 3) showed that listeners adapted to the novel vowel chain shifts by learning a system of co-variation among vowels, as opposed to learning each vowel shift independently. Thus, while the novel vowel chains shifts used in Experiments 1-6 are unnatural according to the general principles that govern chain shifting, listeners appeared to have no difficulty learning these systems, which indicates that the general principles of vowel shifting are not the result of learnability constraints.

5.4 Implications for sociolinguistics

The findings of this dissertation also have implications for the field of sociolinguistics. The study of sociolinguistic variation—broadly, investigations into the relationship between lin- guistic variation and social variation—has developed in three major ‘waves’ (Eckert, 2012).

The first and second waves focused on ‘vernacular’ pronunciation variation (e.g., the real- ization of /T/ as [t] in words like thing, or the realization of words like walking as walkin’) as

186 it relates to apparently static social categories, such as social class, either at a macrosocio- logical level (e.g., Labov, 1966; Trudgill, 1974; Wolfram, 1969) or at a local community level

(Eckert, 2000; Milroy, 1980). The nuance in this work concerned the complex patterning of linguistic variation across groups of talkers. As a result, individual talkers were often treated, if only implicitly, as relatively stable and passive producers of pronunciation vari- ation. The third wave of sociolinguistic research, which has gained increasing prominence over the last decade, marked a sharp reversal in this view, focusing on talkers’ stylistic practices as they use language to make social semiotic moves, such as conveying a stance, invoking power, or constructing a persona (Campbell-Kibler, 2007; Kiesling, 1998; Podesva,

2007; Zhang, 2008). Fundamental to this third wave of research on sociolinguistic variation is the concept of ‘indexical mutability’ (Eckert, 2012, p. 94), the notion that pronunciation variants do not carry a fixed social meaning, but rather can be combined and reinterpreted in myriad ways to create complex and layered meanings that serve a talker’s social purposes

(see Eckert, 2008; Hebdige, 1984). Indeed, talkers are highly nuanced in their speech pat- terns: not only do talkers produce general patterns of variation that are relatively stable over time (e.g., regional dialect variants), they vary their speech patterns from context to context in intricate ways (e.g., Podesva, 2007) that listeners in turn perceive as socially meaningful (Campbell-Kibler, 2007).

While third wave sociolinguistic research has focused largely on the social, linguistic, and (to a lesser extent) cognitive processes involved in constructing social meaning through language (Eckert, 2008; Silverstein, 2003), this research is predicated on assumptions about listeners’ ability to decipher speech input and understand both linguistic and social mean- ings. The findings of the current experiments, combined with other recent research on adaptation in speech perception (for a review, see Samuel & Kraljic, 2009), suggest that talkers are allowed vast degrees of stylistic freedom in production in large part because the speech processing system is highly flexible and can rapidly accommodate considerable acoustic-phonetic deviations from canonical or familiar forms.

187 5.5 Limitations and future directions

One limitation of the current experiments stems from the nature of the test stimuli and the use of offline judgment and identification tasks to assess perceptual learning. Recall that the target vowel shift stimuli (e.g., “w[o]den” for wooden, cf. “w[U]den”) were designed to sound like nonwords to listeners in the control conditions, who lacked knowledge of the relevant cross-category vowel shifts, but like real words if listeners in the adaptation conditions learned to cope with these vowel shifts. Recall also that across experiments the primary measures of interest were word/nonword judgments during lexical decision

(Chapters 2, 3 and 4) and word identification accuracy on an offline transcription task

(Chapters 2 and 4). Response times from these tasks were largely uninformative, given the lack of a correct response on target trials during lexical decision and given the confound of typing speed during the transcription task (see Chapter 2 for discussion). Given the reliance on offline word recognition measures without corresponding response times, it is not possible to distinguish definitively whether exposure to the novel vowel chain shifts influenced the perceptual representation of prelexical vowel categories, or whether listeners in the adaptation conditions developed strategic processes for coping with the unfamiliar vowel variants. A viable task strategy involves segment substitution. For example, as a result of exposure to an unfamiliar accent, listeners in the adaptation condition might have been less inclined to reject unfamiliar surface forms as nonwords without first attempting to match these forms to lexical representations in memory by substituting different vowel categories for the unfamiliar vowel variants.

The results from Experiment 4 (Chapter 3) provide partial evidence against a strategic account of the current data. The auditory naming task in this experiment was designed to allow a meaningful analysis of post-exposure recognition speed for the target vowel- shifted items. If listeners accommodated the atypical back vowel variants by adopting a

(conscious) substitute-and-search strategy, the speed of recognition for words containing

188 the shifted vowel variants should have been markedly slower than for words containing unshifted (standard-sounding) back vowels. However, the delay in word recognition for the back vowel lowered and back vowel raised test items, relative to the standard-sounding back vowel words, was only about 40 milliseconds on average. This delay was statistically significant, though the magnitude of the effect was small (βs = 0.11, ts = 2.1). Thus, if adaptation was strategic in nature, the task strategy operated quickly and likely below the level of conscious awareness, as opposed to listeners consciously substituting vowels to convert the nonword surface forms into real words. Note that across experiments, listeners in the adaptation conditions showed greater identification accuracy for the vowel-shifted stimuli than listeners in the control conditions. Thus, even if the current results reduce to task-specific strategies, they still provide evidence of exposure-driven flexibility in how spoken words are mapped to lexical representations in memory.

A second limitation of the current study is that the experiments only investigated orderly shifts among neighboring vowel categories. Further research is needed to understand how the structure of the vowel shift system constrains learning. A critical direction for this line of research is to test whether vowel shift systems have to involve orderly shifts among adjacent vowel categories to be learned, or whether listeners can adapt to vowel systems in which different vowels are shifted to varying degrees along a trajectory in one direction, or in which vowels are shifted randomly. A related issue concerns the extent to which perceptual learning of trained vowel variants generalizes to untrained vowel variants. The current study showed that as a result of exposure to the novel back vowel lowered accent, listeners became more tolerant of mismatch among vowel categories, which enabled them to better recognize words pronounced with both lowered (accent-consistent) and raised (accent-inconsistent) back vowels, relative to listeners in the control condition (see also Chapter 3). Given the nature of the test stimuli, it is unclear whether this increased tolerance for mismatch was limited to variation among neighboring back vowel categories, or whether listeners were somewhat more tolerant of vowel category mismatch overall. Previous research on

189 perceptual learning of an unfamiliar vowel chain shift showed that listeners do not simply allow any vowel to substitute with any other vowel (Weatherholtz, 2013). However, the factors that determine or constrain generalization to untrained vowel shifts remain to be determined.

An important avenue for future research is to investigate the conditions that lead to direction-specific perceptual adjustments versus broadening of perceptual categories. Re- cent generative models of adaptation, such as the ideal adapter framework (Kleinschmidt &

Jaeger, 2015), predict that listeners can adapt to segmental variation by adjusting either the expected mean of the target category (resulting in direction-specific perceptual shifts) or the expected category variance (resulting in broadening). Indeed, efforts to model behavioral perceptual recalibration results (e.g., Vroomen, van Linden, de Gelder, & Bertelson, 2007) using the ideal adapter framework revealed that these two learning strategies were roughly equally preferred (Kleinschmidt & Jaeger, 2015). An open empirical question concerns the exposure and/or environmental conditions that favor one learning strategy over the other, as well as the boundary conditions on category broadening: e.g., how far can perceptual categories be broadened before neutralizing category contrasts?

190 References

Adank, P., Evans, B. G., Stuart-Smith, J., & Scott, S. K. (2009). Comprehension of familiar and unfamiliar native accents under adverse listening conditions. Journal of Experimental Psychology: Human Perception and Performance, 35 , 520-529.

Adank, P., & Janse, E. (2010). Comprehension of a novel accent by young and older listeners. Psychology and Aging, 25 , 736-740.

Ahn, S.-C. (2001). An optimality approach to chain shifts: Nasal vowel lowering in French. Language Research, 37 (2), 359-375.

Allen, J. S., & Miller, J. L. (2004). Listener sensitivity to individual talker differences in voice-onset-time. Journal of the Acoustical Society of America, 115 (6), 3171-3183.

Allen, J. S., Miller, J. L., & DeSteno, D. (2003). Individual talker differences in voice-onset- time. Journal of the Acoustical Society of America, 113 , 544-552.

Andruski, J. E., Blumstein, S. E., & Burton, M. (1994). The effect of subphonetic differences on lexical access. Cognition, 52 , 163-187.

Aslin, R. N., & Pisoni, D. B. (1980). Some development processes in speech perception. In G. H. Yeni-Komshian, J. F. Kavanagh, & C. A. Ferguson (Eds.), Child phonology: Volume 2, perception (p. 67-96). New York: Academic Press.

Baese-Berk, M. M., Bradlow, A. R., & Wright, B. A. (2013). Accent-independent adaptation to foreign accented speech. Journal of the Acoustical Society of America, 133 (3), 174-180.

Bardhan, N. P., Aslin, R. N., & Tanenhaus, M. K. (2006). Return of the weckud wetch: Rapid adaptation to a new accent. Journal of the Acoustical Society of America, 119 (5), 3423-3423.

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. (2013). Random-effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68 , 255-278.

191 Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed- effects models using Eigen and S4. R package version 1.1-7, http://CRAN.R- project.org/package=lme4.

Bates, D., Mullen, K. M., Nash, J. C., & Varadhan, R. (2014). minqa: Derivative-free opti- mization algorithms by quadratic approximation [Computer software manual]. Retrieved from http://CRAN.R-project.org/package=minqa (R package version 1.2.3)

Bauer, M., & Parker, F. (2008). /æ/-raising in Wisconsin English. American Speech, 83 (4), 403-431.

Benson, E. J., Fox, M. J., & Balkman, J. (2011). The bag that Scott bought: The low vowels in northwest Wisconsin. American Speech, 86 (3), 271-311.

Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual recalibration of auditory speech identification: A McGurk after effect. Psychological Science, 14 , 592-597.

Boberg, C. (2005). The in Montreal. Language Variation and Change, 17 , 133-154.

Boersma, P., & Weenink, D. (2014). Praat: doing phonetics by computer. [Computer program]. Version 5.3.84, http://www.praat.org/.

Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106 (2), 707-729.

Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Perception & Psychophysics, 61 (2), 206-219.

Brouwer, S. M., Mitterer, H., & Heuttig, F. (2012). Speech reductions change the dynamics in competition during spoken word recognition. Language and Cognitive Processes, 27 (4), 539-571.

Brown, H., & Gaskell, M. G. (2014). The time-course of talker-specificity and lexical competition effects during word learning. Language, Cognition and Neuroscience.

Buchholz, L. K. (2009). Perceptual learning of dysarthric speech: Effects of familiarization and feedback (Unpublished master’s thesis). The University of British Columbia.

192 Campbell-Kibler, K. (2007). Accent, (ING), and the social logic of listener perception. American Speech, 82 (1), 32-64.

Campbell-Kibler, K., Walker, A., & Elward, S. (in preparation). Apparent-time change in the regional vowels of college students.

Chomsky, N., & Halle, M. (1968). The sound pattern of english. New York: Harper and Row.

Church, B. A., & Schacter, D. L. (1994). Perceptual specificity of auditory priming: Implicit memory for voice intonation and fundamental frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20 , 521-533.

Clarke, C., & Garrett, M. F. (2004). Rapid adaptation to foreign-accented English. Journal of the Acoustical Society of America, 116 , 3647-3658.

Clarke, S., Elms, F., & Youssef, A. (1995). The Third Dialect of English: Some Canadian evidence. Language Variation and Change, 7 , 209-228.

Clopper, C. G., & Bradlow, A. (2008). Perception of dialect variation in noise: Intelligibility and classification. Language and Speech, 51 , 175-198.

Clopper, C. G., Pierrehumbert, J. B., & Tamati, N. T. (2010). Lexical neighborhoods and phonological confusability in cross-dialect word recognition in noise. Laboratory Phonology, 1 , 65-92.

Clopper, C. G., & Pisoni, D. B. (2004). Effects of talker variability on perceptual learning of dialects. Language and Speech, 47 , 207-239.

Connine, C. M., Blasko, D. G., & Titone, D. A. (1993). Do the beginnings of spoken words have a special status in auditory word recognition. Journal of Memory and Language, 32 , 193-210.

Connine, C. M., Blasko, D. G., & Wang, J. (1994). Vertical similarity in spoken word recognition: Multiple lexical activation, individual differences, and the role of sentence context. Perception and Psychophysics, 56 (6), 624-636.

Connine, C. M., Titone, D. A., Deelman, T., & Blasko, D. G. (1997). Similarity mapping in spoken word recognition. Journal of Memory and Language, 37 , 463-480.

Coseriu, E. (1958). Sincron´ıa,diacron´ıae historia: el problema del cambio.

193 Craik, F. I. M., & Kirsner, K. (1974). The effect of speaker’s voice on word recognition. Quarterly Journal of Experimental Psychology, 26 , 274-284.

Cutler, A., Eisner, F., McQueen, J. M., & Norris, D. (2010). How abstract phonemic cate- gories are necessary for coping with speaker-related variation. In C. Fougeron, B. K¨uhnert, M. D’Imperio, & N. Vall´ee(Eds.), Laboratory phonology 10 (p. 91-111). Berlin: de Gruyter.

Cutler, A., Sebasti´an-Gall´es,N., Soler-Vilageliu, O., & can Ooijen, B. (2000). Constraints of vowels and consonants on lexical selection: Cross-linguistic comparisons. Memory & Cognition, 28 , 746-755.

Dahan, D., Drucker, S. J., & Scarborough, R. A. (2008). Talker adaptation in speech perception: Adjusting the signal or the representations. Cognition, 108 , 710-718.

Dahan, D., & Magnuson, J. (2006). Spoken word recognition. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (p. 249-283). Amsterdam: Aca- demic Press.

Davis, M. H., & Gaskell, M. G. (2009). A complementary systems account of word learning: Neural and behavioural evidence. Philosophical Transactions of the Royal Society of London, Series B – Biological Sciences, 364 , 3773-3800.

Davis, M. H., Johnsrude, I., Hervais-Ademan, A., Taylor, K., & McGettigan, C. (2005). Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. Journal of Experimental Psychology: General, 134 , 222-241.

Davis, M. H., Marslen-Wilson, W. D., & Gaskell, M. G. (2002). Leading up the lexi- cal garden-path: Segmentation and ambiguity in spoken word recognition. Journal of Experimental Psychology: Human Perception and Performance, 28 , 214-244.

Delle Luche, C., Poltrock, S., Goslin, J., New, B., Floccia, C., & Nazzi, T. (2014). Differ- ential processing of consonants and vowels in the auditory modality: A cross-linguistic study. Journal of Memory and Language, 72 , 1-15.

Diependaele, K., Brysbaert, M., & Neri, P. (2012). How noisy is lexical decision? Frontiers in Psychology, 3 (348), doi: 10.3389/fpsyg.2012.00348.

Eckert, P. (2000). Linguistic variation as social practice. Oxford: Blackwell.

194 Eckert, P. (2008). Variation and the indexical field. Journal of Sociolinguistics, 12 , 453-476.

Eckert, P. (2012). Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. The Annual Review of Anthropology, 41 , 87-100.

Eimas, P. D., & Corbit, J. D. (1973). Selective adaptation of linguistic feature detectors. Cognitive Psychology, 4 (1), 99-109.

Eisner, F. (2012). Perceptual learning in speech. In N. Seel (Ed.), Encyclopedia of the science of learning (p. 2583-2584). Berlin: Springer.

Eisner, F., & McQueen, J. M. (2005). The specificity of perceptual learning in speech processing. Perception & Psychophysics, 67 , 224-238.

Eisner, F., & McQueen, J. M. (2006). Perceptual learning in speech: Stability over time. Journal of the Acoustical Society of America, 119 (4), 1950-1953.

Eisner, F., Melinger, A., & Weber, A. (2013). Constraints on the transfer of perceptual learning in accented speech. Frontiers in Psychology, 4 , 1-9.

Elman, J. L., Diehl, R. L., & Buchwald, S. E. (1977). Perceptual switching in bilinguals. Journal of the Acoustical Society of America, 62 , 971-974.

Endzelin, J. (1922). Lettische grammatik. Riga: Lettischen Bildungsministerium.

Evans, B., & Iverson, P. (2004). Vowel normalization for accent: An investigation of best exemplar locations in Northern and Southern sentences. Journal of the Acoustical Society of America, 115 (1), 352-361.

Feagin, C. (1986). More evidence for vowel change in the South. In D. Sankoff (Ed.), Diveristy and diachrony (p. 83-95). Amsterdam: John Benjamins.

Fenn, K. M., Nusbaum, H. C., & Pisoni, D. B. (2003). Consolidation during sleep of perceptual learning of spoken language. Nature, 425 , 614-616.

Feustel, T. C., Shiffrin, R. M., & Salasoo, A. (1983). Episodic and lexical contributions to the repetition effect in word recognition. Journal of Experimental Psychology: General, 112 , 309-346.

Fine, A. B., & Jaeger, T. F. (2013). Evidence for implicit learning in syntactic comprehen- sion. Cognitive Science, 1-14.

195 Flege, J. (1992). Speech learning in a second language. In C. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Models, research, and implications (p. 565-604). Timonium, MD: York Press.

Flege, J., Bohn, O.-S., & Jang, S. (1997). The effect of experience on nonnative subjects’ production and perception of English vowels. Journal of Phonetics, 25 , 437-470.

Floccia, C., Goslin, J., Girard, F., & Konopczynski, G. (2006). Does a regional accent perturb speech processing? Journal of Experimental Psychology: Human Perception and Performance, 32 (5), 1276-1293.

Fox, R. A., Flege, J. E., & Munro, M. J. (1995). The perception of English and Spanish vowels by native English and Spanish listeners: A multidimensional scaling analysis. Journal of the Acoustical Society of America, 97 , 2540-2551.

Francis, A. L., & Nusbaum, H. C. (2002). Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance, 28 (2), 349-366.

Frauenfelder, U. H., Scholten, M., & Content, A. (2001). Bottom-up inhibition in lexi- cal selection: Phonological mismatch effects in spoken word recognition. Language and Cognitive Processes, 16 (5/6), 583-607.

Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating form and meaning: A distributed model of speech perception. Language and Cognitive Processes, 12 , 613-656.

Gass, S., & Varonis, E. M. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language Learning, 34 (1).

Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, & Cognition, 22 , 1166-1183.

Goldinger, S. D. (1998). Echoes of echoes? an episodic theory of lexical access. Psychological Review, 105 , 251-279.

Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception. In Proceedings of the international congress of phonetic sciences.

196 Goldinger, S. D., Kleider, H. M., & Shelley, E. (1999). The marriage of perception and memory: Creating two-way illusions with words and voices. Memory & Cognition, 27 , 328-338.

Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psychology, 49 , 585-612.

Greenspan, S. L., Nusbaum, H. C., & Pisoni, D. B. (1988). Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory & Cognition, 14 , 421-433.

Grimm, J. (1822). Deutsche grammatik (No. v. 1). Dieterichsche buchhandlung.

Grossberg, S. D. (1980). How does a brain build a cognitive code? Psychological Review, 87 , 1-51.

Grossberg, S. D. (1986). The adaptive self-organization of serial order in behaviour: Speech, language and motor control. In E. C. Schwab & H. C. Nusbaum (Eds.), Pattern recognition by humans and machines. speech perception (Vol. 1, p. 187-294). New York, NY: Academic Press.

Grossberg, S. D. (2013). Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks, 37 , 1-47.

Hay, J., Nolan, A., & Drager, K. (2006). From fush to feesh: Exemplar priming in speech perception. The Linguistic Review, 23 , 351-379.

Hebdige, D. (1984). Subculture: The meaning of style. New York: Methuen.

Helson, H. (1948). Adaptation-level as a basis for a quantitative theory of frames of reference. Psychological Review, 55 (6), 297-313.

Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97 (5), 3099- 3111.

Hinton, L., Moonwoman, B., Bremner, S., Luthin, H., Van Clay, M., Lerner, J., & Corcoran, H. (1987). It’s not just the Valley Girls: A study of . In J. Aske, N. Beery, L. Michaelis, & H. Filip (Eds.), Proceedings of the 13th annual meeting of the Berkeley Linguistics Society (p. 117-128). Berkeley, CA: Berkeley Linguistics Society.

197 Idemaru, K., & Holt, L. L. (2011). Word recognition reflects dimension-based statistical learning. Journal of Experimental Psychology: Human Perception and Performance, 37 , 1939-1956.

Jacewicz, E., & Fox, R. A. (2012). The effects of cross-generational and cross-dialectal variation of vowel identification and classification. Journal of the Acoustical Society of America, 131 , 1413-1433.

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59 , 434-446.

Jakobson, R. (1931). Uber¨ die phonologischen Sprachb¨unde. Travaux du Cercle Linguistique de Prague, 4 .

Jongman, A., Wade, T., & Sereno, J. (2003). On improving the perception of foreign- accented speech. In M. J. Sole, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th international congress on phonetic sciences (p. 1561-1564).

Karni, A., & Bertini, G. (1997). Learning perceptual skills: Behavioral probes into adult cortical plasticity. Current Opinion in Neurobiology, 7 (4), 530-535.

Kerswill, P. (2003). Dialect levelling and geographical diffusion in British English. In D. Britain & J. Cheshire (Eds.), Social dialectology. in honour of Peter Trudgill (p. 223- 243). Benjamins.

Kiesling, S. (1998). Men’s identities and sociolinguistic variation: The case of fraternity men. Journal of Sociolinguistics, 2 , 69-100.

Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical access. Journal of Phonetics, 7 , 279-312.

Klatt, D. H. (1986). The problem of variability in speech recognition and models of speech perception. In J. S. Perkell & D. H. Klatt (Eds.), Invariance and variability in speech processing (p. 300-319). Hillsdale, NJ: Lawrence Erlbaum Associates.

Kleinschmidt, D. F., & Jaeger, T. F. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review.

Kraljic, T., Brennan, S. E., & Samuel, A. G. (2008). Accommodating variation: Dialects, idiolects, and speech processing. Cognition, 107 , 54-81.

198 Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there a return to normal. Cognitive Psychology, 51 , 141-178.

Kraljic, T., & Samuel, A. G. (2006). Generalization in perceptual learning for speech. Psychonomic Bulletin and Review, 13 , 262-268.

Kraljic, T., & Samuel, A. G. (2007). Perceptual adjustments to multiple speakers. Journal of Memory and Language, 56 , 1-15.

Kraljic, T., Samuel, A. G., & Brennan, S. E. (2009). First impressions and last resorts: How listeners adjust to speaker variability. Psychological Science, 19 , 332-338.

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99 (1), 22-44.

Labov, W. (1966). The social stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics.

Labov, W. (1994). Principles of linguistic change. volume 1: Internal factors. Oxford: Blackwell.

Labov, W. (1998). The three dialects of English. In M. D. Linn (Ed.), Handbook of dialects and language variation (p. 39-81). San Diego: Academic Press.

Labov, W., Ash, S., & Boberg, C. (2006). The atlas of north american english. Berlin: Mouton de Gruyter.

Ladefoged, P. (2000). A course in phonetics, 4th edition. Boston: Thomson Wadsworth.

Ladefoged, P., & Broadbent, D. E. (1957). Information conveyed by vowels. Journal of the Acoustical Society of America, 29 , 98-104.

Lattner, S., Maess, B., Wang, Y., Schauer, M., Alter, K., & Friederici, A. D. (2003). Dissociation of human and computer voices in the brain: Evidence for a preattentive gestalt-like perception. Human Brain Mapping, 13-21.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74 (6), 431-461.

Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel quality systems: The role of perceptual contrast. Language, 48 (4), 839-862.

199 Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/: II. The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94 , 1242-1255.

LoCasto, P. C., & Connine, C. M. (2011). Processing no-release variants in connected speech. Language and Speech, 54 (2), 181-197.

Loebach, J. L., Bent, T., & Pisoni, D. B. (2008). Multiple routes to the perceptual learning of speech. Journal of the Acoustical Society of America, 142 (1).

Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89 (2), 874-876.

Love, B. C. (2003). The multifaceted nature of unsupervised category learning. Psychonomic Bulletin & Review, 10 (1), 190-197.

Lubowicz, A. (2011). Chain shifts. In M. van Oostendorp, C. Ewen, B. Hume, & K. Rice (Eds.), Companion to phonology. Wiley-Blackwell.

Luce, P. A., Goldinger, S. D., Auer, E. T., & Vitevitch, M. S. (2000). Phonetic priming, neighborhood activation, and PARSYN. Perception & Psychophysics, 62 , 615-625.

Maddieson, I. (1984). Patterns of sounds. Cambridge: Cambridge University Press.

Marslen-Wilson, W., Moss, H. E., & van Halen, S. (1996). Perceptual distance and com- petition in lexical access. Journal of Experimental Psychology: Human Perception and Performance, 22 (6), 1376-1392.

Martinet, A. (1952). Function, structure, and sound change. Word, 8 (1), 1-32.

Martinet, A. (1955). Economie´ des changements phon´etiques. Francke Bern.

Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. Language and Cognitive Processes, 27 , 953-978.

Maye, J., Aslin, R., & Tanenhaus, M. (2008). The Weckud Wetch of the Wast: Lexical adaptation to a novel accent. Cognitive Science, 32 , 543-562.

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18 , 1-86.

200 McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are comple- mentary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102 , 419-457.

McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 24 , 109-165.

McLennan, C. T., Luce, P. A., & Charles-Luce, J. (2005). Representation of lexical form: Evidence from studies of sublexical ambiguity. Journal of Experimental Psychology: Hu- man Perception and Performance, 31 , 1308-1314.

McQueen, J. M., Cutler, A., & Norris, D. (2006). Phonological abstraction in the mental lexicon. Cognitive Science, 30 , 1113-1126.

McQueen, J. M., & Heuttig, F. (2012). Changing only the probability that spoken words will be distorted changes how they are recognized. Journal of the Acoustical Society of America, 131 (1), 509-517.

Miller, J. D. (1997). Internal structure of phonetic categories. Language and Cognitive Processes, 12 (5/6), 865-869.

Milroy, L. (1980). Language and social networks. Oxford: Blackwell.

Mines, M. A., Hanson, B. F., & Shoup, J. E. (1978). Frequency of occurrence of phonemes in conversational english. Language and Speech, 21 (3), 221-241.

Mitterer, H., Chen, Y., & Zhou, X. (2011). Phonological abstraction in processing lexical- tone variation: Evidence from a learning paradigm. Cognitive Science, 35 , 184-197.

Mitterer, H., Scharenborg, O., & McQueen, J. M. (2013). Phonological abstraction without phonemes in speech perception. Cognition, 129 , 356-361.

Neger, T. M., Rietveld, T., & Janse, E. (2014). Relationship between perceptual learn- ing in speech and statistical learning in younger and older adults. Frontiers in Human Neuroscience, 8 (628), doi: 10.3389/fnhum.2014.00628.

Nespor, M., Pe˜na,M., & Mehler, J. (2003). On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e Linguaggio, 2 , 221-247.

201 Nordstr¨om,P.-E., & Lindblom, B. (1975). A normalization procedure for vowel formant data. In Proceedings of the 8th International Congress of Phonetic Sciences (p. 212).

Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences, 23 , 299-325.

Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47 , 204-238.

Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60 , 355-376.

Nygaard, L. C., Sommers, M. C., & Pisoni, D. B. (1994). Speech perception as a talker- contingent process. Psychological Science, 5 , 42-46.

Ohala, J. J., Beddor, P. S., Krakow, R. A., & Goldstein, L. M. (1986). Perceptual constraints and : A study of nasal vowel height. Phonology, 3 (1), 197-217.

O’Reilly, R. C., & Norman, K. A. (2002). Hippocampal and neocortical contributions to memory: Advances in the complementary learning systems framework. TRENDS in Cognitive Science, 6 , 505-510.

Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory and Cognition, 19 (2), 309-328.

Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24 , 175-184.

Pisoni, D. B. (1997). Some thoughts on ‘normalization’ in speech perception. In K. Johnson & J. W. Mullennix (Eds.), Talker variability in speech processing (p. 9-33). San Diego: Academic Press.

Podesva, R. J. (2007). Phonation type as a stylistic variable: The use of falsetto in constructing a persona. Journal of Sociolinguistics, 11 , 478-504.

Posner, M. I., & Keele, S. W. (1968). On the genesis of abstract ideas. Journal of Experi- mental Psychology, 77 , 353-363.

Purnell, T. C. (2008). Prevelar raising and phonetic conditioning: The role of labial and anterior tongue gestures. American Speech, 83 (4), 373-402.

202 R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/

Ranbom, L. J., & Connine, C. (2007). Lexical representation of phonological variation in spoken word recognition. Journal of Memory and Language, 57 , 273-298.

Ranbom, L. J., Connine, C., & Yudman, E. M. (2009). Is phonological context always used to recognize variant forms in spoken word recognition: The role of variant frequency and context distribution. Journal of Experimental Psychology: Human Perception and Performance, 35 , 1205-1220.

Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97 (2), 285-308.

Reinisch, E., & Holt, L. L. (2014). Lexically-guided phonetic retuning of foreign-accented speech and its generalization. Journal of Experimental Psychology: Human Perception and Performance, 40 , 539-555.

Roberts, A. H. (1965). A statistical linguistic analysis of american english. The Hague: Mouton.

Samuel, A. G., & Kraljic, T. (2009). Perceptual learning for speech. Attention, Perception, & Psychophysics, 71 (6), 1207-1218.

Schacter, D. L., & Church, B. A. (1992). Auditory priming: Implicit and explicit mem- ory for words and voices. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18 , 915-930.

Schneider, W., Eschman, A., & Zuccolotto, A. (2012). E-Prime user’s guide [Computer software manual]. Pittsburgh: Psychology Software Tools, Inc.

Sidaras, S., Alexander, J. E. D., & Nygaard, L. C. (2009). Perceptual learning of systematic variation in Spanish-accented speech. Journal of the Acoustical Society of America, 125 , 3306-3316.

Silverstein, M. (2003). Indexical order and the dialects of sociolinguistic life. Language and Communication, 23 , 193-229.

Sjerps, M. J., & McQueen, J. M. (2010). The bounds of flexibility in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 36 (1), 195- 211.

203 Smither, J. A.-A. (1993). Short term memory demands in processing synthetic speech by old and young adults. Behavior & Information Technology, 12 , 330-335.

Sumner, M. (2011). The role of variation in the perception of accented speech. Cognition, 119 , 131-136.

Sumner, M., & Samuel, A. G. (2005). Perception and representation of regular variation: The case of word-final /t/. Journal of Memory and Language, 52 , 322-338.

Sumner, M., & Samuel, A. G. (2009). The effect of experience on the perception and representation of dialect variation. Journal of Memory and Language, 60 , 487-501.

Trude, A. M., & Brown-Schmidt, S. (2012). Talker-specific perceptual adaptation during online speech perception. Language and Cognitive Processes, 27 , 979-1001.

Trude, A. M., Tremblay, A., & Brown-Schmidt, S. (2013). Limitations on adaptation to foreign accents. Journal of Memory and Language, 69 (3), 349-367.

Trudgill, P. (1974). The social differentiation of English in Norwich. Cambridge, UK: Cambridge University Press.

Tzeng, C. Y., & Nygaard, L. C. (2012). The effect of training structure on perceptual learning of accented speech. Journal of the Acoustical Society of America, 131 , 3310.

Utman, J. A., Blumstein, S. E., & Burton, M. W. (2000). Effects of subphonetic and syllable structure on word recognition. Perception & Psychophysics, 62 (6), 1297-1311.

Utman, J. A., Blumstein, S. E., & Sullivan, K. (2001). Mapping from sound to meaning: Reduced lexical activation in Broca’s aphasics. Brain and Language, 79 , 444-472. van Ooijen, B. (1996). Vowel mutability and lexical selection in English: Evidence from a word recognition task. Memory & Cognition, 24 (5), 573-583.

Vroomen, J., van Linden, S., de Gelder, B., & Bertelson, P. (2007). Visual recalibration and selective adaptation in auditory-visual speech perception: Contrasting build-up courses. Neuropsychologia, 45 , 572-577.

Vroomen, J., van Linden, S., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Selec- tive adaptation and recalibration of auditory speech by lipread information: Dissipation. Speech Communication, 44 , 55-61.

204 Weatherholtz, K. (2013). Is F1 differernt from F2?: Generalization of lexically-driven perceptual learning. (Talk presented at The 87th Annual Meeting of the Linguistic Society of America, Boston, MA.)

Weber, A., Di Betta, A. M., & McQueen, J. M. (2014). Treack or trit: Adaptation to genuine and arbitrary foreign accents by monolingual and bilingual listeners. Journal of Phonetics, 46 , 34-51.

Weinreich, U., Labov, W., & Herzog, M. I. (1968). Empirical foundations for a theory of language change: Directions for. In Directions for historical linguistics (p. 95-105). Austin, TX: University of Texas Press.

Wells, J. C. (1982). Accents of English: The British Isles (Vol. 2). Cambridge University Press.

White, K., & Aslin, R. (2011). Adaptation to novel accents by toddlers. Developmental Science, 14 , 372-84.

White, M., Rajkumar, R., Ito, K., & Speer, S. (2009). Eye tracking for the online evaluation of prosody in speech synthesis: Not so fast! In of the 10th annual conference of the international speech communication association. Brighton, U.K..

Wolfram, W. (1969). A sociolinguistic description of Detroit Negro speech. Washington, D.C.: Center for Applied Linguistics.

Yamauchi, T., Love, B. C., & Markman, A. B. (2002). Learning non-linearly separable categories by inference and classification. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28 , 585-593.

Yang, J., & Fox, R. A. (2014). Perception of english vowels by bilingual Chinese-English and corresponding monolingual listeners. Language and Speech, 57 (2), 215-237.

Yuan, J., & Liberman, M. (2008). Speaker identification on the SCOTUS corpus. In Proceedings of acoustics ‘08.

Zeelenberg, R., Wagenmakers, E.-J., & Schiffrin, R. (2004). Nonword repetition priming in lexical decision reverses as a function of study task and speed stress. Journal of Experimental Psychology: Learning, Memory & Cognition, 30 , 270-77.

Zeller, C. (1997). The investigation of a sound change in progress: /æ/ to /e/ in Midwestern American English. Journal of English Linguistics, 25 , 142-155.

205 Zhang, Q. (2008). Rhotacization and the “Beijing Smooth Operator”: The social meaning of a linguistic variable. Journal of Sociolinguistics, 12 , 201-222.

206 Appendix A

Stimulus materials for Experiments 1-3

A.1 Complete high token variability exposure passage for

Experiment 1

THE ADVENTURES OF PINOCCHIO

CHAPTER ONE: In which it is explained how Maestro Cherry, carpenter, found a piece of wood that wept and laughed like a child.

Centuries ago there lived —

“A king!” my little readers will say immediately.

No, children, not even close. Once upon a time there was a piece of wood. It was not an expensive piece of wood. Far from it. Just a common sort of firewood, one of those thick, solid logs, the only use for which is to throw on the fire during winter to make a cold room into a warm and homely abode.

I don’t know how this happened, yet the fact remains that one morning, which began as a normal morning, but quickly became anything but normal, this piece of wood found itself in the shop of an old carpenter. His real name was Maestro Antonio, but everyone called him Maestro Cherry, for the tip of his nose was so round and red and shiny that it looked like a ripe cherry.

As soon as he saw that piece of wood, Maestro Cherry was filled with joy. Rubbing his hands together happily, he mumbled half to himself:

207 “This has come in the nick of time. I have almost finished making a table, and this very morning I shall use this wood to make the final leg.”

He took hold of his hatchet, ready to peel off the bark and shape the wood. But as he was about to give it the first blow, he stood still with arm uplifted, for he had noticed a wee, little voice say in a beseeching tone:

“Please be careful! Don’t hit me so hard!”

What a look of surprise shone upon Maestro Cherry’s face! His funny face became still funnier. He turned frightened eyes about the room to find where that wee, little voice had come from and he saw no one! He looked under the bench – no one! He peeped inside the closet – no one! He searched among the shavings – no one! He opened the door and looked up and down the street – and still no one!

“Oh, I see” he said, with a laugh and a snort. “It can easily be seen that I only thought

I heard the tiny voice say the words! I only imagined it. Well, to work once more.”

He struck a solemn blow upon the piece of wood. Before he could deliver a second blow, he heard the same little voice cry out:

“Oh, oh! You hurt”

Maestro Cherry grew dumb, his mouth opened wide, his tongue hung down on his chin, his eyes almost popped out of his head, and his nose (which was normally as red as a bouquet of roses) turned every color in the rainbow. As soon as this rainbow of color faded and he regained the use of his senses, he looked all around the room. Finding no one there, he said in a trembling voice:

“How has someone spoken to me, when there is no one in this room? Might it be that this piece of wood has learned to weep and cry like a child? I can hardly believe it. Here it is – a piece of common firewood, whose only use is to burn in the stove, the same as any other. Yet – might some buffoon be hidden inside, or some sort of joker looking for a laugh?

If so, the worse for him. I’ll fix him!”

208 With these words, he grabbed the log with both hands and started to knock it about, showing absolutely no mercy. He shook it violently. He threw it to the floor, against the walls of the room, and up in the air almost to the ceiling. He picked it up from the floor and shook it again. He then stood very still and listened for the tiny voice to groan and cry. He waited two minutes – nothing.

“What voice just spoke to me?”, he called.

No response.

He waited five minutes – nothing.

“You have spoken. I heard you”, he shouted again.

No response.

He stood there for ten minutes – and still nothing.

“Oh, I see” he said, trying bravely to laugh and ruffling up his wig with his hand. “It can easily be seen I only imagined that a tiny voice had spoken to me! Well, well – to work once more!”

The poor fellow had almost died from fright, so he tried to sing a little song in order to gain courage. He set aside the hatchet and picked up sanding paper to make the wood smooth. But as he moved the sanding paper to and fro, fro and to, back and forth, forth and back, he noticed the same tiny voice. This time it giggled in a goofy manner as it spoke:

“Stop it! Oh, stop it! Ha, ha, ha! You tickle my stomach.”

This time poor Maestro Cherry fell to the floor motionless as if he’d been shot. He couldn’t move a muscle. When he opened his eyes, he found himself sitting on the floor.

His face had changed; fright had turned even the tip of his nose from red to deepest purple.

209 CHAPTER TWO: In which Maestro Cherry gives the piece of wood to his friend Geppetto, who takes it to make himself a Marionette that will dance, fence, and turn somersaults.

In that very moment, a truly awful bang sounded upon the door.

“Come in,” said the carpenter, not having an atom of strength left with which to stand.

At the words, he saw the door open and a well groomed man stood in the doorway. The man’s name was Geppetto, but to the boys of the neighborhood, he was Polendina, another word for cornmeal mush, because of the wig he always wore which was just the color of yellow corn.

Geppetto was a good man. He had a sense of humor about most things and was quite often in a pleasant mood. But sometimes he would lose his temper. If anyone called him

Polendina, he became as wild as a beast and no one could soothe him!

“Good morning, Maestro Antonio,? said Geppetto. “I cant help but notice that you are down there on the floor. What are you doing there?”

“I am teaching the ants their ABC’s.”

“Good luck!”, replied Geppetto.

“What brought you here?”, asked Antonio.

“My legs and the shoes on my feet,” said Geppetto. “And Maestro Antonio it may

flatter you to know that I have come to beg for a sort of favor.”’

“Here I am, at your service,” answered the carpenter, as he moved to his knees.

“This morning, a truly fine idea came to me.”

“Let’s hear it.”

“I thought of making myself a gorgeous wooden Marionette. It must be truly wonderful, one that will be able to dance, fence, and turn somersaults. With it I intend to go around the world, to earn my crust of bread and cup of wine. What do you think of it?”

“Bravo, Polendina!” cried the same tiny voice which came from no one quite knew.

210 Upon hearing himself called Polendina, Maestro Geppetto turned the color of a red pepper and, facing the carpenter, said to him angrily:

“Why do you choose to insult me?”

“Who is insulting you?”

“You called me Polendina.” said Geppetto as he shook his finger in Antonio’s face.

“I did not.”

“I suppose you think I did! Yet I KNOW it was you.”

“No!”

“Yes’!’

“No!”

“Yes!”

And growing angrier each moment, they went from words to blows, and finally began to throw punches and to scratch and bite and slap each other. When the fight was over,

Maestro Antonio had Geppetto’s yellow wig in his hands and Geppetto found the carpenter’s curly wig in his mouth.

“Give me back my wig!” shouted Maestro Antonio in a brusque tone of voice.

“You return mine and we’ll be friends.”

The two little old men, each with his own wig back upon his own head, shook hands and swore an oath to be good friends for the rest of their lives. But this promise of friendship wouldn’t last the morning.

“Well then, Maestro Geppetto,” said the carpenter. “To prove that I bear no ill will, I will offer you whatever you want.”

“I only want a piece of wood to make a Marionette. Can you offer me that?”

Maestro Antonio knew exactly which piece he’d choose for Geppetto. He looked at the piece of wood that had frightened him so much, and he went immediately to get it. But as he was about to offer it to his friend, with a violent jerk it slipped out of his hands and hit against poor Geppetto’s thin legs.

211 “Ah! Is this the gentle way, Maestro Antonio in which you offer your gifts? You have almost made me lame!”

“I swear to you, I didn’t do it!”

“Ahh, Of course ! It’s my own fault.” said Geppetto as he shook his head from side to side.

“No, It’s the fault of this piece of wood.”

“Technically that is most correct; but remember you were the one to throw it at my legs.”

“I did not throw it!”

“Liar! You did throw it!”, yelled Geppetto.

“Geppetto, do not insult me or I will call you Polendina.”

“Idiot!” yelled Geppetto.

“Polendina!”, offered Antonio in return.

“Ugly monkey!”

“Polendina!”

“Donkey!”

“Polendina!”

At that moment, upon hearing himself called Polendina for the third time, Geppetto was overcome with the most incredible typhoon of rage.

“You have spoken your last words”, he called, and with that he threw himself upon the carpenter. Then and there they gave each other a sound thrashing.

Geppetto took his left foot and stamped on Polendina’s right foot. Polendina returned this blow by stamping his right foot on Geppetto’s left foot. They continued until they could not fight any longer, which in all honesty took less than five minutes. After this fight,

Maestro Antonio had two more scratches on his nose, and Geppetto had an awful bruise on his cheek and two buttons missing from his waistcoat. Thus having settled their accounts, they shook hands and swore to be truly good friends for the rest of their lives.

212 Then Geppetto took the fine piece of wood, thanked Maestro Antonio, and limped away toward home, tripping upon loose stones in the street.

CHAPTER THREE: During which we learn that as soon as Geppetto gets home, he fashions the Marionette and calls it Pinocchio. Also, the first pranks of the Marionette.

Little as Geppetto’s house was, it was neat and sort of comfortable. It was a small room on the ground floor, with a tiny window under the stairway. The furniture couldn’t have been simpler: a very old chair, a rickety old bed, and a tumble down table. A fireplace full of burning logs was painted on the wall opposite the door, complete with ashes and smoke and a mantle of brick and stone. Over the fire, there was painted an iron pot full of vegetable stew which kept boiling happily away and sending up clouds of what looked like real steam.

In one corner of the room, there was a small stove for making soup. And on the floor close by was a bushel of potatoes and a small bag of yellow squash and some utensils.

As soon as he reached home, Geppetto took his tools and began to cut and shape the wood into a Marionette.

“What name shall I choose for him?” he said to himself. “I think I’ll choose to call him

PINOCCHIO. This name will make his fortune. I knew a whole family of Pinocchi once –

Pinocchio the father, Pinocchiae the mother, and Pinocchiai the children – and they were all lucky. The richest of them begged for his living.”

After choosing the name for his Marionette, Geppetto set seriously to work to make the hair, the forehead, the eyes. Fancy his surprise when he noticed that these eyes moved and then stared fixedly at him. Geppetto, seeing this, felt insulted and said in a grieved tone:

“Awful wooden eyes, why do you stare so?”

There was no answer.

After the eyes, Geppetto made the nose, which began to stretch as soon as finished. It stretched out, becoming longer and longer and longer – and longer still – until it became

213 so long, it seemed endless. Poor Geppetto kept cutting it and cutting it, but the more he cut, the longer that impertinent nose grew. In despair he let it alone.

Next he made the mouth. No sooner was it finished than it began to laugh and poke fun at him.

“Stop laughing!” said Geppetto angrily. “Please be a good boy and use your manners.”

But he might as well have spoken to the wall.

“Stop laughing, I say!” he roared in a voice of thunder.

The mouth stopped laughing, but it stuck out a long tongue.

Not wishing to start an argument, Geppetto made believe he saw nothing and continued with his work. After the mouth, he made the chin, then the neck. After a few short moments, he set to work on the shoulders, the stomach, and the arms.

“What shall I choose to make next”, Geppetto wondered. “Ahh yes, the hands.”

He used several small chisels and took great care to get them just right. As he was putting the last touches on the finger tips, Geppetto felt his wig being pulled off his head.

He glanced up and what did he see? His yellow wig was in the Marionette’s hand.

“Pinocchio, put it back! And please stop moving around!”

But instead, Pinocchio put it on his own head, which was half swallowed up in it. At that moment, Geppetto became truly sad and downcast, more so than he had ever been before.

“Pinocchio, you wicked boy!” he cried out. “You are not yet finished, and you start out by being impudent to your poor father. Very bad, my son, very bad! Why cant you be good?”

And he wiped away a tear.

Pinocchio’s legs and feet still had to be made. Geppetto set to work making the left foot, then the right foot. As soon as they were done, Geppetto felt a sharp kick on the tip of his nose.

214 “I deserve it!” he said to himself. “I should have thought of this before I made him.

And I shouldn’t have gotten so close to his feet. Now it’s too late!”

By mid morning, Geppetto had finished most of the work and was feeling quite good.

“I think I will rest for a moment and eat some soup,” he said.

At hearing this, Pinocchio grabbed the only spoon in the room and hid it under some sanding paper. Geppetto looked around and became very confused.

“Where is my spoon?”, he asked aloud. “I cannot eat soup without it.”

Failing to find it, he attempted to use a fork, but the broth kept dripping on his legs.

So he gave up and resumed work on the Marionette, finishing the final details.

He couldn’t decide whether Pinnochio should be a barefoot Marionette or whether to give him shoes. He decided to think it over for a while and in the meantime he took hold of the Marionette under the arms and put him on the floor.

Pinocchio’s legs were so stiff that he couldn’t move them, and Geppetto held his hand and showed him how to put one foot after the other, and how to use his hands for balance.

When his legs were mostly limbered up, Pinocchio started moving by himself and ran all about the room. He looked around and noticed the open door. He took off running, and with one leap he was out the door and racing down the cobble stone street. Poor Geppetto ran after him but was unable to catch him, for Pinocchio ran in leaps and bounds, his two wooden feet, as they beat upon the stones of the street, making the most awful noise, as loud as twenty peasants in wooden shoes.

“Catch him! Catch him!” Geppetto kept shouting.

But the people in the street, seeing a wooden Marionette running through the alleyways, stood still to stare and to laugh until they cried.

At last, by sheer luck, a policeman came along. He noticed all the noise, and thought that an animal had gotten loose, maybe a horse or a donkey, or maybe just a silly goose.

The policeman stood bravely in the middle of the street, with legs wide apart, and with

firm resolve to stop the animal and prevent any trouble.

215 Pinocchio only saw the policeman once he was close. He tried his best to escape between the legs of the big fellow, and he almost made it.

But as soon as Pinocchio got close enough, the policeman took hold of him by the nose

(which was longer than any he?d ever seen, and which seemed made on purpose for grabbing of this sort ). Having thus stopped Pinocchio from moving, the policeman then returned him to Maestro Geppetto.

At that moment, the little old man was so mad he just wanted to pull Pinocchio’s ears.

Imagine the sort of emotions he felt when he looked at Pinocchio and discovered that he had forgotten to make them!

So he did the only thing he could do. He took Pinocchio by the back of the neck and took him home. As he was doing so, he shook him two or three times. He stood quietly for several moments, thinking deeply He then shook Pinnochio again and said to him angrily:

“We’re going home now. When we get home, we’ll settle this matter!”

Pinocchio, upon hearing this, threw himself on the ground and wouldn’t take another step. One person after another gathered around the two. Some said one thing, some another.

“Poor Marionette,” called one man. “I am not surprised he doesn’t want to go home.

Geppetto will beat him horribly, striking him with blow after blow. No doubt, Geppetto is truly awful ! There is no shred of goodness or normalcy in him.”

“Geppetto looks like a good man,” offered another, “but with boys he’s a real tyrant.

If we leave that poor Marionette in his hands he may tear him to pieces right here in the cobble stone streets!”

They said so much that, finally, the policeman ended matters his way. He set Pinocchio at liberty and took Geppetto to prison. The poor old fellow did not know how to defend himself, but wept and wailed like a child and said between his sobs:

216 “Ungrateful boy! I tried to offer you the world. I tried and tried to make you a normal, well-behaved Marionette! But you are nowhere close to that sort of doll. You are awful and mean. I deserve it, however! I should have given the matter more thought.”

What happened after this moment is an almost unbelievable story, but you will have to read it yourself.

217 A.2 Complete medium token variability exposure passage for

Experiment 2

THE ADVENTURES OF PINOCCHIO

CHAPTER ONE:

I don’t know how this happened, yet the fact remains that one morning, which began as a normal morning, but quickly became anything but normal, this piece of wood found itself in the shop of an old carpenter. His real name was Maestro Antonio, but everyone called him Maestro Cherry, for the tip of his nose was so round and red and shiny that it looked like a ripe cherry.

As soon as he saw that piece of wood, Maestro Cherry was filled with joy. Rubbing his hands together happily, he mumbled half to himself:

“This has come in the nick of time. I have almost finished making a table, and this very morning I shall use this wood to make the final leg.”

He took hold of his hatchet, ready to peel off the bark and shape the wood. But as he was about to give it the first blow, he stood still, for he had noticed a wee, little voice say in a beseeching tone:

“Please be careful! Don’t hit me so hard!”

He turned frightened eyes about the room to find where that wee, little voice had come from and he saw no one!

“Oh, I see!” he said. “I only imagined it. Well, to work once more.”

He struck a solemn blow upon the piece of wood. He heard the same little voice cry out: “Oh, oh! You hurt!”

Maestro Cherry grew dumb, and his nose (which was normally as red as a bouquet of roses ) turned every color in the rainbow.

218 “How has someone spoken to me , when there is no one in this room? Might it be this piece of wood has learned to weep and cry like a child? I can hardly believe it.”

In that very moment , a truly awful bang sounded upon the door.

“Come in,” said the carpenter.

At the words, he saw the door open and a well groomed man stood in the doorway .

The man’s name was Geppetto. Geppetto was a good man. He had a sense of humor about most things and was quite often in a pleasant mood. But sometimes he became as wild as a beast, and no one could soothe him!

“What brought you here?”, asked Antonio.

“It may flatter you to know that I have come to beg for a sort of favor.”

“Here I am, at your service,” answered the carpenter.

“I only want a piece of wood to make a Marionette. Can you offer me that?”

Maestro Antonio knew exactly which piece he’d choose for Geppetto. He looked at the piece of wood that had frightened him so much, and he went immediately to get it. But as he was about to offer it to his friend, with a violent jerk it slipped out of his hands and hit against poor Geppetto’s thin legs.

“Ah! Is this the gentle way, Maestro Antonio in which you offer your gifts? You have almost made me lame!”

And with that he threw himself upon the carpenter. Then and there they gave each other a sound thrashing. After this fight, Maestro Antonio had scratches on his nose, and

Geppetto had an awful bruise on his cheek. Then Geppetto took the fine piece of wood, thanked Maestro Antonio, and limped away toward home.

CHAPTER TWO:

As soon as he reached home, Geppetto took his tools and began to cut and shape the wood into a Marionette.

“I think I’ll choose to call him Pinocchio. This name will make his fortune.”

219 After choosing the name for his Marionette, Geppetto set to work to make the hair, the forehead, the eyes. He noticed that these eyes moved and then stared fixedly at him.

Geppetto felt insulted and said in a grieved tone:

“Awful wooden eyes, why do you stare so?”

There was no answer.

Next he made the mouth. No sooner was it finished than it began to laugh and poke fun at him. Geppetto made believe he saw nothing and continued with his work.

By mid morning, Geppetto had finished most of the work and was feeling quite good.

In one corner of the room, there was a small stove for making soup .

“I think I will rest for a moment and eat some soup”, he said.

At hearing this, Pinocchio grabbed the only spoon in the room and hid it under some sanding paper. Geppetto looked around and became very confused.

“Where is my spoon?”, he asked aloud. Failing to find it, he attempted to use a fork, but the broth kept dripping on his legs. So he gave up and resumed work on the Marionette,

finishing the final details.

He took hold of the Marionette and put him on the floor. Pinocchio started moving by himself and ran all about the room. And with one leap he was out the door and racing down the cobble stone street.

“Catch him! Catch him!” Geppetto kept shouting.

By sheer luck, a policeman came along. He noticed all the noise and thought that an animal had gotten loose, maybe a horse or maybe just a silly goose.

As soon as Pinocchio got close enough, the policeman took hold of him by the nose, then returned him to Maestro Geppetto. At that moment, the little old man was so mad he just wanted to pull Pinocchio’s ears. Imagine the sort of emotions he felt when he looked at Pinocchio and discovered that he had forgotten to make them!

What happened after this moment is an almost unbelievable story , but you will have to read it yourself.

220 A.3 Complete low token variability exposure passage for Ex-

periment 3

THE ADVENTURES OF PINOCCHIO

One morning, which began as a normal morning, but quickly became anything but normal, this piece of wood found itself in the shop of an old carpenter.

The man’s name was Geppetto. Geppetto was a good man. He had a sense of humor about most things . But sometimes he would lose his temper, and no one could soothe him!

Geppetto took his tools and began to cut and shape the wood into a Marionette.

“What name shall I choose for him?” he said to himself. “I think I’ll choose to call him

Pinocchio.”

After the eyes, Geppetto made the nose, which began to stretch as soon as finished.

Next he made the mouth. No sooner was it finished than it began to laugh and poke fun at him.

“Stop laughing!” said Geppetto angrily. “Please be a good boy and use your manners.”

By mid morning, Geppetto had finished most of the work and was feeling quite good.

In one corner of the room, there was a small stove for making soup, a bouquet of roses, and a tumble down table.

“I think I will rest for a moment and eat some soup”, he said.

At hearing this, Pinocchio grabbed the only spoon in the room and hid it under some sanding paper. Geppetto looked around and became very confused .

“Where is my spoon?”

He looked under the bench! He peeped inside the closet! He opened the door and looked up and down the street!

Failing to find it, he attempted to use a fork, but the broth kept dripping on his legs.

So he gave up and resumed work on the Marionette.

221 Pinocchio’s legs and feet still had to be made. Geppetto set to work making the left foot, then the right foot. As soon as they were done, Geppetto felt a sharp kick on the tip of his nose.

After this, Geppetto had an awful bruise on his cheek.

At that moment, a policeman came along. He noticed all the noise, and thought that an animal had gotten loose, maybe a horse or maybe just a silly goose.

What happened after this moment is an almost unbelievable story, but you will have to read it yourself.

222 A.4 Test materials

Table A.1: Experiments 1-3. Complete set of target back vowel lowered items.

Trained lexical items New lexical items Word Form Word Form Word Form bouquet “b[U]quet” abuse “ab[jU]se” produce “prod[U]ce” bruise “br[U]se” ambush “amb[o]sh” propose “prop[A]se” choose “ch[U]se” amuse “am[jU]se” prose “pr[A]se” confuse “conf[jU]se” assume “ass[U]me” pudding “p[o]dding” goose “g[U]se” broken “br[A]ken” pupil “p[jU]pil” humor “h[jU]mor” closure “cl[A]sure” reduce “red[U]ce” moment “m[A]ment” cobra “c[A]bra” remove “rem[U]ve” morning “m[A]rning” compose “comp[A]se” revoke “rev[A]ke” normal “n[A]rmal” explode “expl[A]de” roast “r[A]st” nose “n[A]se” expose “exp[A]se” scooter “sc[U]ter” only “[A]nly” food “f[U]d” soldier “s[A]ldier” open “[A]pen” golden “g[A]lden” spoof “sp[U]f” resume “res[U]me” huge “h[jU]ge” sugar “s[o]gar” soon “s[U]n” improve “impr[U]ve” tooth “t[U]th” soothe “s[U]the” juice “j[U]ce” trophy “tr[A]phy” soup “s[U]p” locus “l[A]cus” truth “tr[U]th” spoon “sp[U]n” lotion “l[A]tion” useful “[jU]seful” stove “st[A]ve” music “m[jU]sic” warning “w[A]rning” use “[jU]se” omen “[A]men” woman “w[o]man” wooden “w[o]den” oval “[A]val” youth “y[U]th” Back vowel lowered shifts: /u/ → [U], /U/ → [o], /o/ → /A/, /O, A/ → [æ]

223 Table A.2: Experiments 1-3. Complete set of filler words pronounced with unshifted (standard-sounding) vowels..

about court hush ruffle after craft issue school apart crisp judge sheer around disease jungle shriek awful dough kitchen sky balance equal land smudge bald film laugh solve battle fin leaf sponge beacon finish leap stairway below fireplace lettuce store beyond flatter level teeth bill floor limit think box folder little thrash breathe forage mantle thunder brusque forget mischief turn buckle fork monarch tyrant bump friend narrow voice candle give nectar wagon canoe ground nickel wail captain hands north water chisel hatchet oath wind claim healthy parcel words clam heard pieces work control honest poor wort cookie huddle room young

224 Table A.3: Experiments 1-3. Complete set of maximal nonwords. batoon fegole infloss shoss behick felp jandy shret beshaw flazick kolpane sorneg bewall forch kosspow tangish bilark fresting kurface tomint blemin ganet lampile tunch blukin gastan lapek vasult bolang gink lisk vorgo borse gisto miglen wanic chark glesh morple welge croff goip noast willant dorve grubine nucade wurve dring gurst ompost yince duforst hesting ostrem yolash erbow hinsip sarlin zamp

225 Appendix B

Stimulus materials for Experiment 4

B.1 Exposure passage transcribed in the novel back vowel

lowered accent

THE ADVENTURES OF PINOCCHIO* [pInAkiA]

One day, this piece of lumber found itself in the shop* [Sæp] of an old* carpenter*

[Ald kærpEntÄ]. The carpenter’s* [kærpEntÄz] real name was Maestro* Antonio*

[mAIstrA æntAniA], but everyone called* [kæld] him Maestro* Cherry, for the tip of

his nose* [nAz] was so* [sA] round and red and shiny that it resembled a ripe cherry. As

soon* [sUn] as he saw* [sæ] that piece of lumber, Maestro* [mAIstrA] Cherry was filled

with joy.

“I have almost* [ælmAst] finished making a table. I shall use* [jUz] this lumber to*

[tU] make the final leg.”

He grabbed hold* [hAld] of his hatchet, ready to peel off* [æf] the bark* [bærk].

But... at that moment* [mAmEnt], he noticed* [nAR@st] a wee, little voice say in a beseeching tone* [tAn]: “Please be careful! Don’t* [dAnt] hit me so* hard* [sA

hærd]!”

Maestro* [mAIstrA] Cherry grew* [grU] dumb, and his nose* [nAz] (which was

normally* [nArm@li] as red as a bouquet* [bUkej] of roses* [rAz@z]) turned every

color in the rainbow* [rejnbA]. He turned frightened eyes about the room* to* [rUm

tU] find where that wee little voice had come from and he saw* no* [sæ nA] one!

226 “How has someone spoken* [spAk@n] to me, when there is no* [nA] one in this room*

[rUm]? Might it be that this piece of lumber has learned to weep and cry like a child? I can hardly* [hærdli] believe it.”

In that very moment* [mAmEnt], a truly* awful* [trUli æf@l] bang sounded upon*

[@pæn] the door* [dAr].

“Come in,” said the carpenter* [kærpEntÄ]”.

The man’s name was Geppetto* [Z@pERA]. Geppetto* [Z@pERA] had a sense of hu-

mor* [hjUmÄ] about most* [mAst] things and was quite often* [æftEn] in a pleasant

mood* [mUd]. But sometimes he became as wild as a beast and it was impossible* to*

soothe* [Impæs@b@l tU sUD] him!

“What brought* you* [bræt jU] here?”, asked Antonio*.

“I want a piece of lumber to* [tU] make a Marionette.

Maestro Antonio* knew exactly which piece he’d choose* for* Geppetto* [ÙUz

fAr Z@pERA]. He grabbed the piece that had frightened him so* [sA] much. But as he was

about to* offer* [tU æfÄ] it to* [tU] his friend, with a violent jerk it slipped out of his

hands and hit against Geppetto’s* [Z@pERAz] thin legs.

”Ah! Is this the gentle way, Maestro Antonio*, in which you* offer* [jU æfÄ] gifts?

You* have almost* [ælmAst] made me lame!”

And with that, he threw* [TrU] himself upon* [@pæn] the carpenter* [kærpEntÄ].

Then and there, they gave each other a sound thrashing. After this fight, Maestro Anto-

nio* had scratches on* [æn] his nose* [nAz], and Geppetto* [Z@pERA] had an awful*

bruise* on* [æf@l brUz æn] his cheek. Then Geppetto* [Z@pERA] grabbed the piece of

lumber and limped away toward* home* [tArd hAm].

227 CHAPTER TWO* [tU]

As soon* [sUn] as he reached home* [hAm], Geppetto* [Z@pERA] began to* [tU] cut

and shape the lumber into* [intU] a marionette.

“I think I’ll choose* [ÙUz] to call* [kæl] him Pinocchio* [pInAkiA]. This name will

make his fortune* [fArÙ@n].”

After choosing* [ÙUzIN] the name for his marionette, Geppetto* [Z@pERA] set to*

[tU] work to* [tU] make the hair, the forehead* [fArhEd], the eyes. He noticed* [nAR@st]

that these eyes moved* [mUvd] and then stared fixedly at him. Next, he made the mouth.

No* sooner* [nA sUnÄ] was it finished than it began to* [tU] laugh and poke* [pAk]

fun at him. Geppetto* [Z@pERA] made believe he saw* [sæ] nothing and continued*

[k@ntInjUd] with his work.

By mid morning* [mArnIN], Geppetto* [Z@pERA] had finished most* [mAst] of

the work and decided to take a break to dispose* [dIspAz] of his frustrations with the

misbehaving marionette. In one corner* [kArnÄ] of the room* [rUm], there was a

small stove* for* [stAv fAr] making soup* [sUp]. “I think I will rest for a moment*

[mAmEnt] and eat some soup* [sUp]”, he said. At hearing this, Pinocchio* grabbed the

only* spoon* [Anli spUn] in the room* [rUm] and hid it under some sanding paper.

”Where is my spoon* [spUn]?”, he asked aloud.

Failing to* [tU] find it, he attempted to use* [jUz] a fork* [fArk], but the broth*

[bræT] kept dripping on* [æn] his legs. So* [sA] he gave up and resumed* [r@zUmd]

work on* [æn] the marionette, finishing the final details. At that moment* [mAmEnt],

Pinocchio* started* moving* [mUvIN] by himself and ran all* [æl] about the room*

[rUm]. And with one leap he was out the door* [dAr] and racing down the cobble*

stone* [kæbl stAn] street. " ”Catch him! Catch him!” Geppetto* kept shouting.

By sheer luck, a policeman came along* [@læN]. He noticed* all* [nAR@st æl] the

noise, and thought* [Tæt] that an animal had gotten* loose* [gæPn lUs], maybe a "

228 horse* [hArs] or a donkey* [dænki], or maybe just a silly goose* [gUs]. What happened after this moment* [mAmEnt] is an almost* [ælmAst] unbelievable story* [stAri], but you* [jU] will have to* [tU] read it yourself.

229 B.2 Exposure passage transcribed in the novel back vowel

raised accent

THE ADVENTURES OF PINOCCHIO* [pInUkiU]

One day, this piece of lumber found itself in the shop* [Sop] of an old* carpenter*

[Uld korpEntÄ]. The carpenter’s* [korpEntÄz] real name was Maestro* Antonio*

[mæstrU æntUniU], but everyone called* [kold] him Maestro* [mæstrU] Cherry, for*

[fUr] the tip of his nose* [nUz] was so* [sU] round and red and shiny that it resembled

a ripe cherry. As soon* [sin] as he saw* [so] that piece of lumber, Maestro* [mæstrU]

Cherry was filled with joy.

“I have almost* [olmUst] finished making a table. I shall use* [jiz] this lumber to*

[ti] make the final leg.”

He grabbed hold* [hUld] of his hatchet, ready to peel off* [of] the bark* [bork].

But... at that moment* [mUmEnt], he noticed* [nUR@st] a wee, little voice say in

a beseeching tone* [tUn]: “Please be careful! Don’t* [dUnt] hit me so* hard* [sU

hord]!”

Maestro* [mæstrU] Cherry grew* [gri] dumb, and his nose* [nUz] (which was

normally* [nUrm@li] as red as a bouquet* [bikej] of roses* [rUz@z]) turned every color

in the rainbow* [rejnbU]. He turned frightened eyes about the room* to* [rim ti] find

where that wee little voice had come from and he saw* no* [so nU] one!

“How has someone spoken* [spUk@n] to me, when there is no* [nU] one in this room*

[rim]? Might it be that this piece of lumber has learned to weep and cry like a child? I can

hardly* [hordli] believe it.”

In that very moment* [mUmEnt], a truly* awful* [trili of@l] bang sounded upon*

[@pon] the door* [dUr].

“Come in,” said the carpenter* [korpEntÄ]”.

230 The man’s name was Geppetto* [Z@pERU]. Geppetto* [Z@pERU] had a sense of hu- mor* [hjimÄ] about most* [mUst] things and was quite often* [oft@n] in a pleasant mood* [mid]. But sometimes he became as wild as a beast and it was impossible* to* soothe* [Impos@b@l ti siD] him!

“What brought* you* [brot ji] here?”, asked Antonio* [æntUniU].

“I want a piece of lumber to* [ti] make a Marionette.

Maestro* Antonio* [mæstrU æntUniU] knew exactly which piece he’d choose* for* Geppetto* [Ùiz fUr Z@pERU]. He grabbed the piece that had frightened him so*

[sU] much. But as he was about to* offer* [ti ofÄ] it to* [ti] his friend, with a violent jerk it slipped out of his hands and hit against Geppetto’s* [Z@pERUz] thin legs.

”Ah! Is this the gentle way, Maestro* Antonio* [mæstrU æntUniU], in which you* offer* [ji ofÄ] gifts? You* [ji] have almost* [olmUst] made me lame!”

And with that, he threw* [Tri] himself upon* [@pon] the carpenter* [korpEntÄ].

Then and there, they gave each other a sound thrashing. After this fight, Maestro*

Antonio* [mæstrU æntUniU] had scratches on* [on] his nose* [nUz], and Geppetto*

[Z@pERU] had an awful* bruise* on* [of@l briz on] his cheek. Then Geppetto* [Z@pERU] grabbed the piece of lumber and limped away toward* home* [tUrd hUm].

CHAPTER TWO* [ti]

As soon* [sin] as he reached home* [hUm], Geppetto* [Z@pERU] began to* [ti] cut and shape the lumber into* [inti] a marionette.

“I think I’ll choose* [Ùiz] to call* [kol] him Pinocchio* [pInUkiU]. This name will make his fortune* [fUrÙ@n].”

After choosing* [ÙizIN] the name for his marionette, Geppetto* [Z@pERU] set to*

[ti] work to* [ti] make the hair, the forehead* [fUrhEd], the eyes. He noticed* [nUR@st] that these eyes moved* [mivd] and then stared fixedly at him. Next, he made the mouth.

231 No* sooner* [nU sinÄ] was it finished than it began to* [ti] laugh and poke* [pUk] fun at him. Geppetto* [Z@pERU] made believe he saw* [so] nothing and continued*

[k@ntInjid] with his work.

By mid morning* [mUrnIN], Geppetto* [Z@pERU] had finished most* [mUst] of

the work and decided to take a break to dispose* [dIspUz] of his frustrations with the

misbehaving marionette. In one corner* [kUrnÄ] of the room* [rim], there was a small

stove* for* [stUv fUr] making soup* [sip]. “I think I will rest for* [fUr] a moment*

[mUmEnt] and eat some soup* [sip]”, he said. At hearing this, Pinocchio* [pInUkiU]

grabbed the only* spoon* [Unli spin] in the room* [rim] and hid it under some sanding

paper.

”Where is my spoon* [spin]?”, he asked aloud.

Failing to* [ti] find it, he attempted to use* [jiz] a fork* [fUrk], but the broth*

[broT] kept dripping on* [on] his legs. So* [sU] he gave up and resumed* [r@zimd]

work on* [on] the marionette, finishing the final details. At that moment* [mUmEnt],

Pinocchio* started* moving* [pInUkiU stort@d mivIN] by himself and ran all* [ol]

about the room* [rim]. And with one leap he was out the door* [dUr] and racing down

the cobble* stone* [kobl stUn] street. " ”Catch him! Catch him!” Geppetto* [Z@pERU] kept shouting.

By sheer luck, a policeman came along* [@loN]. He noticed* all* [nUR@st ol] the

noise, and thought* [Tot] that an animal had gotten* loose* [goPn lis], maybe a " horse* [hUrs] or a donkey* [donki], or maybe just a silly goose* [gis]. What happened

after this moment* [mUmEnt] is an almost* [olmUst] unbelievable story* [stUri], but

you* [ji] will have to* [ti] read it yourself.

232 B.3 Test materials

Table B.1: Experiment 4. Complete set of test words for the back vowel lowered and back vowel raised item types. The shifted vowel is underlined in each word. Columns indicate the category of the shifted vowel.

/u/ /U/ /o/ /O, A/ abuse ambush broken applause amuse barefoot closure closet assume bookmark cobra costly huge bullet compose cottage improve bushel explode doctor include butcher expose dolphin juice crooked golden exhaust music foot lotion fossil perfume footage open gossip reduce goodness oval hostage refuse input propose office remove looked prose resolve seduce pushed revoke response spoof sugar roast salt truth woman soldier squash youth wooden trophy squat Back vowel lowered shifts: /u/ → [U], /U/ → [o], /o/ → /A/, /O, A/ → [æ] Back vowel raised shifts: /u/ → [i], /U/ → [u], /o/ → /U/, /O, A/ → [o]

233 Table B.2: Experiment 4. Complete set of filler words pronounced with unshifted (standard- sounding) vowels.

accuse captain finish nickel after cartoon flatter nook balance chamber float novel banjo chisel friend phone battle claim healthy pieces beacon cloak honest platoon beyond common junior poodle bill cookie kitchen pulley bloom crisp lettuce put bonus crusade level rookie bottom disease limit sheer box dispose little shriek breathe elbow mischief solve brook equal monarch teeth bully exclude narrow wagon bush film nectar water

Table B.3: Experiment 4. Complete set of maximal nonwords.

batoon erbow hinsip sarlin behick fegole infloss shoss beshaw felp jandy shret bewall flazick kolpane sorneg bilark floak kosspow tangish blemin forch kurface tomint blukin fresting lampile tunch bolang ganet lapek vasult borse gastan lisk vorgo chark gink miglen wanic croff gisto morple welge daver glesh noast willant deese goip nucade wurve dorve grubine ompost yince dring gurst ostrem yolash duforst hesting provate zamp

234