<<

UNIVERSITY OF CALGARY

Allophone Acquisition: Exploring the Phonological System and the Nature of

Representations

by

Christine E. Shea

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

Department of Linguistics

CALGARY,

June, 2010

© Christine E. Shea 2010

Library and Archives Bibliothèque et Archives Canada

Published Heritage Direction du Branch Patrimoine de l’édition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre référence ISBN: 978-0-494-69501-2 Our file Notre référence ISBN: 978-0-494-69501-2

NOTICE: AVIS:

The author has granted a non- L’auteur a accordé une licence non exclusive exclusive license allowing Library and permettant à la Bibliothèque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l’Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans le loan, distribute and sell theses monde, à des fins commerciales ou autres, sur worldwide, for commercial or non- support microforme, papier, électronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. . The author retains copyright L’auteur conserve la propriété du droit d’auteur ownership and moral rights in this et des droits moraux qui protège cette thèse. Ni thesis. Neither the thesis nor la thèse ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent être imprimés ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author’s permission.

In compliance with the Canadian Conformément à la loi canadienne sur la Privacy Act some supporting forms protection de la vie privée, quelques may have been removed from this formulaires secondaires ont été enlevés de thesis. cette thèse.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n’y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

UNIVERSITY OF CALGARY

FACULTY OF GRADUATE STUDIES

The undersigned certify that they have read, and recommend to the Faculty of Graduate

Studies for acceptance, a thesis entitled " Allophone Acquisition: Exploring the

Phonological System and the Nature of Representations" submitted by Christine E. Shea in partial fulfilment of the requirements of the degree of Doctor of Philosophy.

Supervisor, Dr. Suzanne Curtin, Departments of Linguistics and Psychology

Dr. John Archibald, Department of Linguistics

Dr. Darin Flynn, Department of Linguistics

Dr. Wei Cai, Department of Germanic, Slavic and East Asian Studies

External Examiner, Dr. Ellen Broselow, Department of Linguistics, State University of New York at Stony Brook

April 5, 2010

ii Abstract

The core premise of this dissertation is that the phonological system operates across detailed, rich representations, formed by tracking of distributional information in the input. This proposal is explored in a series of perception and production experiments with adult L1 English/L2 Spanish and L1 Spanishlearning children acquiring the

Spanish stopapproximant alternation. Allophonic acquisition provides an ideal testing ground for theories regarding the nature of the phonological system and phonological acquisition. Traditionally, allophones are characterized as resulting from categorical rules, which learners implement in their grammar through constraint interaction. In the studies that compose this dissertation, it is argued instead that learners track and store information about the phonological environment for each allophone. The emergent rich representations result from learners’ experience with the input. However, a further premise of this dissertation is that not all information will necessarily be available at all stages of learning and under all conditions (see Werker & Curtin, 2005). In the case of L2 learners, this will be largely a function of the native language filter while for children learning their first language it will be a function of natural biases and the acquisition of a lexicon.

These issues are explored within the context of Werker & Curtin’s (2005)

PRIMIR (Processing Rich Information from Multidimensional Interactive

Representations; Curtin & Werker, 2007; Curtin, ByersHeinlein & Werker, under review) framework that accounts for early language development. PRIMIR is grounded in the assertions that a) representations are exemplarlike in nature and b) the phonological system is sensitive to distributional information in the input. Interacting

iii with these rich representations and distributionbased learning mechanisms are three dynamic filters that direct information pickup: natural biases, task effects and developmental level. In this dissertation, I further elaborate the framework by adding the

L1 filter effect for L2 learners. The results from the three studies presented here suggest that learners store detailed phonetic information in their representations and experience interacts with their ability to draw upon this information in perception and production tasks. These results lend support to PRIMIR and in general to approaches which view phonological acquisition as sensitive to representations and learners’ experience with the input

iv Acknowledgements

My first and most heartfelt thanks go to my advisor, Dr. Suzanne Curtin, for her exceptional skills as an academic and mentor. You helped me enjoy what I do and always managed to keep me focused, oftentimes in spite of myself. I feel ready to take on whatever the future holds and that is due to you, Suzanne, and your faith in me and my abilities. Thank you.

I would also like to thank all my professors in the Department of Linguistics at the University of Calgary. I am grateful to Dr. John Archibald for his patience and assistance over the course of my PhD studies and for providing me with numerous opportunities to experience new things in addition to my research. Thank you, John, for everything.

I am also sincerely grateful to Dr. Betsy Ritter for making syntax not only understandable but even (almost) enjoyable. You set high, uncompromising standards, instil in your students a desire to reach them and most importantly, make us believe we can. One of my greatest pleasures was being your student and working with you. Thank you.

Thanks as well to Dr. Darin Flynn for his profound knowledge of all things phonological and for his contagious academic curiosity. Being your student is a constant adventure in discovery and fearless exploration of ideas and theories.

I would like to thank Dr. Susanne Carroll for sharing her perspective on second language learning and always being willing to impart invaluable advice on how to navigate the academic world. Thank you for all the time and support you have given me over the course of my PhD. I look forward to collaborating with you in the future.

v As well, thank you to Dr. Amanda Pounder for her tireless efforts as grad co ordinator. Your hard work and unwavering support made this whole experience much, much easier and less stressful. And a warm thank you to Dr. Stephen Winters for filling all those gaps regarding how sound works. Thank you.

I would also like to thank all my fellow and former linguistics grad students,

Antonio González, Ilana Mezhevich, Ashley Burnett, Danica MacDonald, Jamison

CooperLeavitt, Keffyalew Gebregziabher, Kelly Murphy, Kim Meadows, Lindsay

Kirkpatrick, Nick Welch, Nina Widjaja, Rhonda Sim, Silke Weber and Sue Jackson.

Silke and Sue – we will have to plan another road trip soon, hopefully somewhere a little more exciting than Edmonton and to do something a little more exciting than an OT conference. I have loved having the two of you as colleagues. I will always treasure your friendship, support and intelligence. You have made this whole thing more enjoyable and memorable. Silke – if you are lucky, your babies will grow up to be linguists. Sue – if you are lucky, you can join Armando and me on the beach in Mexico.

Thanks as well to all my colleagues in the Speech Development Lab – Danielle,

Heather, Jen, Jenn, Jenna, Becky, Sally. It was wonderful working in such a supportive and positive environment. As well, I would like to thank Dan Hufnagle (our errant

Speech Development Lab member) for the great conversations and perspectivetaking chats. I learned a lot from you, Dan, and always value your opinions and ideas. Thank you.

Sarah Eaton and Jacquie Clydesdale – you were there for me in some tough moments and always helped me gain and maintain perspective. I will treasure your

vi friendship forever. The coffee, tea and walk sessions were some of the most therapeutic hours I spent during this whole thing. Thank you.

A warm, deep thanks to Stephanie Archer for all the hours and hours (and hours and hours!!) of talking, commiserating and griping as well as for all the times you talked me down off the ledge. I guess we were destined for a lasting friendship after that first afternoon way back when in the Grad Lounge, drinking our worries away. It is rare to find someone who truly, from the heart, celebrates your successes and sticks around even when the successes are few and far between. I found that in you and I value your friendship. Thank you, Steph (keep that EpiPen close by – I want you around for a while longer).

I would also like to thank my parents – Anna and Bill Shea for letting me be who

I am and encouraging me to figure that out. No pressure, no demands, just letting me

‘be’, even when what that meant was not always clear. You play a fundamental role in all

I do and have always encouraged me to live my life in my own way. I am sure that must have been difficult at times, but I appreciate it more than words can ever say. All my love to both of you and thank you for loving me as much as you do.

On that note, a huge, enormous thanks to Katherine, Malique, Elizabeth and

Nathan for reminding me that we all need a little of that unconditional love and support in our lives. Katherine, you have helped me through the most difficult of times and just knowing that you are there somehow made it all seem OK that old feeling of invincibility, I guess. It is what has allowed me to get to this point and take all the risks that this journey has involved. Thank you for knowing me so well, loving me so much and keeping me so close.

vii And a longdistance thanks as well to René, Karina, Tyara, Bruno and Rocío all the way down in México. You are my family and I love you all dearly. Now that the PhD is done hopefully we can visit more frequently.

Y por último, dedico esta tesis a ti, Armando, por haberme ayudado a creer en mí. Seguiremos juntos en esta gran aventura. Te amo con todo mi corazón.

viii Dedication

Dedico esta tesis a Armando. Mi amor.

ix Table of Contents

Approval Page ...... ii Abstract ...... iii Acknowledgements ...... v Dedication ...... ix Table of Contents ...... x List of Tables ...... xiii List of Figures and Illustrations ...... xiv

CHAPTER ONE: INTRODUCTION ...... 1 1.1 Learning a linguistic sound system: Experience, mechanisms and representations 1 1.2 Characterizing the alternation ...... 3 1.3 Approaches to linguistic sound systems: representations and grammars ...... 5 1.4 Experience and acquisition ...... 12 1.5 Phonetic and Phonological Models of L2 Speech Acquisition ...... 15 1.6 Overview of the dissertation ...... 20

CHAPTER TWO: PERCEIVING THE RELATIONSHIP BETWEEN PHONOLOGICAL ENVIRONMENT AND ALLOPHONES IN A SECOND LANGUAGE: EVIDENCE FOR DISTRIBUTIONAL LEARNING ...... 27 2.1 Introduction ...... 27 2.2 Experiment 1: Consonant and Vowel Stress Shift ...... 32 2.2.1 Method ...... 33 2.2.1.1 Participants ...... 33 2.2.1.2 Stimuli ...... 34 2.2.1.3 Procedure ...... 35 2.2.2 Results ...... 37 2.2.2.1 Stress perception: allophone + stressed vowel ...... 37 2.2.2.2 Stress perception: Testing for trochaic bias ...... 39 2.2.2.3 Logistic regression analysis ...... 43 2.3 Experiment 2: Allophone alternation, vowel steady ...... 46 2.3.1 Method ...... 48 2.3.1.1 Participants ...... 48 2.3.1.2 Stimuli ...... 48 2.3.1.3 Procedure ...... 49 2.3.2 Results ...... 49 2.3.2.1 Stress perception: allophone + nonprominent vowel ...... 50 2.3.2.2 Testing for trochaic bias ...... 50 2.3.2.3 Logistic regression analysis ...... 52 2.4 General Discussion ...... 54 2.5 Conclusions ...... 60

CHAPTER THREE: EVIDENCE FOR A NONCATEGORICAL PHONOLOGICAL SYSTEM: ADULT L2 ALLOPHONE PRODUCTION...... 62 3.1 Introduction ...... 62

x 3.2 Experiment: Stopapproximant production data ...... 66 3.2.1 Method ...... 66 3.2.1.1 Participants ...... 66 3.2.1.2 Stimuli ...... 68 3.2.1.3 Procedure ...... 69 3.2.1.4 Phonetic Analyses ...... 69 3.2.2 Results ...... 72 3.3 General Discussion ...... 83 3.4 Conclusions ...... 87

CHAPTER FOUR: EVIDENCE FOR DETAILED REPRESENTATIONS IN L1 ACQUISITION: FREQUENCY AND ALLOPHONE PRODUCTION ...... 89 4.1 Introduction ...... 89 4.2 Phonetic universals and languagespecific effects in phonological development ...90 4.3 Corpus I: recorded data ...... 96 4.3.1 Participants ...... 97 4.4 Presentation of the data ...... 98 4.4.1 Data ...... 98 4.4.1.1 [d]deletion ...... 109 4.4.2 General discussion of the data: phonetic groundedness ...... 112 4.5 Corpus II: CHILDES database ...... 116 4.5.1 Description of the C and CDS Corpora ...... 116 4.5.2 Frequency data for C and CDS Corpora ...... 118 4.6 Discussion and Conclusions ...... 125

CHAPTER FIVE: GENERAL DISCUSSION AND CONCLUDING REMARKS ...... 128 5.1 Introduction: Summary of the dissertation results ...... 128 5.2 PRIMIR ...... 131 5.3 PRIMIRL2 ...... 136 5.4 L1 filter ...... 142 5.5 Nature of the phonological system: Evidence for distributionbased learning in an L2 ...... 147 5.6 Phonological Mechanisms: evidence for comparison and contrast in an L2 and tracking of multiple levels of information ...... 149 5.7 Probabilistic updating of representations ...... 155 5.8 Role of the lexicon in L2 sound category acquisition ...... 157 5.8.1 Role of the lexicon in a gradient phonological system ...... 159 5.9 Conclusion ...... 161

APPENDIX A ...... 162

APPENDIX B ...... 164

APPENDIX C ...... 165

REFERENCES ...... 172

xi

xii List of Tables

Table 1. Stress perception on first syllable across groups and onsets ...... 40

Table 2. Results of the hierarchical logistic regression for Experiment 1 ...... 44

Table 3. Stress perception on first syllable across groups and onsets ...... 51

Table 4. Results of the logistic regression for Experiment 2 ...... 53

Table 5. L2 participant biographical data ...... 67

Table 6. Means and standard deviations on the dependent variables for the three groups ...... 75

Table 7. Results of Discriminant Analysis for phonetic and phonological environment cues ...... 76

Table 8. MG’s productions ...... 99

Table 9. FC’s productions ...... 100

Table 10. Positional log frequency ...... 119

xiii List of Figures and Illustrations

Figure 1. Ratio values for stops + stressed vowel perceived as stressed/approximants + stressed vowels perceived as stressed ...... 38

Figure 2. Proportion of syllables perceived as stressed ...... 42

Figure 3. Stopinitial syllables perceived as stressed/approximantinitial syllables perceived as stressed ratio values...... 52

Figure 4. Spectrogram of gato ‘cat’ ...... 70

Figure 5. Plot of group centroids ...... 77

Figure 6. MANOVA dependent variables ...... 78

Figure 7. Consonant x context for each group ...... 80

Figure 8. Schematized distribution of onset segments for FC ...... 108

Figure 9. Proportion attempted and accurate across target sounds ...... 114

Figure 10. C and CDS Positional frequencies correlation ...... 119

Figure 10. Log Likelihood Position Accuracy ...... 121

Figure 12. MG Log Likelihood Place of Articulation Accuracy ...... 122

Figure 13. FC Log Likelihood Place of Articulation Accuracy ...... 124

Figure 14. Log10 frequency counts for words realized with and without approximants in medial position ...... 125

xiv

xv 1

Chapter One: Introduction

1.1 Learning a linguistic sound system: Experience, mechanisms and representations

... the infant and the adult could never truly perceive the same speech input in the same way, nor could the L2 learner or bilingual perceive L2 or L1 speech in exactly the same way as native monolinguals of either language.

(Best & Tyler, 2007:13)

A language learner never experiences the same input twice over the course of acquisition. The input interacts with the developmental level of the learner, leading to distinct experiences and learning effects each time it is experienced. Moreover, across learners, the same input is never perceived in precisely the same manner. The nature of the input received and individual differences in learning interact in complex ways that ultimately reflect the overall effect of ‘experience’ on language learning.

This dissertation addresses two main questions related to the issue of experience.

First, what type of phonological system is required to account for the effect of experience on language acquisition? Second, what types of representations are created over the course of acquisition to support this system? In order to answer these questions, I present perception and production data from L1 English/L2 Spanish learners and production data from L1 Spanish children, focusing on the acquisition of the Spanish stopapproximant alternation (b d g ~ ß ð γ).

2

Learning a language involves not only acquiring the contrastive sound categories that form its phonemic inventory but also acquiring the noncontrastive sounds that surface in predictable contexts (Crystal, 1997). In the phonological literature, it is traditionally assumed that segments can be related either through contrast or allophony

(see, e.g., Steriade 2007). Two segments contrast if their distribution in the lexicon of a language is not predictable. Sounds that are related through allophony, on the other hand, occur in conditioned distributions. Each allophone occurs in a regular, predictable context

(Crystal, 1997).

An important part of allophone acquisition involves determining the correct context for each variant and one way learners might do this is by attending to the specific, contextdependent cues that characterize each category. For example, an English learner is exposed to input that contains an alveolar sound with aspiration in word initial position

(e.g., [thab]) and is also exposed to input that contains a voiceless alveolar sound with no aspiration (e.g., [stap]). These two sounds share similar acoustic and articulatory characteristics but do not contrast lexical entries in English they fulfill many of the typical characteristics of allophones found across the world’s languages. Moreover, [th] and [t] represent a very particular type of allophonic relationship: complementary distribution. The likelihood of encountering one sound in the phonological environment where the other occurs is close to zero. The phonological environment can include sounds directly adjacent to the sound itself, sounds which occur at a predetermined distance from it, as well as the prosodic structure that directly contains the sound, such as the syllable,

3 the foot or the prosodic word (Hall, 2009). Part of learning an allophonic distribution involves connecting the correct allophone to its phonological environment.1

In this introductory chapter I present a brief description of the Spanish stop approximant alternation, followed by a short discussion of how allophones have been characterized in phonological theory and by exemplarbased approaches. Subsequently, I address how experience, in terms of previous linguistic knowledge, is characterized in L2 speechlearning, concentrating on allophone acquisition. Finally, the experiments comprising this dissertation are briefly discussed and I introduce PRIMIRL2, an extension to Werker and Curtin’s (2005) PRIMIR framework. PRIMIRL2 uses the same architecture and assumptions as the infant speech framework, with the addition of an L1 filter to account for the particularities of adult L2 acquisition.

1.2 Characterizing the alternation

In the Hispanic linguistics tradition, the stopapproximant alternation is traditionally characterized as an alternation between stops and voiced spirants (Zampini,

1994; Lléo & Rakow, 2005). However, recent work has demonstrated that the

1 Complementary distribution is often cited as a necessary but not sufficient condition for an allophonic relationship to exist (Crystal, 1997). In addition, allophones also generally share certain acoustic and/or articulatory features. For example, in English the sounds /h/ and /ŋ/ occur in complementary distribution /h/ only occurs in syllableinitial position and /ŋ/ in syllablefinal position, but no native speaker of English would ever consider these sounds to be allophones in the same way as [t] and [th].

4 relationship may be better characterized as involving stops and approximants (Hualde,

2005). MartínezCeldrán (2004) argued that these sounds are approximants because they do not exhibit turbulent airflow and moreover, [β ð γ] have a lower degree of articulatory precision than the spirants or fricatives. In traditional IPA, the symbols used for these allophones are accompanied by the subscript for lowering, which reflects the more open articulatory nature of the approximants as compared to the fricatives or spirants. For the remainder of the dissertation the segments will be referred to as approximants but for ease of exposition the subscript will not be used.

Phonological descriptions (e.g., Mascaró 1984; Harris 1969) have characterized the alternation in terms of feature spreading, with the stop generally proposed as underlying and the approximant as the allophone. Under this view, the feature [+cont] spreads from the adjacent vowels to the [cont] stops, rendering them approximants.

Stops surface after a pause, after nasals and the alveolar stop surfaces after the lateral /l/.

To account for the surfacing of [d] after /l/, researchers posited an underspecified representation for /l/ in Spanish. Face (2002) provides a phonetic and phonological account that is based in the similarity of place of articulation between the lateral and the alveolar stop. This, according to the author, is why the stop has a strong release.

More recent research has examined the noncategorical phonetic realizations of the alternation and considered conditioning factors such as word position and stress.

According to Hualde (2005), more open, approximantlike articulations occur post tonically rather than in the onset of a stressed syllable (p.142, see also Lavoie, 2001;

OrtegaLlebaría, 2003; Shea & Curtin, to appear). The conditioning factors of word position and stress have a variable effect not only on the alternation as a whole but also

5 on the different consonants themselves: More stoplike productions are observed with the bilabial segments than with the velars (Cole, Izkarous & Hualde, 1997; OrtegaLlebaría,

2003). Examples are provided in (1):

(1) Examples of allophones across contexts:

Wordinitial WordInitial Wordmedial Wordmedial Phrasemedial Stressed Unstressed Stressed Unstressed

bicho gusano adentro cabalgar la bata [ɑbitȓo] [guɑsano] [aɑðentȎo] [kaβalɑgaȎ] [laɑβata] ‘bug’ ‘worm’ ‘inside’ ‘to trot’ ‘the housecoat’

The primary acoustic cues to this alternation are the presence of a release burst and segmental intensity: stops have audible release bursts and are less intense than approximants. These acoustic cues are internal to the allophones. However, accurate production and perception also requires knowledge of where each alternant occurs. Adult

L2 learners of English must begin to connect the /b/ at the beginning of [bit] with a release burst and recognize that the [b] in Spanish that occurs in the middle of two vowels and does not have a release burst. In other words, learners must recognize where in the input the specific phonetic cues occur, or their phonological environment.

1.3 Approaches to linguistic sound systems: representations and grammars

The longstanding question of how people learn to perceive and produce language has lead researchers to posit various types of mechanisms and representations that might

6 play a role in language acquisition. A formal linguistic model of grammar such as that proposed in the generative tradition, postulates that phonological knowledge consists of a series of rules or constraints that operate across abstract, minimal representations of lexical items. In their work The Sound Pattern of English (1968), Chomsky and Halle state that the phonological grammar component maps between the syntax and the phonetics and is responsible for applying the necessary rules to arrive at the correct phonetic form. In other words, phonology serves as the intermediate component between abstract representation of syntax and the physical realization of sounds.

From this perspective, learning is categorical and systematic, and the resulting lexical representations contain no redundant or predictable information (Chomsky &

Halle, 1968). The lexicon is fully separated from the rules and constraints that form the grammatical output. Sounds that contrast (i.e., phonemes) and those which do not (i.e., allophones) are differ in that noncontrastive sounds are the result of rule application.

Specifically, the lexicon is assumed to consist of a series of underlying forms, which contain only contrastive information. These underlying forms pass through the grammatical system of rules or constraints, which in supply allophonic information. An example of a rule might be A → B / D __ E (i.e., A becomes B when it occurs between D and E). Because allophones are predictable, they are not part of the underlying lexical representation. The problem with this traditional characterization, however, is that there are examples of allophones that are a) not predictably distributed in all positions (e.g., only voiceless stops can occur syllablefinally in German, but voiced and voiceless stops alternate freely in syllable onset position) and b) even those that are predictably distributed in all positions often demonstrate gradiency and probabilistic influences in

7 their realizations. To address this, Goldsmith (1995) suggests that contrast should be thought of as a ‘cline’ rather than a binary distinction.

In constraintbased approaches such as Optimality Theory (OT, Prince &

Smolensky, 1993), inputoutput relations are nonserial and relationships between types of contrast do not strictly exist (Hall, 2009). Instead, OT grammars yield language specific outputs through the ranking of different types of constraints on phonological outputs: faithfulness constraints require the output to retain certain characteristics of the input, and markedness constraints require the output to have certain phonetic characteristics regardless of the form of the input. Thus, under OT, allophones occur in the output as a result of constraint ranking, specifically, constraints that are relativized to particular positions in the word in order to reflect allophonic alternations. For example, there are families of constraints related to positional faithfulness (Beckman, 1998) or positional markedness (Zoll, 1999). Highranking positional faithfulness constraints will lead to contrasts and highranking positional markedness constraints will lead to allophonic variation conditioned by the phonological environment. In OT, contrastive and allophonic relationships can be easily expressed in the theory, despite the fact that they do not formally play a role in the articulation of the theory itself. According to Hayes

(2004), all forms of contrast in OT emerge strictly from the ranking of constraints, not from any inherent relationship in the phonemic inventory of the language itself.

OT grammars themselves also emerge through the reranking of universal constraints over the course of acquisition. Most models assume that reranking is a consequence of an erroranalysis process that drives constraints either up in the ranking or down, according to the number of violations the particular constraint incurs for the

8 input provided (see e.g., Boersma & Hayes, 2001). There is no role for the lexicon in traditional OTbased learning algorithms (although see Escudero & Boersma, 2004, for an OT model that includes lexical feedback). The learner cannot draw upon previously experienced inputs to evaluate the current one – it is hypothesized that the grammar is the only mechanism that can evaluate input.

Recently, exemplarbased models (see, e.g., Goldinger 1996; Johnson 1997, 2005;

Pierrehumbert 2001a, 2001b, 2003a, 2003b, 2006; Bybee 2000, 2001b, 2003) of phonological and phonetic knowledge have presented an alternative approach to OT and rulebased models of grammar. Exemplarbased models assume that all information found in the input is stored in the multidimensional phonetic space. Grammar emerges as a consequence of generalizations across these stored clusters once there is a large cluster of similar exemplars that can be identified as a category (Pierrehumbert, 2002, 2003a;

Hall, 2009). Categories emerge when the connections amongst certain exemplars are stronger than the connections to other exemplars. This may be the result of higherlevel, topdown factors, such as spelling or lexical knowledge. Allophones will necessarily share more connections amongst themselves in the multidimensional space than phonemes (Johnson, 2005; Hall, 2009). Importantly, in most exemplar models, allophones are not necessarily tied to the notion of belonging to the same overarching category. There is no need to specify a ‘b’ category that subsumes all exemplars of [b] and [β]. Instead, allophones and contrastive segments are at two ends of a continuum that can be understood as endpoints of a similarity relationship (Ladd, 2006).

In order to correctly cluster sounds in the multidimensional space, models that use exemplarbased representations assume that all categorization proceeds based upon

9 previously encountered exemplars and representations are constantly shifting to reflect the most current input experienced. Speech perception is probabilistic in nature – listeners learn and keep track of complex probabilistic distributions in the course of processing language. In such probabilistic phonological models (Pierrehumbert, 2003), words are represented by various levels of generalizations, which encode abstractions that may be parametric in nature (such as formant transitions) or abstractions across entire words. Similar sounding tokens that share semantic meaning or other higherlevel similarities (e.g., spelling) will begin to shift the exemplar's own space. This is how lexical frequency effects emerge. As learner representations become more robust, the space within the distribution itself will shift to reflect different modes, or peaks, of cue congruence. Given this, one of the basic assumptions of models that employ exemplar type representations is that the information learners have stored over the course of their experience with a language will play a primary role in directing all subsequent categorization of input. Previous learning guides the way in which information is picked up and used by learners and speech perception is a process of optimizing categorization of the input given the noise present in the signal (Feldman, Griffiths & Morgan, 2009).

There is a growing body of research supporting the proposal that phonological processing by native speakers of a language is closely linked to this optimization process, whereby new input is processed and categorized based upon previously existing clusters and lexical items. Again, frequency will necessarily play a strong role, as more frequently encountered items will coalesce into more robust representations. The more often listeners hear a word the more entrenched that word becomes, as do the sublexical patterns that make up the words themselves (Edwards, Beckman & Munson, 2000).

10

These patterns are typically referred to as phonotactic probabilities because they encode the likelihood, or probability, that a certain sequence of sounds will appear in a word or in a particular position within a word.

Phonotactic probabilities can affect adult speech production and perception. For example, adults are faster to repeat nonwords with highfrequency consonantvowel sequences (Vitevitch & Luce, 1999), and listeners are biased towards hearing ambiguous sounds as examples of highprobability sequences (Pitt & McQueen, 1998). Adults also give higher acceptability ratings to words that conform to attested phonotactic patterns

(Coleman & Pierrehumbert, 1997; Frisch, Pierrehumbert & Broe, 2004; Munson, 2001), judgments which have also been shown to be sensitive to vocabulary size in native speakers (Frisch & Zawaydeh, 2001). Research on infant speech perception has shown that infants as young as six months are sensitive to the phonotactics of their native language and prefer sounds that occur in their ambient language more frequently to sounds that either do not occur or occur with less frequency (Jusczyk, Luce & Charles

Luce, 1994).

In terms of adult production, Goldrick and Larson (2008) demonstrated that for adult native speakers, phonotactic probability affected production accuracy of nonwords independent of phonetic complexity. For children, Storkel (2001) found that 36 yearold children learned new words more rapidly when the words contained high probability sequences than when they contained low probability sequences. Similar results from other studies led researchers to hypothesize that children with larger vocabularies will have more robustly generalized phonological system (Edwards, Beckman, Munson,

2001). This hypothesis is based on the notion that representations of frequent, familiar

11 sublexical patterns are more easily accessed during production and less easily shifted by new input, because they are more robustly instantiated.

Under this approach, the grammar emerges epiphenomenally, based upon experience with languagespecific distributional information. As learner experience with the input grows, their vocabulary also expands, which serves to further support the sublexical patterns which occur in the words they already know. Indeed, Edwards,

Beckman & Munson (2004) show that vocabulary size in children aged 39 years is correlated with production accuracy on novel words with high and low frequency sequences, supporting the notion that the lexicon emerges from experience with the distributions of sounds in the target language.

Approaches advocating a separation of the lexicon from the grammar have difficulty accounting for the evidence cited above, given that the frequency of a lexical item cannot be arrived at from its phonological properties. Most formal linguistic models of phonology are all strictly grammatical and can interact with lexical items only in terms of their grammatical properties (their phonological and morphological properties) lexical frequency is viewed as an idiosyncratic property of the lexical item itself and must be stored along with it. Recent work by Coetzee (2008) represents an attempt at incorporating lexical frequency effects into a model of the grammar by means of lexicallyindexed faithfulness constraints that assign a single lexical item to different lexical classes on different occasions. Each item is associated with a distinct probabilistic distribution, determined by usage frequency, which ultimately assigns the item to a particular lexical class. Coetzee’s model addresses phonological variation in English associated with t/ddeletion and how usage frequency is related to the deletion rates.

12

While Coetzee’s model is one of the few that take an Optimality Theoretic perspective and directly incorporate frequencybased phonological variability effects, his proposal still maintains a separation of the grammar and frequency, or ‘extralinguistic’ factors, by means of lexical indexing.

1.4 Experience and acquisition

One element all models of acquisition draw upon, however, is learner experience with the input. In the case of adult L2 learners, experience is generally considered in terms of the native language and/or the amount of exposure to the target language, typically as a function of Age of Acquisition (AoA), amount of target language use and length of residence (LoR). In the case of child L1 learners, experience refers to a combination of age, syntactic and lexical development, given that age is not always a reliable predictor of linguistic development in young children. As learners accumulate experience with the target language, their perception and production of its sound system will shift and new representations may be created or previously created representations may be reenforced.

For adults acquiring a second language, linguistic experience (as a combination of the L1 and the other factors mentioned above) plays a determining role in how the non native sound contrasts are perceived and produced. In terms of perception, numerous studies have shown that not all nonnative sounds will be perceived equally, with some discriminated well and others not at all (Best, McRoberts & Goodell, 2001; Best &

Strange, 1992; Polka, 1995). The relative ease or difficulty of perception is assumed to

13 depend upon the native language of the listener, whereby sounds that are closer to native language categories will be more difficult to perceive than those that are most different from native categories.

It has been shown that listeners are sensitive to phonetic properties of the target language that may or may not be similar to those of the native phonology. This sensitivity across levels also occurs with respect to noncontrastive variation within categories of the target language sound system (Best & Tyler, 2007). Nonnative listeners are affected by contextual factors (Levy & Strange, 2008) and when presented with phoneticallyvariable target language segments, category goodness ratings shift (Allen & Miller, 2001). The perception of nonnative phonetic contrasts is also affected by native language phonotactic knowledge, whereby native listeners ‘hear’ sounds that repair phonological input to conform with L1 biases (Dupoux, 1997; Hallé , Segui, Frauenfelder & Meunier,

1998) . The cumulative results from this body of work indicate that L2 learners can and do perceive noncontrastive information in their second language.

Phonologically, different allophones are part of the same phonemic category and thus are treated as the same perceptual object by native speakers of a language (Jaeger,

1980; Pegg & Werker, 1997; Whalen, Best & Irwin, 1997; Peperkamp et al., 2003; Shea

& Curtin, 2005; Kazanina, Phillips & Idsardi, 2006; Boomershine, CurrieHall, Hume and Johnson, 2007). 2 The explanation for this finding is that because allophones are

2 Phonological categories are compared to phonetic categories where the former can contrast lexical items and the latter do not. Allophones and other sound categories that result from coarticulatory effects are classified as phonetic categories in most approaches to speech development.

14 variants of the same abstract phonological category, listeners are not very good at discriminating between them – an effect of categorical perception. However, recent research suggests that the type of allophonic relationship will have an influence on the discrimination results. Specifically, Celata (2007) showed that allophones in complementary distribution demonstrate the traditional categorical perception effect of poor discrimination while for allophones that result from neutralization, discrimination scores actually improved (Celata, 2007).

Celata (2007) also found task effects for allophone discrimination. L1 Tuscan

Italian listeners carried out an AX discrimination task, where allophones were discriminable at the same level as phonemes, and an ABX/2AFC+gating identification experiment, where the allophones were not discriminated at all. These results were in line with other research suggesting a phonological level of perceptual mapping and a second, phonetic level, which depended upon the ISI and surrounding stimuli (Pegg & Werker,

1997). Together, these studies indicate that under certain task conditions and certain allophonic relationships, native speakers do not perceive differences between allophones.

However, this does not necessarily apply to nonnative speakers. As stated, an L1filter effect may be at play in L2 speech perception that renders cues to targetlanguage sound categories imperceptible. The determining factor is assumed to be the way in which the new sounds assimilate to native language categories. This is addressed by the two main models of foreign language speech perception, the Speech Learning Model (SLM, Flege,

1995, inter alia) and the Perceptual Assimilation Model (PAM, Best, Sithole &

McRoberts, 1988, inter alia)

15

1.5 Phonetic and Phonological Models of L2 Speech Acquisition

As Best and Tyler (2007) state, a main question in the crosslanguage speech perception literature has been whether learners show perceptual shifts of L2 contrasts they were initially unable to differentiate, or differentiated poorly. The answer is affirmative, albeit with certain caveats. Specifically, perceptual learning does occur for some contrasts, but its success will depend upon the degree of similarity to the L1 and the relative amount of experience the listeners have had with the target language.

Interestingly, it appears that the greatest amount of perceptual learning occurs within the first year of exposure to the L2 and no significant perceptual learning differences have been found in adults with under one year of experience and those with 1.5 years (Jia,

Strange, Wu, Colander & Quan, 2006).

Two of the better known and tested models of L2 perceptual learning are the

Perceptual Assimilation Model (PAM: Best, 1994, 1995, inter alia) and the Speech

Learning Model (SLM: Flege, 1995; 2002, inter alia). PAM was developed to account for how nonnative or naive listeners perceive foreign language speech and the SLM was designed to examine L2 speech production by adult second language learners. Crucially, neither PAM nor SLM are restricted to predictions grounded in L1 phonological categories. Both models address noncontrastive phonetic similarities and dissimilarities between the L1 and L2 phones. In fact, one of the principles of SLM involves the notion that the actual targets of speech learning are positional allophones.

PAM is founded in the idea that the focus of speech perception is on information about the distal articulatory events that produced the speech signal (Best, 1994, 1995), a

16 position that is compatible with Articulatory Phonology (Browman & Goldstein, 1992).

PAM posits that perceivers extract articulatory information from the speech signal, rather than forming categories from acousticphonetic cues. In terms of testable predictions,

PAM establishes a set of relationships that might result when naïve listeners are asked to discriminate nonnative sounds. Specifically, PAM takes into account how each phone in a contrasting nonnative pair is perceptually assimilated into the most articulatorily similar native phone, as either categorizable (good exemplar of native phone) or uncategorizable (poor exemplar of native phone). PAM predicts that for listeners with no previous experience with the target language, new linguistic sounds will be perceived through the phonological system of the native language, and consequently the phonetic and phonological levels of the target language will be conflated. For L2 learners, the phonological level of the target language is predicted to be key because these learners have access to lexical knowledge and can therefore begin to form contrastive phonological categories (Best & Tyler, 2007).

In terms of experience, both the SLM and PAM share the notion that new sound category formation is possible throughout the lifespan. Importantly, the SLM posits that the L1 and the L2 share a common phonological space. Thus, distinct from PAM, the

SLM assumes that perceivers form categories based upon acousticphonetic cues, rather than articulatory gestures.

The SLM also predicts that new category formation is more likely for sounds that do not correspond closely to a sound in the native language, stored in longterm memory.

For new phonetic categories to be formed, the learner must discern at least some of the phonetic differences between the novel L2 and the closest L1 sound. The SLM posits that

17 learners can develop the capacity to perceive nonnative phonetic features over the course of acquisition, but even if the features are in place, the L2 listener may not grant each feature, or cue, the same weight as native speakers of the language do. Thus a ‘new’ L2 category may be based on different features or feature weightings than the corresponding category in a monolingual speaker (Flege, 1995).

An important part of cueweighting functions involve languagespecific biases and part of the second language learning task is to acquire the correct cueweighting for the target language categories. It is important to understand precisely how L2 learners weight cues and how their weighting might shift in order to fully grasp how L2 sound category acquisition proceeds. Escudero and Boersma (2004) propose a formal linguistic model of cueweighting in which adult second language learners incorporate cues that are used to distinguish categories in their target language but are not used in their native language. Escudero and Boersma’s L1 Spanish speakers learned to use the duration cue in their perception of the English /i/ / I/ contrast, where their native language relies primarily upon spectral information to distinguish among vowels. Their learners noticed that English vowels were functionally differentiated by the duration cue and, with increased language experience, L1 Spanish speakers learned to differentiate between these two vowels. According to Escudero and Boersma, L1 Spanish/L2English learners use a general learning mechanism that interacts with languagespecific experience.

The SLM, PAM and the model proposed by Escudero and Boersma account for how naïve learners categorize nonnative sounds, how L2 learners will categorize the sounds of their target language and how an OT perception grammar can model this process. The SLM and Escudero and Boersma’s (2004) model claim that learning new

18 sounds will be a process of acoustic cueweighting, whereby the listener shifts the internal cues that characterize categories in the native language to the correct weighting for the target language cues. For the SLM, the mechanism which carries out this process is not specified. For Escudero and Boersma’s model, it is assumed to be frequency of exposure to the input, i.e., experience.

Related to the work by Escudero and Boersma (2004) is a body of research that examines linguistic cueweighting outside of the OT framework. These studies have generally examined implicit changes over the course of acquisition (e.g., Bohn, 1995;

Cebrian, 2004; Flege, Bohn & Jang, 1997; Morrison, 2006). Work by Francis and colleagues (Francis, Baldwin & Nusbaum, 2000; Francis & Nusbaum, 2002; Francis,

Kaganovich & DriscollHuber, 2008) considers a role for attention in cueweighting by adult second language learners. In Francis, Baldwin and Nusbaum (2000), participants were given a category level feedback in identification training tasks, which allowed them to implicitly infer the role of specific acoustic cues. Subsequent analyses revealed that these acoustic cues were weighted more heavily in posttraining tasks.

According to Holt and Lotto (2006), cue weighting characterizes how a listener integrates auditory information (and potentially information from other modalities as well) in perceptual categorization. From this perspective, speech categorization is not just a matter of detecting cues but also assigning the cues the correct weighting function which, according to Holt and Lotto (2006) depends upon the listener’s experience with phonetic distributions. In a recent experiment, Holt and Lotto (2006) trained listeners on a twodimensional acoustic cuespace, where the two cues were the center frequency

(CF) and modulation frequencies (MF) of frequencymodulated sound waves, i.e., non

19 linguistic sounds and cues. The two cues were psychophysically matched to be equally discriminable and were equally informative for accurate categorization. In spite of this, in

Experiment 1, listeners’ categorization responses reflected a bias for use of one cue over the other (CF>MF). In Experiment 2, Holt and Lotto (2006) changed the informativeness of the preferred cue to more overlapping distributions and the bias was still found.

However, when greater variance was provided for the preferred cue in addition to shifts in distributional information, cueweighting preferences shifted. From this, the authors concluded that cueweighting can be affected by natural biases and also by shifting distributional information in the input.

There is evidence, however, that perceptual learning can proceed differently under explicit conditions. Adult learners, distinct from infants, can be explicitly told what specific aspects of the input to pay attention to. In a study carried out by Guion and

Pederson (2007) using stimuli taken from Hindi stop contrasts, a ‘sound attending group’ was told that even if the beginnings of two words sounded the same, where they referred to lexical items with different meanings, they were in fact two different sounds. The

‘meaning attending group’, on the other hand, was told that if two words referred to something different, they were different words, whether the difference was perceptible or not. Guion and Pederson’s results showed a 5.7% improvement for the soundattending group over a baseline pretest and only a 2.6% improvement for the meaning attending group. The authors interpret these results as suggesting that distinguishing differences between first and second language phonetic categories benefits from explicit attention to phonetic information.

20

The studies cited above show that learners demonstrate a natural bias towards weighting some cues over others – even in nonlinguistic stimuli, as shown by Holt and

Lotto (2006) and attention can lead to an explicit awareness of allophonic phonetic contrasts that may assist in acquisition. These studies are particularly relevant to the results obtained here. In the case of adult L2 learners of Spanish, the stopapproximant alternation is explicitly addressed in classroom instruction, which means that in instructed

L2 Spanish contexts, explicit attention to the cues involved (even if not referred to in phonetic terms of release burst/intensity values but rather as phonetic categories) informs learners in advance that they will ‘hear’ two different sounds at the phonetic level, both of which may be initially mapped onto the voiced stop L1 category.

1.6 Overview of the dissertation

The experiments carried out in this dissertation with adult L2 learners address how contextual factors affect the perception and production of the allophones and how experience with Spanish determines the nature of the interaction. In this sense, the contribution of this dissertation to the L2 speech literature is slightly different from previous studies, which have mainly examined how learners categorize – whether at a phonological or phonetic level – target language allophones. Instead of examining this categorization process directly, I present perception data that addresses how allophonic knowledge affects the perception of contextual factors. I also present production data that examines the issue from the opposite perspective, how the contextual factors of stress and word position affect the production of second language allophones. Collectively, these

21 experiments suggest that learners a) use a distributionbased mechanism in acquiring second language allophones and b) create representations that are rich in phonetic detail.

Distributionbased learning involves tracking information in the input that can be

subsequently drawn upon when perceiving or producing language. Upon hearing the phonetic cues that indicate one variant consistently occurring in the same context,

learners begin to build the distribution for this allophone. Another set of cues will be

associated with the other variant, stored as part of a separate context.

The basic assumption of all distributional, or statistical, approaches to acquisition is that the learner is aware of which units are relevant to the learning process and must be tracked. One way learners do so is by tracking the variability that occurs in the input.

Specifically, more variability occurs at transitions between units than within units

(Saffran, Aslin & Newport, 1996; Saffran, Newport & Aslin, 1996, inter alia) and learners can exploit these regularities to locate relevant units and also the boundaries between them. Boundaries between categories may be characterized by high variance; low variance indicates that items are members of the same category (Soderstrom,

Conwell, Feldman & Morgan, 2009) Given this, variance in the input that creates peaks and valleys can be highly informative to the learner while homogeneity of variance may indicate that a particular analysis can be either abandoned (in the case of speech segmentation tasks) or pursued (in the case of category formation).

Exemplarbased models are grounded in distributionbased mechanisms. These

models assume that all information is stored and can be drawn upon in subsequent

categorization and production tasks (Johnson, 2005). However, the accessibility of this

information will depend on various factors, such as individual experience and whether

22 sufficient generalizations about the input have been formed. At the earliest stages, such clusters are sparse and as experience increases, representations become more robust.

Following Munson, Edwards and Beckman (in press), I assume that representations are latent variables and cannot be directly observed. Instead, the nature of representations can only be inferred from behavioural patterns. The data presented here will allow inferences to be drawn regarding the types of representations created over the course of L2 phonological acquisition and furthermore, provide evidence for the type of phonological system is required to create them.

Examining the acquisition of allophones in complementary distribution allows a closer consideration of issues related to representations and the nature of the phonological system. Because their distribution is predictable and therefore contextuallydependent, it is possible to analyze how learners’ perception and production are directly affected by the cues in the input and infer how learners use such information over the course of acquisition. From this, it is also possible to make inferences regarding the phonological system itself – whether it is gradient or categorical in nature. For example, unlike phonemes, targetlike perception and production of allophones could result from the application of a categorical, systematic rule. The application of this rule may be the end state of acquisition (see Chomsky & Halle, 1968, among others, for this approach) or it could be a byproduct of the learning situation – learners apply abstract rules before they have accumulated extensive experience with the language. In other words, rules can be applied to incipient representations that do not necessarily comprise all the details

23 included in representations that exist at later stages. 3 These earlystage rules could be a consequence of explicit classroom instruction regarding a particular sound or process.

Alternatively, experience could lead to the formation of a nuanced, contextsensitive generalization applied to detailed representations. On this view, learners do not apply an abstract, categorical rule to acquisition but rather accumulate experience with the input and only subsequently can carry out abstractions (Pierrehumbert, 2003a; Werker &

Curtin, 2005).

In the next two chapters I examine how adult L2 Spanish learners acquire the contextual factors that condition the stopapproximant alternation in their target language.

Chapter 2 presents data from the perception of stressed syllables by Low and

Intermediate level L1 English/L2 Spanish learners, Native Spanish and Monolingual

English speakers to investigate how experience with Spanish interacts with the perception of stress, one of the conditioning factors of the allophone alternation. L1 English/L2

Spanish learners must integrate the phonetic cues that serve to distinguish the more stop like segments from the more approximantlike segments and link the former with word onset, stressed syllable onset position and the latter with word medial, unstressed syllable onset position. I predict that if L2 learners track the distribution of this alternation, they should link stops to stressed syllables in word onset position and approximants to unstressed, word medial position. In Experiment 1, the allophone onset was crossed with vowel stress. In Experiment 2, the allophone onsets were alternated and the vowel was

3 As will be discussed in Chapter 3, this does not mean that representations change over time. Instead, experience with the input will lead to the creation of denser, more robust representations that still contain all the previously stored information.

24 held steady. Results show that less experienced groups were more likely to perceive stressed vowels as stressed, regardless of onset consonant and approximant onset syllables as stressed. On the other hand, listeners with greater Spanish experience were more likely to perceive stress on stopinitial syllables only and were less influenced by the stress quality of the vowel. This pattern follows Spanish distributional information.

These results suggests that learning the interplay between allophonic distributions and their conditioning factors is possible with experience and that contextual factors play a role in second language allophone acquisition.

Chapter 3 includes production data from Low Intermediate and High Intermediate level native English/Spanish second language speakers and five native Mexican Spanish speakers. I examined the use of two cues to the alternation: consonant intensity and the presence of a release burst, and analyzed how these cues varied in participants’ productions in distinct contexts. Results show that the use of these cues differs with experience. That is, learners with greater language experience exhibit cueuse that is closer to the nativespeakers. Results further suggest that Low Intermediate learners may be using a basic rule for producing the alternation but over time this shifts to a more nuanced production pattern, indicating that more experienced learners’ ability to use these phonetic cues in a nativelike fashion emerges over the course of allophone acquisition.

In Chapter 4, I present child production data of the allophonic alternation by children learning Spanish as their native language. The findings support the existence of natural biases in the acquisition of these segments and also show languagespecific effects.

25

In order to account for the data presented in this dissertation, we require a learning mechanism that can track where in the word a particular sound occurs

(distributional information) and how this information is subsequently stored (nature of the representations created) so that the learner can draw upon it when faced with similar input. In the final chapter of this dissertation I present a framework for second and first language acquisition that can adequately address these issues, an extension to PRIMIR

(Werker & Curtin, 2005). The PRIMIR framework (Processing Rich Information from

Multidimensional Interactive Representations, Werker & Curtin, 2005) is grounded in two observations: first, rich information is available in the speech stream and second, the listener filters that information (Werker & Curtin, 2005). Importantly, PRIMIR offers an explanation for why some information is available at certain stages of development and not at others, thereby offering a comprehensive explanation for developmental patterns, task effects and attentional effects.

In PRIMIR, Werker and Curtin posit the existence of three dynamic filters, which serve to enhance the raw acoustic saliency of information in the input and can also diminish and/or transform information in the input. The first filter is the result of certain evolutionary and epigeneticallybased biases. For example, infants prefer speech to non speech, prefer infantdirected speech, point vowels and can process rhythmic patterns in speech (Werker & Curtin, 2005:212). The second filter that operates on the infant’s language learning is the developmental level. Younger infants will necessarily have fewer cognitive resources upon which to draw when processing language. Finally, the task itself constitutes the third filter and directs the infant’s attention to certain aspects of the input over others.

26

While PRIMIR was originally designed to account for infant speech development,

I present an extension to PRIMIR, PRIMIRL2, to address the data discussed here.

PRIMIRL2 includes a fourth filter that operates in addition to those mentioned above, the native language, or L1filter. The L1 filter also serves to direct information pickup and operates on the input together with the other three filters. The L1 filter operates on exemplar representations that store all the information present in the speech stream right from the learner’s first exposure to the target language. The shifting and interaction of the dynamic filters will determine whether or not this information is available to the learner.

27

Chapter Two: Perceiving the relationship between phonological environment and allophones in a second language: Evidence for distributional learning

2.1 Introduction

Studies of L2 speech perception have primarily explored how target language sounds fit into the sound system of the speaker’s native language. In particular, these studies examine whether nonnative sounds represent new categories, are classified into existing nativelanguage phoneme categories, or if they are similar to existing allophones.

For example, Flege’s (1995) Speech Learning Model (SLM) and Best’s Perceptual

Assimilation Model (PAM) (Best 1994; Best, 1995; Best & Tyler, 2007) emphasize

L1/L2 perceptual similarity as predictive of difficulties in discriminating nonnative contrasts. Flege characterizes nonnative phones along a continuum of similarity to L1 phones from ‘‘identical’’ through ‘‘similar’’ to ‘‘new’’. New phones may be difficult to perceive by inexperienced listeners, but the sound will eventually become differentiated from L1 phones (and other L2 phones) as L2learners gain experience. Best’s PAM maintains that listeners assimilate nonnative phones into L1 categories based upon a perceived continuum of ‘‘category goodness’’. If two nonnative phones are considered to be of the same native category, they will be very difficult to discriminate. If the two phones are perceptually assimilated to the same native category, but differ in their perceived ‘‘category goodness’’ they will be easier to discriminate. Finally, if nonnative phones are assimilated to different native categories, they will be very easy to

discriminate. While these models have provided testable hypotheses regarding how

28 second language learners acquire new sounds, they do not explicitly address the effects of context on the acquisition of target language sounds and thus it is difficult to draw clear predictions for the second language acquisition of sounds that differ in the nature of their contrastiveness, such as allophones.

Most models of L2 phonological and phonetic acquisition regard allophone acquisition as similar to that of phoneme acquisition. In the experiments discussed here, the allophone acquisition task is approached from a slightly different perspective. Instead of focusing on the acquisition of new sound categories and how L2 allophones assimilate into the native language sound inventory, I examine whether learners are sensitive to the contextual factors found in the phonological environment that condition the allophones’ distribution. In other words, do learners recognize and store information about the specific context in which each variant occurs? Learners with greater language experience are predicted to be aware of which factors condition the allophonic alternation and this awareness will shift their perception of the target language. The experiments discussed in this chapter assess how L1 English/L2 Spanish learners perceive and make use of the conditioning factors driving the stopapproximant alternation in their target language.

The experimental task involved identifying stressed syllables based on the onset consonant that accompanies the syllable and/or the position in the word. This indirect method of detecting participants’ knowledge of the allophonic alternation permits a closer consideration of how experience affects the use of contextual cues. The occurrence of either the stop or approximant allophone is contingent upon the phonological

environment – where in the word it occurs and whether the syllable is stressed. Given

this, it was posited that more experienced learners use stress and word position as

29 probabilistic cues to the stopapproximant alternation. Evidence in favour of this relationship would support the claim that learners associate the contextual factors to the allophonic variants and have fairly sophisticated knowledge about sound alternations.

One way learners might do so is by means of a distributionbased learning

mechanism. Researchers have shown that both adult and infant listeners are able to form

categories based on the distributions of speech sounds and shift their perception of

allophones by using this type of distributionbased mechanism (e.g. Maye & Gerken,

2001; Maye, Werker & Gerken, 2002; Goudbeek, Cutler & Smits, 2006; Holt & Lotto,

2006). A distributionbased learning mechanism can track information in the input, for

example, the cooccurrence of sounds, and allows learners to draw upon this information

over the course of acquisition. By their very nature, distributionbased models assume

learners create and store highly detailed, rich, exemplartype representations and

grammar emerges as the result of generalizations across all the stored items in the lexicon

(see, e.g., Goldinger, 1997, Johnson & Mullennix, 1997; Pierrehumbert, 2001b, 2003). In

the present case, the learning task for L1 English learners of Spanish involves creating a

new distribution for the approximant category, separate from the Spanish stop category

that is very similar to their L1 category. However, accumulating the knowledge required

to determine how context affects the production of the voiced stops in Spanish requires

time and exposure to the target language. The separate categories will emerge only when

the learners have stored sufficient examples of each category and the phonetic details that

distinguish them.

If adult second language learners track distributional information in the input,

they will expect to hear more stops in word initial, stressed position than word medial

30 unstressed position. Thus, learners will be more likely to perceive stress when the stressed syllable is accompanied by a stop consonant and less likely to perceive a syllable as stressed when it begins with an approximant. Given that it takes extensive linguistic experience to build sound category distributions, more advanced learners will associate the alternant with its most probable context of occurrence; less advanced learners will not do so. The fact that the distribution of the stopapproximant alternation in Spanish is not categorical, other than the postnasal contexts for stops (and post lateral in the case of

[d]), suggests that acquisition must also be probabilistic in nature. The likelihood of a stressed syllable occurring in wordinitial position with a stop consonant in the onset is much greater than hearing it in word medial position accompanied by an approximant in the onset, but it is still not 100%.

The acoustic correlates of lexical stress in Spanish and English are similar, other than for unstressed, or reduced syllables, which are common in English and do not occur in Spanish. In English, studies have consistently indicated that for disyllabic words, the acoustic correlates of fundamental frequency (F0), intensity, syllable duration and vowel quality are associated with the perception and production of lexical stress (Lieberman,

1960, 1975; Sluijter and van Heuven, 1996; Sluijter et al., 1997). Stressed syllables have higher F0, greater intensity and longer duration than unstressed syllables. In Spanish, stressed syllables also receive greater prominence by means of pitch, duration and intensity, although the difference in quality between stressed and unstressed vowels is very small (Hualde, 2005). Studies by Llisterri and colleagues (Llisterri, Machuca, de la

Mota, Riera & Ríos, 2003; Llisterrí & Schwab, 2010) show that in Spanish, F0 is the only parameter systematically related to the identification of the stressed syllable of a

31 word, while the role of duration depends on the stress pattern. While in English, duration and intensity can function as independent cues to lexical stress, in Spanish only F0 does so. In general, English lexically stressed syllables can be indentified in the absence of pitch prominence because of vowel quality and durational differences. In Spanish, pitch is a more fundamental correlate of stress. The experiments presented in this chapter use stressed and nonstressed vowels recorded by a female native Spanish speaker. There are no shifts in vowel quality nor are there any accentual contextual effects at play. While stress is acoustically manifested in different manners between English and Spanish, the same cues are used, with the difference lying in the crucial role of F0 for Spanish.

In Experiment 1, listeners heard CVCV nonce words, crossed for allophone onset and stressed/unstressed vowels to determine if the perception of stress shifts according to the allophone onset. Following the predictions stated above, learners with knowledge of the relationship between phonological environment and allophones will be more likely to select a syllable as stressed if it begins with a stop consonant and has a stressed vowel, than a syllable with an approximant onset and stressed vowel. In Experiment 2, the vowel was equated for stress and only the syllable onsets alternated between stops and approximants. Listeners with more Spanish experience should select stoponset syllables as stressed with greater likelihood than groups with less Spanish experience, given their increased knowledge of Spanish distributional information.

32

2.2 Experiment 1: Consonant and Vowel Stress Shift

In Experiment 1, participants were presented with bisyllabic CVCV stimuli and asked to select which syllable they perceived as stressed. Because stress is one of the conditioning factors driving the allophonic alternation, and stress is more likely to co occur with stop onsets than with approximant onsets in the Spanish input, Spanish proficient listeners should perceive stress with greater likelihood on stopinitial syllables than approximantinitial syllables. Stress detection served as an indirect method of determining the perceptual association of stress with stop onset syllables. This indirect behavioural method circumvents problems with phonetic vs. phonological representations4 and also arrives at the key question motivating these experiments: are learners aware of the contextual factors that drive allophonic alternations in their target language?

4 Asking listeners for either discrimination or categorization responses may have led learners of distinct proficiency levels to tap into separate levels of representations. Specifically, the L1 Spanish speakers may tap into the allophones at either the phonetic (resulting in two categories) or phonological (one category) level. L2 Spanish learners, on the other hand, could be tapping into two phonetic categories without considering them necessarily as part of the Spanish sound system. At the phonological level, L2 Spanish listeners could be tapping into the stop category without having unified the two allophones representationally. Thus, in behavioural terms, the L1 and L2 groups could be performing in the same way, but with underlyingly distinct motivations.

33

2.2.1 Method

2.2.1.1 Participants

Participants were 19 Low Intermediate L1 English/L2 Spanish learners and 20

High Intermediate Spanish learners, recruited from second and thirdyear universitylevel

Spanish classes. Students were recruited from different sections of the same course (eight distinct instructors). Participants filled out an autobiographical questionnaire regarding their experience with Spanish, which revealed that no participant from either group had spent more than six weeks in a Spanishspeaking country and none spoke Spanish outside of the classroom context. Participants had received explicit instruction regarding the stop approximant alternation during their class sessions. This alternation is mentioned as

‘softening’ of the ‘b, d and g sounds’ when they occur between vowels. This is covered in the low and higher level classes. None had previously taken Spanish phonology or phonetics courses.

Fifteen Native Spanish speakers (NSS) were recruited from the Center for the

Teaching of Foreign Languages at the National Autonomous University of Mexico,

(CELEUNAM) in Mexico City, Mexico. They were agematched with the native

English speaker learnergroups. None of the participants had ever lived abroad, none had attended a bilingual school nor did they have more than three hours per week of contact with English. Finally, 15 Monolingual English speakers (ME) were recruited from a university psychology subject pool and were also agematched with the two learner groups. None of these participants spoke any language other than English. All

34 participants were paid the equivalent of $15.00 for their time or, in the case of the

Monolingual English speakers, received course credit. None had any reported hearing difficulties.

2.2.1.2 Stimuli

The stimuli were created from naturalistic speech samples, recorded by a native female speaker of Spanish from Mexico City. Recordings were carried out in a sound proof booth and made directly onto a PowerMac computer (GIA417” Soundcard) and a

Sennheiser microphone. The microphone was placed into a stand and maintained at a 45o angle at all times, approximately 5.5cm from the speaker’s lips. The speech tokens were sampled at a rate of 44.1Hz with a quantization of 16 bits and saved directly onto the computer’s hard drive.

The speaker was asked to read a list of bisyllabic CVCV nonce words, in which the first syllable was stressed, following the expected stress pattern of Spanish. She was told that the nonwords were ‘Spanish’ and was asked to read them in Spanish. The consonants were [b], [d] or [g] and the vowel was [a]. Using PRAAT 5.1 (Boersma &

Weenink, 2008), the consonants were spliced from the vowels to create four separate sounds: stop (word onset), approximant (second syllable onset), stressed vowel and unstressed vowel. For example, the nonce word ‘baba’ [báβa] provided four separate segments; [b], stressed [a], [β] and unstressed [a]. These four sounds were combined to create four different tokens: [báβa], [βába], [baβá] and [βabá]. This procedure was

35 repeated for both [d] and [g], creating a total of 12 tokens. Stimuli ranged in length from

67ms to 78ms.

All stimuli were presented to two native English speakers and two native Spanish speakers and judged for naturalness on a scale of 15, where 5 represented ‘extremely naturalsounding’ and 1 was ‘extremely artificialsounding’, or how speechlike the sounds seemed to them. Stimuli that did not originally receive a rating of 4.5 or higher was respliced and presented to the judges again.

In the present case, approximants do not occur – or at least very rarely occur in word initial position, following a pause and any stimuli exhibiting this phonotactic pattern will necessarily be deemed unnatural to native Spanish speakers. Asking native speakers to rate stimuli that violate the phonotactic constraints of their native language may be difficult. However, the raters were asked to give a global impression of how

‘Spanishlike’ or ‘Englishlike’, i.e., speechlike, the sounds seemed to them. Care was taken not to direct their attention to any specific aspect of the stimuli, minimizing the likelihood that they would pay too close attention to the allophone onsets.

2.2.1.3 Procedure

For participants tested in Mexico, the experiment was carried out in a small, quiet room with the door closed. For participants tested outside of Mexico, the experiment was conducted in a soundproof booth. Participants were seated at a table in front of a

PowerMac computer and stimuli were presented through headphones at a comfortable volume using PsyScope (Cohen, MacWhinney, Flatt, & Provost, 1993) experimental

36 software program. Participants were told they were going to listen to nonwords in

Spanish. For the Native Spanish speaker group and the two learner groups, all communication occurred in Spanish (for the Low Intermediate group, clarifications were given in English where requested). The Monolingual English speaker group was told the nonwords were from a foreign language. Participants were instructed to select the syllable they perceived as stressed by means of pressing a key on the computer keyboard.

The keys were marked with a sticker indicating either ‘1’ or ‘2’. Subsequent tokens were played after the participant made their selection, with an ISI of 1000ms. If no decision was made in 1500ms, the trial timed out and the following trial was played. This occurred in 2.5% of all trials.

Participants were given five pretest trials before beginning the experimental trials.

The test trials were randomly selected by the experimenter from among the test stimuli, different for each participant. Because there were technically no ‘right’ answers listeners were predicted to perceive stress either according to the vowel or the consonant onset, based upon language experience no feedback was provided during the pretest trials.

All participants first completed a stress detection task prior to the pretest trials to ensure the results would not be compromised due to the inability to detect stressed syllables. The stress detection task involved listening to a series of 20 nonce words that followed the phonotactic and prosodic constraints of Spanish read by a female speaker of

Mexican Spanish, who was asked to read the words in Spanish. All ‘words’ had the form of nouns (i.e., no infinitives and words that were analogically comparable to conjugated verbs were avoided, see Appendix A). Participants indicated by means of pressing keys

37 on the computer keyboard whether they thought stress fell on syllable ‘1’ ‘2’ or ‘3’.

Sixteen of the twenty lexical items were bisyllabic, eight with stress on the first syllable and eight on the second. The four remaining items were trisyllabic and stress fell either on the first (2) or the third (2) syllable. Only participants who obtained at least 75% accuracy on this task had their results included for analysis.

2.2.2 Results

2.2.2.1 Stress perception: allophone + stressed vowel

It was necessary to exclude 4 participants from the Low Intermediate group and 6 from the High Intermediate group because they did not reach the 75% criterion on the

Stress Detection task. This gave a final total of 15 participants for the Monolingual

English, Low Intermediate and Native Spanish speaker groups. For the High Intermediate group n =14.

Recall the prediction that stress perception for lowlevel learners would not be affected by the allophone in the onset position of the syllable and instead would only be affected by the vowel. Lowlevel learners are predicted to perceive stress in accordance with the prominence of the vowel. To test this, a oneway ANOVA was carried out to determine whether there were overall differences among the groups in terms of connecting stressed vowels to one or the other allophone. Group was the independent variable and for the dependent variable, a ratio value was calculated as follows:

38

stops + stressed vowel perceived as stressed = Ratio approximants + stressed vowel perceived as stressed

The Native Spanish and High Intermediate groups are predicted to have ratios greater than 1, indicating more syllables with initial stops and stressed vowels were perceived as stressed than syllables with initial approximants and vowel stress. For the

Low Intermediate and Monolingual English groups, the ratios are predicted to be around

1, indicating a lack of preference for either allophone. Figure 1 presents the means for

each of the groups:

Figure 1. Ratio values for stops + stressed vowel perceived as stressed/approximants + stressed vowels perceived as stressed

The results from the oneway ANOVA were significant [F(3,56)=30.85, p

<0.001]. The groups with less Spanish experience had ratios that were close to 1 (Low

Intermediate: M=1.3, SD=.25; Monolingual English: M=.95,SD =.15) while the ratios for the two groups with more experience were significantly greater (Native Spanish: M=2.78,

39

SD=.122; High Intermediate: M =2.1, SD=.7). Tukey’s HSD post hoc tests showed significant differences among the Native Spanish speaker group and the two lower proficiency groups, and between the High Intermediate group and the two lower proficiency groups (all ps<0.05), but not between the High Intermediate and Native

Spanish speaker groups. These results show that the more experienced groups perceived stress significantly more often on stopinitial syllables than approximantinitial syllables, suggesting listeners with more Spanish experience associate stress with the stop allophone.

It is possible, however, that the selection of stressed syllables is also influenced by knowledge of the predominant stress pattern in English and Spanish. To determine whether such a generalization was occurring, I examined whether the selection of stressed syllables aligned with the most frequent stress pattern in Spanish and English.

2.2.2.2 Stress perception: Testing for trochaic bias

Of the 4 829 most frequent polysyllabic words in Spanish, those ending with a vowel followed a trochaic stress pattern 87.5% of the time (Alameda & Cuetos, 1995). In

English, the most common word type is bisyllabic with a trochaic stress pattern. Only about a quarter of the words of English are polysyllabic with a weak initial syllable

(Cutler & Carter, 1987). Given this, it is possible participants are simply relying on predominant stress patterns of English and Spanish, rather than the allophone or the perception of vowel stress, the two independent variables used in the main analysis.

40

In order to investigate the proportion of stressed syllables responses that corresponded to initial stress and determine whether trochaic biases were affecting listeners’ stress perception, I examined the proportion of initial syllables perceived as stressed, independent of the allophone. First, the total number of trials successfully completed by each group was calculated (15 participants * 12 trials = 180 per group, except for High Intermediate group for which there were only 165 responses, n =14).

Subsequently, I calculated the proportion of first syllables perceived as stressed,5 independent of vowel prominence, followed by the proportion of trials perceived as initial stress with stop allophones. While it is possible that all groups potentially demonstrate a trochaic bias for CVCV forms, I predict that only the groups with greater

Spanish experience will show a preference for syllables with stops over approximants in trochaic contexts. The less experienced groups are predicted to be around chance (0.5).

Table 1 presents these results:

Table 1. Stress perception on first syllable across groups and onsets

Segment NES LI HI NSS

First syllable Trials 105/174 98/179 95/164 100/176 Proportion .60 .55 .59 .57

Stop onset 57/87 55/90 59/82 73/88 Trials .65 .62 .71 .81 Proportion

5 The totals do not sum to 180 (or 168 for the High Intermediate, with n = 14) because of the 2.5% miss rate. A total of 18 trials were discarded, distributed as follows: Native Spanish speakers: 4; Native English speakers: 6; Low Intermediate L2 Spanish: 1; High Intermediate L2 Spanish: 4.

41

All groups perceived stress on the first syllable at proportions above chance, results which are consistent with the lexical statistics of Spanish and English. To permit adequate comparisons among the groups, I calculated ratio values (proportion stop allophoneσ1+stressed vowel /proportion approximant allophoneσ1+stressed vowel) and carried out a oneway ANOVA with groups as the independent variable and the ratio values as the dependent variable. There was no significant effect for stress detection on the first syllable (F[3, 55]= 1.97, p>.05). A second oneway ANOVA was conducted with stoponset syllables in initial position as the dependent variable (also a ratio). The results were significant (F[3,55]=22.03, p<0.001). Tukey HSD post hoc tests revealed significant differences between the Native Spanish speaker group and the other three groups

(p<0.001), but no significant differences emerged among the High Intermediate, Low

Intermediate and Monolingual English groups (all ps >0.05).

These results show that all four groups of listeners show a bias towards hearing trochaic stress patterns, but the Native Spanish speakers demonstrate an additional bias towards perceiving stress on syllables that have stop onsets, consistent with the Spanish distributional information. The High Intermediate group did not differ from the other three groups in their preference for stoponsets in initial position of trochaic nonwords.

This result was somewhat surprising, given that we expected the High Intermediate group to pattern largely along the same lines as the Native Spanish speakers. One explanation may be that the preference for trochaic bias in stress perception overrides any preference for stop onsets, even for groups that have had extensive experience with Spanish.

According to the predictions outlined in the introduction, the groups with greater

Spanish experience should be most likely to perceive stress on stopinitial syllables with

42

stressed vowels, regardless of the position in the word. To determine whether this prediction held, I conducted a goodnessoffit chisquare test on the proportion of syllables perceived as stressed for each of the four possible onsetvowel combinations,

reported for each group. If in fact the groups with more experience prefer stops as onsets to stressed syllables, there should be a difference amongst the four combinations, with the

different allophonetypes clustering together for the more proficient learners and the

stressedunstressed vowel factor clustering together for the less proficient learners. These results are presented in Figure 2:

Figure 2. Proportion of syllables perceived as stressed

For the Native Spanish speaker group, preference for syllable stress was not

equally distributed, χ2 (3, N=176) = 4.21, p<0.05. For the High Intermediate group, the

same results held, χ2 (3, N=174) = 7.22, p<0.05. For the Low Intermediate and

Monolingual English speaker groups there were no significant differences across the four

contexts (all ps>0.05). The finding that language experience led to significant differences

43 in the perception of stress across the four contexts shows the pivotal role played by this variable in terms of how the allophone drives stress perception in Spanish.

The results reported in this section show that native language trochaic stress perception preference affects both L1 Spanish and L1 English listeners. However, L1

Spanish listeners perceived initial syllables with stops as stressed at a significantly higher rate than the L1 English groups. Thus, the hypothesis that experience determines this particular aspect of stress perception is partially supported.

In the following section I examine which factor allophone onset or vowel stress

– is most likely to affect the stress perception of each group. The previous results show that all groups are affected by a bias towards trochaic stress, which is attributable to their

L1, so we now require an analysis that can reflect the likelihood that a particular group will be affected by vowel stress or allophone onset. To this end, a logistic regression analysis was carried out.

2.2.2.3 Logistic regression analysis

Logistic regression allows us to connect the predictor variables to the probability that they have an effect on the dependent, or outcome variable. A hierarchical logistic regression was used because it allows the testing of each predictor in a cumulative manner and it is better suited to analyses with a small n (Tabachnik & Fidell, 2007).

The three levels used in this analysis were as follows: Stress Vowel (Level 1) included all syllables with a stressed vowel collapsed across allophone; Allophone type and Stressed Vowel (Level 2) included allophone variant and vowel stress

44

(approximant+stressed vowel, e.g., βába; stop+stressed vowel, e.g., báβa); Position

(Level 3) included position and allophone (approximants in word medial position and stops in wordinitial position, collapsed across stress).

Table 2. Results of the hierarchical logistic regression for Experiment 1 Odds Ratio 6 (SE) Predictor NSS HI LI MES

Level 1 Stressed Vowel .542 .688** 1.707* 2.026** (.196) (.153) (.193) (.223)

Level 2 Stressed Vowel .733 1.116 1.430 1.285 (.314) (.261) (.250 (.243) Stop + Stressed Vowel 1.362 .820 .935 .964 (.214) (.204) (.196) (.239) Approximant + Stressed .593* .613** 1.207 1.930* Vowel (.264) (.234) (.164) (.899)

Level 3 Stressed Vowel .636 1.026 1.39 .966 (.373) (.271) (.332) (.331) Stop + Stressed Vowel 1.369 .888 1.316 .724 (.279) (.215) (.269) (.355) Approximant + Stressed .849 .663 1.345 1.825* Vowel (.337) (.284) (.253) (.306) Approximant Medial .474 .866 1.145 2.323 (.462) (.326) (.357) (.590) Stop Onset .545 .649 1.115** 2.093* (.544) (.452) (.711) (.734) Significance values are in brackets.

* Wald χ2, df=1, p<0.01, ** Wald χ2< 0.05.

6 values greater than 1 indicate likelihoods greater than chance

45

Overall, the results indicate that experienced listeners are more likely to rely on the allophones when detecting a stressed syllable. The results show that at Level 1,

Stressed Vowel reached significance for the Low Intermediate and Monolingual English speaker groups, suggesting that this predictor was particularly good at identifying members of these groups. For the High Intermediate group, significance was also reached, but the odds ratio is less than 1, indicating that members of the High

Intermediate group are less likely to perceive stress on syllables with a stressed vowel than either of the other two learner groups. This predictor did not reach significance for the Native Spanish speaker group. Thus, if participants relied primarily on whether the syllable contained a stressed vowel, they were likely to be members of the Low

Intermediate group and the Monolingual English group and less likely to be in the High

Intermediate group.

The Level 2 predictors (Stressed Vowel, Approximant+Stressed vowel,

Stop+Stressed vowel) demonstrated that if a participant was likely to perceive stress on an approximantinitial syllable, he/she was twice as likely to be a Monolingual English speaker than a member of one of the other speaker groups. This shows that Spanish experience affects the likelihood of perceiving stress on an approximantinitial syllable.

Finally, the results for the Level 3 predictors (Position) showed that the

Monolingual English speaker group was significantly more likely than the other three groups to perceive stress on approximantmedial and stopinitial syllables. If we consider first the likelihood of stress perception for approximantmedial syllables, the more experienced Spanish groups should have very low probabilities of perceiving stress in this context. These results were borne out by the logistic regression analysis. Only the

46 group with minimal Spanish experience was significantly more likely to perceive stress on approximant, word medial syllables. If we consider the results for the stopinitial syllables, the fact that the Monolingual English group was more likely to perceive stress on these syllables independent of vowel stress indicates a strong effect for position on the probabilities of stress detection by this group. This appears to contradict the initial hypothesis that stress perception for the more experienced groups would be driven primarily by the allophone. However, a possible explanation may lie in perceptual biases that drive the Monolingual English listeners to prefer stress on initial syllables (i.e., trochaic stress pattern) to a greater extent than the other three groups. Because they are completely unfamiliar with the sounds that are used in the stimuli, when forced to rely purely upon the onset allophone, the Monolingual English group reverts to the bias they demonstrated above. Specifically, these listeners prefer initial stressed syllables, the pattern that predominates in English.

It still remains to be seen whether the perception of stress is being driven by the onset alone or whether the presence of vowel stress plays a key role. To investigate this, I carried out a second experiment in which only the onset was shifted – whether stop or approximant – and the vowel stress was held steady in both syllables of the bisyllabic nonword. In other words, the vowel was ‘prominence neutral’.

2.3 Experiment 2: Allophone alternation, vowel steady

In Experiment 2 I examined whether shifting the consonant onset influences the perception of stress when the prominence of the vowel remains constant across syllables.

47

Participants listened to bisyllabic CVCV sequences in which the vowel was held steady and the syllable onsets alternated between the two allophones (e.g., baβa vs. βaba). Thus, the perception of stress is an illusion in this experiment. That is, stress is not explicitly present in the signal, but rather inferable from the presence of a stop onset, providing that the listener is sensitive to the distributional information connecting stress and stops in the

Spanish input. The presence of a stop onset is predicted to be one of the cues that native

Spanish speakers will use to detect stress in the signal. Other cues may come from the vowel and together, create the percept of stress for the listener.

In addition to the onset shifts, the duration of the onset consonants was also manipulated in order to test the hypothesis that shorter segments are more likely to be perceived as the onset to unstressed syllables than longer segments. Lavoie (2001) found that in addition to manner contrasts, the length of the allophone served to distinguish stoplike productions from approximantlike productions in a group of native Mexican

Spanish speakers. In the current experiment, native Spanish speakers and listeners with more Spanish experience are expected to use their knowledge of Spanish phonotactics to

‘hear’ stress on the syllable with the stop onset and not hear stress on the syllable with the approximant onset. The groups with less Spanish experience will have no such knowledge, or incomplete knowledge, to draw upon and therefore are predicted to perform at or around chance.

48

2.3.1 Method

2.3.1.1 Participants

The same participants from Experiment 1 took part in Experiment 2.

2.3.1.2 Stimuli

The stimuli for this experiment consisted of CVCV nonwords, created from the same naturalistic speech samples as Experiment 1. However, instead of shifting the vowel

[a] between stressed and unstressed values, the vowel was held steady and only the consonant onsets were alternated. A stressed vowel token was taken from the CVCV stimuli used in Experiment 1 and the intensity was adjusted to 75dB. The F1 value was

806Hz and F2 was 1628Hz and the duration was 74ms. Because stressed syllables are detectable only in comparison to unstressed syllables, by maintaining both vowels equal in terms of vowel duration, intensity and pitch, a ‘stressneutral’ CVCV item was created.

The length of the onset consonant was also varied in terms of duration, from values that were 3367100% of the original duration. Finally, the consonants were spliced onto the vowel and counterbalanced for allophone variant. The place of articulation was held constant within each CVCV sequence. This gave 18 possible CVCV combinations. For example, a nonce word with the velar versions of the allophones with the approximant in wordinitial position would take the form [γ]A[g]A. Stimuli ranged in length from 171ms

49 to 201ms. The 54 stimuli were presented randomly together with the 36 stimuli from

Experiment 1 in one block.

As with the first experiment, two native English speakers and two native Spanish speakers were asked to judge the stimuli for naturalness, on a scale of 15, where 5 represented ‘extremely naturalsounding’ and 1 was ‘extremely artificialsounding’. Only stimuli rated 4.5 or higher were used for the experiment. The same caveats hold for the explanation of ‘naturalness’ in terms of these stimuli as for the first experiment.

2.3.1.3 Procedure

The same procedure as in Experiment 1 was used. Participants were told that for certain trials it ‘might be very difficult to perceive stress with 100% certainty’ and they should try their best to respond accurately.

2.3.2 Results

A mixed analysis of variance was carried out to determine if there was any effect for the three different onset lengths. Group was the betweensubjects variable and segment length (33%, 67% or 100%) was the withinsubjects variable. The results revealed a nonsignificant main effect for group (F[3, 42]=1.8, p>0.05) and a non significant main effect for the withinsubjects variable of length (F[6,126]=2.1, p>0.05).

This permitted collapsing across consonant lengths for subsequent analyses.

50

2.3.2.1 Stress perception: allophone + nonprominent vowel

As with Experiment 1, a oneway ANOVA was run to determine whether participants perceived stress in higher proportions on stop syllables or on approximant syllables. Group was the independent variable and the following ratio measure was the dependent variable:

stopinitial syllables perceived as stressed = Ratio approximantinitial syllables perceived as stressed

There was a significant difference among the means (F[3, 55]=20.18, p<0.001).

Post hoc Tukey HSD tests revealed significant differences between the Native Spanish speaker group and the other three (p<0.01) and the Monolingual English group and the other three groups (p<0.01). There were no significant differences between the High

Intermediate group and the Low Intermediate group.

2.3.2.2 Testing for trochaic bias

As with Experiment 1, responses were examined to see if there was for a bias for perceiving stress on first syllable of the word. Of the 1566 possible test trial responses,

4% were discarded due to timing out, leaving 1503 responses for analysis. Table 3 gives the proportion of first syllables perceived as stressed, followed by the raw totals, for each group. The totals are then broken down into the percentage of syllables perceived with

51 stress on the first syllable and subsequently, the percentage of stressed syllables that were stopinitial. This data is presented in Table 3.7

Table 3. Stress perception on first syllable across groups and onsets

Segment NES LI HI NSS

First syllable Trials 440/746 444/792 397/747 380/785 Proportion .60 .56 .53 .48

Stop onset Trials 228/440 229/444 215/397 229/380 Proportion .53 .52 .54 .60

To permit adequate comparisons among the groups, I calculated ratio values

(proportion stop allophoneσ1/ proportion approximant allophoneσ1) and carried out a one way ANOVA. The results did not reach significance (F[1,55]=1.8, p>0.05), possibly because the two higherproficiency groups clustered together, as did the two lower proficiency groups. The mean ratio for the Native Spanish speakers was 2.4 (SD = 0.6) and for the High Intermediate group it was 1.9 (SD= 0.7) demonstrating that these participants prefer to associate stress with stop syllable onsets. For the Low Intermediate group the mean was 1.1 (SD= 0.37) and for the Monolingual English speakers, the mean was .93 (SD=0.4) suggesting that participants in these groups did not distinguish between

7 The totals do not sum to 405 (or 378 for the High Intermediate, with n=14) because of the 4% miss rate. A total of 64 trials were discarded, distributed as follows: Native Spanish speakers: 25; Monolingual English speakers: 12; Low Intermediate L2 Spanish: 18; High Intermediate L2 Spanish: 9.

52 stop and approximant onsets as they related to stress. These results, while not significant, do suggest that the onset allophone interacts with the perception of stress as a function of group membership. Figure 3 presents these ratios:

Figure 3. Stopinitial syllables perceived as stressed/approximantinitial syllables perceived as stressed ratio values

3.5 3 2.4 2.5 1.9 2 Ratio 1.5 0.93 1.1 1 0.5 0 NES LI HI NSS

The results from this section indicate that with more Spanish experience, the onset allophone – whether stop or approximant can lead to an illusory stress perception effect.

The Native Spanish speaker group heard stress significantly more often on stopinitial syllables than on approximantinitial syllables as compared to the other three groups and the Monolingual English group perceived stress significantly less often than the other three groups when the syllable had a stop in onset position.

2.3.2.3 Logistic regression analysis

As for Experiment 1, a logistic regression was run to determine whether the likelihood of group membership was affected by the perception of stress, this time solely in terms of the allophone onset. The predictors were as follows: Approximant Onset, Stop

53

Onset. There were six syllables with approximant onsets and six with stop onsets in the stimuli set. The predictor variable most likely to affect the likelihood of belonging to the different groups is the perception of stress on a syllable with an approximant. The likelihood of perceiving stress on an approximant syllable should be close to 1 (chance) for both the Low Intermediate and Monolingual English speaker groups. The likelihood of perceiving stress on an approximant syllable should be lower for the two groups with more Spanish experience. Table 4 presents the results from this analysis:

Table 4. Results of the logistic regression for Experiment 2 ODDS RATIOS (SE) NSS HI LI MES

Approximant Onset .547** .687* 1.986* 1.048** (.175) (.138) (.242) (.207)

Stop Onset 1.07 1.096 1.407 .898 (.201) (.192) (.277) (.264)

Standard error values are in parentheses.*p<0.01, **Wald χ2, df=1,p<0.005

The approximant allophone predictor reached significance for all groups. The

Low Intermediate participants were more likely to select a syllable with an approximant in the onset as the stressed syllable and the likelihood for the Monolingual English group was around chance. The Native Spanish speaker and High Intermediate participants were less likely to select a syllable as stressed if it had an approximant in the onset.

These results add to those from Experiment 1 and further suggest that when vowel stress is not accessible as a direct cue to word stress, listeners with greater Spanish

54 experience follow the distributional information found in Spanish and disprefer approximants as the onset to stressed syllables. The fact that Stop Onset did not reach significance for any of the groups indicates that having stops in onset position is expected by both L1 groups. In other words, none of the groups was more likely than the others to perceive stop onsets as stressed, which follows from the expectation that a natural bias towards preferring stops is at work. It has been extensively documented in the literature on phonology that stops are ideal onsets, whether due to markedness or phonetically grounded motives (Archangeli & Pulleyblank, 1994; Prince & Smolensky, 1993).

These results further suggest that with increased Spanish experience, L1 speakers of English perceive an illusory stress effect, induced by the onset allophone in bisyllabic nonwords. Learners associate the stop allophone with stress and the approximant allophone with absence of stress, but only after considerable experience with Spanish.

2.4 General Discussion

In these experiments, I investigated whether L2 learners connect each allophone to its expected phonological environment and if so, whether language experience plays a role. The results suggest that learners are able to track the distribution of the allophones and over time, they begin to learn the relationship between the allophones and the contexts in which they surface.

One possible way to explain how L2 learners acquire allophones is through distributional learning. This was demonstrated in Experiment 1. As expected, based on the predominant pattern for main stress in both Spanish and English, all four groups

55 showed a preference for perceiving stress on the first syllable. However, upon closer examination, the bias in favour of stop initial, stressed syllables only occurred with the

Spanishproficient groups. This suggests that participants in the Native Spanish and High

Intermediate groups have acquired knowledge about the distribution of these allophones.

In particular, these listeners have connected the phonological environment of stress to a stop onset and lack of stress to an approximant onset. This shows that experience with

Spanish can actually shift the perception of stress in nonnative speakers in the direction of the distributional information found in their target language.

Distributionbased learning mechanisms play a fundamental role in exemplar based models of phonological acquisition. In these types of models, phonological categories are represented as probability distributions over a mental phonetic acoustic/auditory map (Pierrehumbert, 2003a). Categories emerge as multiple exemplars that are phonetically similar accumulate in the same location on the phonetic map. In the present case, native Spanish speakers would have a large number of exemplars at the coordinates for a voiced bilabial approximant [β] and the voiced bilabial stop [b]. These two categories share many articulatory and acoustic characteristics, in addition to being represented by the same orthographic character.

Results supporting distributionbased learning in native language allophone perception were shown by Maye & Gerken (2001). They demonstrated that the perception of allophonic contrasts can be modified after exposure to an artificial language containing tokens of the allophones with a certain statistical distribution. They tested the perception of the allophonic contrast between voiced [d] and the voiceless unaspirated [t] in English. Adult native speakers of American English were exposed to either a

56 monomodal or a bimodal distribution of tokens lying on a continuum between these two sounds. After exposure, subjects in the bimodal group performed significantly better than those in the monomodal group in a discrimination task, suggesting that the former but not the latter had constructed two separate categories.

Peperkamp, Pettinato and Dupoux (2002) explored the effect of contextual information on distributionbased modifications to native language allophone categories.

They exposed native speakers of French to a bimodal and monomodal distribution of voiced/voiceless uvular trills in their native language. For one group, the trills were presented in context, whereby voicing was the result of assimilation to the following sound. The other group was presented with the same voiced/voiceless trill stimuli, following the same distributions, but without the contextual information. The group without context improved their perception of the contrast while the group exposed to the allophones in context did not. The authors argue that the condition without context led to greater improvement because learners were relying more upon phonetic perception while the other group was relying upon phonological knowledge. According to French phonology, both trills are members of the same phoneme class. In other words, the phonetic differences between the voiced and voiceless versions of the uvular trills were perceivable, but when they were in the correct phonological context for the realization of the allophones, this perceptibility was diminished. These results indicate that distributional information, in this case immediate contextual information regarding voicing assimilation patterns, plays a role in perceived noncontrastiveness of allophones.

Peperkamp et al.’s (2002) results add to our finding that learners rely upon a distribution

57 based mechanism in their perception of allophones. Furthermore, our results show that the likelihood of learners’ using contextual cues is a function of language experience.

Another possibility is that listeners are transferring cueuse from their L1. In

English, the alveolar stops /d/ and /t/ are flapped in wordmedial unstressed syllables. /b/ and /g/ do not undergo similar phonological processes. It is possible that the listeners are using their knowledge of flapping in English when perceiving the approximant allophones in Spanish. The evidence presented in this study indicates that this does not appear to be the case, however, as the lowlevel learners and more particularly the

Monolingual English speakers do not show any preference for stop or approximant onsets as stressed or unstressed, indicating that they have not tracked either of these allophones as being more probabilistically related to stress than the other.

Under an exemplarbased model, such effects arise when listeners rely upon information they have stored and probabilistically draw upon when exposed to input. The experienced Spanish listeners have representations that probabilistically associate stress with stop onsets. Their perception is biased towards perceiving stops and stress, yielding phonotactic sequences that are highly probable in Spanish. They are biased against hearing stress on approximantinitial syllables for the same reason. The groups with less

Spanish experience have not built up sufficiently dense representations and are thus not biased in one direction or the other.

Languagespecific phonotactics have been shown to bias the perception of individual segments. Massaro & Cohen (1983) found that synthetic stimuli that are ambiguous between /r/ and /l/ tend to be perceived as /r/ when preceded by /t/ and as /l/ when preceded by /s/. The authors argue that perception is biased towards segments that

58 yield the legal clusters /tr/ and /sl/, rather than the illegal clusters /tl/ and /sr/. Similarly,

Hallé, Segui, Frauenfelder and Meunier (1998) found that illegal onset clusters in French are perceived as legal ones. In particular, illegal /dl/ and /tl/ are perceived as legal /gl/ and

/kl/, respectively. This could be part of the explanation for the results from Experiment 2, where more Spanishproficient listeners demonstrated an allophoneinduced ‘stress illusion’.

As language experience increases, listeners are more affected by the contextual cues, or conditioning factors that drive the allophonic alternation. Specifically, knowledge of probabilistic, distributionbased information allowed our more advanced learners to recognize the factors that condition the allophonic alternation. In the present experiment, context effects – i.e., the onset allophone – actually shifted the perception of stress in the learners with greater Spanish proficiency. Lowerproficiency learners did not demonstrate any such effects. This suggests that adult L2 speech perception shifts over time and becomes sensitive to the phonotactics of their target language. Similar results were obtained by Dupoux, Kakehi, Hirose, Pallier and Mehler (1999) who compared

French and Japanese native speaker perception of sequences that were respectively phonotactically legal and illegal in their native language. They found that the phonotactic properties of Japanese (very reduced set of syllable types) drove L1 Japanese listeners to perceive “illusory” vowels inside consonant clusters in VCCV stimuli. French listeners, for whom these sequences were legal, did not. Our results indicate that stop allophone onsets can induce the perception of stress in L1 Spanish listeners and L2 English listeners with high Spanish proficiency, an ‘illusion’ that is consistent with the distributional information found in Spanish, but not English.

59

According to PAM (Best, 1995, Best & Tyler, 2007), contextual factors will change how the targetlanguage sound is assimilated into native language categories. For example, the Spanish ‘b’ category may be realized as an approximant or as a stop, depending upon the context in which it occurs, and PAM predicts that this will affect the way in which the L2 ‘b’ allophone is assimilated into the L1 ‘b’ category. They will each assimilate into a different L1 category or possibly not into any L1 category at all. Thus, context can play a key role in PAM in terms of phonological categories. However, there is less clarity in terms of how allophones might be assimilated into the L1 on a phonological level. Specifically, L2 listeners may hear the two targetlanguage allophones as separate sound categories, may even initially want to assimilate them into two completely separate L1 categories (See Boomershine et al., 2007 for an example of this) but their knowledge of targetlanguage orthography or classroom instruction will drive them to classify them as variants of the L1 category. Thus, while the basic prediction of PAM in terms of nonnative category assimilation and how context may affect this holds, other factors such as orthography and explicit instruction may override this in the case of literate, adult learners. Moreover, the PAM’s assumption that non native categories are assimilated into native categories under certain conditions means that presumably, homophonous lexical items will result. Under exemplarbased approaches, fine phonetic details that distinguish between the phonetic realizations of segments from different languages would prevent such assimilation from occurring.

Flege’s Speech Learning Model (SLM, Flege, 1995), provides an account for how

L2 learners acquire the sound categories of their target language. According to this model, phonetic characteristics of speech sounds are stored and production targets are

60 taken from these stored memories. L2 speech learning occurs across the lifespan, causing adaptations and changes to the L1 phonological system. For a new category to be learned, the SLM posits that the listener must notice the difference between the new category and the native language categories. All learning involves acquiring positional allophones in the target language – the acquisition of phonological categories is not addressed. Our results lend support to two important hypotheses of the SLM, specifically, that learning is possible and will occur as experience with the target language accumulates and that learners store phonetic details in their representations. However, the claim that phonetic differences drive the formation of new sound categories is less clear in terms of allophone acquisition. In the present case, learners must realize that there are two allophonic variants of the L1 voiced stop category, but that these two variants are in complementary distribution. Thus, we require some sort of mechanism by which differences can be noted, following the SLM, but category unity can still be maintained on an abstract level.

The results of this study suggest that adult second language learners use contextual information in their acquisition of target language allophones: the perception of stress was conditioned by the onset allophone and the position in the word, as a function of language experience. In a broader sense, these results point to the availability of a distributionbased mechanism for adult second language acquisition and further suggest that language experience plays a strong role in how exactly this mechanism is used over the timecourse of second language acquisition.

2.5 Conclusions

61

The results presented in this chapter provide evidence for a phonological system capable of tracking distributional information in the speech stream. Furthermore, this distributional knowledge is gradually accumulated, as shown by the different effects for the contextual factors across distinct levels of experience with Spanish: listeners with greater Spanish experience demonstrated an illusory effect for stress, induced by the presence of a stop allophone in syllable onset position. These results speak directly to how contextual factors drive listener expectations regarding the allophone alternation and suggests that learner representations encode statistical information such as cooccurrence likelihoods.

In the following chapter, I present stopapproximant production data from a similar group of L1 English/L2 Spanish learners that speaks to the nature of the phonological system itself.

62

Chapter Three: Evidence for a noncategorical phonological system: Adult L2 allophone production

3.1 Introduction

In this chapter, I explore another aspect of the phonological system, namely, whether it operates in a categorical or gradient manner. Stopapproximant production data was collected from L1 English/L2 Spanish learners of different proficiency levels in an effort to determine whether their productions reflect the implementation of a categorical rule or instead reflect more finegrained knowledge of how the allophones are produced.

As stated, in order to acquire an allophonic alternation, learners must connect phonetic cues to the context in which each alternant occurs. In the present case, this involves integrating the phonetic cues that serve to distinguish the more stoplike segments from the more approximantlike segments and linking stops with word onset, stressed syllable onset position and approximants with word medial, unstressed syllable onset position. The cues considered here are the presence of a release burst and consonant intensity: stoplike allophones will have release bursts and low intensity while approximantlike allophones will have no release burst and higher consonant intensity. In order to form this connection, learners must be sensitive to the contextual factors that condition the alternation.

There are (at least) two ways learners might carry this out. The first involves a rulebased phonological system that leads to categorical acquisition patterns (see

63

Chomsky & Halle, 1968). The second involves a more gradual inputbased system that leads to gradient, noncategorical acquisition patterns (Pierrehumbert, 2003a; N.Ellis,

2008). The representations created by each type of phonological system – whether categorical or gradient will necessarily be different, as will the learning outcomes. If L2 learners are acquiring a rule, then they should treat all three members involved this allophonic alternation in the same way. If not, asymmetries may emerge, reflecting place of articulation differences.

Evidence for asymmetries in L2 production comes from a number of different studies. For example, Zampini (1994) examined the acquisition of the Spanish stop approximant alternation by L1 English speakers and found different rates of lenition across the three places of articulation ([d] and [g] were lenited more often than [b]). Other studies have demonstrated that L2 learners often show variation in their substitution patterns as well, related to where the L2 sound occurs in the word (see Brown, 1998, for a featurebased approach to L2 positional effects). For example, learners whose first language bans voiced stops often master the voicing in onsets first while codas remain voiceless (Steele, 2005, for L1 German/L2 English). Such positionsensitive asymmetries have been attributed to markedness (Eckman, 2007; Broselow, Chen & Wang, 1998 for

Mandarin L1/English L2), whereby more marked segments emerge in positions that are more common crosslinguistically, and phonetic principles, whereby segments are acquired more readily in positions which favour the phonetic conditions for their realization (Colantoni and Steele, 2007 for L1 English/L2 French).

In the Generative Phonology tradition phonological knowledge is posited as a series of rules that operate across minimal, abstract representations of lexical items (see,

64 e.g., Sound Pattern of English, Chomsky & Halle 1968) or constraints (e.g., Optimality

Theory, Prince & Smolensky, 1993) that operate over possible outputs. Allophones are the product of rule application or constraint interaction. Because they are entirely predictable, allophones are not stored. Only contrastive sounds (i.e., phonemes) are stored in representations and the lexicon is considered fully separated from the rules and constraints that form the grammatical output (i.e. allophones). Learning is assumed to be categorical and systematic – rules are applied across natural classes in a nongradient fashion. More recently, research has shown that language users are sensitive to non categorical aspects of the signal. For example, frequency and fine phonetic details have been shown to affect lexical recognition and production patterns crosslinguistically

(Frisch, Pierrehumbert & Broe, 2004; Dahan, Drucker & Scarborough, 2009). In the present context, evidence for a categorical phonological system would be acrossthe board productions of the alternation, with no differences for place of articulation. If learners demonstrate either differences across place of articulation or for the contextual factors, a more gradient conceptualization of phonology is required, one that is capable of accounting for noncategorical patterns.

Presumably, if learner experience plays a role in connecting the alternant’s cues to its context, proficiency will be a determining factor in adult sensitivity to phonological environment. Proficiency effects could be realized in (at least) two ways. First, it is possible that learners of different proficiency levels show distinct cue integration patterns and do not produce the allophones in the correct manner for the phonological environment – the presence of release bursts should cooccur with stops in word onset and stressed syllable contexts. The highest intensity realizations should cooccur with the

65 lowest proportion of release bursts in wordmedial and unstressed syllable contexts and result in approximants. If the cues do not cooccur with the correct phonological environment, this would indicate a lack of phonetic cue and phonological environment integration.

Another possible effect for proficiency could be at the level of the phonological environment cues themselves: the contextual factors of stress and position could have distinct effects on learner production. For example, learners could be more sensitive to position than to stress in their realization of the phonetic cues. This could lead to an asymmetrical, or nonintegrated, production of the phonetic cues to the alternation (e.g., they produce higher burst ratios in initial position but intensity ratios are the same across both positions).

Allophone acquisition provides an excellent testing ground for comparing models of categorical and gradient phonological systems because, arguably, learners could be using either a categorical or gradient phonological system to carry out the learning task .

In the present context, evidence for a categorical phonological system would be the finding that no differences across place of articulation emerge. This would support a model that allows for phonological encoding of alternations (Chomsky & Halle, 1968).

If, on the other hand, learners are using a more gradient system, such differences are predicted to emerge. This would support a model that allows for richer representations that store phonetic details such as place of articulation.

66

3.2 Experiment: Stopapproximant production data

3.2.1 Method

3.2.1.1 Participants

Three groups of participants took part in this study. One group is classed as Low

Intermediate, one as High Intermediate, and the final one is a Native Speaker group. For the Low Intermediate group, 5 L1English/L2 Spanish learners were recruited from third semester Spanish university level classes. The High Intermediate participants were recruited from fifth semester Spanish classes. They were paid $10.00 for their participation. Participants filled out an autobiographical questionnaire regarding their experience with Spanish. None had spent more than six weeks in a Spanishspeaking country and none spoke Spanish outside of the classroom context. Two members of the

High Intermediate group also spoke French. In order to confirm their placement in either the Low or High Intermediate groups, participants were asked to selfrate their Spanish ability on a scale of 19, where 1 represented ‘my ability on my first day of Spanish class’ and 9 represented ‘my Spanish teacher’. Subsequently, participants were recorded taking part in a 5minute conversation in Spanish with a speaker who has a nearnative level of fluency in Spanish.

The LowIntermediate group had taken two universitylevel Spanish courses, with a total class time of approximately 67 hours, over eight months of the same academic year. Two had taken one year of Spanish in high school, three years previous to the data collection. The High Intermediate group had taken four universitylevel Spanish courses,

67 with a total class time of approximately 135 hours, over two academic years. All had taken Spanish in high school, for two years. In terms of the input they received in their

Spanish class, their instructors were Mexico (Mexico City and Guadalajara) and Spain

(both from Madrid). These two varieties of Spanish are relatively conservative in their realizations of the stopapproximant alternations and follow the phonological characterization detailed in the introduction. Specifically, stops follow nasals and for [d], the lateral. Otherwise, approximants are expected intervocalically.

Two native Mexican Spanishspeakers who were unaware of the study’s goal listened to the conversation and classified the speakers into two groups, based upon their accent, grammar and speech rate, on a scale of 1(low)5(high). The ratings coincided with the initial recruitment levels in all but two cases, where one participant was moved to the

High Intermediate group and another was moved to the Low Intermediate group. Table 5 presents the result from these two classification tasks:

Table 5. L2 participant biographical data

Age at Testing L1 English participants’ Native Spanish selfrating Speaker judges /9 /5 Group average range average range average range

Low Intermediate 26.6 1953 2.2 13 2.2 13

High Intermediate 22.4 2124 5.2 47 4.1 3.34.5

68

The native Spanish speakers group was composed of five female Mexican

Spanish speakers, from the central region of Mexico (Mexico City [2], Jalisco [1], Puebla

[2]). They all lived in an Englishspeaking environment at the time of data collection and all spoke Spanish and English. Four of the 5 participants reported speaking Spanish at home and at least 50% of the time outside of the home.

3.2.1.2 Stimuli

In selecting words for inclusion, the following factors were crossed: consonant (b, d or g), following vowel context (i, a, u), position (initial or medial), and stress (stressed or unstressed), yielding 36 words (see Appendix A). The word list included real and nonce lexical items. Where the segments of interest were in initial position (50%, 18/36) fourteen of the lexical items were bisyllabic. Where the segments of interest were in medial position, lexical items were either three or four syllables in length. The segment of interest never occurred in syllablefinal position. Recordings were carried out in a sound proof booth and made directly onto a PowerMac computer (GI A417” Soundcard) and a

Sennheiser microphone. The microphone was placed into a stand and maintained at a 45o

angle at all times, approximately 3.5cm from the speaker’s lips. The speech tokens were

sampled at a rate of 44.1Hz with a quantization of 16 bits and saved directly onto the

computer’s hard drive.

69

3.2.1.3 Procedure

Participants were asked to read three lists of the same words, with semi counterbalanced order, at a moderate pace, using the carrier phrase Diga ___, por favor or “Say ____, please”. 8 Each participant read the same three lists and the third reading was used for analysis in order to counteract possible novelty effects for the lexical items.

Novelty effects occur when words are new to the speaker and may result in a slower, more deliberate reading of the lexical item. Even words that exist in Spanish may exhibit novelty effects for the lowlevel learners.

All communication with the researcher was conducted in Spanish to avoid possible effects for language mode on the learner groups. However, the selfrating questionnaire was in English, to avoid confusion for the lower level participants.

3.2.1.4 Phonetic Analyses

Once recordings were made, all target words were labelled using Praat 5.0

(Boersma & Weenink, 2008). Labels were inserted at the following points for each token: consonant onsetoffset, CV onsetoffset, burst onsetoffset (where present) and vowel onsetoffset. Both the waveform and the spectrogram were consulted during labeling. The

8The final vowel in diga may have influenced the production of the following stopinitial word. However, if true, this influence is expected to be in the direction of more approximantlike segments, running counter to the hypothesis that speakers would produce stops in postpause position.

70 offset of the previous vowel’s F2 served as the onset of the following consonant and the onset of the following vowel’s F2 served as the offset of the previous consonant (Lavoie,

2001). Where there was doubt, intensity and other formants were also taken into account.

Bursts were identified after a visual inspection of the waveform and spectrogram and also labeled for their onsets and offsets. A fifteen ms Hamming window was used for analyses. Window size for burst measurements was based upon the duration of the burst itself and thus varied from token to token. The labelling procedure served the purpose of allowing scripts to be run on the sound files, guaranteeing accurate recording of the data and also allowing verification of labelling decisions where required. 9 There were a total number of 36 tokens per speaker. Figure 4 provides an example:

Figure 4. Spectrogram of gato ‘cat’ g a t o

0 Time (s)

C onset burst onsetoffset

C offset/vowel onset

vowel offset

9 I thank Titia Benders for writing the Praat scripts.

71

Recordings were analyzed for consonant intensity and the presence of release bursts. One of the main acoustic features associated with stop production is a noise burst at the moment of release (Kent & Read, 2002). The burst is a very brief acoustic event

(10 30ms in duration) and is the manifestation of the initial release of the air pressure behind the constriction for the stop (Kent & Read, 2001). 10 According to Stevens and

Keyser (1989:90), bursts can be interpreted as an enhancing feature of a stop.

Phonologically, bursts are said to be licensed in onsets – they are often missing in syllable codas or in word final position. Thus, the presence of a release burst indicates a stronger manifestation of the stop and its absence, a weaker segment. Given that there is no closure for approximants, there is no release burst. The implementation of the release burst cue was determined by examining the spectrogram and calculating a ratio based upon the number of bursts present/number of possible contexts. There were nine possible contexts for burst production for each context (stress/unstressed x initial/medial).

The other phonetic cue is consonant intensity. In traditional phonological approaches (e.g., Mascaró, 1984) the stopapproximant alternation in Spanish has been characterized as a process sonorityincreasing lenition: the less sonorous stop becomes more sonorous through a process of lenition, when it surfaces between two vowels.

According to Parker (2002), intensity is the most reliable correlate of phonological sonority, a fact which is also noted by Ladefoged (1975:219): ‘The sonority of a sound is

10 Burst intensity values are an acoustic cue to place of articulation (Raphael, Harris & Borden, 2007:150). For the labial stops /p/ and /b/, the bursts have low frequencies while for the alveolar stops, these frequencies are high. Velar stops exhibit more variability in their burst frequency,

linked closely to the F2 frequency of the following vowel.

72 its loudness relative to that of other sounds with the same length, stress, and pitch’, which is based on intensity or the perceived loudness of a sound. Thus, intensity is connected directly, albeit nonlinearly, to the loudness of a sound (Raphael et al., 2007). Because intensity can vary across speakers and also across words with different phonemic composition 11, we used a ratio measurement of consonant intensity/CV intensity.12 The ratio was calculated as follows: target consonant intensity (C) = Ratio target consonant + following vowel intensity (CV)

Where the target segment has lower intensity, the ratios will be close to zero, indicating the presence of a more stoplike segment. Where the target segment has higher intensity, the ratios will be closer to 1, indicating the presence of a more approximantlike segment.

3.2.2 Results

To determine if there were any significant differences between the real and nonce words, a mixed ANOVA was conducted with group as the betweensubject factor and wordtype (real vs. nonce) as the withinsubjects factor. The dependent variable was the

11 Intensity also varies across phonemes. However, given that the segments of interest form a natural class we assume that inherent intensity will not vary greatly across the three segments. 12 Intensity is measured in dB, which are on a logarithmic scale. In order to calculate ratios using logarithmic values, normally one value is subtracted from the other. Given that the objective of this study is to compare productions of cues across proficiency levels, we deemed a pure ratio value sufficient.

73 average consonant intensity.13 The main effect for wordtype was not significant overall

(F[1,12] = .031, p>0.05, partial eta squared = 0.003). This permitted collapsing across word types for subsequent analyses.

In order to guarantee that each participant only contributed one score to each variable and thus ensure independent error effects (Max & Onghena, 1999), an average for each cue in each context was calculated. For example, to calculate the CCV intensity and burst production values for the phonological environment of stressed syllables, all occurrences of the segments for each phonetic cue in stressed syllables were counted, regardless of their position in the word. To calculate the CCV intensity and burst production for word medial position, all occurrences of the segments in word medial position were counted, regardless of whether the syllable was stressed or not. Again, the creation of these variables ensured that each subject contributed only one score per context.

The first objective is to determine whether learning is systematic and categorical or if gradient effects are observed. The second objective is to determine whether proficiency plays a role in sensitivity to phonological environment factors. Thus, I examined which cues (if any) best separate the three groups and how to interpret these dimensions of difference in terms of the phonetic and phonological environment cues.

13 We selected the consonant intensity variable because the burst ratio values were generally either very low (for the word medial positions, where there were few release bursts produced) or very high (i.e., for wordinitial position, where there were a high number of release bursts). Thus, an average score for these groups would not have been indicative of their variability.

74

Because we know the native Spanish speakers produce the alternation, their data can serve as a baseline against which to compare the learner groups.

Discriminant Analysis (DA) is the data analysis method that best serves this purpose. DA allows researchers to determine along which dimensions groups differ reliably and how those dimensions can be interpreted (Tabachnik & Fidell, 2007). The

DA was run using the two cues in each of the four phonological environments

(stressed/unstressed, initial/medial). This gave a total of eight potentially significant predictors. The groupingvariables were formed by the three proficiency levels: Low

Intermediate, High Intermediate and Native Spanish speakers. In view of the fact that the groups have an n of five, each run of the DA could only use four predictors (Tabachnik &

Fidell, 2007). As a consequence, multiple DAs were run to determine which predictor variables were most important in separating our three groups.

The relative importance of each predictor variable was determined by their structure correlations, or discriminant loadings, which represent the correlation between the predictors and the discriminant functions (Huberty & Olejnik, 2006). The four predictors with the highest structure r’s (all greater than 0.5, p<0.05) were kept. Using these criteria, the following four predictor variables were included in the DA: i) unstressed syllable CCV intensity ratios; ii) medial syllables CCV intensity ratios; iii) unstressed syllable burst ratios; iv) medial syllable burst ratios. These four predictor variables that emerged from the DAs are all related to medial position and unstressed syllables. Table 6 presents the descriptive statistics for the data. There were no missing data nor were there any outliers. The correlations are in the small to moderate range and

75 the equality of variance assumption is not violated (Box test, (F[20, 516.9] = 48.3, p =

.277).

Table 6. Means and standard deviations on the dependent variables for the three groups Group means/(SD) Predictor Low High Native Spanish Intermediate Intermediate Speakers

Unstressed .853 .925 .967 CCV (1) (.458) (.071) (.04)

Medial .849 .919 .951 CCV (2) (.421) (.092) (.044)

Medial .906 .763 .235 Burst (3) (.671) (.049) (.083)

Unstressed .889 .711 .367 Burst (4) (.781) (.149) (.165)

To better determine how the four predictor variables separated the three groups, I examined the two linear discriminant functions (LDFs) which emerged as significant.

Table 7 presents these results:

76

Table 7. Results of Discriminant Analysis for phonetic and phonological environment cues

r’s for Predictor variables WithinGroups Correlations Among Discriminant Functions Predictors Matrix

Predictor Function 1 Function 2 Medial Medial Unstressed CCV Burst Burst

Unstressed .592 n.s. .47 .48 .14 CCV Medial .408 .358 .54 .51 CCV Medial n.s. .804 .16 Burst Unstressed n.s. n.s. Burst (4)

Function 1 (Wilks’ =0.001, p<0.001) accounts for 93.3% of the variance found in the data while function 2 (Wilks’=0.15, p<0.001) accounts for 6.7% of the variance.

Function 1 is best defined by CCV intensity, related to both stress and word position: the intensity of the allophone segments in unstressed (.592) and medial syllable (.408) onsets serve to maximally separate the three groups, with intensity values rising relative to amount of Spanish experience. All three groups are separated maximally by this function.

This is consistent with the hypothesis that experience with Spanish will lead to a differentiation in phonetic cueuse between wordinitial, stressed syllable context and wordmedial, unstressed syllable context. Function 2, on the other hand, loads primarily on the positional predictors. That is, burst ratios in medial position (.804) and CCV intensity in medial position (.358). Function 2 separates the Native Spanish speakers and

77 the Low Intermediate speakers from the High Intermediate speakers. This can be seen in the twodimensional plot of group centroids provided in Figure 5:

Figure 5. Plot of group centroids

The discriminant analysis presented in this section provides a general picture of the two constructs separating the three groups. The first function in the DA revealed that consonant intensity ratios in unstressed and medial syllables contributed greatest to group separation. For the second function, position contributed greatest to group separation.

Thus, the three groups are best separated by consonant intensity ratios in the first instance and position in the second. These results suggest that the two learner groups implement the phonetic cues to the stopapproximant alternation and are influenced differently by the phonological environment than the Native Spanish speaker group and in a way that differs from each other. What the DA did not reveal, however, were more precise details

78 regarding intergroup differences for each predictor. To investigate this, a oneway multivariate analysis of variance (oneway MANOVA) was conducted. The four predictors used in the discriminant analysis (CCV medial, CCV unstressed, burst medial position, burst unstressed) served as the dependent variables. Group was the independent variable and all tests were conducted at p<.05.

The results for the multivariate test show that overall, the means for the dependent variables are significantly different across the three groups (Wilkes’s Lambda = 0.002, significant at F(4,8) = 103.25, p<0.001). The multivariate η2 = .88, indicating that 88% of the variance of the dependent variables is associated with the group factor. Means and standard deviations are presented in Table 6 above. Figure 6 presents the means for the two dependent variables related to CCV intensity ratios and the two variables related to burst ratios:

Figure 6. MAOVA dependent variables

1.00 0.80 0.60 LI 0.40 HI Ratio Value Ratio NSS 0.20 CCV ratio CCV ratio Medial Unstressed

79

1.00 0.80 0.60 LI 0.40 HI Ratio Value Ratio NSS 0.20 Burst ratio Burst ratio Medial Unstressed

The univariate results on the four dependent variables were all significant across all four groups: CCV medial position [F(2,12) = 332.65, p<0.005, η2 = .78]; CCV unstressed [ F(2,12) = 690.92, η2 = .69, p<0.01]; burst unstressed [F(2, 12) = 20.153), η2

= .77, p<0.05]; burst medial [F(2,12) = 135.15, p<0.05, η2 = .61]. To determine if there were any significant differences between the groups, we conducted post hoc analyses to the univariate ANOVA for the four dependent variables. Tukey’s pairwise comparison revealed that the Native Spanish speaker group had significantly different mean scores on all four dependent variables in comparison with the other two groups (all ps<0.05). The

Low Intermediate and High Intermediate pairwise comparisons were significant for all dependent variables except for burst unstressed (p=.126).

Conjointly, the results from the DA and MANOVA demonstrate that proficiency affects sensitivity to the contextual factors of stress and position. The three groups produce significantly different cue values overall and across the four variables that serve to best distinguish between them on the DA analysis. They further suggest that learners demonstrate nonsystematic learning effects, given that the two conditioning factors affected the learners of different levels in different ways.

80

The question remains, however, whether the nonsystematic effects occur across different places of articulation. If speakers are applying a systematic rule to the production of the stopapproximant alternation, such a rule would target a natural class,

in phonological terms. Therefore, if learners are applying an abstract rule, there should be little, if any, significant differences across places of articulation. On the other hand, if speakers are drawing upon stored phonetic details when executing the articulatory plan

for a specific sound, we expect differences across the three places of articulation.

In order to examine this, a twofactor mixed ANOVA was conducted on the C

CV intensity ratios, with context (stressed, unstressed, initial, medial) as the within

subjects variable and place of articulation as the between subjects variable. Each group

was analyzed separately, since the goal was to see whether differences exist across places of articulation within groups, rather than across groups. There were 60 tokens for each

run of the ANOVA (three places of articulation, four contexts, five cases). Figures 7a, b and c show the differences in means among the consonants in each of the four contexts,

for the three groups:

Figure 7. Consonant x context for each group a. Low Intermediate speakers

81

b. High Intermediate speakers

c. Native Spanish speakers

The results for the Low Intermediate group demonstrate a significant main effect for context [F(3,36)=17.645, p<0.001) but not for consonant [F(2,12)=.107, p>0.05].

Pairwise comparisons (Tukey’s, p<0.05) revealed that this was due to significant differences between initial (M=.83) and medial contexts (M=.87). These results suggest that the Low Intermediate group productions are affected by position but not place of

articulation, indicating a systematic acquisition pattern has emerged for this group.

For the High Intermediate group, there were main effects for context

[F(3,36)=58.96, p<0.001] and consonant [F(2,12)=13.8, p<0.001]. A significant interaction between context and consonant also occurred [F(6,36)=2.83, p<0.05].

82

Subsequent post hoc tests revealed significant differences amongst b and d/g (p<0.001).

Thus, it appears that the High Intermediate group productions demonstrate a more gradient pattern than those of the Low Intermediate group.

Finally, the Native Spanish speaker group productions showed a main effect for context [F(3,36)=68.5, p<0.001] and consonant [F(2, 12)=5.03, p<0.05). There was a significant interaction between the two factors as well [F(6,36)=3.71, p<0.01]. Post hoc tests revealed significant differences between b and g (p<0.05).

The results from the twofactor mixed ANOVA indicate that gradiency in productions across context and consonant emerges with more Spanish experience. The

Low Intermediate group may be applying a rule along the lines of ‘b, d and g become softer’ (i.e., more lenited/more vowellike) when in the middle of the word. Because there were no significant effects across the places of articulation, for this group we can assume that this is due to the systematic effect of categorical and/or explicit learning at this early stage. As speakers gain experience, their productions become less categorical and more gradient. At the beginning stages, learners may be applying a rule to the natural class of voiced stops and only with more experience do they begin to differentiate across the places of articulation. One way to explain this is that learner representations actually shift over the course of acquisition. Another possibility is that representations remain consistent but the way in which learners access the information they contain is subject the developmentallydependent modulation.

83

3.3 General Discussion

The results from this study show that the answer to the research question of whether proficiency affects sensitivity to contextual factors in adult L2 allophone acquisition is affirmative. The results from the Discriminant Analysis revealed two significant functions separating the three groups. Function 1 loaded primarily on the consonant intensity phonetic cue, in medial and unstressed position. Function 2 loaded primarily on the medial position phonetic environment cue, for both release burst and consonant intensity. Both significant functions that maximally separate the groups are associated with cues that differentiate the phonological environment of approximants from that of stops. They show that significant differences exist across the three groups for the implementation of the contextual factors of stress and position.

The other research question addressed in this chapter relates to the nature of the phonological system and learner knowledge. Specifically, I hypothesized that learning an allophonic alternation could involve either categorical or gradient knowledge. The evidence provided here supports more gradient knowledge, albeit with certain caveats.

The results from the twoway ANOVA for place of articulation and context indicate that detailed phonetic knowledge is stored, as shown by the differences across the places of articulation in the more experienced groups’ productions and the interaction between place of articulation and context. Crucially, this detailed phonetic knowledge does not emerge in learners of lower proficiency. These results support the hypothesis that experience with a language is required in order for such subtle effects to emerge in learner productions. Learners with less experience did not produce the finegrained

84 differences across place of articulation and context that were observed in the productions of the High Intermediate learners.

I propose that the nature of L2 classroom learning may play a role in accounting for these results. It is quite common in the Spanish second language classroom for instructors to mention that the b, d and g become ‘softer’ when they occur between vowels. In fact, the textbook used by the Low Intermediate learners mentions this rule in an explicit manner, which may explain why their productions were most influenced by position. As for the more proficient learners, they may still have the categorical pattern but it has been enhanced and rendered more gradient by increased amounts of experience.

There has been a great deal of research in adult L2 acquisition on the role of explicit vs. implicit learning, most of which has concentrated on the acquisition of morphosyntax. In general, this research suggests that teaching explicit rules to adult learners can lead to faster integration of these rules in production and comprehension.

However, the rule must fulfill certain characteristics – for example, it must be relatively simple and transparent in its application – in order for learners to benefit (for a general discussion of this, see N. Ellis, 2008, inter alia). Given the results observed here, this may hold for phonological acquisition as well. Explicit instruction may lead to categorical effects but implicit learning may be required in order for finergrained phonetic differences to emerge, such as place of articulation effects. These distinctions may only emerge once the speaker has had experience with the language and can draw upon sufficiently robust representations (Pierrehumbert, 2003a; N. Ellis, 2008). Again, this has parallels in the area of L2 morphosyntax acquisition.

85

According to models of L2 speech acquisition, a key step in any sort of perceptual learning is the realization that differences exist between the L1 and L2 categories, required in order to initiate the acquisition process (Flege, 1995; Best & Tyler, 2007).

Again, this can occur implicitly or explicitly. Implicit mechanism such as cue salience may operate to draw learner’s attention to the new category that must be created.

However, as has been welldocumented, when the target sound is fully assimilated into a native language category, the listener will not necessarily realize two distinct sounds exist.14

Alternatively, learners could be using an explicit mechanism that draws their attention to the different target language sounds, for example either being told that two sounds are different because they contrast lexical items in the target language (Guion &

Pedersen, 2007) or by means of orthography. 15 Semantic and orthographic contrasts have been shown to assist L2 learners with lexicallybased categorization (Cutler, Weber &

Otake, 2006; Escudero, HayesHarb, Mitterer, 2008). In Spanish and English, the allophones are represented by the same orthographic symbols, 16 which may impede the formation of separate phonetic categories for each allophone. L1 English speakers are

14 Another implicit mechanism that could be affecting the acquisition of the approximant allophone is phonetic naturalness (Stampe, 1979). As discussed in terms of the Aerodynamic Voicing Constraint (AVC, Chomsky & Halle, 1968), the approximant is more phonetically natural in intervocalic context. 15 See recent work by Bassetti (2006; 2008) for additional evidence that orthography plays a role in L2 acquisition. 16 In Spanish, the orthographic symbols b and v are realized in the same manner phonetically and it is claimed that phonologically they also share a representation. None of the target words had the letter v in them, so this was not relevant to the present analysis.

86 exposed to the orthographic symbols b, d, g and associate them with their phonetic/phonological equivalents in English, which are voiced stops. In order to overcome this automatic response, adult learners may initially employ a rule. Indeed, the results seem to suggest that learners shift from an abstract, categorical learning ‘rule’ at the early stages of acquisition, which may be the result of explicit classroom instruction, to an implicit mechanism that can track detailed phonetic information across places of articulation. The difficulty with this explanation however, is the incompatibility of the assumptions regarding the phonological system. We would require two different mechanisms to account for the differences between the two groups and an explanation for how and why they would shift between them.

As an alternative, I propose that all learners – regardless of proficiency level – use the same mechanism and create the same types of representations. However, not all the information that is stored in these representations will necessarily be consistently available to all learners nor will the representations temselves be equally robust. For example, the Low Intermediate group could be abstracting across representations that do not support place of articulation details. In other words, these learners could be accessing information related to position only. The fact that none of the three groups showed a significant effect for nonce vs. real words indicates that they are generalizing across production patterns to neverencountered forms.

The High Intermediate learners, on the other hand, may be using different levels

of information in their productions of the stopapproximant alternation, information that

allows them to carry out abstractions that could include place of articulation details. This

explanation can also account for why we did not observe differences between the real and

87 nonce words on this production task. The Low Intermediate learners use levels of information in their productions that include positional details, allowing them to abstract from known sublexical patterns (i.e., ‘soften the stops in word medial position’) to new lexical items. The High Intermediate learners, on the other hand, can use this positional information level and also place of articulation information. These learners have stored information regarding sublexical patterns in Spanish that allows them to support generalizations to new words. The High Intermediate group’s additional experience means more detailed, robust representations can be drawn upon when carrying out the articulatory plan. Thus, learners of different proficiency levels access different information over the course of perception and production.

Further support for the fact that learners of different proficiency levels access different information was shown in the DA results. The three groups are separated along both position and stress environment cues, suggesting learners are storing this information and subsequently drawing upon it. However, proficiency will play a key role in precisely how this information is implemented in production. Learners at the early stages do not connect the phonological environment factors of stress and position to the phonetic cues for the approximant and instead produce similar phonetic cue values across the four contexts.

3.4 Conclusions

In the next chapter, I round out the picture of allophone acquisition by considering production data from two children acquiring Spanish as a native language. In contrast to

88 adult L2 acquisition, children have no previously established linguistic system that might interfere with new learning. Nor, in the case of these children, do they have an orthographic system that may interfere with the acquisition of allophonic alternations. By examining data from children it will be possible to consider a possible inputaccuracy interaction and also analyze potential effects for natural biases in stopapproximant acquisition.

89

Chapter Four: Evidence for detailed representations in L1 acquisition: Frequency and allophone production

4.1 Introduction

In this chapter, I examine how children learning Spanish as a first language produce the stopapproximant alternation and what this might imply for the phonological system and the types of representations created over the process of L1 allophone acquisition. In child phonological acquisition, it is generally assumed that natural biases strongly influence the order of emergence for segments and combinations of segments.

Natural biases are typically related to one aspect of markedness, namely, relative ease of articulation. Vocal tract physiology does not differ from person to person or language to language and thus the mapping between articulation and acoustics leads to the same divisions of the phonetic space, regardless of the language being acquired. Recent research has shown, however, that the ambient language influences when more marked sounds may emerge. For example, in a study examining the emergence of the stop approximant alternation in Spanish and German monolingual children, as well as bilingual SpanishGerman children, Lléo and Rakow (2005) demonstrate that the monolingual Spanish children showed the earliest production of the more marked approximant allophone, suggesting a role for languagespecific input. Another recent study by Edwards and Beckman (2008), in which they examined real word repetitions elicited from 2 and 3yearold children who were monolingual speakers of English,

Cantonese, Greek, or Japanese, found evidence in favour of both language universal

90 effects in phonological acquisition and for languagespecific effects related to phoneme and phoneme sequence frequency. The authors concluded that common acquisition patterns across languages are the result of universal constraints imposed by the physiology and physics of speech production/perception as well as the influence of individual language effects.

The aim of this chapter is to expand on the findings by Edwards and Beckman

(2008) and investigate how natural biases and languagespecific facts about the distribution of the stopapproximant allophones in Spanish affect the production of the alternation by young children. The data presented here is from two L1 Spanish speaking children (2;13;1), MG and FC. Their productions of the stopapproximant alternation were analyzed and subsequently compared to data from a Spanish childdirected speech corpora and child L1 Spanish production corpora.

4.2 Phonetic universals and languagespecific effects in phonological development

Research on phonological development has long been guided by Jakobson’s hypotheses (1941/1968:41) regarding universal principles, which he calls “implicational laws that structure the phoneme inventories of all languages and also guide child phonological acquisition.” For example, one wellknown universal principle states that the presence of voiced and/or aspirated stops in a language necessarily implies the presence of voiceless unaspirated stops. From this, Jakobson predicted that young children will produce unaspirated stops before they produce either voiceless aspirated or voiced stops, a markedness universal in acquisition. This prediction has largely been

91 confirmed, based upon evidence from English (Macken & Barton, 1980a), French (Allen,

1985), and Spanish (Macken & Barton, 1980b), among other languages.

An explanation for this apparently universal acquisition pattern lies in phonetic groundedness. Researchers have shown that aerodynamic requirements render it relatively more difficult to produce voiced stops because the build up of oral air pressure during stop closure inhibits voicing even when the vocal folds are adducted. Another example comes from crosslinguistic patterns observed in infant babbling, which contains many more stop consonants than fricatives. Children generally master stop consonants before they master fricatives (e.g., Kent, 1992; Dinnsen, 1992; Smit, Hand, Freilinger,

Bernthal, & Bird, 1990; Vihman, Macken, Miller, Simpson & Miller, 1985). Kent (1992) suggests that stops are mastered earlier than fricatives because the primary gesture in their production involves complete closure of the vocal tract while the production of fricatives and to an even greater degree approximants, requires greater control of the constriction degree in order to allow the sufficient but not excess airflow. The execution of the articulatory gestures required for fricatives involves precise control, which often lies beyond the abilities of young children.

Markedness can also be contextdependent. For example, there are certain contexts where stops are in fact the more marked segments. One such environment where contextual markedness disfavours stops is the intervocalic position. In this context, approximants are actually less marked. As Smith (2007) states, sonorityincreasing lenition is less marked in the intervocalic position where approximants are found. Thus, while stops are universally preferred as syllable onsets, when syllables are in word internal and in posttonic position, lenition to a more sonorous segment is in fact the least

92 marked option. In the case of the stopapproximant alternation, both prosodic and linear contextual effects motivate approximants in intervocalic, unstressed syllable onsets.

Prosodically, smaller gestural magnitude is expected in less prominent positions such as wordmedial, unstressed contexts (Byrd, 1996; Cho & McQueen, 2007; OrtegaLlebaría,

2006 for Spanish). Linearly, aerodynamic factors also favour the production of approximants in intervocalic context. According to the Aerodynamic Voicing Constraint

(AVC, Chomsky & Halle, 1968:300301), for voicing to occur, the vocal cords first must have the appropriate degree of tension and adduction and also must have air flowing through them. As Ohala (1994) states, the AVC favours the emergence of certain phonological patterns across the world’s languages, such as the tendency of intervocalic stops to become approximants. This is due to an effort on the part of the speaker to keep the closure duration short, but still avoid devoicing the stop. Excessive shortening – as might occur between two vowels – may lead to an incomplete closure and a spirant or approximant could result (Ohala, 1994:4).

In addition to the implicational and markedness universals assumption, Jakobson made two further proposals. First, that all children essentially produce the same set of sounds when babbling and only later acquire languagespecific inventory and second, there is a discontinuity between early babbling and the sounds produced in children’s first words. These claims have been subsequently countered by researchers working on a wide variety of languages (Vihman, et al., 1985; Ingram, 1999), who have shown that the sounds infants use in babbling resemble those sounds that eventually form part of their native language inventory. Thus, contrary to Jakobson’s proposals, it is now recognized that child phonological development reflects universal tendencies and also language

93 specific tendencies that emerge based upon the input received. In other words, universal effects can be modulated by languagespecific input. Thus, as with all children acquiring the sound system of their language, Spanishspeaking children acquiring the stop approximant alternation are subject to two conflicting pressures. One is direct and results from universal phonetic and perceptual constraints imposed by the human speech system at its early stages of development; the second is attributable to the way in which languagespecific lexical and frequency effects drive the emergence of more marked segments (Edwards & Beckman, 2008).

If natural biases play the determining role in child acquisition of the stop approximant alternation, the two children studied here are predicted to initially produce more stops than approximants, given that approximants are more articulatorily difficult.

If languagespecific patterns play the determining role in acquisition, the children’s production data will reflect the input frequencies and approximants will appear with little substitution asymmetry occurring. This would imply little to no role for markedness universals.

There is also a third possibility, which follows the results found by Edwards and

Beckman (2008). Specifically, there could be effects for both natural biases and languagespecific input. If these two factors interact, the children could demonstrate an asymmetry in favour of the stop allophones, reflecting natural biases, but the way in which the approximants emerge could be either categorical or gradient. If the approximants emerge in a categorical fashion, this would lend support to a categorical phonological system in child L1 allophone acquisition.

94

By categorical emergence of the approximant allophones I do not mean that the children will produce all the approximant targets accurately at a uniform point in development. Instead, I mean that there will be no relationship between targetlike productions of the approximant allophone and what are considered extragrammatical factors, such as frequency in the input or the child’s own output. For example, in more traditional constraintbased models of phonological development acquisition occurs via the reranking of universal constraints over the course of development. Most models assume that reranking is a consequence of an erroranalysis process that drives constraints either up in the ranking or down, according to the number of violations the particular constraint incurs for the input provided (Levelt & van de Vijver, 1998;

Boersma & Levelt, 1999; Boersma & Hayes, 2001).

On the other hand, if the emergence of targetlike allophone production does demonstrate frequency effects, this could be taken as evidence for a noncategorical phonological system. Specifically, gradient effects in the emergence of the approximants would suggest that the phonological system is sensitive to information of this type and creates representations that can encode it.

Given the conflicting pressures of universal constraints and language specific effects, I predict that the third possibility will actually be observed in the data – an interaction between natural biases and languagespecific effects. Spanish speaking children will tend to produce stops in contexts where approximants are expected and not vice versa, consistent with the universal preference for stops over continuant segments but approximants will emerge first in words that are frequent in the child’s lexicon or in the input distributions. Sounds or sequences of sounds that appear frequently afford the

95 child many opportunities to perceive and produce the patterns, facilitating the mapping between perception and production and eventually leading to stronger representations that can be abstracted away from the specific word context (Edwards, Beckman & Munson,

2004).

In the case of allophones, children with incipient lexicons may not be aware of the relationship between the two noncontrastive sounds and may treat them as two separate categories. As the size of the lexicon increases, however, connections between allophones begin to emerge – in terms of articulation and/or perception – and recognize that the two sounds are not contrastive in their language. This relationship may emerge in a piece meal fashion. Gradually, as lexical knowledge builds, connections form, allowing generalizations based on the similarities shared by the allophones. This process of generalization may not occur until the child learns to read and recognizes the shared orthographic symbol for both allophones.

The stopapproximant alternation provides an ideal testing ground for theories related to the role of universals and languagespecific patterns in language acquisition. As stated by Beckman and Edwards (2008), languagespecific sound changes generally reflect universal tendencies – e.g., lenition of stops between vowels (Ohala, 1994) and how children acquire these languagespecific patterns helps us to understand the nature of the phonological system and the representations created.

96

4.3 Corpus I: recorded data

The data comes from two L1 Spanish children living in Calgary, Alberta. Child 1,

MG, male, was 2;4 and Child 2, FC, also male, was 2;0 at the beginning of the recording.

The children were recorded over an eight month period from October 2005June, 2006.

The corpus was recorded at a private, homerun, Spanishspeaking daycare centre in

Calgary, Alberta, Canada. The children attended the daycare four halfdays a week, from

91pm. The daycare workers spoke only Spanish to the children and all books, songs, games and activities were in Spanish as well. On any given day there were between 46 children at the daycare centre, between the ages of 2;0–3;8, approximately. The linguistic background of the children represented a mixture of monolingual Spanish (3), bilingual

SpanishEnglish (1) and monolingual English (2). However, the recordings analyzed for this paper were of monolingual Spanishspeaking children exclusively.

Recordings of each participant were made on average once every two weeks, for about 4560 minutes each session, recorded directly onto a laptop computer from a portable microphone and subsequently transcribed using PRAAT 5.1 (Weenink &

Boersma, 2007).All words were transcribed. Stops are predicted to occur in wordinitial, postpause position and the approximants are predicted to surface everywhere else, following standard phonological and phonetic descriptions of Spanish (Hualde, 2005).

97

4.3.1 Participants

Child 1, MG, is from Bogotá, Colombia and lived there until he was 2;3 when he moved with his family to Calgary, Alberta. The recordings were made when MG was between 2;4.5 and 2;11.26 years of age. MG attended the Spanishspeaking daycare centre four halfdays a week, during which time he was exposed to 100% Spanish with the daycare workers and about 80% Spanish with the other children at the daycare centre.

The other 20% was Canadian English. MG has one older brother who was in Grade 1 and learning English at school. At home, MG spoke only Spanish with his mother, stepfather and older brother. He watched TV and videos in English, however, and had books in

English as well. Spanish was the language of communication 100% of the time in the house and the language of entertainment (TV, videos, books) about 60% of the time.

According to parental reports, MG did not comprehend or produce any English words at the time the recordings began or ended.

Child 2, FC, was 2;0;11 when recorded for the first time. FC was born in

Barranquilla, Colombia and moved with his mother and father to Calgary, Alberta when he was 1;10. FC attended the same Spanishspeaking daycare centre as MG. Spanish was the only language used at home, by both parents, and FC had access to books, videos and television in that language as well. He is an only child.

98

4.4 Presentation of the data

For the data analysis, only singleton productions of /b/, /d/ and /g/ targets were considered, where they did not follow a sonorant homorganic nasal or lateral in the case of /d/. In Colombian Spanish, the target variety of the participants in this study, highland and coastal speakers tend to produce stops after all consonants, where speakers of other varieties of Spanish produce stops only after homorganic sonorants (Hualde, 2005. Given this particularity of Colombian Spanish, all stopstop sequences were counted as obligatory for stops. Consequently, these contexts were not part of the data analyzed.

Where the same word was repeated over the course of the same session, each individual production was counted. For example, if the child produced amigo ‘friend’( masc, sing)’ with an approximant twice and a stop once, it was counted as three tokens of the word, two approximants and one stop.

4.4.1 Data

Data from MG is presented first. Over the 33week span of which the recordings were made, he was between 2;4.5 to 2;11.26 years old. MG was recorded a total of twelve times. Table 8 presents a breakdown of allophone distribution across the tokens produced by MG. The first column contains the total number of tokens with /b/ /d/ /g/ as singletons, in either wordinitial or wordmedial position. Column two includes the total number of tokens with stop targets, that is, with one of the three target sounds in word initial position where the stop is the expected allophone. Column three comprises the

99 total number of tokens with approximant targets, that is, with one of the three sounds in wordmedial position where the approximant is the expected allophone. Column four contains the actual number of stops that were produced in the expected position and in a targetlike manner. Finally, column five presents the same information for approximant targets. For all columns, the percentages reflect the number of productions for that specific category divided by the target number in the entire corpus.

Table 8. MG’s productions Total number Tokens with Tokens with Tokens realized Tokens realized of tokens with stop targets approximant with targetlike with targetlike /b/ /d/ /g/ (wordinitial) targets stops approximants* targets (wordmedial)

209 124/209 85/209 119/124 51/85 (59%) (41%) (96%) (61%)

*The other 34 targets were all realized as stops. Only two stops were realized as approximants, the other three were eliminated completely.

The data presented in Table 8 shows that MG produced more words with stop targets than with approximant targets (59% vs. 41%) and that the stop targets were produced overwhelmingly in greater targetlike fashion (96% vs. 61%) than the approximants. The asymmetry observed in the targetlike realization of the allophones supports the prediction that natural biases play a role in MG’s phonological development.

FC was four months younger than MG when recording began (2;0.11) and even though he was recorded a total of thirteen times, only eight sessions produced data usable

100

for the present study. Over the 16 weeks covered by the recordings, FC produced a total

of 71 tokens with the target segments. This data is presented in Table 9:

Table 9. FC’s productions Total number of Tokens with Tokens with Tokens realized Tokens realized tokens with stop targets approximant with targetlike with targetlike /b/ /d/ /g/ targets targets stops approximants*

71 28/71 43/71 22/28 28/43 39% 61% 80% 65%

*Ten of these targets were realized as stops, five were realized as fricatives.

FC produced fewer words with stop targets, but still had a much higher accuracy

rate with that allophone than with the approximant target (80% vs. 65%). These results

also support the hypothesis that universal factors are influencing the production of the

allophones for FC.

To summarize, it appears that there is a notable asymmetry in the production of

the stopapproximant alternation. Both children produced more stop targets accurately

than approximant targets, in spite of the fact that approximants were almost 50% of the

targets in MG’s data and 61% in FC’s data, indicating that natural biases may be playing

a role in their production patterns. Examples of productions where the stops and

approximant targets are produced as targetlike by each child are shown in (2):

101

(2) a. MG: Targetlike stops

Target form Child Form Adult Target Age bien [bién] [bién] 2;4.21 ‘good’ dos [dos] [dos] 2;4.21 ‘two’ gané [gane] [gane] 2;6.3 ‘(I) won’ voy a comer [bói a komer] [bói a komer] 2;6.3 ‘I am going to eat’

b. FC: Targetlike stops

Target form Child Form Adult Target Age vaca [baka] [baka] 2;2.22 ‘cow’

dos [doh] [doh] 2;2.22 ‘two’ gorra [gola] [gorɸa] 2;3.13 ‘cap’ basura [basuta] [basuȎa] 2;6.4 ‘garbage’

c. MG: Targetlike approximants

Target form Child Form Adult Target Age 2;4.5 avion [aβión] [aβión] ‘plane’

sabes [saβes] [saβes] 2;4.5 ‘(you) know’

102

yo ya te diji (dije) [teðixi] [teðixi] 2;6.24 ‘I already told you.’

d. FC: Targetlike approximants

Target form Child Form Adult Target Age a bajar [aβaga] [aβahaȎ] 2;2.23 ‘to go down’ caballo [aβaidȣo] [kaβaidȣo] 2;6.4 ‘horse’

The data in (2) provides examples of productions that conform to the predictions for Spanish: stops will occur in word onset and approximants in wordmedial position.

However, as shown in Tables 8 and 9 above, there are notable asymmetries in the non targetlike productions of the two children. Specifically, stops are substituted for approximants at a much higher rate than approximant substitutions. The data in (3) provides examples from MG and FC’s productions where such substitutions occur:

(3) a. MG: Stop substitutions for approximants:

Target form Child Form Adult Target Age tu tengas fuego (juego) [fuégo] [fuégo] 2;4.5 ‘(You) have (a) game.’ caballo [kabaiȭo] [kaβaiȭo] 2;6.3 ‘horse’ yo estoy buscando [ȭo ehtóibuhkan̪ do] [ȭo ehtóiβuhkando] 2;6.24 ‘I am looking.’

103 b. FC: Stop substitutions for approximants

Target form Child Form Adult Target Age yogúr [ȭogu] [ȭoγu] 2;2.22 ‘yogurt’ víbora [biboȎa] [biβoȎa] 2;3.13 ‘snake’ abajo [abáo] [aβaho] 2;5.14 ‘below’

(4) a. MG: Approximant substitutions for stops:

Target form Child Form Adult Target Age bicicleta [βisiket] [bisikleta] 2;4.5 ‘bicycle’

b. FC: Approximant substitutions for stops

Target form Child Form Adult Target Age

‘I turn (it) over’ [ðoilawelta] [doi lawelta] 2;6.3 voy a quitar ... [βoi a kita] [boi a kitaȎ] 2;6.3 ‘(I) am going to get rid of...’ [βaŋga] [baŋga] 2;2.22 banca ‘bench’

In the contexts where MG substituted the incorrect allophone, there was a notable asymmetry in terms of his substitution pattern. Stops were overwhelmingly substituted for approximants and not the other way around. The example provided in (4a) was the only token where he substituted an approximant for a stop, or indeed where stops were

104 not produced in a targetlike fashion. The alternative substitution happened 33 times. 17

This clearly skewed substitution pattern provides support for universal principles in

MG’s phonological development, favouring stops over approximants and stops in word initial or postpause position.

Interestingly, FC has a slightly higher rate of targetlike approximant production

(65%) than MG. This is notable because he is younger than MG and has a smaller vocabulary. Given his age and smaller lexicon, the expectation is that FC would substitute more stops for approximants and consequently have a lower accuracy rate. One possible explanation for this may lie in the specific lexical items FC produced. When analyzing frequency effects in phonology, researchers typically discuss two manners of calculating it: type frequency, or the number of individual lexical items and token frequency, or the total number of lexical items produced. FC produced 50 lexical types and 71 lexical tokens, for a type:token ratio of 0.7. MG had similar type:token ratios (141 lexical types/209 tokens, or .67). However, there was a marked difference in precisely how the stops and approximant targets were distributed among the tokens produced by the two children: 61% of the tokens produced by FC had approximant targets in them.

For MG, this total was only 41%. Thus, FC attempted proportionately more words with approximant targets, which afforded him more opportunities to practice the articulation of this allophone across fewer words. This could explain why FC demonstrates higher

17 For MG there were no other substitution patterns that occurred for the target segments. For FC there were other patterns, which are discussed below.

105 accuracy rates for approximant target words, in spite of his younger age, smaller lexicon and the relative difficulty of these segments as compared to stops.

There were examples of truncations in FC’s data that resulted in contexts for the stop allophone. These are presented in (5):

(5) FC: Stopapproximant substitutions in initial position as a result of truncation:

Target form Child Form Adult Target Age abuelo [béo] [aβuélo] 2:2.22 ‘grandfather’ abajo [bao] [aβaho] 2;2.22 ‘below’ caballo [baiȴo] [kaβao] 2;7.18 ‘horse’

In these examples, FC truncated the initial weak syllable, leaving one of the three target segments in wordonset position. The target sounds were originally in wordmedial position, the target for approximants, but after truncation was applied they became word initial and FC produced them as stops. Based upon this data, it appears that FC has generalized across positions and produces all wordinitial segments as stops.

In Inkelas and Rose’s (2008) study of E, the child passed through a stage during which he was neutralizing plosives in prosodically strong positions to velars and producing glides instead of laterals in these positions as well. E. produced velars in the onsets of primary and secondary stressed syllables and wordinitially, even in unstressed syllables, which the authors took as evidence that E had grammaticalized positional

106 hardening, or strengthening, in these positions.18 Grammaticalization of positional hardening requires abstract and presumably innate knowledge of syllable onsets in addition to what constitutes the ‘ideal’ syllable onset. It is possible that FC’s productions are also subject to a similar constraint, whereby target segments that are approximants are hardened to stops when they occur in strong, i.e., wordinitial position. However, in (6) there are examples which show this stopinitial generalization does not hold across the board:

(6) FC: Other substitutions:

Target form Child Form Adult Target Age verde [fede] [beȎde] 2;2.22 ‘green’ juguete [feke] [huγete] 2;2.22 ‘toy’ abajo [vako] [aβaxo] 2;2.22 ‘below’ arriba ‘above’ [aviva] [arɸiβa] 2;5.23 [viva] [vivía]

In the first example from (5), verde ‘green’, FC substitutes a voiceless labiodental

fricative for the target bilabial stop. He makes a similar substitution in the following

example, juguete, where the initial syllable is truncated and substituted by [f]. There are

two possible explanations for this substitution pattern in juguete. FC could be truncating

18 One problem with such an analysis is what type of evidence would serve to convince E that he had generalized incorrectly? That is, if the substitutions have been grammaticalized, what would subsequently drive the child to shift the constraints to a new ranking order?

107 and maintaining the [+cont] feature of the /h/ onset or, alternatively, he could be maintaining the [+cont] feature from the target velar approximant. It appears that the first explanation is more accurate, as the subsequent syllable has /k/ as the onset, which may result from fusion of the velar approximant and voiceless coronal.

In the next two examples, abajo and arriba, FC also produces a fricative in initial position. For abajo, FC truncates the initial unstressed syllable and maintains the [+cont] and voicing feature of the target approximant segment, which now occurs in the word onset. This contrasts with the previous token of the word from (3), where FC truncates and produces a voiced stop in initial position. For the target arriba, FC alternates between truncating and not truncating the initial unstressed syllable, but he substitutes the voiced fricative for the target trill in all tokens. The examples from (6) show that FC does not always respect the universal unmarked preference for stops in initial position. He produces target stop allophones as continuants in verde and abajo and is faithful to the

[+cont] feature in the initial segment of juguete. Finally, he harmonizes the place of articulation from the approximant target in arriba to the onset of the word, substituting the voiced labiodentals fricative in that position.

The data in (6) from FC shows that he is not grammaticalizing positional hardening and in fact appears to be producing fricatives and stops interchangeably. Thus, it appears that for certain lexical items, FC produces stops in initial position, as a result of truncation, while for others, with and without truncation, he reproduces a segment with the [+cont] feature, characteristic of the approximant allophone. This inconsistency suggest that he is either using a production strategy that is lexicallyspecific, or a result of

108 some other phonetic or phonological process. More data would be required to confidently draw a conclusion one way or another.

Nonetheless, based upon the examples from (4), (5) and (6), it appears that FC has a production pattern that fluctuates between the unmarked stop segment (as in (5)) in onset position and the more marked [+cont] segment (as in (6)). He may be drawing upon a distribution of the two allophones that can be characterized as in Figure 8. For ease of exposition, the segments are restricted to bilabials, but there could be a similar distributional overlap for the dental and velar allophones as well.

Figure 8. Schematized distribution of onset segments for FC

stops either stop/approx/fric approximant

[b] [b, β, v, f] [β]

[beo] [bao]/[vako] [kaβaiȴo]

[baiȴo] [aviva]/[viva] [aβahar]

Figure 8 shows how the distributions for FC may overlap in his phonetic space,

leading to inconsistent productions for the tokens which fall in the middle. According to

109 exemplarbased models of production (Pierrehumbert, 2003b), speakers select production targets from the mean of their distributions. In the case of FC, he has two distributions, which correspond to the stop and approximant allophones, and he also has a extensive region of overlap between the two allophones. On the occasions where FC pulls his production target from this overlapping region, his productions will fluctuate between the stop and the approximant at times during the same session and possibly for the same lexical target. On the occasions where he pulls his production target from one or the other nonoverlapping region, he will be more consistent.

4.4.1.1 [d]deletion

There is another pattern of alternation that occurs only with the dental allophone, that of /d/ deletion. In many varieties of Spanish, especially those spoken in the

Caribbean region, /d/ deletion is common. In these varieties, speakers commonly delete intervocalic /d/ in participles and other lexical items with the same ending, where the /d/ is found intervocalically (Hualde, 2005). For example, comida ‘food’ [koɑmiða] ~

[koɑmia]. This is characteristic of certain regional varieties of Colombian Spanish, including the variety being acquired by FC and MG. In this section data from each child showing examples of /d/ deletion in expected contexts is presented. Unlike /b/ and /g/, the input received by the child for the dental target includes stops, approximants and complete deletion. In (7) is data from M in which he deletes /d/.

110

(7) MG /d/ deletion

Target form Child Form (Standard) Age Adult Target puedo [puéo] [puéðo] 2;4.21 ‘(I) can’ al lado [aláo] [al laðo] 2;6.3 ‘next to’ helado [elao] [elaðo] 2;6.3 ‘ice cream’ comida [komía] [komiða] 2;7.7 ‘food’

While MG produces helado ‘ice cream’ with the expected /d/ deletion he produces it a second time without it and in fact produces a stop instead of the expected deletion. As well, there are examples of two other words that do not undergo /d/ deletion:

(8) a. MG no /d/ deletion

Target form Child Form (Standard) Adult Age Target helado [elado] [elaðo] 2;8.21 ‘ice cream’

revolcado [puepokáðo] [Ȏeβolkaðo] 2;8.21

‘overturned’

espada [espada] [espaða] 2;10.1

‘sword’

111

In the case of revolcado, MG produces an approximant and in the case of espada, he produces a stop. The fluctuation between deletionapproximantstop is not found in FC’s production. He only produced three words with the correct context for /d/ deletion, and all were produced with the expected target form:

b. FC /d/ deletion

Target form Child Form (Standard) Adult Age Target no, nada [no naa] [no naða] 2;4.22 ‘no, nothing’ helado [eláo] [elaðo] 2;7.18 ‘ice cream’ vestido [bestío] [bestiðo] 2;7.18 ‘dress’

The fact that the children are producing /d/ deletion as expected in some but not all cases suggests that MG and FC are not applying an acrosstheboard rule. If they were applying a rule, all words would exhibit the same /d/deletion effects, independent of the individual lexical items themselves. In other words, such a rule would affect all possible candidates for its application in an equal fashion.

As shown in the data presented in (2) – (8), MG and FC demonstrate inconsistencies in terms of their allophone production that suggests they are not using a rulebased approach. The irregular production of the approximant across different lexical items suggests that there are more gradient factors at play.

112

4.4.2 General discussion of the data: phonetic groundedness

According to the data presented above, it might be possible that phonetic groundedness is driving the emergence of the approximant allophone. Specifically, as discussed, the Aerodynamic Voicing Constraint (AVC) can account for why children eventually produce approximants in their favoured context. However, the AVC cannot necessarily explain why these effects do not emerge immediately, once the child has mastered the articulatory precision required to execute the gesture itself. In order to account for this, the AVC must be complemented by a model of acquisition that can incorporate effects for motor learning, or practice effects. Motor learning occurs when the speaker has practiced and repeated the required articulatory gestures sufficiently often and developed the motor skills necessary to execute it. Thus, some sort of frequency effect must also play a role.

Frequency effects have played an important role in debates over the motivating force behind finegrained adjustments in articulation. As Munson, Edwards and Beckman

(2009) state, children could repeat a sequence of sounds that occurs in many words by relying upon the articulatory and acoustic representations that they have accumulated over the course of their languagelearning. Sequences that are more frequent will be easier to draw upon because they will be more accessible. In probabilistic terms, highly frequent items will be represented in distributional peaks over the phonetic space and it is more likely they will be pulled out when articulatory planning is taking place. On the other hand, sounds that occur relatively infrequently cannot be drawn upon as efficiently.

113

A similar argument has been put forth by Ouedeyer (2005, 2006), who states that automatization allows articulations to become more energyefficient with practice.

A key assumption to this explanation is that motor skill development plays an important role in phonological development, which in turn is based in production frequency. In the present case, this means that approximant segments that are produced more frequently afford the child more opportunities to practice the articulatory gestures required for its execution and therefore should be more accurate. In order to test the prediction that motor skill development is driving the emergence of approximants in MG and FC’s productions, we require a comparison between accuracy rates and production frequency for each child. Alignment between production frequency and accuracy rates would suggest that practice, or motor skill development, plays a key role in the emergence of allophones.

This information is presented in Figure 9, which depicts the target (the number of words with the target sound attempted by the child, divided by the total number of words in the corpus) and proportion accurate (the number of target words produced accurately divided by the total number of words attempted). For example, MG produced 30 words with /b/initial as the target, out of a corpus that included 209 words. This gave a ‘Target’ proportion of 0.14. Of those 30 target words, 24 were produced accurately (i.e., as stops), giving a ‘Proportion Accurate’ score of 0.8.

114

Figure 9. Proportion attempted and accurate across target sounds

1 0.80 0.79 0.73 0.78 0.8 0.59 0.6 0.4 0.33 0.26 0.21 0.14 0.14 0.2 0.11 0.07 0 b initial b medial d initial d medial g initial g medial

MTarget MProportion Accurate

Figure 9 shows that accuracy rates do not necessarily align with position, place of articulation or frequency of production. There is no discernable pattern in MG or FC’s productions for accuracy rates across position or place of articulation. For MG, bmedial position had the highest proportion of words attempted (.33) but the lowest accuracy proportion (.26) across the six position/place of articulation combinations. For FC, the highest proportion of words attempted was also for bmedial (.39) but he had a low

accuracy rate for these targets (.61). The highest accuracy rates for both children occurred

with /b/initial targets, although the proportion of words with this target was relatively low for both children. Based upon this, it appears that no relation exists between the

115 number of words attempted and the actual proportion of target segments that were produced accurately. Therefore, a linear relationship between motor learning and AVC effects cannot hold as an explanation for the patterns observed here.

Another way of considering the data presented above is by looking for possible effects for individual lexical items. It is possible that by considering individual words a pattern in allophone accuracy rates may emerge. It has often been noted that higher frequency lexical items show a higher probability, or greater degree, of phonetic lenition.

The influence of lexical frequency on lenited productions suggests that articulation can be influenced by details stored in longterm memory (Pierrehumbert, 2001a,b, 2002,

2003ab; Bybee, 2001, 2002; Beckman & Pierrehumbert, 2003). That is, the development of phonetic and phonological knowledge is connected to the development of the lexicon.

Individual words that occur frequently in the input (or output) will be more likely candidates for lenition because the child has had more opportunities to develop the motor skills necessary to produce the approximant.

In order to determine what role (if any) the lexicon may play in the acquisition of the stopapproximant alternation in Spanish, we require data showing what type of input children receive. To this end, the following section addresses how the target sounds are distributed in the input received by young Spanishspeaking children. I used a corpus of child directed speech and child productions, taken from the CHILDES database of L1

Spanish speaking children. The goal was to determine what type of input children are exposed to when acquiring Spanish as a native language and whether patterns in this input are reflected in child productions. If so, this would provide support for lexically based acquisition patterns.

116

4.5 Corpus II: CHILDES database

In this section of the chapter, the nature of the input Spanishlearning children receive over the course of language learning is examined. In order to do so, data was combined to form two corpora, taken from Spanish child language databases found in

CHILDES. The first corpus consisted of child productions (CCorpus) and the other consisted of childdirected speech (CDSCorpus). The corpora were taken from a combination of eight subcorpora of Spanishlearning children (three in Latin America, five in Spain), in which the children ranged in age from ten months to five years of age.

In what follows, I present a statistical description of the two corpora and comparative data to examine a) positional segmental frequencies to see if there are correlations among them and b) whether there is a correlation between the emergence and subsequent frequency of approximants and overall word frequency across the input received by MG and FC.

4.5.1 Description of the C and CDS Corpora

All words with the target segments in either initial or medial position were first extracted from the relevant CHILDES databases. In order to minimize possible effects for variability among the subcorpora, only the top 50% of all tokens were included. The C

Corpus had a total of 12 291 tokens and 492 lexical types and the CDS Corpus had a total of 60 842 tokens and 2210 lexical types. Following this, the positional frequency for each

117 sound was calculated. Positional segment frequency is the likelihood of occurrence of a given sound in a given position. It is calculated by summing the log frequency of the words in the corpus containing the target sound in the target word position divided by the sum of the log frequency of the total number of words containing any segment in the target word position in the corpus (Storkel, 2004). For our purposes, this meant taking the log frequency counts for all the words with, for example, /b/ in medial position and dividing it by the log frequency counts for all the words with singleton /b d g/ in either position. All inflected forms were counted as separate lexical items. For example, gato and gatos were counted as separate tokens, as were forms such as estaba (he/she/it was) and estaban (they were). This way the token frequency counts faithfully reflect the total number of times children hear certain segments in the input. All proper nouns from the database were eliminated, given that these may bias individual token statistics in favour of the child’s name or other idiosyncratic items. Finally, where the CCorpus transcription reflected errors such as cluster simplification, I took the target word as the correct form, not the actual production. For example, if the child produced [gan̪ de] instead of [grande], I counted the production as a cluster and it was not calculated as part of the total number of tokens. In the case of /d/ deletion, where noted in the transcriptions, I considered deletion as the target and any deviation as nontargetlike.

The C Corpus is related to the production of the target segments while the CDS

Corpus represents the input received by the child. It is necessary to acknowledge that the

CCorpus represents an idealized version of what children are actually producing – the databases considered here did not provide allophonic transcriptions. Thus, the tokens from the CCorpus represent the number of potential contexts for the stopapproximant

118 alternation, not necessarily the number that were actually produced by the children who participated in the database creation. The CDS Corpus, on the other hand, represents what the children receive as input to their language acquisition process. Because this input was provided by adult native speakers of Spanish, it is assumed that the tokens follow the expected distributional patterns for each allophone. Thus, the two corpora represent the token frequencies for what Spanishspeaking children produce and what they hear.

As shown above in Figure 2, the accuracy and target production proportions do not align directly for either child, neither in terms of position nor in terms of place of articulation. It was argued that this provided evidence against a strictly articulatorily based explanation. By turning now to a comparison between what MG and FC are receiving as input (the CDS Corpus) and what Spanishspeaking children generally produce, it is hoped that we can obtain an idea of whether their production patterns follow any types of generalizations that occur in the input or across production patterns.

4.5.2 Frequency data for C and CDS Corpora

In order to see whether the CDS and C corpora shared similar characteristics, a correlational analysis of the positional segment frequency values for the two databases was carried out. Strong correlations exist between the positional segment frequency counts for the two databases (r = .821, p<0.05), indicating that children are producing similar rates of /b d g/ as they hear in the input. Figure 10 illustrates this correlation:

119

Figure 10. C and CDS Positional frequencies correlation

Based upon these results it is safe to assume that the two databases share similar frequency distributions for segments of interest to this study, albeit not necessarily the

same lexical items. Because this is based on token counts, it is possible that MG and FC are producing many fewer tokens than those which actually occurred in the two

databases. Table 10 presents the positional frequency values for each consonantposition

combination:

Table 10. Positional log frequency CDS C

b initial .91 .82

b medial .85 .81

d initial .84 .85

d medial .82 .82

g initial .71 .74

g medial .76 .8

120

A nonparametric test was carried out to determine if there were significant differences among the ranks of positional segment frequency values. According to the

MannWhitney test, there were no significant differences between the two corpora on this variable (U= 15, p>0.05), further illustrating that the two corpora are similar in terms of the positional distribution of the target segments.

Next, a comparison between the positional frequencies for MG and FC’s production of the target segments and the positional frequencies found in the two corpora was conducted. Because the sizes of the MG and FC corpora are several magnitudes smaller than the C and CDS corpora, it was necessary to use the Log Likelihood (LL) statistic to compare across corpora (Rayson and Garside, 2000). Log Likelihood statistics are based upon the expected and observed frequencies and therefore can be adapted to corpora of widely differing sizes. If the Log Likelihood (LL) frequencies are positive, then the reference corpus has greaterthanexpected frequencies than the comparison corpora. The opposite holds if the LL is negative. Using the recorded corpus of MG and

FC as the reference corpus, the LL numbers comparing them should be around zero, or at least not significantly different if MG and FC’s accuracy rates directly reflect the input they receive. Recall that the goal is to see if there is a relationship between the input MG and FC receive, in the form of an idealized corpus, and their accurate productions of the allophones, particularly the approximant targets.

The first set of comparisons is presented in Figure 4. It depicts Log Likelihood by position for each child, compared to the C and CDS corpora. Figure 2 above showed that little, if any, relationship exists between accuracy rates and the number of words attempted, suggesting that motor skill development is not driving the emergence of

121 approximants in a linear fashion. It is possible, however, that the children’s accuracy

reflects the rate at which the alternants occur in the input. This would explain why the higher frequency targets were not produced more accurately. If motor practice for each

target were the primary driving force, then more frequently produced targets should be more accurate. This was not found to hold. It is possible that accuracy rates reflect the frequency of the input received, rather than the actual production by the child.

Figure 10. Log Likelihood Position Accuracy19

All ps>0.01

Based upon a significance level of p<0.01, and a critical value of 6.63, the results from Figure 4 show that the Log Likelihood of MG and FCs accuracy rates did not significantly differ from the rates of positional occurrence for the three target segments

(all ps >0.01). These results suggest that both children are producing the target positions

19 Following Rayson and Garside (2000), significance of the LL values can be interpreted as follows: 95th percentile (5% level), p < 0.05, critical value = 3.84; 99th percentile (1% level), p < 0.01, critical value = 6.63; 99.9th percentile (0.1% level), p < 0.001, critical value = 10.83; 99.99th percentile, (0.01% level) p < 0.0001, critical value = 15.13.

122 at accuracy rates that correspond to the rates at which other children produce the target segments. Furthermore, these results show that FC and MG’s accuracy rates also reflect the input. The fact that MG and FC’s accuracy rates reflect the input they receive may be a result of two things. On the one hand, it could mean that the children are repeating precisely what they hear and faithfully reproducing the targetlanguage positional occurrence rates, albeit it potentially as a result of abstraction. On the other hand, it could be the result of an asymmetry in the accuracy rates across places of articulation. For example, the children could be producing all their bmedial segments at higher accuracy rates than the other two segments, or producing more words with bsegments than the

CDS or Ccorpus contain, thereby skewing the overall results.

In order to investigate this, Figures 12 and 13 depict the Log Likelihood values for each child compared to each corpus, across the three places of articulation.

Differences across places of articulation for the Log Likelihood statistic would suggest

that children are not merely reproducing what they hear in the input and generalizing across place of articulation.

Figure 12. MG Log Likelihood Place of Articulation Accuracy

123

Based upon a significance level of p<0.01, and a critical value of 6.63, the Log

Likelihood values for MG’s accuracy rates and the C Corpus were significantly different only for the bmedial target, where MG had higher expected rates of production than occurred in the Ccorpus. This suggests that MG’s accuracy rates on the six target sounds is similar to the rate at which Spanish children produce these sounds. In other words, the distribution of MG’s accuracy rates and the distribution of these target sounds found in the words produced by children in the CCorpus are very similar. For the input, however, the picture is slightly different. MG’s accuracy rates were significantly different from those of the CDS Corpus in four of the six target contexts. Only the dinitial and g medial targets did not reach significance, suggesting that MG’s accuracy does not directly align with the input received. However, the direction of this difference is also important. In only the dmedial and ginitial contexts did MG’s accuracy rates reflect a significantly lower LL than the CDS corpus (p<0.001). This means that MG produced a lower rate of expected accuracy for these two segments only. The binitial and bmedial segments actually demonstrated higher accuracy rates than those predicted by the corpus.

Figure 13 illustrates FC’s Log Likelihood values for place of articulation accuracy. FC’s accuracy rates differed significantly (p>0.01) from the CCorpus for the bmedial and dinitial targets. For the CDS corpus, FC’s accuracy rates reached significance for the bmedial targets only. FC’s accuracy rates align more closely with the

CDS corpus than do MG’s. It is hard to draw any firm conclusions regarding this, however, because of the relatively few tokens he produced overall.

124

Figure 13. FC Log Likelihood Place of Articulation Accuracy

10.8 5.07 6.8 6.9 0.31 0.18 1.16

0.33 1.17 6.38 10.9 15.95 F AccurateC Corpus F AccurateCDS Corpus

b initial b medial d initial d medial g initial g medial

Together, the results from this comparison indicate closer alignment between MG and FC’s accuracy rates and the CCorpus than with the CDS Corpus. This suggests that both children are producing tokens of targetlike segments at rates that are similar to those of child Spanish in general.

The LL values presented so far suggest an alignment with position but not for place of articulation. One explanation for these results may lie in the type of frequencies calculated. Token frequencies reflect the number of times the child produced a word containing the target segment, which means it is possible that each child only produced a minimal number of word types accurately, but these skewed the results in favour of a nonsignificant likelihood result. In other words, the lexical items that MG and FC produced accurately could be highly frequent across their own productions and lead to favourable comparisons with the two corpora at abstract levels. In order to tease apart these possible effects, Figure 14 portrays the log frequency counts from the C and CDS

Corpus for the lexical items that were produced with approximants. If lexical frequency

125 plays a role in the emergence of the alternation, when approximants do emerge in children’s productions, they should occur in higher frequency words.

Figure 14. Log10 frequency counts for words realized with and without approximants in medial position

2.5 1.91 1.94 2 1.81 1.79 1.51 1.33 1.5 1.18 1.13 1.02 1 0.74 0.5 0.32

lexical items items lexical 0 0 b medial d medial g medial b medial d medial g medial Average Log10 frequency frequency for Log10 Average

Accurate productions Not Accurate productions

As can be seen from these results, the average log10 frequencies for the lexical items realized as targetlike, that is with an approximant in wordmedial positions, were consistently higher than the frequencies for items realized with stops in word medial position (see Appendix B for the specific lexical items and their log frequencies). This holds for both children and across all target segments. These results confirm the hypothesis that frequent items are more likely to have approximants than less frequent items.

4.6 Discussion and Conclusions

The evidence presented in this section indicates that lexical frequency plays a role in the emergence of the approximant allophones. MG and FC produced the approximant

126 allophone in words that were more frequent in the input and output. Frequency effects for individual lexical items appears to be driving the emergence of the approximant allophone, rather than positional or place of articulation abstraction. There is ample evidence from research studies that lexical frequency and sublexical frequencies affect both word recognition and production. Strings that are more frequent in the lexicon are accessed more rapidly and produced with more lenited forms than those which are less frequent. In the present context, we are interested in what this might mean for the nature of the phonological system and the types of representations that result.

I propose that the children are storing phonetic details in their representations (and therefore their productions are sensitive to frequency) but not all the stored information is available at all times, due to filters such as task effects and natural biases. Thus, words that are more frequent will be more easily accessed in terms of articulatory and acoustic routines than words which are less frequent. The children may be abstracting across position in their production of the allophone alternation – the data presented here does not refute that possibility – however, even if they are performing such abstractions, their representions must also be encoding token frequency information, and frequency ultimately affects the emergence of a more universally marked allophone, but less contextually marked allophone.

To conclude, this chapter provided evidence in favour of a phonological system that is biased by certain universal factors and yet also capable of tracking language specific frequency information. In order to account for these results, a framework is required that can incorporate universal biases such as those shown by the preference for stops in word onset position and the overwhelming substitution of stops for approximants

127 and also account for lexical frequency effects in the emergence of the approximant allophones. The PRIMIR framework, which I have addressed briefly in previous chapters can easily account for both findings.

128

Chapter Five: General Discussion and Concluding Remarks

5.1 Introduction: Summary of the dissertation results

The objective of this dissertation was to explore the nature of the phonological system and the type of representations that emerge from it. The evidence presented from adult L2 and child L1 learners suggests that the phonological system is sensitive to phonetic details and tracks distributional information in the input. Furthermore, the results indicate that phonological system operates across exemplartype representations that store these phonetic details and distributionbased information. Experience with the target language was shown to be a determining factor in how phonetic details are accessed and subsequently used in perception and production, with lowerlevel learners accessing representations that are less robust in nature than those of more experienced learners.

Chapter 2 explored the way in which the phonological environment drives the perception of allophones by L2 adult learners. The phonological environment can include sounds directly adjacent to the sound itself, sounds which occur at a predetermined distance from it, as well as the prosodic structure that directly contains the sound, such as the syllable, the foot or the prosodic word (Hall, 2009). The two experiments presented in

Chapter 2 examined how Spanish proficiency affected listeners’ sensitivity to the association between stress and allophone onset. More experienced learners were predicted to associate the stop allophone in syllable onset position with stressed syllables,

and approximants in syllable onset position with unstressed syllables, consistent with the

129 distributional pattern found in Spanish. The results from Experiment 1 showed that listeners with more Spanish experience perceived stress with greater likelihood on syllables that initiated with a stop than on those which initiated with an approximant. In

Experiment 2, the allophone onset alternated and the vowel was held steady with respect to prominence. Listeners with more Spanish experience were predicted to perceive an illusory stress effect for syllables that initiated with a stop and perceive no such effect for syllables that initiated with an approximants. Listeners with little or no Spanish experience were predicted to perceive stress with more or less equal probability on either syllable. The results from this experiment showed that the presence of a stop allophone did indeed alter stress perception in listeners with more Spanish experience, suggesting that these listeners are hearing stress as distributionallyassociated with a particular onset allophone. Furthermore, these results indicate that learners track the distribution of the allophones and over time, they can begin to learn the relationship between the allophones and the contexts in which they surface.

Chapter 3 included production data from adult L1 English/L2 Spanish learners for the stopapproximant alternation. Learner productions were analyzed for two cues – consonant intensity and the presence of a release burst – and how the production of these cues changed according to the context in which the target segment occurred. The results showed that experience with Spanish played a decisive role in how the cueproduction interacted with context. As a first approximation, it seemed that learners were qualitatively changing their representations as they gained experience with the target language – moving from a phonological system that was sensitive to position generalizations but not place of articulation generalizations. Place of articulation

130 differences emerged only in the productions of the more advanced learners. However, this assumption requires a shift in the phonological system itself, to one that could encode phonetic details particular to place of articulation information. Instead, it was argued that learners do store all input information in their representations, but proficiency and language experience will determine whether this information is accessible at the moment of production. On this view, the phonological system and the representations it creates are consistent and no qualitative shift is required as the learner becomes more proficient.

To round out the picture of allophone acquisition, Chapter 4 presented data from child production of the stopapproximant alternation that revealed lexical frequency plays a role in the emergence of the approximant allophone. Data from two L1Spanish speaking children was analyzed and for both children, where substitution errors occurred, they were overwhelmingly biased in favour of the stops, reflecting a universal preference for stops in early phonological production. However, when approximants were produced, they occurred in words with high token frequency count according to a child directed/child production speech corpus and also the Spanish version of the CDI. Thus, universal and languagespecific pressures drive L1Spanishlearning children to produce approximants in words that are more frequent in their input and output, in spite of their relative phonetic complexity. In terms of the phonological system itself, these results support a phonology that can track frequency and store it in representations.

The question that remains is what type of framework might best account for the data presented here? In order to account for the perception data presented in Chapter 2, we require a framework that can explain how learners begin to associate particular allophones with the conditioning factor of stress. Such a framework must also account for

131 the data presented in Chapter 3, related to how contextual factors differentially affect learners of distinct proficiency levels and how phonetic details emerge in the production of more advanced learners. Finally, assuming that children do not have qualitatively different representations from adults, we require a framework that can account for how universal biases and languagespecific frequency interact and drive emergent acquisition patterns. The PRIMIR framework (Processing Rich Information from Multidimensional

Interactive Representations, Werker & Curtin, 2005; Curtin et al, under review) provides a developmentally oriented account of how this might be possible.

5.2 PRIMIR

PRIMIR is grounded in two observations: first, rich information is available in the speech stream and second, the listener filters that information (Werker & Curtin, 2005).

PRIMIR offers an explanation for why some information is available at certain stages of development and not at others, thereby offering a comprehensive explanation for developmental patterns, task effects and attentional effects. One of the ways in which

PRIMIR achieves this is by positing the existence of three dynamic filters, which serve to enhance the raw acoustic saliency of information in the input. Filters can also diminish and/or transform information in the input.

The first filter infants bring to the speech learning task is the result of certain evolutionary and epigeneticallybased biases. Such biases act as a filter on the linguistic input received by the infant. For example, infants prefer speech to nonspeech, prefer infantdirected speech, point vowels and can process rhythmic patterns in speech (Werker

132

& Curtin, 2005:212). The second filter that operates on the infant’s language learning is the developmental level. Younger infants will necessarily have fewer cognitive resources upon which to draw when processing language. Finally, the task itself constitutes the third filter and directs the infant’s attention to certain aspects of the input over others.

Information that has been passed through the filters will be stored on three planes, which are multidimensional and interactive. In PRIMIR, Werker and Curtin propose that infants first organize sound structure on a prelexical dimension called the General

Perceptual Plane, the outcome of which is languagespecific phonetic categories. The

General Perceptual Plane stores phonetic and indexical information. These representations interact with the Word Form Plane, which stores soundsequence exemplars that have been extracted from the speech signal (Curtin, ByersHeinlein &

Werker, under review: 4). Initially, there is no meaning attached to these wordforms. As wordforms become linked in the representational space, the Phoneme plane begins to emerge. Information summarized across the other two Planes is stored here. Development of the Planes is not consecutive. They are interactive and information is accessible on any of the Planes at any point in time. The accessibility of information at any of the interactive planes will depend upon the dynamic filters – the task, developmental level of the learners and initial biases. Crucially, if information has been shown to be accessible at one point in development, it should be available at all following stages of development, under the correct task demands (Werker & Curtin, 2005:28).

Representations in PRIMIR are exemplarbased. Models which use exemplar based representations assume that human knowledge is formed from examples. Based upon these examples, we make inferences regarding whether new examples of objects we

133 encounter share the properties of things we have previously encountered. Exemplarbased representations emerge from stored examples in our memory and allow us to make future judgments (Nosofsky, 1986), based in some sort of metric that allows the listener to determine similarity between incoming exemplars and those which have been stored.

While all information is stored in detailed, rich, exemplar representations, not all information is available to the learner under all conditions, due to the interaction of the three dynamic filters that operate to limit learners’ access to the stored information.

Collectively, they serve to highlight certain information in the input and modify or even prevent other information from being accessed by the learner. For example, it has been shown that task effects are active in infant speech acquisition infants will use certain information at particular developmental stages and under certain task demands and ignore other information under different circumstances (see Fennell & Werker, 2003; Werker,

Fennell, Corcoran & Stager, 2002).

In PRIMIR, clusters of exemplars which share similar characteristics are stored and generalizations across these clusters eventually lead to the formation of phoneme categories, which are represented in the Phonemic Plane. Representations are context sensitive – for example, segments in wordinitial position will cluster with similar positionally occurring segments that share similar phonetic or contextual characteristics.

Context sensitivity to wordonset position has been demonstrated in infants as young as 9 months (Jusczyk, Goodman & Bauman, 1999) and Zamuner (2006) found that 10month olds could not discriminate wordfinal contrasts that they could discriminate when in wordinitial position.

134

The mechanism used by infants in acquiring the sound system and incipient lexicon of their target language is statistical in nature. The statistical learning mechanism interacts with the filters to ensure that only linguistically possible forms are acquired.

More precisely, because the system is fully interactive, previous learning influences the processing of incoming information, which in turn influences the emergence of representations (Curtin et al., under review).

PRIMIR has recently been extended to include a framework for bilingual and multilingual development (Curtin et al., under review). The same assumptions hold with respect to the multidimensional planes and the dynamic filters but the authors make two additional postulations. First, the statistical learning mechanism is complemented by an additional mechanism that allows the bilingual to maintain separate languages. Second, the task demands on the bilingual infant will be distinct from those placed upon the monolingual. This will lead to different effects on the way in which tasks are carried out, due to potentially different proficiency in each language and the general cognitive and linguistic development of the infant (Curtin et. al., under review).

These first of these assumptions is based upon evidence that infants as young as 4 months can use speech rhythm to guide the information they track in the input, allowing them to keep particular statistical information associated with one rhythmic pattern as separate from that associated with a distinct rhythmic pattern (Nazzi & Ramus, 2003).

This initial bias has been shown to hold for bilingual neonates as well, who show a preference for both languages to which they were exposed prior to birth. At four months of age, bilingual infants can distinguish their two languages from each other, even if the two languages are closely related. In experiments conducted with SpanishCatalan

135 bilingual infants, Bosch and SebastiánGallés (2001) demonstrated that infants at this age were able to discriminate their two languages. Together, these studies show that bilingual infants share the same initial biases as their monolingual peers and crucially, keep their languages separate from birth.

Evidence for task effects in bilingual infants has also been shown. When bilingual

SpanishCatalan infants are given a task where they must discriminate between one of their native languages and an unfamiliar language, they take longer than monolingual infants to orient towards the native language (Bosch & Sebatián Gallés, 1997). The authors concluded that the bilingual infants needed to decide which language they were hearing before orienting towards it, something the monolinguals did not need to do. Thus, as Curtin et al. (under review) state, the task demands may be different for the bilingual infant even when the same task is used. This evidence supports the hypothesis that infants can track statistical information for each different language and maintain them as separate across the distinct language inputs.

The next question is how the bilingual infant organizes the input on the multidimensional planes. In order to support this difficult learning task, Curtin et. al.

(under review) propose that bilingual infants make use of an additional learning mechanism that can compare and contrast the language input in order to guarantee that discrimination and subsequent separation can occur. Such a comparison and contrast mechanism would allow listeners to group similar types of information together and keep disparate statistical information separate. Moreover, such a mechanism would allow for languagespecific sound clusters to emerge and eventually, the formation of language specific phonetic categories. As experience increases, languagespecific representations

136 are created on the distinct planes. They further suggest that ‘nested’ tracking of information may also be possible, whereby infants track phonetic and rhythmic information simultaneously. This opens the possibility that infants can track statistical information across domains, a finding that is supported by research carried out by Yeung and Werker (2009)

PRIMIR was developed to account for how infants understand and process speech. Specifically, the framework provides a coherent manner of conceptualizing how infants approach the speech learning task (initial biases) and how the task itself and the infant’s developmental level interact with the statistical learning mechanism. The bilingual extension to PRIMIR assumes the same filters operate for bilingual or multilingual native language acquisition and also assumes that a comparison and contrast mechanism helps the infant in tracking two separate sets of statistics across the input. We now turn to another extension of the PRIMIR framework, that of PRIMIRL2.

5.3 PRIMIRL2

As stated, a fundamental assumption of PRIMIR is that three dynamic filters operate on the input to determine what information is available to the infant over the course of acquisition. For the adult second language learner, I propose that in addition to the epigenetic biases, task effects and developmental level, learning is also subject to L1 filter effects. Consistent with the operation of the other filters, L1 filter effects are dynamic and interact with the three Planes over the course of acquisition. The target language input passes through the four filters (natural biases, task effects, developmental

137 level and L1), which serve to direct the learner’s attention in terms of information pick up. The filters direct learner attention to certain elements of the input and away from others and crucially, determines what information adult L2 learners can actually use when carrying out perception and production tasks in their target language. Moreover, there must be some sort of mechanism that allows learners who are acquiring a second language to keep their linguistic input separate and permit the tracking of distributional information across two sets of input statistics. Thus, I propose that adult L2 learners also use a similar comparison and contrast mechanism as that proposed by Curtin et al. (under revision) for infant bilingual speech development. I will treat each of these points in turn.

Evidence for natural biases in L2 phonological acquisition is abundant. In typologicallybased approaches to markedness, it has been shown that L2 learners are more likely to acquire unmarked structures earlier and easier than more marked structures, formalized in Eckman’s (1977) Markedness Differential Hypothesis (see also

Eckman & Iverson, 1993). In constraintbased approaches, these effects are generally conceptualized as the emergence of unmarked structures that are not present in either the native or target language. Evidence for this comes from Broselow, Chen and Wang

(1998), who looked at data from 10 Mandarinspeakers learning English and analyzed their productions of final devoicing. Mandarin does not allow CVCstop syllables. English, on the other hand, allows this syllable type and many others. Mandarin speakers resorted to epenthesis, deletion and devoicing in their productions of English codas, a strategy which does not occur in Mandarin or in English. Broselow et al. (1998) state that the grammar used by these speakers has a highly ranked markedness constraint that prevents outputs with voiced codas from emerging. The results from these studies indicates that L2

138 phonological acquisition can be affected by the native language phonology and its interaction with the target language structures, as well as factors that are grounded in phonetic and phonological universals.

The prediction in PRIMIR (Werker & Curtin, 2005) that learner performance will depend upon task demands is also supported in the literature on L2 phonological acquisition. For example, Zampini (1994) also found task effects, related to orthography.

Spanish uses the orthographic symbols v and b to represent the phoneme category /b/ the words bato and vato would be both pronounced as [bato] by a Spanish speaker. The

L1 English/L2 Spanish speakers in Zampini’s (1994) study did not suppress their pronunciation of the symbol v as [v], showing L1 transfer effects. The error occurred more frequently during the reading task than the conversational task, providing further evidence that orthography and knowledge of L1 influenced pronunciation. Additionally, effects have been found for task formality. Learners produce more stops when speaking formally than when speaking informally (Zampini, 1996; Shea, 2005).

The L1 filter has perhaps the most important effect on what type of information

L2 learners will pick up and be able to use in their perception and production of the target language. The native language determines which cues to new categories can be perceived by learners and how these cues will be categorized. In the context of adult L2 acquisition, the L1 filter will prevent all statistics from being usable. Adult learners cannot necessarily make use of all the contextual and category cues that are objectively available in the input, which creates a different learning situation from that of the infant. 20

20 This point is also similar to Best’s Perceptual Assimilation Model (PAM, 1995).

139

Research has clearly demonstrated that the L1 filter can prevent L2 learners from perceiving certain information in the target language input. Investigations using L2 speech perception models such as the PAM (Best, 1995) test naive listeners’ perception of an unfamiliar language and results have consistently shown that where two sounds are very similar, the listener will not perceive the foreign language sound as distinct from the native language category. Flege’s SLM (Flege, 1995) calls this ‘equivalence classification’. Both models assume that certain information in the input will not be perceived by L2 learners, because it is subsumed into L1 native categories. For example, in the case of vowels, it has been shown that beginner level L1 Spanish speakers cannot perceive the lax front English vowel /I/ and mostly perceive it as the high front vowel /i/

(Escudero, 2005; Morrison, 2005). As experience with English increases, some can eventually learn to perceive the new vowel category of their target language. The SLM and PAM base their predictions upon phonological and phonetic, i.e., noncontrastive phonetic similarities and dissimilarities between L1 and nonnative/L2 phones. In other words, both models explicitly recognize that noncontrastive details contribute to goodnessoffit and classification of nonnative sounds.

A recent extension to PAM, PAML2 (Best & Tyler, 2007) refocuses PAM’s assumptions on L2 learners, that is, learners who not naïve listeners and are consciously acquiring a second language. Within the category of second language learners, Best and

Tyler (2007) further distinguish between learners who are acquiring their second language in a naturalistic context, with input from native speakers of one dialect and those learning their second language in classroom contexts, where the environment outside the classroom is the L1. In this latter context, learners have little opportunity to

140 interact with native speakers and teachers may be from different dialects. In other words, foreign language learners have L2 exposure through formal instruction in a restricted setting with little or no interaction with native speakers outside of the teacher (Best &

Tyler, 2007:11). Furthermore, PAML2 assumes that the listener’s attention and goals will direct the pickup of information in the speech stream (i.e., distal gestures under

PAM’s approach, not acoustic/phonetic categories). According to this model, information relevant to L2 speech learning is not merely phonetic but also phonological and gestural

(Best & Tyler, 2007:20). Thus, similar to PRIMIRL2, PAML2 recognizes that these different levels of attentional focus can lead to different types of learning across groups of learners and learning tasks. PAML2 differentiates between phonological categories, which serve to indicate minimal lexical differences – and phonetic categories, which are sublexical but nonetheless perceptible to listeners. Allophones fall into this latter category.

In PRIMIRL2 all learner representations are exemplarbased. Listeners store all information in detailed phonetic representations that have coalesced over the course of language learning. The comparison and contrast mechanism serves to keep the exemplar representations separate for each language, although especially for beginner learners there may be some overlap between the exemplars for each language.

The question that naturally arises from this is how learning might occur.

Specifically, how can information that is stored in learner representations but rendered inaccessible due to dynamic filter effects become available at a later point? In Chapters 2 and 3, I presented data showing learners only become aware of the contextual factors that drive the allophonic alternation as their experience with Spanish increases. How exactly

141 does the system evolve so that information stored in exemplar representations becomes accessible to learners? An additional question related more specifically to adult L2 acquisition is whether information that may have been stored in the incorrect cluster can be used to form its own category? Or do these exemplars remain in the incorrect cluster?

The four dynamic filters will determine what information is available at what point and under which conditions. Learning occurs when the filter shifts and renders information accessible. Thus, to answer the questions of how previously unavailable or incorrectly clustered information becomes accessible to the learner, it is necessary to examine precisely how the L1 filter directs information pickup. In what follows, I discuss how the L1 filter interacts with attention and perceptual learning in the specific context of adult second language learning. Specifically, adults have more attentional resources at their disposal than do infants acquiring language. This means that explicit instruction or negative feedback can potentially direct information pickup for adult learners and assist with overcoming L1 filter effects. This also means that different learning trajectories may result for individuals receiving explicit instruction and those learning more naturalistically.

The claim made here is not that all sounds are learned. Instead, the architecture and learning mechanisms that are proposed by PRIMIRL2 together mean that all sounds are learnable, but whether or not the individual succeeds depends upon myriad factors, prime among them the L1 filter effect. Moreover, the L2 learner’s final objective is not to acquire the sound system of the target language but rather to acquire a new linguistic system. The target language phonology represents a way into the lexicon and the grammar of the target language. Phonology, in its essence, serves as a means to acquire

142 the lexicon of the target language. Therefore, the overall target of acquisition is not learning sounds, but rather words. This will influence what is learned.

In the next section, I discuss the role of the L1 filter and how it directs information pickup over the course of adult L2 acquisition. Specifically, I address the role of attention and how it interacts with the two learning mechanisms proposed by

PRIMIR L2 (distributionbased learning/comparison and contrast). I will discuss how prior learning can interfere with attention and how explicit instruction or certain salient cues in the input can help the learner overcome this. The comparison and contrast mechanism allows learners to create clusters that accurately reflect the input they receive by categorizing the incoming information into similaritybased clusters. In this way learner representations always reflect the most recent input. I present evidence from studies that show learners do in fact shift their representations after brief exposure to different dialects. A final L1 filter effect I address is the role of orthography. The adult

L2 participants in the studies presented here were all literate, classroom learners of

Spanish. Given that the stop and approximant allophones share orthographic symbols, orthography might be another example of the L1 filter directing information classification in this case.

5.4 L1 filter

In the case of the Spanish stopapproximant allophones, I propose that L1

English/L2 Spanish learners create a new stop phoneme category that corresponds to the phonetic cues of Spanish and after sufficient learning has taken place, they create a

143 positionallydependent allophone that coexists with the Spanish stop category. In other words, the comparison and contrast mechanism will lead the L1 English/L2 Spanish learner to create a new category for the voiced stops in Spanish. The approximant allophone will emerge from this Spanish voiced stop category. Learners will track the acoustic cues to each category and the contextual information that cooccurs with them and eventually create the new approximant category.

This assumption is similar to one posited by Best and Tyler (2007) for PAML2.

Under this model, the L1 and L2 phonological categories exist in a similar phonological space, but if the phonetic details of these phonological categories are discriminable, the

L2 listener will be able to maintain them as separate realizations of one phonological category (p.23). As the phonetic distinction becomes more entrenched, the details of the

L1 and L2 phonemes shift, which can lead to shifts in native language phonological categories or, as in the case of L2 allophone acquisition, explain how an allophonic category might emerge from a unitary L2 phonological category which is phonologically similar to the native language category.

A fundamental assumption in PAML2 and SLM is that L2 phones can be assimilated into the L1 phonetic space. Under PRIMIRL2, this not predicted to occur.

Under assimilationbased models, it would be possible to have homophonous L2 words where the listener does not perceive a distinction. For example, Spanish speakers will perceive ‘ship’ and ‘sheep’ as the same words because they assimilate English /i/ and /I/ into the same native language category of /i/ (Escudero & Boersma, 2004; Morrison,

2008). This explanation poses a problem for precisely how learning occurs, in particular as learning relates to representational changes. What drives the L2 learner to suddenly

144 perceive a sound as something different and crucially, where do the representations come from that allow this new category to support speech learning? Under PRIMIRL2 this problem is solved by positing that learners will not assimilate L2 categories fully into L1 categories. Instead, the two languages will be maintained separate, but the L2 information will not necessarily be available under all conditions and for all levels of learners. It is possible that L2L1 sound category assimilation could be due to task effects. For example, L2 learners acquire sounds in the context of words, not in the form of isolated phonemes or allophones. Thus, when presented with a sound in isolation or semi isolation, learners may not have a sufficiently robust representation of that sound to support a task such as categorization or discrimination. In order to test this, it would be interesting to see if L2 learners are any better at discriminating sounds when they are in the context of words and presented by speakers who are familiar to them (i.e., to replicate indexical information effects). In this way, the task would reflect what learners actually store. Furthermore, task effects can also help adults with nonnative perception tasks.

Adults can make use of spelling and topdown contextual information to initiate awareness of new phonetic or phonological categories.

PRIMIRL2 shares the assumption with PAML2 that L2 learners are aware of which language they are listening to. The comparison and contrast mechanism maintains the languages separate, by means of tracking different distributional information and allowing the learner to build different representations for each language. Right from the moment adult L2 learners are exposed to the target language the distributionbased learning mechanism begins to operate on the input and store information in representations. This important assumption means that no qualitative shift occurs in the

145 representations created by learners over time. Evidence for this comes from Chapter 2, where adult L2 learners of Spanish appeared to be making generalizations across all places of articulation in their production of the stopapproximant alternation. Specific phonetic details for each place of articulation only emerged for more experienced learners. I argued that these learners are not qualitatively shifting their representations to suddenly include place of articulation details once they have reached a more advanced stage of acquisition. Instead, I suggest that at first, representations will be sparse and unable to support robust generalizations. Only as the learner gains experience with the target language will the nuanced production and perception patterns that characterize native speaker speech emerge. Part of the reason for this is the nature of adult second language classroom acquisition. As mentioned in Chapters 2 and 3, L1 English/L2

Spanish learners are explicitly taught about the stopapproximant alternation. Explicit knowledge of this rule is reflected in the low level learners’ productions: an acrossthe board application of the explicit rule they were taught in class. The ‘rule’ they were explicitly taught in the classroom is realized in a more nuanced fashion only once learners accumulate experience with Spanish.

Learning a second language in a classroom context also means acquiring it through reading and writing. Thus, orthographic symbols can play a strong topdown role in speech perception by L2 learners. The stopapproximant allophones in Spanish share the same orthographic symbols, also shared by the English system, which may lead L2

Spanish learners to maintain the two allophones united in one stop category in spite of the acoustic cues that distinguish them.

146

The L1 filter will direct attention to certain salient information in the input and the determination of saliency is dependent upon the task, developmental level and native language biases.21 The precise role and characterization of attention in second language acquisition theory has been hotly debated (see Schmidt 1990, 1995, 2001 and Carroll,

2006 for an opposing view). In general, research supports the idea that drawing the learner’s attention to certain information in the target language input can make learning more efficient and improve grammatical accuracy (Doughty & Williams, 1998), even if the precise mechanism and conditions for its operation have not been clearly identified

(Carroll, 2001).

The role of attention in distributionbased learning has had a somewhat shorter history, although it has been shown to play a role in noticing certain aspects of the input that otherwise pass below the attentional and therefore learning threshold, of the learner.

In a recent series of experiments, Toro, Sinnett and SotoFaraco (2005) investigated whether there is a role for attention in statisticallybased segmentation tasks. They conducted three experiments to determine the effects of passive listening vs. attentionally divided listening on artificial speech segmentation tasks. In Experiment 1, the first group passively listened to the input artificial speech stream while the other half, the divided attention group, listened to the artificial speech stream interspersed with random non

21 For example, it has been welldemonstrated in the L2 speech perception research that duration is a more salient cue to English vowel identity than spectral information for nonnative listeners. For L1 Spanish listeners (Morrison, 2002) and L1 Japanese listeners (Strange, AkahaneYamada, Kubo, Trent, Nishi & Jenkins, 1998) English vowel duration was consistently shown to be a more salient cue than spectral information, even though duration is not phonemic in any of the English varieties studied.

147 linguistic noises (e.g., car engine, door slamming). On a subsequent segmentation task, the divided attention group did significantly more poorly. In Experiment 2, the divided attention group had to perform a concurrent visual task, where they detected repeated pictures, and for Experiment 3, the divided attention group had to detect pitch changes in randomly selected syllables. Under all three conditions, the divided attention group performed significantly poorer on the segmentation task than the passive listening group.

Toro et al.’s (2005) results show that attentional resources play a strong role in guiding the statistical learning mechanism and can render it more or less efficient.

If, as argued in this dissertation, all information is stored in exemplar representations that begin to coalesce right from first exposure to the target language, the effects of the dynamic filters will determine what information is available to be drawn upon for perception and production. The way in which information in the speech stream is structured can help or hinder the attentional mechanisms from operating. Explicit instruction will also play a role in this, as will topdown lexical and contextual information. All these sources can help the learner direct attention to the parts of the input that remain nonsalient and thus potentially overcome the negative effects of the L1 filter.

5.5 ature of the phonological system: Evidence for distributionbased learning in an L2

In addition to the evidence presented here, there are abundant studies that show adult listeners are sensitive to shifts in distributional information. Results supporting distributionbased learning in native language allophone perception were shown by Maye

148

& Gerken (2001) and Peperkamp et al. (2002), which were discussed in the Introduction.

Briefly, Maye and Gerken (2001) showed that the perception of allophonic contrasts can be modified after exposure to tokens of the allophones with a certain statistical distribution. Peperkamp et al. showed that French L1 listeners performed poorer on an allophone discrimination task when the allophones were presented in context, in spite of the distributional information listeners were exposed to.

Merely tracking a distribution is not sufficient to ensure the learning of an allophonic relationship, however. In addition, the mechanism must track the context in which the sound surfaces. There is abundant evidence showing that L2 learners’ categorization of nonnative sounds is affected by the context in which these sounds occur, which means that learners must be storing this information at some level and drawing upon it for discrimination and categorization purposes. A recent investigation by

Levy and Strange (2008) examined how coarticulatory information in French affected the perception of French rounded vowels /y/ and /ɶ/ in alveolar contexts. In American

English, back rounded vowels are fronted in alveolar contexts (Hillenbrand, Clark &

Nearey, 2001) but in bilabial contexts they are not. The authors predicted that American

English listeners with less French experience would use this sublexical information from their native language to perceptually assimilate French front rounded vowels in alveolar context to their nativeEnglish /u/ vowel category. Levy and Strange’s (2008) results revealed that in general, the less experienced group made more errors in discrimination for the alveolar context than for the bilabial context, suggesting that context had a greater effect on the less experienced participants overall. These results suggest that the more experienced listeners had succeeded in perceiving the front rounded vowels independent

149 of their context and independent of L1 English coarticulatory influences while the less experienced group could not do so. In terms of L2 speech perception models, Levy and

Strange propose that a phonetic level of representation operates in equivalence classification that drives inexperienced listeners to perceive vowels as new or similar to

L1 categories, based upon the context.

The results obtained by Levy and Strange (2008) indicate that native language contextual information plays a role in discrimination of L2 sounds. Their lowlevel learners exhibited effects for English coarticulation in the perception of French, their L2.

The data presented in Chapters 2 and 3 of this study support these results albeit in the opposite direction. The lowlevel learners did not use the contextual information for their target language in the same way as more experienced listeners did. Similar to the participants’ results in the Levy and Strange experiment, L1 expectations influenced their perception and production of L2 variants.

5.6 Phonological Mechanisms: evidence for comparison and contrast in an L2 and tracking of multiple levels of information

The comparison and contrast mechanism used by infants is also available to the adult learner. In the case of second language learners (i.e., not naïve listeners hearing an unknown language for the first time), hearers are aware of which language they are listening to, whether they correctly categorize the sounds of the particular language or not. This means that learners can track separate statistics for each language, much like infant bilingual learners.

150

It is accepted in perception and production research that no two languages share the same phonetic realization of phonological categories (Pierrehumbert, 2003b). For example, as shown by Sundara (2005), English and Canadian French voiced and voiceless coronal stops do not share the same place of articulation or VOT. The phonetic implementation of the phonological categories also differs in terms of the third and fourth spectral moments (burst diffusion). According to PRIMIR, listeners take advantage of these finegrained phonetic differences when perceiving speech and use it to organize the information across the multidimensional planes.

Cues such as those which distinguish between the coronal stops in Canadian

French and English are objectively available in the speech stream. Nevertheless, whether cues to categories or contexts, they are probabilistic in nature, which means certain cues cooccur in highly probably manners and coalesce into robust representations. Others are only partially informative and yet others, not informative at all. As stated, a listener’s ability to pick up on these cues will depend upon the effects of the dynamic filters. The

L1 filter plays a key role in determining what information will be available to the L2 learner.

I presented evidence that adult L2 learners track the phonological environment or stress and position along with the cues that are internal to the allophones themselves; that is, learners track where in the input each variant occurs, in addition to the specific acoustic cues that characterize each allophone. These results also suggest that learners build this distributional information into their representations. To the contrary, it would be difficult to explain how shifting the allophone onset can drive the illusion of stress perception in more experienced learners.

151

In these terms, stress is understood as a percept that emerges from language specific experience, as a constellation of cooccurring phonetic cues. There is evidence from infant research that stress patterns influence the representations created. Research on infant perception of rhythm has shown that they are sensitive to the rhythmic patterns of language from a young age and this sensitivity increases and becomes language specific as development progresses. For example, changes in the rhythmic patterns of alternating strong–weak (Sw) syllables are detected by infants between one and four months of age (Jusczyk & Thompson, 1978) and at ninemonths of age English infants demonstrate a trochaic bias (Sw). Infants not only demonstrate a preference for stress patterns in the speech input, but they can also use this information to learn phonetic contrasts. Mattys, Jusczyk, Luce, and Morgan (1999) found that stress was beneficial for learning phonetic contrasts such that minimally distinct sounds were detected in stressed syllables, but not in unstressed ones. These results suggest that the stress pattern of adult language may influence the level of detail encoded in early lexical representations.

Curtin. Mintz and Christiansen (2005) show that infants incorporate information about stress into their representations of word forms; after hearing DObita, infants of 7months recognize that doBIta is different. This information can influence subsequent word recognition and word segmentation and crucially, it interacts with the way in which statistics are calculated over the input.

The results from the perception study presented in Chapter 2 support a similar type of effect for stress in adult second language acquisition. Stress detection by more experienced learners was affected by the presence of a stop, which is probabilistically linked to stress in the Spanish input. Less experienced learners were not affected. The

152 tracking of statistical information by adult L2 learners of Spanish is affected by stress – its presence and absence – and stored in their representations, a finding that supports those of Curtin et al (2005) for infants. Stress influences the way in which other information is statistically tracked and its influence will depend upon the developmental level of the learner. Importantly, the concurrent tracking of stress and allophone onset is not dependent upon L1 filter effects – as discussed in Chapter 3 the lowlevel learners were screened for their ability to perceive stress in Spanish. Instead, learners require sufficient experience with Spanish – the developmental level dynamic filter – and only then can they draw upon sufficiently robust representations that coalesce around stress and stop allophones.

I have presented evidence that adult L2 learners track distributional information in the input and use this information to produce and perceive their target language. In order to successfully organize the information that has passed through the filters, L2 learners make use of the comparison and contrast mechanism discussed above. This will lead to the formation of a ‘Spanish’ stop category and the maintenance of an ‘English’ stop category, given that no two languages share precisely the same cues to sound categories. Thus, when learners encounter multiple languages, they also encounter multiple cues to the different categories. As Weiss, Gerfen and Mitchel (2009:31) state, some of the cues may come from the languages themselves, in the form of allophones, stress cues, phonotactic patterns, among others. Other cues may come from the context in which each language is learned. For example, in Spanish class, students expect to hear

Spanish and even outside of class expect to hear Spanish from their Spanish teacher.

Weiss et al (2009) investigated whether learners do form multiple representations based

153 upon cues that separate different input streams. They chose to examine the indexical cue of speaker voice (male and female) and tested listener’s ability to segment fourword artificial speech streams. They exposed listeners to 12 minutes of artificial speech input by each speaker. In Condition 1, the streams had compatible statistics while in Condition

2, the statistics were incongruent and required that the listener track separate sets of statistics for each speaker, with gender providing a strong contextual cue for the change in distributional information. Their results showed that when the streams were presented with different voices, adults were able to learn them both; however, in the absence of the contextual (talker) cue, learning was at chance for both. These results indicate that when the comparison and contrast mechanism is activated to indicate two separate streams of input are being heard, and therefore two separate sets of statistics must be calculated, learners can do so. In its absence, no learning occurs.

In a related study, Gebhart, Aslin and Newport (2009) conducted a series of five experiments aimed at investigating primacy effects in artificial grammar learning, specifically, whether learning a previous set of structures interferes with the acquisition of a subsequent set of structures, which can be likened to the task facing adult second language learners. Their results showed that learners acquire the first set of artificial language structures to which they are exposed and not the second, when there is no explicit cue to signal a midstream shift in these structures. When the shift is rendered explicit by the addition of a pause and explicit instruction between the two language streams, both structures are learned and there is no loss of performance on either, when compared to a singlelanguage baseline. In their final experiment, the authors tripled the amount of exposure time to the second input stream and added additional cues to the two

154 sets of structures. Their results showed significantly better results on Language 1 than

Language 2, but results were still above chance for Language 2. These results show that a) learners detect a structural change and b) they create two separate structural representations for the two subsets of the corpus and c) the emergence of representations for Language 2 did not interfere with the representations retained from Language 1.

Crucially, the results of Experiment 5 demonstrate with triple the exposure, previously unused cues could lead to the creation of a second set of representations. As Gebart et al

(2009:25) state, even without lowlevel cues that could trigger the formation of a second structural representation, learners compared the current input to the stored representations and could recognize the new structures after sufficient exposure, based upon mismatches.

PRIMIR provides for these contingencies with its comparison and contrast mechanism that serves to organize the input for listeners. Input evaluated as it comes in and classified according to previously encountered input.

The data presented in Chapters 2 and 3 shows that adult learners are capable of tracking cues across multiple dimensions and, much like Weiss et al’s (2009) participants, it was proposed that learners use detailed phonetic information to encapsulate each set of input streams. In the present context, L1 English/L2 Spanish learners use the comparison and contrast mechanism to separate the different distributional information for English and Spanish, which is subsequently stored in multidimensional clusters. The effects of the four filters will subsequently determine whether the stored information is available.

I argue in this dissertation that experience is key to the way in which learners use the contextual cues that drive the allophonic alternation. Learners with more experience

155 used the contextual cues – in perception and production – in different ways than learners with little or no Spanish experience. In the following section I propose a way in which experience might be operationalized as a probabilistic updating of representations.

5.7 Probabilistic updating of representations

The fact that previously established categories affect subsequent perception is well established in the literature, perhaps best demonstrated by the categorical perception effect (Liberman, 1957). However, speech perception does not occur without noise – not every token intended by a speaker to be of a certain category will necessarily be an ideal exemplar of that sound category. Thus, listeners must seek the optimal solution to the categorization problem in spite of the noise present in the signal (Feldman, Griffiths &

Morgan, 2009). Categorization becomes a process of inferring the speaker’s target pronunciation and ultimately, categorization depends upon prior knowledge of the likelihood a particular speech target corresponds to a particular category. Representations are constantly updated to reflect the input and are influenced by the posterior probabilities, or the probability that a hypothesis is true given the prior probabilities of an event (Griffiths & Yuille, 2006:3).

There is abundant evidence in the literature that listeners can create new representations with very short exposure times to artificial speech. Recent work by

Dahan, Drucker and Scarborough (2009) shows that adult L1 English listeners are sensitive to indexical information in the speech signal and crucially, this sensitivity is probabilistic in nature. Their study examined how listeners adjust to the acoustic variety

156 found in natural speech – whether by a process of normalization or an online updating of representations. The authors created two sets of stimuli, one of which corresponded to dialects of English that raise /æ/, as in ‘bag’ to a vowel closer to /ɛ/, but do not do so for the same target vowel in the word ‘back’. The authors hypothesized that for speakers of dialects where raising does not occur, the phonetic overlap would lead listeners to reduce the possibility that [bæ] corresponded exclusively to ‘back’. For speakers of dialects where such raising does occur, the likelihood of [bæ] corresponding to ‘back’ would be equal to one. If listeners adapted their internal representations to their past experience with a given speaker, they would less likely to misinterpret [bæ] input as probabilistically being ‘bag’ (in Bayesian terms, the likelihood of the input [bæ], given the hypothesis bag) for speakers whose dialect exhibited raising. Their results show that exposure to a talker whose /æ/ vowel was affected in the context of /g/ affected the reaction time and eye movements to words that were not affected by the dialect (Dahan et al, 2009:9). In other words, the adaptation to the new vowel was not the results of expanding the listener’s vowel space or the acceptance of additional noise in the input. Listeners dynamically adjusted their representations stored in the lexicon according to the speaker and context. This study provides evidence of online dynamic updating that occurs when listeners are exposed to new sound distributions in a different dialect. After brief exposure, listeners exposed to a different dialect from their own were able to adjust their lexical representations to incorporate the new information in a probabilistic manner.

The distributionbased learning mechanism of PRIMIRL2 can incorporate this type of online updating of representations, based upon past experience with the input and previously stored exemplars. It remains to be determined, however, precisely how this

157 updating occurs across second language representations that are subject to the L1 filter effect. One possibility is that probabilistic updating cannot occur reliably across sparse representations. Another possibility is that the exemplars are not being categorized correctly into unified distributions and therefore probabilistic updating is occurring across incorrectly classified categories.

5.8 Role of the lexicon in L2 sound category acquisition

One of the basic premises of PRIMIR is that the three Planes are interactive and information flows in a topdown and bottomup fashion. Word forms can emerge before the phoneme plane has finished developing and word forms can also influence the development of the phoneme plane. Thus, phonological development should reflect natural biases and also languagespecific patterns, a result of the lexicon being acquired.

We saw evidence for this in Chapter 4, where the earliest lexical items that contained the approximant allophone were also the most frequent in the input and output, according to the CHILDES corpora and most recognized by children of similar ages according to the

Normas Léxicas del Español. In other words, phonological development reflects the nature of the lexicon being acquired and more frequent words will be acquired first and in a more targetlike manner. Assuming all learning mechanisms that are available at one point in development continue to be available across the lifespan, the targetlanguage lexicon should also have an influence on adult second language learners.

In terms of second language learners, the role of the lexicon in phonological development has been studied to a much lesser extent. In Best and Tyler’s PAML2

158 framework, it is proposed that a larger L2 vocabulary may lead to rephonologization of the targetlanguage sounds (Best & Tyler, 2007:32), in light of the fact that the overall goal of L2 acquisition is learning a lexicon. Specifically, the prediction is that a larger L2 vocabulary will drive learners to notice subtle differences in words, and thus motivate perceptual learning where none might otherwise occur. Bundgaard (2009) tested this hypothesis by examining vowel perception by L1 Japanese learners of Australian

English. She predicted that following the PAML2 hypothesis that L2 vocabulary size drives L2 phonological reattunement, L2 learners with a larger vocabulary would more consistently identify L2 vowels in terms of their L1 vowel categories than learners with a smaller L2 vocabulary. The results showed that vocabulary size was related to the identification patterns of the learners whereby the high vocabulary group selected a significantly smaller number of L1 categories for the Australian English vowels and they were also significantly more consistent in their matching of L2 vowels to L1 categories.

Furthermore, the high vocabulary group was also more consistent in their identification pattern of uncategorized L2 vowels, indicating greater regularity in the way in which they perceive the L2 phones.

The lexicon plays a crucial role in driving acquisition – leading to frequency effects, place of articulation effects that cannot be accounted for if the phonological system is purely rule or constraintgoverned. Learning words helps individuals attune their phonological system to the distributional patterns that occur in the target language and track where allophones surface. As shown in this dissertation, the process of allophone and indeed, phonological acquisition in general, is not an acrosstheboard

159 process but instead involves interaction between a distributionbased learning mechanism and the patterns it tracks in the lexicon.

5.8.1 Role of the lexicon in a gradient phonological system

If, as claimed here, the lexicon plays a key role in phonological acquisition, it must also play a role in the gradient nature of phonological knowledge and variability.

Under constraintbased approaches, gradiency and variability are generated by the grammatical constraints themselves. Stochastic approaches to OT model lexical regularities as stored abstract generalizations that emerge from the reranking of universal constraints. The most widely recognized and employed stochastic model of OT is the

Gradual Learning Algorithm (GLA, Boersma & Hays, 2001). The GLA assumes that all constraints are ranked and violable, an assumption that follows that of regular OT approaches. However, each constraint is also assigned a real number along a continuous scale, or its rank. Each time an evaluation occurs, some random noise is added to each constraint rank, giving its selection point. Constraints are ordered according to their selection points. Therefore, these ordinal rankings reflect positions on the continuous scale. Where two constraints differ greatly in their initial ranking value (i.e., prior to the addition of noise) they will remain relatively fixed in their selection points. The GLA takes an analyzed corpus as input and returns a ranking for each constraint based upon the frequencies of constraint violation observed in the corpus. Thus, the basic assumption of the GLA is that learning is errordriven (greater number of input forms which violate a particular markedness constraint will lead to its faster demotion) and gradual. Variability

160 in the output occurs when the rankings are close on the continuous scale and the addition of random noise may lead to one being ranked higher than the other in their final selection points. Importantly, updating is assumed to occur after each new word is encountered, rendering it difficult to account for how listeners may draw comparisons with words they have stored or words which are similar in phonological/phonetic form.

There is abundant evidence that listeners do so, however.

Under exemplarbased approaches, gradiency and variability emerge as a consequence of analogical processes (whether complete or incomplete). For example, the data presented in Chapter 4 of this dissertation, taken from Spanish L1 child productions, suggests that lexical frequency plays a role in the production of the more marked allophone. In other words, the child production patterns did not reflect a general frequencybased reranking of constraints. As well, the study cited above by Dahan et al.

(2009) showed evidence for online updating of lexical representations that do not occur across the board. Experiments such as those of Dahan et al. (2009) and the ones presented here show that listeners actually adjust their representations, not only constraints, as they gain experience with a language or even a specific speaker. Learners are affected by such factors as neighbourhood density and make active use of lexical frequency information in their perception and production of language. Under exemplarbased approaches, the number of similar neighbours will affect the ease and speed with which a word is recognized and produced. Whenever a word is encountered, words which are similar to it will be activated and enter into competition for recognition.

161

5.9 Conclusion

In this dissertation I showed that adult learners of a second language use a distributionbased mechanism to create exemplarlike representations of their target language allophones. I outlined PRIMIRL2, an extension to Werker & Curtin’s (2005)

PRIMIR framework and proposed the addition of an L1 filter that serves to direct attention and information pickup for second language learners.

The research presented in this dissertation adds to the growing literature showing that the mechanism underlying language acquisition and development is rooted in universal perceptual and articulatory biases and conditioned upon the linguistic input received by the learner. Learner representations are in a constant state of change, adapting to the incoming information, allowing learners to optimally categorize new input based upon prior knowledge and experience. There is continuity between infant language learning and adult second language learning in terms of the learning task itself and the mechanisms underlying it, the distinction lies in the way learning is affected by filters that serve to direct learner attention and allow for information pickup.

To conclude, this dissertation represents a novel approach to adult and child phonological acquisition. Learning a sound system consists of acquiring statistical, probabilisticallybased information that accumulates gradually over the course of learning. I provided evidence that the grammar encodes distributionbased information that includes frequency and fine phonetic details and any grammatical model must be able to account for data of this type.

162

Appendix A

Stress Detection Word List

Word Number of syllables Stressed Syllable

barad 2 2 belen 2 2 budan 2 2

dagú 2 2

gonam 2 2

gudin 2 2

magú 2 2

dabun 2 2

ibe 2 1 bera 2 1 bito 2 1

dabo 2 1

dida 2 1

duno 2 1

gita 2 1

nabe 2 1

lúguida 3 1

163 básigo 3 1

dibanó 3 1

adigué 3 3

164

Appendix B

Word list

Words in italics are nonce words. a. Word onset stressed syllable

Ci Ca Cu

bicho bato burro

gita gato gudo

díba dado duda b. Word onset unstressed syllable

bidán banana buró

guitarra galán gusano

dilató dató dudó c. Word medial stressed syllable

cabina abague aburro

meguilla abogado laguna

bedita adapto maduro d. Unstressed word medial

tabila sábado aburró

mudinó idagó laduró

águila regadó agua

165

Appendix C

This Appendix includes a list of selected words for each child where the approximant allophone was the target production. The data for FC is presented first, followed by MG and for each child, targetlike items are presented first. The lexical items are organized by place of articulation.

In column three, I present the mean log frequency and raw frequency for all words, across all places of articulation, for the target and nontargetlike items. In column four I present the percentage of children who recognize and use the item, according to the Spanish Child Development Index.

TARGETLIKE C CDS CDI

Mean log freq: Mean log % children who 1.53 freq: 1.35 use/understand Mean raw Mean raw item freq: 56.4 freq: 36.8

FC Target Realization

b medial lobo [loβo] 2.17 2.13 55% ‘wolf’ (148) (136)

autobús [autoβuh] 0 0.3 46% ‘bus’ (2) [aβiéka] .78 .7 87% abierta (6) (5) ‘open’

nuevo [neβo] .3 1.18 52% ‘new’ (2) (15)

arriba [iβa] 1.95 2.36 72% ‘above’ [viva] (90) (230)

avion [aβon] 0 1 91% ‘plane’ (10)

166

caballo [aβaiȴo] 2.2 2.1 89% ‘horse’ (167) (128)

abajo [aβaxo] 1.79 2.28 72% ‘below’ (61) (190)

d medial nada [na’a] 2.37 2.64 55% ‘nothing’ (237) (435)

vestido [bestío] 1.04 1.71 60% ‘dress’ (11) (52)

helado [elao] 0 1.28 66% ‘ice cream’ (19)

g medial agua [aγwa] 2.9 2.65 100% ‘water’ (489) (448)

jugo [kuγo] 1.04 1.2 85% ‘juice’ (11) (16)

OTARGETLIKE

Target Realization C CDS CDI Mean log Mean log freq. freq. 1.2 1.4 Mean raw Mean raw freq. 28 freq:38 b medial abajo [abáo] 1.79 2.28 72% ‘below’ [vako] (61) (190)

abuelo [béo] 1.32 2.13 80% ‘grandfather’ (21) (134)

vibora [bobola] 0 0 55%

167

‘snake’ g medial yogúr [ȭogu] 0 1 65% ‘yogurt’ (10)

fuego [βeko] .7 1.25 48% ‘fire’ (5) (18)

168

TARGETLIKE C CDS CDI MG Mean log freq: Mean log freq: 1.53 1.35 Mean raw Mean raw freq: 56.4 freq: 36.8

Target Realization b medial sabes [saβes] 2.2 2.9 40% ‘you know’ (167) (837) (saber, inifinitive) avion [aβión] 0 1 91% ‘plane’ (10)

otra vez [ota βes] 2.1 2.58 66% ‘again’ (102) (380) 3X

arriba [arɸiβa] 1.95 2.36 72% ‘above’ (90) (230)

iba [iβa] 1.77 1.85 65% ‘went’ (59) (70) (ir, infinitive) abajo [aβaxo] 1.79 2.28 72% ‘below’ (61) (190)

nieve [nieβe] 1.18 1.17 42% ‘snow’ (15) (15) (2X)

abuelo [aβuélo] 1.32 2.13 80% ‘grandfather’ (21) (134)

bebé [beβe] 1.51 2.53 95% ‘baby’ (32) (342)

cobija [koβixa] 0 0 78% ‘blanket’

169 d medial puedo [puéo] 2.05 2.24 43% ‘(I) can’ (113) (174)

queda [kéa] 1.38 1.95 48% ‘remains’ (24) (89)

comida [komía] 1.84 2.01 80% ‘ food’ (69) (103)

computador [komputaor] 0 0 23% ‘computer’

ayúdame [aúame] 0 1.08 60% ‘ help me’ (12) (ayudar)

borrador [bowaor] 0 0 0 ‘eraser’ g medial amigo [amiγo] 1.41 1.69 91% ‘friend’ (26) (49)

jugo [juγo] 1.04 1.2 85% ‘juice’ (11) (16)

juguetes [xuγetes] 1.66 2.19 82% ‘toys’ (46) (154)

barriga [baðiγa] 1.08 1.32 89%1 ‘belly’ (12) (21)

jugar [xuγaȎ] 2.03 2.66 82% ‘play’ (107) (452)

(me) pegué [peγe] 1.51 1.65 66% ‘ (I) hit (32) (45) myself’

170

agua [aɣwa] 2.9 2.65 100% ‘water’ (489) (448)

onTargetLike

C CDS CDI Mean log freq: Mean log freq: .9 1.2 Mean raw Mean raw freq: 23 freq: 33 b Target Realization medial caballo [kabaȭo] 2.2 2.1 89% ‘ horse’ (167) (128)

estabas [ehtabas] 2.27 1.9 34% ‘(you) were’ (185) (86) (estar)

cabeza [kabesa] 1.6 2.03 94% ‘head’ (44) (109)

me subí [mesubi] 1 1 68% ‘ (I) climbed’ (10) (10)

sabor [sabor] 0 0 0% ‘taste’

se acabó [seakabó] 1.08 1.68 68% ‘It’s finished’ (12) (48)

vibora [biboda] 0 0 55% ‘snake’

llevar [ȭebar] 0 2 55% ‘to carry’ (99)

d medial revolcado [pwepokado] 0 0 0% ‘overturned’

171

espada [espada] 1.1 1.18 0% ‘sword’ (13) (15)

preferido [esmipetalido] 0 0 43% ‘favourite’

mezclado 0 .3 0% ‘mixed’ (2) g medial juegos [fuégos] .9 1.11 89% ‘games’ (8) (13)

regalos [legalos] .95 1.2 32% ‘gifts’ (9) (16)

fuego [fuégo] .7 1.25 48% ‘fire’ (5) (18)

jugamos [xugamos] 1.6 2.06 89% ‘let’s play’ (40) (116)

digas [digas] 0 1.6 0%

‘say’ (you, (42) subjunctive)’

172

References

Alameda, J.R., & Cuetos, F. (1995). Diccionario de frecuencias de las unidades

lingüísticas del castellano. Oviedo, Spain: University of Oviedo Press.

Allen, J.S., and Miller, J.L. (2001). Contextual influences on the internal structure of

phonetic categories: A distinction between lexical status and speaking rate.

Perception & Psychophysics, 63, 798810.

Archangeli, D., & Pulleyblank, D. (1994). Grounded Phonology. Cambridge, MA: MIT

Press.

Bassetti, B. (2006). Orthographic conventions and phonological representations in

learners of Chinese as a Foreign Language. Written Language and Literacy 9 (1),

95114.

Bassetti, B. (2008) Orthographic input and second language phonology. In T. Piske, & M.

YoungScholten (Eds.) Input Matters in SLA (pp. 191206). Clevedon, UK:

Multilingual Matters.

Beckman, J.N. (1997). Positional faithfulness, positional neutralization and Shona vowel

harmony. Phonology 14, 146.

Beckman, J. N. (1998). Positional Faithfulness. PhD dissertation, University of

Massachusetts, Amherst.

Beckman, M.E., & Edwards, J. (2000). The ontogeny of phonological categories and the

primacy of lexical learning in linguistic development. Child Development, 71,

240249.

173

Beckman, M. E., & Edwards, J. (2000). Lexical frequency effects on young children's

imitative productions. Papers in laboratory phonology V (M. Broe & J.

Pierrehumbert, eds). Cambridge University Press, 208218.

Best, C. T. (1994). The emergence of nativelanguage phonological influences in infants:

A perceptual assimilation model. In J. Goodman & H. C. Nusbaum (Eds.) The

development of speech perception: The transition from speech sounds to spoken

words (pp. 167–224). Cambridge, MA: MIT Press.

Best, C. T. (1995). A direct realist perspective on crosslanguage speech perception. In

W. Strange (Ed.), Speech perception and linguistic experience: Issued in cross

language research (pp. 171204). Timonium, MD: York Press.

Best, C., & Tyler, M. (2007). Nonnative and secondlanguage speech perception:

Commonalities and complementarities. In O.S. Bohn & M. Munro (Eds.)

Language Experience in Second language Speech Learning. In honor of James

Emil Flege (pp. 1334). Amsterdam: John Benjamins.

Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of nonnative

consonant contrasts varying in perceptual assimilation to the listener's native

phonological system. Journal of the Acoustical Society of America, 109(2), 775

794.

Best, Catherine; Strange, W. (1992). Effects of phonological and phonetic factors on

crosslanguage perception of approximants. Journal of phonetics 20, 305–330

Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of Perceptual

Reorganization for Nonnative Speech Contrasts: Zulu Click Discrimination by

174

EnglishSpeaking Adults and Infants. Journal of Experimental

Psychology:Human Perception and Performance, 14(3), 345360.

Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm.

Linguistic Inquiry, 32, 4586.

Boersma, P., & Levelt, C. (1999). Gradual ConstraintRanking Learning Algorithm

predicts acquisition order. Unpublished ms., available from Rutgers Optimality

Archive, ROA 3611199

Boersma, P., & Weenink, D. (2009). Praat: doing phonetics by computer (Version 5.1.)

[Computer program]. Retrieved February 2, 2009, from http://www.praat.org/.

Bohn, O.S. (1995). Crosslanguage speech perception in adults: First language transfer

doesn’t tell it all. In W. Strange (ed.), Speech Perception and Linguistic

Experience. Baltimore: York Press p.273304.

Boomershine, A., Currie Hall, K., Hume, E., & Johnson, K., (2007). The influence of

allophony vs. Contrast on perception: The case of Spanish and English. In P.

Avery, B. Elan Dresher & K. Rice (Eds.), Contrast in phonology: Perception and

acquisition (pp.145171). Berlin: Mouton.

Bosch, L. & SebastiánGallés, N. (1997). Simultaneous bilingualism and the perception

of a languagespecific vowel contrast in the first year of life. Language and

Speech 46(23), 217243.

Bosch, L., & SebastiánGallés. (2001). Evidence of early language discrimination

abilities in infants from bilingual environments. Infancy, 2(1), 2949.

Broselow, E. (2004). Unmarked Structures and Emergent Rankings in Second Language

Phonology. International Journal of Bilingualism, 8, 5165.

175

Broselow, E., Chen, S., & Wang, C. (1998). The emergence of the unmarked in second

language phonology. Studies in Second Language Acquisition, 20, 261280.

Browman, C. P., & Goldstein, L. (1992). Articulatory Phonology: An Overview.

Phonetica, 49, 155180.

Brown, C. A. (1998). The role of the L1 grammar in the L2 acquisition of segmental

structure. Second Language Research, 14 (2), 136193.

Bybee, J. L. (2000). The phonology of the lexicon: Evidence from lexical diffusion. In M.

Barlow and S. Kemmer (Eds.), Usagebased models of language (pp. 65–85).

Stanford: CSLI.

Bybee, J. L. (2001b). Phonology and language use. Cambridge: Cambridge University

Press.

Bybee, J. L. (2003). Mechanisms of change in grammaticization: The role of frequency.

In R.Janda & B. D. Joseph (Eds.), Handbook of Historical Linguistics (pp. 602–

623). Oxford: Blackwell.

Byrd, D. (1996). Influences on articulatory timing in consonant sequences. Journal of

Phonetics, 24(2), 209244.

Carroll, S. E. 2001. Input and evidence: The raw material of second language

acquisition. Amsterdam: Benjamins.

Casali, R. (1996). Resolving Hiatus. PhD dissertation, UCLA.

Cebrian, J. (2006) Experience and the use of duration in the categorization of L2 vowels.

Journal of Phonetics 34, 372387.

Celata, C. (2007). Allophonic Variation Can Affect L2 Speech Perception:Evidence from

a Tuscan Neutralization Process. Paper presented at New Sounds 2007, Brazil.

176

Cho, T., McQueen, J. M., & Cox, E. (2007) Prosodically driven phonetic detail in speech

processing: The case of domaininitial strengthening in English. Journal of

Phonetics, 35, 210243.

Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York:Harper &

Row.

Coetzee, A. (2008). Grammaticality and ungrammaticality in phonology. Language,

84:218257.

Cohen, J.D., MacWhinney, B, Flatt, M, & Provost, J. (1993). PsyScope: A new graphic

interactive environment for designing psychology experiments. Behavioral

Research Methods, Instruments, and Computers, 25, 257271.

Colantoni, L., & Steele, J. (2007). Acquiring /alveolar approximant /in context. Studies in

Second Language Acquisition, 29, 381 406.

Cole, J., Hualde, J.I., & Iskarous, K. (1997). Effects of prosodic and segmental context

on /g/ lenition in Spanish. In O. Fujimura, B Joseph & B. Palek (Eds.),

Proceedings of LP ’98 (pp. 123137). Prague: Karolinium Press.

Coleman, J. and J. Pierrehumbert (1997) Stochastic Phonological Grammars and

Acceptability, 3rd Meeting of the ACL Special Interest Group in Computational

Phonology: Proceedings of the Workshop, 12 July 1997. Association for

Computational Linguistics, Somerset NJ. 4956.

Crystal, D. (1997). The Cambridge Encyclopedia of the . Cambridge:

CUP.

Curtin, S., Mintz, T. H., Christiansen, M. (2005) Stress changes the representational

landscape: evidence from word segmentation in infants. Cognition, 96, 233262

177

Curtin, S., ByersHeinlein, K., & Werker, J. (under review). Bilingual beginnings as a

lens for theory development: PRIMIR in focus.

Cutler, A., & Carter, D.M. (1987). The predominance of strong initial syllables in the

English vocabulary. Computer Speech and Language, 2, 133–142.

Cutler, A., Weber, A., & Otake, T. (2006). Asymmetric mapping from phonetic to lexical

representations in secondlanguage listening. Journal of Phonetics, 34 (2), 269

284.

Dahan, D., Drucker, S. J., & Scarborough, R. A. (2008). Talker adaptation in speech

perception: adjusting the signal or the representations? Cognition, 108, 710718.

Dinnsen, Daniel. (1992). Variation in Developing and Fully Developed Phonetic

Inventories. In C.Ferguson, L. Menn & C. StoelGammon (Eds.) Phonological

Development: Models, Research, Implications(pp. 191210). Timonium, MD:

York Press.

Doughty, C. & Williams, J. (Eds.), Focus on form in classroom second language

acquisition (pp. 197261). New York: Cambridge University Press.

Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C. & Mehler, J. (1999). Epenthetic vowels in

Japanese: a perceptual illusion? Journal of Experimental Psychology: Human

Perception and Performance, 25, 15681578.

Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing "deafness" in

French? Journal of Memory and Language, 36, 406421.

Eckman, F.R. (2007). Universals, innateness and explanation in second language

acquisition. In M. Penke & A. Rosenbach (Eds.) What Counts as Evidence in

Linguistics (pp.217–239). New York: John Benjamins Publishing Company.

178

Eckman, F. R., & Iverson, G. (1993). Sonority and Markedness among onset clusters in

the interlanguage of ESL learners. Second Language Research, 9, 234252.

Edwards, J., Beckman, M. E., & Munson, B. (2004). The interaction between vocabulary

size and phonotactic probability effects on children's production accuracy and

fluency in nonword repetition. Journal of Speech, Language, and Hearing

Research, 47, 421436.

Edwards, J., & Beckman, M. E. (2008). Some crosslinguistic evidence for modulation of

implicational universals by languagespecific frequency effects in the acquisition

of consonant phonemes. Language Learning & Development, 4(2): 122156

Ellis, N.C. (2008). Usagebased and formfocused language acquisition: The associative

learning of constructions, learned attention, and the limited L2 endstate. In P.

Robinson & N.C. Ellis Handbook of Cognitive Linguistics and Second Language

Acquisition (pp. 372405). London: Routledge.

Escudero, P,. & Boersma, P. (2004). Bridging the gap between L2 speech perception

research and phonological theory. Studies in Second Language Acquisition, 26,

551585.

Escudero, P., HayesHarb, R., & Mitterer, H. (2008). Novel secondlanguage words and

asymmetric lexical access. Journal of Phonetics, 36 (2), 345360.

Face, T. L. (2002). Disentangling the necessarily entangled: The phonology and

phonetics of Spanish spirantization. Southwest Journal of Linguistics, 21, 5671.

Feldman, N. H., Griffiths, T. L., & Morgan, J. L. (2009). The influence of categories on

perception: Explaining the perceptual magnet effect as optimal statistical

inference. Psychological Review, 116, 752782.

179

Fennell, C.T., & Werker, J.F. (2003). Early word learners’ ability to access phonetic

detail in wellknown words. Language & Speech, 46(2), 2003, 245264

Flege, J. (1995) Second language speech learning: Theory, findings and problems. In W.

Strange (Ed.) Speech perception and linguistic experience: Theoretical and

methodological issues (pp. 233277). Timonium, MD: York Press.

Flege, J. (2002). Interactions between the native and secondlanguage phonetic systems.

In P. Burmeister, T. Piske, & A. Rohde (Eds.), An integrated view of language

development: Papers in honor of Henning Wode (pp. 217–244). Trier:

Wissenschaftlicher Verlag.

Flege, J. (2003). Assessing constraints on secondlanguage segmental production and

perception. In A. Meyer, & N. Schiller (Eds.), Phonetics and phonology in

language comprehension and production, differences and similarities (pp. 319–

355). Berlin: Mouton de Gruyter.

Flege, James E., & Liu, S. (2001). The effect of experience on adults’ acquisition of a

second language. Studies in Second Language Acquisition, 23, 527552.

Flege, James E., Bohn, O. S., & Jang, S. (1997). Effects of experience on nonnative

speakers’ production and perception of English vowels. Journal of Phonetics, 25,

437470.

Francis, A. L., Baldwin, K., & Nusbaum, H. C. (2000). Effects of training on attention to

acoustic cues. Perception and Psychophysics,62(8), 16681680.

180

Francis, A.L,. & Nusbaum, H. C. (2002). Selective attention and the acquisition of new

phonetic categories. Journal of Experimental Psychology: Human Perception

and Performance, 28(2), 349–366.

Francis, A.L., Kaganovich, N., & DriscollHuber, C.J. (2008). Cuespecific effects of

categorization training on the relative weighting of acoustic cues to consonant

voicing in English. Journal of the Acoustical Society of America, 124 (2), 1234

1251.

Frisch, S. & Zawaydeh, B. (2001). The Psychological Reality of OCPPlace in .

Language, 77, 91106.

Frisch, S., Pierrehumbert, J.B., & Broe, M.B. (2004). Similarity avoidance and the OCP.

atural Language and Linguistic Theory, 22, 179–228.

Gebhart, A. L., Aslin, R. N., & Newport, E. L. (2009). Changing structures in mid

stream: Learning along the statistical garden path. Cognitive Science, 33, 1087

1116.

Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification

and recognition memory. Journal of Experimental Psychology:Learning,

Memory, and Cognition, 22(5), 1166–1183.

Goldsmith, J. (1995) Phonological theory. In J. Goldsmith (ed) The handbook of

phonological theory. Oxford: Blackwell, pp. 123.

Goldrick, M., & Larson, M. (2008). Phonotactic probability influences speech

production. Cognition, 107, 11551164.

181

Goudbeek, M., Cutler, A., & Smits, R. (2006). Supervised and unsupervised learning of

multidimensionally varying nonnative speech categories. Speech

Communication, 50, 109125.

Guion, S. G. & Pederson, E. (2007). Investigating the role of attention in phonetic

learning. In Bohn, O.S. & Munro, M. J. (Eds.), Language experience in second

language speech learning: In honor of James Emil Flege (pp. 5778). Amsterdam:

John Benjamins.

Hall, K.C. (2009). A Probabilistic Model of Phonological Relationships from Contrast to

Allophony. PhD Dissertation, Ohio State University.

Hallé, P., Segui, J., Frauenfelder, U. & Meunier, C. (1998). Processing of illegal

consonant clusters: a case of perceptual assimilation? Journal of Experimental

Psychology: Human Perception and Performance, 24, 592608.

Harris, J. (1969). Spanish Phonology. Cambridge, Mass.: MIT Press.

Harris, J. (1983). Syllable structure and stress in Spanish. Cambridge, Mass.: MIT Press.

Hayes, B. (2004). Phonological Acquisition in Optimality Theory: The early stages. In R.

Kager, J. Pater & W. Zonneveld (Eds.), Fixing Priorities: Constraints in

Phonological Acquisition. Cambridge: Cambridge University Press.

Hillenbrand, J. M., Clark, M. J., & Nearey, T. M. (2001). Effects of consonant

environment on vowel formant pattern. Journal of the Acoustical Society of

America, 109, 748–763.

Holt, L.L. & Lotto, A.J. (2006) Cue weighting in auditory categorization: Implications

for first and second language acquisition. Journal of the Acoustical Society of

America, 119, 30593071.

182

Hualde, J.I. (2005). The Sounds of Spanish. New York: Cambridge University Press.

Huberty, C., & Olejnik, S. (2006). Applied MAOVA and Discriminant Analysis.

Hoboken, NJ: WileyInterscience.

Ingram, D. (1999). Phonological acquisition. In M. Barrett (Ed.) The development of

language. London: UCL Press, 7397.

Jaeger, Jeri J. (1980). Testing the psychological reality of phonemes. Language and

Speech, 23. 233253.

Jakobson, R. (1941/1968). Kindersprache, aphasie und allgemeine lautgestze. [Published

originally in 1941; citations are from the 1968 translation by A. R. Keiler, Child

language, aphasia, and phonological universals.The Hague: Mouton.]

Jia, G., Strange, W., Wu, Y., Collado, J., & Guan, Q. (2006). Perception and production

of English vowels by Mandarin speaker: Agerelated differences vary with

amount of L2 exposure. Journal of the Acoustical Society of America, 119, 1118

1130.

Johnson, K. (1997). Speech perception without speaker normalization. In K.Johnson & J.

W. Mullennix (Eds.), Talker variability in speech processing (pp. 145–165). San

Diego: Academic.

Johnson, K. (2005). Decisions and mechanisms in exemplarbased phonology. UC

Berkeley Phonology Lab Annual Report, 289–311.

Johnson, K. & Mullennix, J.W.(Eds.). (1997). Talker Variability in Speech Processing.

San Diego: Academic Press.

183

Jusczyk, P. W., Luce, P. A. & CharlesLuce, J. (1994). Infants' sensitivity to

phonotactic patterns in the native language. Journal of Memory and Language,

33, 630645.

Jusczyk, P.W. Goodman, M. & Bauman, A. (1999). 9montholds attention to sound

similarities in syllables, Journal of Memory and Language, 40, p.6282.

Kazanina, N., Phillips, C., & Idsardi, W. (2006). The infuence of meaning on the

perception of speech sounds. Proceedings of the ational Academy of Sciences

USA, 103, 1138111386.

Kent, R. D. (1992). The biology of phonological development. In C. A. Ferguson, L.

Menn, & C. StoelGammon (Eds.), Phonological development: Models, research,

implications. Timonium, MD: York Press.

Kent, C., & Read, C. (2001). Acoustic analysis of Speech. New York: Singular Publishing

Group.

Kondaurova, M.V., & Francis, A.L. (2008). The relationship between native allophonic

experience with vowel duration and perception of the English tense/lax vowel

contrast by Spanish and Russian listeners. Journal of the Acoustical Society of

America, 124(6), 39593971.

Ladd, D. R. (2006). ‘Distinctive phones’ in surface representation. In L. M. Goldstein, D.

H. Whalen & C.T. Best (Eds.), Laboratory phonology 8 (pp.326). Berlin:

Mouton de Gruyter.

Ladefoged, P. (1975). A course in phonetics. Orlando: Harcourt Brace.

Lavoie, L. (2001). Consonant strength: phonological patterns and phonetic

manifestations. New York: Garland.

184

Levelt, C., & van de Vijver, R. (1998). Syllable types in crosslinguistic and

developmental grammars. Rutgers Optimality Archive, ROA 2650698.

Levy, E. S. (2009). Language experience and consonantal context effects on perceptual

assimilation of French vowels by AmericanEnglish learners of French. Journal of

the Acoustical Society of America, 125, 11381152.

Levy, E. S., & Strange, W. (2008). Perception of French vowels by American English

adults with and without experience. Journal of Phonetics, 36,

141157.

Lieberman , P. (1960). Some acoustic correlates of word stress in American English.

Journal of the Acoustical Society of America, 32, 451–454.

Lieberman, P. (1975). Intonation, Perception and Language. Cambridge, Mass: M.I.T.

Press.

Lléo, C., & Rakow, M. (2005). Markedness Effects in the Acquisition of Voiced Stop

Spirantization by SpanishGerman Bilinguals. In J. Cohen, K.T. McAlister,

K.Rolstad & J. MacSwan (Eds.) ISB4: Proceedings of the 4th International

Symposium on Bilingualism (pp. 13531371), Somerville, MA: Cascadilla Press.

Llisterri, J., Machuca, M. J., de la Mota, C., Riera, M., Ríos, A. 2005. La percepción del

acento léxico en español. Filología y lingüística. Estudios ofrecidos a Antonio

Quilis. Madrid: Consejo Superior de Investigaciones Científicas Universidad

Nacional de Educación a Distancia Universidad de Valladolid. 27197.

185

Macken, M.A. & Barton, D. (1980a). The acquisition of the voicing contrast in English:

A study of voice onset time in wordinitial stop consonants. Journal of Child

Language, 7, 4174.

Macken MA, Barton D. (1980b). The acquisition of the voicing contrast in Spanish: a

phonetic and phonological study of wordinitial stop consonants. Journal of Child

Language. 7, 433–458.

MartínezCeldran, E. (1991). Sobre la naturaleza fonética de los alófonos de /b,d,g/ en

español y sus distintas denominaciones. Verba, 18, 235253.

MartínezCeldran, E. (2004). Problems in the classification of approximants. Journal of

the International Phonetic Association 34, 201210.

Mascaró, J. (1984). Continuant spreading in Basque, Catalan and Spanish. In M. Aronoff,

(Ed.) Language sound and structure (pp. 287298). Cambridge, MA: MIT Press.

Mascaró, J. (1991). Iberian spriantization and continuant spreading. In P. Branchadell, R.

Quer. & B. Solá (Eds), Catalan working papers in linguistics 1991 (pp.167179).

Barcelona: Universitat Autónoma de Barcelona.

Massaro, D, & Cohen, M. (1983). Phonological context in speech perception. Perception

and Psychophysics, 34, 338348.

Mattys, S. L., Jusczyk, P. W., Luce, P. A., & Morgan, J. L. (1999) Phonotactic and

prosodic effects on word segmentation in infants. Cognitive Psychology, 38(4),

465494.

Max, L., & Onghena, P. (1999). Some issues in the statistical analysis of completely

randomized and repeated measures designs. Speech, Language, and Hearing

Research, 42, 261270.

186

Maye, J., & Gerken, L. (2001). Learning phonemes: how far can the input take us? In

A.H.J. Do, L. Dominguez & A. Johansen (Eds.), Proc. 25th Annual Boston

University Conference on Language Development (pp. 480–490). Somerville:

Cascadilla Press.

Maye, J.,Werker, J. & Gerken, L. (2002) Infant sensitivity to distributional information

can affect phonetic discrimination. Cognition, 82, B101B111.

Munson, B. (2001). Phonological pattern frequency and speech production in adults and

children. Journal of Speech, Language, and Hearing Research, 44, 778792.

Munson, B., Edwards, J., & Beckman, M.E. (in press). Phonological representations in

language acquisition: Climbing the ladder of abstraction. To appear in A.C.

Cohn, C. Fougeron, & M. K. Huffman (Eds.) Handbook of Laboratory Phonology

Oxford: Oxford University Press.

Munson, B., Edwards, J., & Beckman, M. E. (2005). Relationships between nonword

repetition accuracy and other measures of linguistic development in children with

phonological disorders. Journal of Speech, Language, and Hearing Research, 48,

6178

Nazzi, T. & Ramus, F. (2003). Perception of linguistic rhythm by newborn infants.

Speech Communication 41, 221243.

Nosofsky, Robert M., (1986). Attention, Similarity, and the IdentificationCategorization

Relationship. Journal of Experimental Psychology: General, 115(1), pp. 3957.

Ohala, J. J. (1994). Speech aerodynamics. In R. E. Asher and J. M. Y. Simpson (Eds),

The Encyclopedia of Language and Linguistics (pp.41444148). Oxford:

Pergamon.

187

Ohala, J. J. (2005). Phonetic explanations for sound patterns. Implications for grammars

of competence. In W. J. Hardcastle & J. M. Beck (Eds.) A figure of speech. A

festschrift for John Laver, (pp.2338). London: Erlbaum.

OrtegaLlebaria, M. (2003). Interplay between phonetic and inventory constraints in the

degree of spirantization of voiced stops: Comparing intervocalic /b/ and

intervocalic /g/ in Spanish and English. In T.Face (Ed.), Laboratory Approaches

to Spanish phonetics and phonology (pp. 237255). The Hague: Mouton de

Gruyter.

OrtegaLlebaria, M. (2006). Phonetic cues to stress and accent in Spanish. In M.Diaz

Campos (Ed.) Selected Proceedings of the 2nd Conference on Laboratory

Approaches to Spanish Phonology ( pp 104118). Somerville, MA: Cascadilla

Press

Parker, S. (2002). Quantifying the sonority hierarchy. PhD dissertation. University of

Massachusetts Amherst. Reproduced and distributed by GLSA (Graduate

Linguistics Students’ Association).

Pegg, J. E., & Werker, J. F. (1997). Adult and infant perception of two English phones.

Journal of theAcoustical Society of America, 102, 3742–3753.

Peperkamp, S., Pettinato, M., & Dupoux, E. (2002). Allophonic variation and the

acquisition of phoneme categories. In B. Beachley, A. Brown, & F. Conlin (Eds.),

Proceedings of 27th Annual Boston University Conference on Language

Development.Volume 2 (pp. 650661). Sommerville, MA:Cascadilla Press.

188

Pierrehumbert, J.B. (2001a). Exemplar dynamics: Word frequency, lenition, and contrast.

In J. L. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic

structure (pp. 137–157). Philadelphia: John Benjamins.

Pierrehumbert, J.B. (2001b). Stochastic phonology. Glot International, 5(6), 195–207.

Pierrehumbert, J.B. (2002). Wordspecific phonetics. In C. Gussenhoven & N. Warner

(Eds.), Papers in Laboratory Phonology 7 (pp. 101–140). Berlin: Mouton de

Gruyter.

Pierrehumbert, J. B. (2003). Phonetic diversity, statistical learning, and acquisition of

phonology. Language and Speech, 46, 115154.

Pierrehumbert, J. B. (2003b). Probabilistic phonology: Discrimination and robustness. In

R. Bod, J. Hay & S. Jannedy (Eds.), Probabilistic linguistics (pp. 177–228).

Cambridge, MA: MIT Press.

Pierrehumbert, J., Beckman, M., & Ladd, D.R.. (2001). Conceptual foundations of

phonology as laboratory science. In N. BurtonRoberts, P. Carr & G. Docherty

(Eds.), Phonological knowledge: Its nature and status (pp. 273–304). Cambridge:

Cambridge University Press.

Pitt, M. A., & McQueen, J. M. (1998). Is compensation for coarticulation mediated by the

lexicon? Journal of Memory and Language, 39, 347–370.

Polka, L. (1995). Linguistic influences in adult perception of nonnative vowel contrasts.

Journal of the Acoustical Society of America, 95, 12861296.

Prince, A., & Smolensky, P. (1993). Optimality Theory: Constraint interaction in

generative grammar. Rutgers University Center for Cognitive Science Technical

Report, 2.

189

Raphael, L. J., Borden, G. J., & Harris, K. S. (2007). Speech Science Primer, 5th Edition.

Baltimore: Lippincott, Williams & Wilkins.

Rayson, P. & Garside, R. (2000). Comparing corpora using frequency profiling. In

proceedings of the workshop on Comparing Corpora, held in conjunction with the

38th annual meeting of the Association for Computational Linguistics (ACL

2000). 18 October 2000, Hong Kong.

Romero, J. (1992). An experimental analysis of spirantization in Spanish. Journal of the

Acoustic Society of America, 92 (4), 2340.

Saffran, J.R., Aslin, R.N., & Newport, E.L. (1996). Statistical learning by 8month old

infants. Science, 274, 19261928.

Saffran, J.R., Newport, E.L., & Aslin, R.N. (1996). Word segmentation: The role of

distributional cues. Journal of Memory and Language, 35, 606621.

Schmidt,R.W. 1990:The role of consciousness in second language learning. Applied

Linguistics 11, 129–58.

Schmidt, R.W. (1995). Consciousness and foreign language learning:a tutorial on the

role of attention and awareness in learning. In R. Schmidt (Ed.) Attention and

awareness in foreign language learning. Honolulu, HI:Second Language

Teaching and Curriculum Center, University of Hawai’i.

Schmidt, R. (2001). Attention. In P.J. Robinson (ed.), Cognition and second language

instruction. Cambridge, U.K., 3–32.

Shea, & Curtin (2005). Learning allophones from the input. Poster presented at the 30th

Annual Boston University Conference on Language Development. Boston, MA.

[also available online at http://128.197.86.186/posters/30/SheaBUCLD2005.pdf]

190

Shea, C., & Curtin, S. (accepted, to appear in Second Language Research) Context and

the Production of Second Language Allophones.

Shea, C., & Curtin, S.. (accepted, to appear in December 2010, Studies in Second

Language Acquisition) Discovering the relationship between context and

allophones in a second language: Evidence for distributionbased learning.

Shea, C., & Curtin, S. (in prep). Phonetic biases and frequency in the acquisition of L1

alternations.

Shea, C. (2005) Acquisition of Spanish stopapproximant alternation by English L2

Spanish speakers. Paper presented at the Hispanic Linguistics Symposium.

Penn.State University, PA.

Sluijter A. M. C., & Heuven V. J. (1996). Spectral balance as an acoustic correlate of linguistic stress. Journal of

the Acoustical Society of America 4, 2471–2485.

Sluijter A. M. C., Heuven V. J. & Pacilly J. J. A. (1997. ). Spectral balance as a cue in the perception of

linguistic stress, Journal of the Acoustical Society of America 1, 503–513. Smit AB, Hand L., Freilinger J.J., Bernthal J.E., Bird A. (1990). The Iowa articulation

norms project and its Nebraska replication. Journal of Speech and Hearing

Disorders 55, 779–798.

Smith, J.L. (2008). Markedness, faithfulness, positions, and contexts: Lenition and

fortition in Optimality Theory. In J. Brandão de Carvalho, T.Scheer & P. Ségéral

(Eds). Lenition and Fortition, 312345. Berlin: Mouton de Gruyter.

Soderstrom, M., Morgan, J.L, Conwell, E., & Feldman, N. (2009). Statistical learning in

language acquisition: Beyond demonstrations, towards a theory. Developmental

Science

Stampe, D. (1979). A Dissertation on atural Phonology. Bloomington: IULC.

191

Steele, J. (2005). Positionsensitive licensing asymmetries and developmental paths in L2

acquisition. In L. Dektydspotter, R.A. Sprouse & A. Liljestrand (Eds.),

Proceedings of the 7th Generative Approaches to Second Language Acquisition

Conference (GASLA 2004 (pp. 226237). Somerville, MA: Cascadilla Press.

Steriade, D. (2007).Contrast. In P. de Lacy (Ed.), Handbook of Phonology (pp.139157).

Cambridge: Cambridge University Press.

Stevens, K.N., & Keyser, S.J. 1989. Primary features and their enhancement in

consonants. Language, 65(1), 81106.

Storkel, H. L. (2001). Learning new words: Phonotactic probability in language

development. Journal of Speech Language and Hearing Research, 44(6), 1321

1337.

Storkel, H.L. (2004). Methods for minimizing the confounding effects of word length in

the analysis of phonotactic probability and neighborhood density. Journal of

Speech, Language, and Hearing Research, 47, 14541468.

Strange, W., AkahaneYamada, R., Kubo, R., Trent, S.A., Nishi, K. (2001) Effects of

consonantal context on the perceptual assimilation of American English vowels

by Japanese listeners, Journal of the Acoustical Society of America, 109, 1691

1704.

Sundara, M. (2005). Acousticphonetics of coronal stops: A crosslanguage study of

Canadian English & Canadian French. Journal of the Acoustical Society of

America, 118(2), 10261037.

Tabachnik, B.G., & Fidell, L.S. (2007). Using Multivariate Statistics. Boston: Pearson

Education, Inc.

192

Toro, J.M., Sinnett, S., & SotoFaraco, S. (2005). Speech segmentation by statistical

learning depends on attention. Cognition, 97, B25B34

Vihman, M.M., Macken, M.A., Miller, R., Simmons,H., Miller, J. (1985). From babbling

to speech:A reassessment of the continuity issue. Language 61, 397445.

Vitevitch, M. S., & Luce, P. (1999). Probabilistic phonotactics and neighborhood

activation in spoken word recognition. Journal of Memory and Language, 40,

374–408.

Weber, A., & Cutler, A. 2004: Lexical competition in nonnative spokenword

recognition. Journal of Memory and Language, 50, 125.

Weiss, D.J., Gerfen, C. & Mitchel, A.D. (2009) Speech segmentation in a simulated

bilingual environment: A challenge for statistical learning? Language Learning

and Development, 5, 3049.

Werker, J.F. & Curtin, S. (2005). PRIMIR: A Developmental Framework of Infant

Speech Processing. Language Learning and Development, 1(2), 197234.

Werker, J.F., Fennell, C.T., Corcoran, K.M. & Stager, C.L. Developmental changes in

infants’ ability to learn similar sounding words. Infancy, 3 (1), 2002, 133

Westbury, J., & Keating, P. 1986: On the naturalness of stop consonant voicing. Journal

of Linguistics, 22, 145166.

Whalen, D. H., Best, C. T., & Irwin, J. R. (1997). Lexical effects in the perception and

production of American English /p/ allophones. Journal of Phonetics, 25, 501

528.

Whitley, M.S. (2002). English/Spanish contrasts: A course in Spanish linguistics 2nd ed

Washington, DC: Georgetown Univ. Press.

193

Yeung, H. H. & Werker, J. F. (2009). Learning words' sounds before learning how words

sound: 9monthold infants use distinct objects as cues to categorize speech

information. Cognition, 113(2), 234243.

Zampini, M.L.(1994). The role of native language transfer and task formality on the

acquisition of Spanish spirantization. Hispania, 77(3), 470481.

Zamuner, T.S. (2006). Sensitivity to wordfinal phonotactics in 9 to 16monthold

infants. Infancy, 10, 7795.

Zoll, C. (1998). Positional asymmetries and licensing. Handout of paper presented at the

annual meeting of the Linguistic Society of America, New York, January, 1998.