UNIVERSITY OF CALGARY
Allophone Acquisition: Exploring the Phonological System and the Nature of
Representations
by
Christine E. Shea
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
Department of Linguistics
CALGARY, ALBERTA
June, 2010
© Christine E. Shea 2010
Library and Archives Bibliothèque et Canada Archives Canada
Published Heritage Direction du Branch Patrimoine de l’édition
395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada
Your file Votre référence ISBN: 978-0-494-69501-2 Our file Notre référence ISBN: 978-0-494-69501-2
NOTICE: AVIS:
The author has granted a non- L’auteur a accordé une licence non exclusive exclusive license allowing Library and permettant à la Bibliothèque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l’Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans le loan, distribute and sell theses monde, à des fins commerciales ou autres, sur worldwide, for commercial or non- support microforme, papier, électronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. . The author retains copyright L’auteur conserve la propriété du droit d’auteur ownership and moral rights in this et des droits moraux qui protège cette thèse. Ni thesis. Neither the thesis nor la thèse ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent être imprimés ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author’s permission.
In compliance with the Canadian Conformément à la loi canadienne sur la Privacy Act some supporting forms protection de la vie privée, quelques may have been removed from this formulaires secondaires ont été enlevés de thesis. cette thèse.
While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n’y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.
UNIVERSITY OF CALGARY
FACULTY OF GRADUATE STUDIES
The undersigned certify that they have read, and recommend to the Faculty of Graduate
Studies for acceptance, a thesis entitled " Allophone Acquisition: Exploring the
Phonological System and the Nature of Representations" submitted by Christine E. Shea in partial fulfilment of the requirements of the degree of Doctor of Philosophy.
Supervisor, Dr. Suzanne Curtin, Departments of Linguistics and Psychology
Dr. John Archibald, Department of Linguistics
Dr. Darin Flynn, Department of Linguistics
Dr. Wei Cai, Department of Germanic, Slavic and East Asian Studies
External Examiner, Dr. Ellen Broselow, Department of Linguistics, State University of New York at Stony Brook
April 5, 2010
ii Abstract
The core premise of this dissertation is that the phonological system operates across detailed, rich representations, formed by tracking of distributional information in the input. This proposal is explored in a series of perception and production experiments with adult L1 English/L2 Spanish and L1 Spanish learning children acquiring the
Spanish stop approximant alternation. Allophonic acquisition provides an ideal testing ground for theories regarding the nature of the phonological system and phonological acquisition. Traditionally, allophones are characterized as resulting from categorical rules, which learners implement in their grammar through constraint interaction. In the studies that compose this dissertation, it is argued instead that learners track and store information about the phonological environment for each allophone. The emergent rich representations result from learners’ experience with the input. However, a further premise of this dissertation is that not all information will necessarily be available at all stages of learning and under all conditions (see Werker & Curtin, 2005). In the case of L2 learners, this will be largely a function of the native language filter while for children learning their first language it will be a function of natural biases and the acquisition of a lexicon.
These issues are explored within the context of Werker & Curtin’s (2005)
PRIMIR (Processing Rich Information from Multidimensional Interactive
Representations; Curtin & Werker, 2007; Curtin, Byers Heinlein & Werker, under review) framework that accounts for early language development. PRIMIR is grounded in the assertions that a) representations are exemplar like in nature and b) the phonological system is sensitive to distributional information in the input. Interacting
iii with these rich representations and distribution based learning mechanisms are three dynamic filters that direct information pick up: natural biases, task effects and developmental level. In this dissertation, I further elaborate the framework by adding the
L1 filter effect for L2 learners. The results from the three studies presented here suggest that learners store detailed phonetic information in their representations and experience interacts with their ability to draw upon this information in perception and production tasks. These results lend support to PRIMIR and in general to approaches which view phonological acquisition as sensitive to representations and learners’ experience with the input
iv Acknowledgements
My first and most heartfelt thanks go to my advisor, Dr. Suzanne Curtin, for her exceptional skills as an academic and mentor. You helped me enjoy what I do and always managed to keep me focused, oftentimes in spite of myself. I feel ready to take on whatever the future holds and that is due to you, Suzanne, and your faith in me and my abilities. Thank you.
I would also like to thank all my professors in the Department of Linguistics at the University of Calgary. I am grateful to Dr. John Archibald for his patience and assistance over the course of my PhD studies and for providing me with numerous opportunities to experience new things in addition to my research. Thank you, John, for everything.
I am also sincerely grateful to Dr. Betsy Ritter for making syntax not only understandable but even (almost) enjoyable. You set high, uncompromising standards, instil in your students a desire to reach them and most importantly, make us believe we can. One of my greatest pleasures was being your student and working with you. Thank you.
Thanks as well to Dr. Darin Flynn for his profound knowledge of all things phonological and for his contagious academic curiosity. Being your student is a constant adventure in discovery and fearless exploration of ideas and theories.
I would like to thank Dr. Susanne Carroll for sharing her perspective on second language learning and always being willing to impart invaluable advice on how to navigate the academic world. Thank you for all the time and support you have given me over the course of my PhD. I look forward to collaborating with you in the future.
v As well, thank you to Dr. Amanda Pounder for her tireless efforts as grad co ordinator. Your hard work and unwavering support made this whole experience much, much easier and less stressful. And a warm thank you to Dr. Stephen Winters for filling all those gaps regarding how sound works. Thank you.
I would also like to thank all my fellow and former linguistics grad students,
Antonio González, Ilana Mezhevich, Ashley Burnett, Danica MacDonald, Jamison
Cooper Leavitt, Keffyalew Gebregziabher, Kelly Murphy, Kim Meadows, Lindsay
Kirkpatrick, Nick Welch, Nina Widjaja, Rhonda Sim, Silke Weber and Sue Jackson.
Silke and Sue – we will have to plan another road trip soon, hopefully somewhere a little more exciting than Edmonton and to do something a little more exciting than an OT conference. I have loved having the two of you as colleagues. I will always treasure your friendship, support and intelligence. You have made this whole thing more enjoyable and memorable. Silke – if you are lucky, your babies will grow up to be linguists. Sue – if you are lucky, you can join Armando and me on the beach in Mexico.
Thanks as well to all my colleagues in the Speech Development Lab – Danielle,
Heather, Jen, Jenn, Jenna, Becky, Sally. It was wonderful working in such a supportive and positive environment. As well, I would like to thank Dan Hufnagle (our errant
Speech Development Lab member) for the great conversations and perspective taking chats. I learned a lot from you, Dan, and always value your opinions and ideas. Thank you.
Sarah Eaton and Jacquie Clydesdale – you were there for me in some tough moments and always helped me gain and maintain perspective. I will treasure your
vi friendship forever. The coffee, tea and walk sessions were some of the most therapeutic hours I spent during this whole thing. Thank you.
A warm, deep thanks to Stephanie Archer for all the hours and hours (and hours and hours!!) of talking, commiserating and griping as well as for all the times you talked me down off the ledge. I guess we were destined for a lasting friendship after that first afternoon way back when in the Grad Lounge, drinking our worries away. It is rare to find someone who truly, from the heart, celebrates your successes and sticks around even when the successes are few and far between. I found that in you and I value your friendship. Thank you, Steph (keep that Epi Pen close by – I want you around for a while longer).
I would also like to thank my parents – Anna and Bill Shea for letting me be who
I am and encouraging me to figure that out. No pressure, no demands, just letting me
‘be’, even when what that meant was not always clear. You play a fundamental role in all
I do and have always encouraged me to live my life in my own way. I am sure that must have been difficult at times, but I appreciate it more than words can ever say. All my love to both of you and thank you for loving me as much as you do.
On that note, a huge, enormous thanks to Katherine, Malique, Elizabeth and
Nathan for reminding me that we all need a little of that unconditional love and support in our lives. Katherine, you have helped me through the most difficult of times and just knowing that you are there somehow made it all seem OK that old feeling of invincibility, I guess. It is what has allowed me to get to this point and take all the risks that this journey has involved. Thank you for knowing me so well, loving me so much and keeping me so close.
vii And a long distance thanks as well to René, Karina, Tyara, Bruno and Rocío all the way down in México. You are my family and I love you all dearly. Now that the PhD is done hopefully we can visit more frequently.
Y por último, dedico esta tesis a ti, Armando, por haberme ayudado a creer en mí. Seguiremos juntos en esta gran aventura. Te amo con todo mi corazón.
viii Dedication
Dedico esta tesis a Armando. Mi amor.
ix Table of Contents
Approval Page ...... ii Abstract ...... iii Acknowledgements ...... v Dedication ...... ix Table of Contents ...... x List of Tables ...... xiii List of Figures and Illustrations ...... xiv
CHAPTER ONE: INTRODUCTION ...... 1 1.1 Learning a linguistic sound system: Experience, mechanisms and representations 1 1.2 Characterizing the alternation ...... 3 1.3 Approaches to linguistic sound systems: representations and grammars ...... 5 1.4 Experience and acquisition ...... 12 1.5 Phonetic and Phonological Models of L2 Speech Acquisition ...... 15 1.6 Overview of the dissertation ...... 20
CHAPTER TWO: PERCEIVING THE RELATIONSHIP BETWEEN PHONOLOGICAL ENVIRONMENT AND ALLOPHONES IN A SECOND LANGUAGE: EVIDENCE FOR DISTRIBUTIONAL LEARNING ...... 27 2.1 Introduction ...... 27 2.2 Experiment 1: Consonant and Vowel Stress Shift ...... 32 2.2.1 Method ...... 33 2.2.1.1 Participants ...... 33 2.2.1.2 Stimuli ...... 34 2.2.1.3 Procedure ...... 35 2.2.2 Results ...... 37 2.2.2.1 Stress perception: allophone + stressed vowel ...... 37 2.2.2.2 Stress perception: Testing for trochaic bias ...... 39 2.2.2.3 Logistic regression analysis ...... 43 2.3 Experiment 2: Allophone alternation, vowel steady ...... 46 2.3.1 Method ...... 48 2.3.1.1 Participants ...... 48 2.3.1.2 Stimuli ...... 48 2.3.1.3 Procedure ...... 49 2.3.2 Results ...... 49 2.3.2.1 Stress perception: allophone + non prominent vowel ...... 50 2.3.2.2 Testing for trochaic bias ...... 50 2.3.2.3 Logistic regression analysis ...... 52 2.4 General Discussion ...... 54 2.5 Conclusions ...... 60
CHAPTER THREE: EVIDENCE FOR A NON CATEGORICAL PHONOLOGICAL SYSTEM: ADULT L2 ALLOPHONE PRODUCTION...... 62 3.1 Introduction ...... 62
x 3.2 Experiment: Stop approximant production data ...... 66 3.2.1 Method ...... 66 3.2.1.1 Participants ...... 66 3.2.1.2 Stimuli ...... 68 3.2.1.3 Procedure ...... 69 3.2.1.4 Phonetic Analyses ...... 69 3.2.2 Results ...... 72 3.3 General Discussion ...... 83 3.4 Conclusions ...... 87
CHAPTER FOUR: EVIDENCE FOR DETAILED REPRESENTATIONS IN L1 ACQUISITION: FREQUENCY AND ALLOPHONE PRODUCTION ...... 89 4.1 Introduction ...... 89 4.2 Phonetic universals and language specific effects in phonological development ...90 4.3 Corpus I: recorded data ...... 96 4.3.1 Participants ...... 97 4.4 Presentation of the data ...... 98 4.4.1 Data ...... 98 4.4.1.1 [ d ] deletion ...... 109 4.4.2 General discussion of the data: phonetic groundedness ...... 112 4.5 Corpus II: CHILDES database ...... 116 4.5.1 Description of the C and CDS Corpora ...... 116 4.5.2 Frequency data for C and CDS Corpora ...... 118 4.6 Discussion and Conclusions ...... 125
CHAPTER FIVE: GENERAL DISCUSSION AND CONCLUDING REMARKS ...... 128 5.1 Introduction: Summary of the dissertation results ...... 128 5.2 PRIMIR ...... 131 5.3 PRIMIR L2 ...... 136 5.4 L1 filter ...... 142 5.5 Nature of the phonological system: Evidence for distribution based learning in an L2 ...... 147 5.6 Phonological Mechanisms: evidence for comparison and contrast in an L2 and tracking of multiple levels of information ...... 149 5.7 Probabilistic updating of representations ...... 155 5.8 Role of the lexicon in L2 sound category acquisition ...... 157 5.8.1 Role of the lexicon in a gradient phonological system ...... 159 5.9 Conclusion ...... 161
APPENDIX A ...... 162
APPENDIX B ...... 164
APPENDIX C ...... 165
REFERENCES ...... 172
xi
xii List of Tables
Table 1. Stress perception on first syllable across groups and onsets ...... 40
Table 2. Results of the hierarchical logistic regression for Experiment 1 ...... 44
Table 3. Stress perception on first syllable across groups and onsets ...... 51
Table 4. Results of the logistic regression for Experiment 2 ...... 53
Table 5. L2 participant biographical data ...... 67
Table 6. Means and standard deviations on the dependent variables for the three groups ...... 75
Table 7. Results of Discriminant Analysis for phonetic and phonological environment cues ...... 76
Table 8. MG’s productions ...... 99
Table 9. FC’s productions ...... 100
Table 10. Positional log frequency ...... 119
xiii List of Figures and Illustrations
Figure 1. Ratio values for stops + stressed vowel perceived as stressed/approximants + stressed vowels perceived as stressed ...... 38
Figure 2. Proportion of syllables perceived as stressed ...... 42
Figure 3. Stop initial syllables perceived as stressed/approximant initial syllables perceived as stressed ratio values...... 52
Figure 4. Spectrogram of gato ‘cat’ ...... 70
Figure 5. Plot of group centroids ...... 77
Figure 6. MANOVA dependent variables ...... 78
Figure 7. Consonant x context for each group ...... 80
Figure 8. Schematized distribution of onset segments for FC ...... 108
Figure 9. Proportion attempted and accurate across target sounds ...... 114
Figure 10. C and CDS Positional frequencies correlation ...... 119
Figure 10. Log Likelihood Position Accuracy ...... 121
Figure 12. MG Log Likelihood Place of Articulation Accuracy ...... 122
Figure 13. FC Log Likelihood Place of Articulation Accuracy ...... 124
Figure 14. Log10 frequency counts for words realized with and without approximants in medial position ...... 125
xiv
xv 1
Chapter One: Introduction
1.1 Learning a linguistic sound system: Experience, mechanisms and representations
... the infant and the adult could never truly perceive the same speech input in the same way, nor could the L2 learner or bilingual perceive L2 or L1 speech in exactly the same way as native monolinguals of either language.
(Best & Tyler, 2007:13)
A language learner never experiences the same input twice over the course of acquisition. The input interacts with the developmental level of the learner, leading to distinct experiences and learning effects each time it is experienced. Moreover, across learners, the same input is never perceived in precisely the same manner. The nature of the input received and individual differences in learning interact in complex ways that ultimately reflect the overall effect of ‘experience’ on language learning.
This dissertation addresses two main questions related to the issue of experience.
First, what type of phonological system is required to account for the effect of experience on language acquisition? Second, what types of representations are created over the course of acquisition to support this system? In order to answer these questions, I present perception and production data from L1 English/L2 Spanish learners and production data from L1 Spanish children, focusing on the acquisition of the Spanish stop approximant alternation (b d g ~ ß ð γ).
2
Learning a language involves not only acquiring the contrastive sound categories that form its phonemic inventory but also acquiring the non contrastive sounds that surface in predictable contexts (Crystal, 1997). In the phonological literature, it is traditionally assumed that segments can be related either through contrast or allophony
(see, e.g., Steriade 2007). Two segments contrast if their distribution in the lexicon of a language is not predictable. Sounds that are related through allophony, on the other hand, occur in conditioned distributions. Each allophone occurs in a regular, predictable context
(Crystal, 1997).
An important part of allophone acquisition involves determining the correct context for each variant and one way learners might do this is by attending to the specific, context dependent cues that characterize each category. For example, an English learner is exposed to input that contains an alveolar sound with aspiration in word initial position
(e.g., [thab]) and is also exposed to input that contains a voiceless alveolar sound with no aspiration (e.g., [stap]). These two sounds share similar acoustic and articulatory characteristics but do not contrast lexical entries in English they fulfill many of the typical characteristics of allophones found across the world’s languages. Moreover, [th] and [t] represent a very particular type of allophonic relationship: complementary distribution. The likelihood of encountering one sound in the phonological environment where the other occurs is close to zero. The phonological environment can include sounds directly adjacent to the sound itself, sounds which occur at a predetermined distance from it, as well as the prosodic structure that directly contains the sound, such as the syllable,
3 the foot or the prosodic word (Hall, 2009). Part of learning an allophonic distribution involves connecting the correct allophone to its phonological environment.1
In this introductory chapter I present a brief description of the Spanish stop approximant alternation, followed by a short discussion of how allophones have been characterized in phonological theory and by exemplar based approaches. Subsequently, I address how experience, in terms of previous linguistic knowledge, is characterized in L2 speech learning, concentrating on allophone acquisition. Finally, the experiments comprising this dissertation are briefly discussed and I introduce PRIMIR L2, an extension to Werker and Curtin’s (2005) PRIMIR framework. PRIMIR L2 uses the same architecture and assumptions as the infant speech framework, with the addition of an L1 filter to account for the particularities of adult L2 acquisition.
1.2 Characterizing the alternation
In the Hispanic linguistics tradition, the stop approximant alternation is traditionally characterized as an alternation between stops and voiced spirants (Zampini,
1994; Lléo & Rakow, 2005). However, recent work has demonstrated that the
1 Complementary distribution is often cited as a necessary but not sufficient condition for an allophonic relationship to exist (Crystal, 1997). In addition, allophones also generally share certain acoustic and/or articulatory features. For example, in English the sounds /h/ and /ŋ/ occur in complementary distribution /h/ only occurs in syllable initial position and /ŋ/ in syllable final position, but no native speaker of English would ever consider these sounds to be allophones in the same way as [t] and [th].
4 relationship may be better characterized as involving stops and approximants (Hualde,
2005). Martínez Celdrán (2004) argued that these sounds are approximants because they do not exhibit turbulent airflow and moreover, [β ð γ] have a lower degree of articulatory precision than the spirants or fricatives. In traditional IPA, the symbols used for these allophones are accompanied by the subscript for lowering, which reflects the more open articulatory nature of the approximants as compared to the fricatives or spirants. For the remainder of the dissertation the segments will be referred to as approximants but for ease of exposition the subscript will not be used.
Phonological descriptions (e.g., Mascaró 1984; Harris 1969) have characterized the alternation in terms of feature spreading, with the stop generally proposed as underlying and the approximant as the allophone. Under this view, the feature [+cont] spreads from the adjacent vowels to the [ cont] stops, rendering them approximants.
Stops surface after a pause, after nasals and the alveolar stop surfaces after the lateral /l/.
To account for the surfacing of [d] after /l/, researchers posited an underspecified representation for /l/ in Spanish. Face (2002) provides a phonetic and phonological account that is based in the similarity of place of articulation between the lateral and the alveolar stop. This, according to the author, is why the stop has a strong release.
More recent research has examined the non categorical phonetic realizations of the alternation and considered conditioning factors such as word position and stress.
According to Hualde (2005), more open, approximant like articulations occur post tonically rather than in the onset of a stressed syllable (p.142, see also Lavoie, 2001;
Ortega Llebaría, 2003; Shea & Curtin, to appear). The conditioning factors of word position and stress have a variable effect not only on the alternation as a whole but also
5 on the different consonants themselves: More stop like productions are observed with the bilabial segments than with the velars (Cole, Izkarous & Hualde, 1997; Ortega Llebaría,
2003). Examples are provided in (1):
(1) Examples of allophones across contexts:
Word initial Word Initial Word medial Word medial Phrase medial Stressed Unstressed Stressed Unstressed
bicho gusano adentro cabalgar la bata [ɑbitȓo] [guɑsano] [aɑðentȎo] [kaβalɑgaȎ] [laɑβata] ‘bug’ ‘worm’ ‘inside’ ‘to trot’ ‘the housecoat’
The primary acoustic cues to this alternation are the presence of a release burst and segmental intensity: stops have audible release bursts and are less intense than approximants. These acoustic cues are internal to the allophones. However, accurate production and perception also requires knowledge of where each alternant occurs. Adult
L2 learners of English must begin to connect the /b/ at the beginning of [bit] with a release burst and recognize that the [b] in Spanish that occurs in the middle of two vowels and does not have a release burst. In other words, learners must recognize where in the input the specific phonetic cues occur, or their phonological environment.
1.3 Approaches to linguistic sound systems: representations and grammars
The long standing question of how people learn to perceive and produce language has lead researchers to posit various types of mechanisms and representations that might
6 play a role in language acquisition. A formal linguistic model of grammar such as that proposed in the generative tradition, postulates that phonological knowledge consists of a series of rules or constraints that operate across abstract, minimal representations of lexical items. In their work The Sound Pattern of English (1968), Chomsky and Halle state that the phonological grammar component maps between the syntax and the phonetics and is responsible for applying the necessary rules to arrive at the correct phonetic form. In other words, phonology serves as the intermediate component between abstract representation of syntax and the physical realization of sounds.
From this perspective, learning is categorical and systematic, and the resulting lexical representations contain no redundant or predictable information (Chomsky &
Halle, 1968). The lexicon is fully separated from the rules and constraints that form the grammatical output. Sounds that contrast (i.e., phonemes) and those which do not (i.e., allophones) are differ in that non contrastive sounds are the result of rule application.
Specifically, the lexicon is assumed to consist of a series of underlying forms, which contain only contrastive information. These underlying forms pass through the grammatical system of rules or constraints, which in supply allophonic information. An example of a rule might be A → B / D __ E (i.e., A becomes B when it occurs between D and E). Because allophones are predictable, they are not part of the underlying lexical representation. The problem with this traditional characterization, however, is that there are examples of allophones that are a) not predictably distributed in all positions (e.g., only voiceless stops can occur syllable finally in German, but voiced and voiceless stops alternate freely in syllable onset position) and b) even those that are predictably distributed in all positions often demonstrate gradiency and probabilistic influences in
7 their realizations. To address this, Goldsmith (1995) suggests that contrast should be thought of as a ‘cline’ rather than a binary distinction.
In constraint based approaches such as Optimality Theory (OT, Prince &
Smolensky, 1993), input output relations are non serial and relationships between types of contrast do not strictly exist (Hall, 2009). Instead, OT grammars yield language specific outputs through the ranking of different types of constraints on phonological outputs: faithfulness constraints require the output to retain certain characteristics of the input, and markedness constraints require the output to have certain phonetic characteristics regardless of the form of the input. Thus, under OT, allophones occur in the output as a result of constraint ranking, specifically, constraints that are relativized to particular positions in the word in order to reflect allophonic alternations. For example, there are families of constraints related to positional faithfulness (Beckman, 1998) or positional markedness (Zoll, 1999). High ranking positional faithfulness constraints will lead to contrasts and high ranking positional markedness constraints will lead to allophonic variation conditioned by the phonological environment. In OT, contrastive and allophonic relationships can be easily expressed in the theory, despite the fact that they do not formally play a role in the articulation of the theory itself. According to Hayes
(2004), all forms of contrast in OT emerge strictly from the ranking of constraints, not from any inherent relationship in the phonemic inventory of the language itself.
OT grammars themselves also emerge through the re ranking of universal constraints over the course of acquisition. Most models assume that re ranking is a consequence of an error analysis process that drives constraints either up in the ranking or down, according to the number of violations the particular constraint incurs for the
8 input provided (see e.g., Boersma & Hayes, 2001). There is no role for the lexicon in traditional OT based learning algorithms (although see Escudero & Boersma, 2004, for an OT model that includes lexical feedback). The learner cannot draw upon previously experienced inputs to evaluate the current one – it is hypothesized that the grammar is the only mechanism that can evaluate input.
Recently, exemplar based models (see, e.g., Goldinger 1996; Johnson 1997, 2005;
Pierrehumbert 2001a, 2001b, 2003a, 2003b, 2006; Bybee 2000, 2001b, 2003) of phonological and phonetic knowledge have presented an alternative approach to OT and rule based models of grammar. Exemplar based models assume that all information found in the input is stored in the multidimensional phonetic space. Grammar emerges as a consequence of generalizations across these stored clusters once there is a large cluster of similar exemplars that can be identified as a category (Pierrehumbert, 2002, 2003a;
Hall, 2009). Categories emerge when the connections amongst certain exemplars are stronger than the connections to other exemplars. This may be the result of higher level, top down factors, such as spelling or lexical knowledge. Allophones will necessarily share more connections amongst themselves in the multidimensional space than phonemes (Johnson, 2005; Hall, 2009). Importantly, in most exemplar models, allophones are not necessarily tied to the notion of belonging to the same overarching category. There is no need to specify a ‘b’ category that subsumes all exemplars of [b] and [β]. Instead, allophones and contrastive segments are at two ends of a continuum that can be understood as endpoints of a similarity relationship (Ladd, 2006).
In order to correctly cluster sounds in the multidimensional space, models that use exemplar based representations assume that all categorization proceeds based upon
9 previously encountered exemplars and representations are constantly shifting to reflect the most current input experienced. Speech perception is probabilistic in nature – listeners learn and keep track of complex probabilistic distributions in the course of processing language. In such probabilistic phonological models (Pierrehumbert, 2003), words are represented by various levels of generalizations, which encode abstractions that may be parametric in nature (such as formant transitions) or abstractions across entire words. Similar sounding tokens that share semantic meaning or other higher level similarities (e.g., spelling) will begin to shift the exemplar's own space. This is how lexical frequency effects emerge. As learner representations become more robust, the space within the distribution itself will shift to reflect different modes, or peaks, of cue congruence. Given this, one of the basic assumptions of models that employ exemplar type representations is that the information learners have stored over the course of their experience with a language will play a primary role in directing all subsequent categorization of input. Previous learning guides the way in which information is picked up and used by learners and speech perception is a process of optimizing categorization of the input given the noise present in the signal (Feldman, Griffiths & Morgan, 2009).
There is a growing body of research supporting the proposal that phonological processing by native speakers of a language is closely linked to this optimization process, whereby new input is processed and categorized based upon previously existing clusters and lexical items. Again, frequency will necessarily play a strong role, as more frequently encountered items will coalesce into more robust representations. The more often listeners hear a word the more entrenched that word becomes, as do the sublexical patterns that make up the words themselves (Edwards, Beckman & Munson, 2000).
10
These patterns are typically referred to as phonotactic probabilities because they encode the likelihood, or probability, that a certain sequence of sounds will appear in a word or in a particular position within a word.
Phonotactic probabilities can affect adult speech production and perception. For example, adults are faster to repeat nonwords with high frequency consonant vowel sequences (Vitevitch & Luce, 1999), and listeners are biased towards hearing ambiguous sounds as examples of high probability sequences (Pitt & McQueen, 1998). Adults also give higher acceptability ratings to words that conform to attested phonotactic patterns
(Coleman & Pierrehumbert, 1997; Frisch, Pierrehumbert & Broe, 2004; Munson, 2001), judgments which have also been shown to be sensitive to vocabulary size in native speakers (Frisch & Zawaydeh, 2001). Research on infant speech perception has shown that infants as young as six months are sensitive to the phonotactics of their native language and prefer sounds that occur in their ambient language more frequently to sounds that either do not occur or occur with less frequency (Jusczyk, Luce & Charles
Luce, 1994).
In terms of adult production, Goldrick and Larson (2008) demonstrated that for adult native speakers, phonotactic probability affected production accuracy of non words independent of phonetic complexity. For children, Storkel (2001) found that 3 6 year old children learned new words more rapidly when the words contained high probability sequences than when they contained low probability sequences. Similar results from other studies led researchers to hypothesize that children with larger vocabularies will have more robustly generalized phonological system (Edwards, Beckman, Munson,
2001). This hypothesis is based on the notion that representations of frequent, familiar
11 sublexical patterns are more easily accessed during production and less easily shifted by new input, because they are more robustly instantiated.
Under this approach, the grammar emerges epiphenomenally, based upon experience with language specific distributional information. As learner experience with the input grows, their vocabulary also expands, which serves to further support the sublexical patterns which occur in the words they already know. Indeed, Edwards,
Beckman & Munson (2004) show that vocabulary size in children aged 3 9 years is correlated with production accuracy on novel words with high and low frequency sequences, supporting the notion that the lexicon emerges from experience with the distributions of sounds in the target language.
Approaches advocating a separation of the lexicon from the grammar have difficulty accounting for the evidence cited above, given that the frequency of a lexical item cannot be arrived at from its phonological properties. Most formal linguistic models of phonology are all strictly grammatical and can interact with lexical items only in terms of their grammatical properties (their phonological and morphological properties) lexical frequency is viewed as an idiosyncratic property of the lexical item itself and must be stored along with it. Recent work by Coetzee (2008) represents an attempt at incorporating lexical frequency effects into a model of the grammar by means of lexically indexed faithfulness constraints that assign a single lexical item to different lexical classes on different occasions. Each item is associated with a distinct probabilistic distribution, determined by usage frequency, which ultimately assigns the item to a particular lexical class. Coetzee’s model addresses phonological variation in English associated with t/d deletion and how usage frequency is related to the deletion rates.
12
While Coetzee’s model is one of the few that take an Optimality Theoretic perspective and directly incorporate frequency based phonological variability effects, his proposal still maintains a separation of the grammar and frequency, or ‘extra linguistic’ factors, by means of lexical indexing.
1.4 Experience and acquisition
One element all models of acquisition draw upon, however, is learner experience with the input. In the case of adult L2 learners, experience is generally considered in terms of the native language and/or the amount of exposure to the target language, typically as a function of Age of Acquisition (AoA), amount of target language use and length of residence (LoR). In the case of child L1 learners, experience refers to a combination of age, syntactic and lexical development, given that age is not always a reliable predictor of linguistic development in young children. As learners accumulate experience with the target language, their perception and production of its sound system will shift and new representations may be created or previously created representations may be re enforced.
For adults acquiring a second language, linguistic experience (as a combination of the L1 and the other factors mentioned above) plays a determining role in how the non native sound contrasts are perceived and produced. In terms of perception, numerous studies have shown that not all non native sounds will be perceived equally, with some discriminated well and others not at all (Best, McRoberts & Goodell, 2001; Best &
Strange, 1992; Polka, 1995). The relative ease or difficulty of perception is assumed to
13 depend upon the native language of the listener, whereby sounds that are closer to native language categories will be more difficult to perceive than those that are most different from native categories.
It has been shown that listeners are sensitive to phonetic properties of the target language that may or may not be similar to those of the native phonology. This sensitivity across levels also occurs with respect to non contrastive variation within categories of the target language sound system (Best & Tyler, 2007). Non native listeners are affected by contextual factors (Levy & Strange, 2008) and when presented with phonetically variable target language segments, category goodness ratings shift (Allen & Miller, 2001). The perception of non native phonetic contrasts is also affected by native language phonotactic knowledge, whereby native listeners ‘hear’ sounds that repair phonological input to conform with L1 biases (Dupoux, 1997; Hallé , Segui, Frauenfelder & Meunier,
1998) . The cumulative results from this body of work indicate that L2 learners can and do perceive non contrastive information in their second language.
Phonologically, different allophones are part of the same phonemic category and thus are treated as the same perceptual object by native speakers of a language (Jaeger,
1980; Pegg & Werker, 1997; Whalen, Best & Irwin, 1997; Peperkamp et al., 2003; Shea
& Curtin, 2005; Kazanina, Phillips & Idsardi, 2006; Boomershine, Currie Hall, Hume and Johnson, 2007). 2 The explanation for this finding is that because allophones are
2 Phonological categories are compared to phonetic categories where the former can contrast lexical items and the latter do not. Allophones and other sound categories that result from coarticulatory effects are classified as phonetic categories in most approaches to speech development.
14 variants of the same abstract phonological category, listeners are not very good at discriminating between them – an effect of categorical perception. However, recent research suggests that the type of allophonic relationship will have an influence on the discrimination results. Specifically, Celata (2007) showed that allophones in complementary distribution demonstrate the traditional categorical perception effect of poor discrimination while for allophones that result from neutralization, discrimination scores actually improved (Celata, 2007).
Celata (2007) also found task effects for allophone discrimination. L1 Tuscan
Italian listeners carried out an AX discrimination task, where allophones were discriminable at the same level as phonemes, and an ABX/2AFC+gating identification experiment, where the allophones were not discriminated at all. These results were in line with other research suggesting a phonological level of perceptual mapping and a second, phonetic level, which depended upon the ISI and surrounding stimuli (Pegg & Werker,
1997). Together, these studies indicate that under certain task conditions and certain allophonic relationships, native speakers do not perceive differences between allophones.
However, this does not necessarily apply to non native speakers. As stated, an L1 filter effect may be at play in L2 speech perception that renders cues to target language sound categories imperceptible. The determining factor is assumed to be the way in which the new sounds assimilate to native language categories. This is addressed by the two main models of foreign language speech perception, the Speech Learning Model (SLM, Flege,
1995, inter alia) and the Perceptual Assimilation Model (PAM, Best, Sithole &
McRoberts, 1988, inter alia)
15
1.5 Phonetic and Phonological Models of L2 Speech Acquisition
As Best and Tyler (2007) state, a main question in the cross language speech perception literature has been whether learners show perceptual shifts of L2 contrasts they were initially unable to differentiate, or differentiated poorly. The answer is affirmative, albeit with certain caveats. Specifically, perceptual learning does occur for some contrasts, but its success will depend upon the degree of similarity to the L1 and the relative amount of experience the listeners have had with the target language.
Interestingly, it appears that the greatest amount of perceptual learning occurs within the first year of exposure to the L2 and no significant perceptual learning differences have been found in adults with under one year of experience and those with 1.5 years (Jia,
Strange, Wu, Colander & Quan, 2006).
Two of the better known and tested models of L2 perceptual learning are the
Perceptual Assimilation Model (PAM: Best, 1994, 1995, inter alia) and the Speech
Learning Model (SLM: Flege, 1995; 2002, inter alia). PAM was developed to account for how non native or naive listeners perceive foreign language speech and the SLM was designed to examine L2 speech production by adult second language learners. Crucially, neither PAM nor SLM are restricted to predictions grounded in L1 phonological categories. Both models address non contrastive phonetic similarities and dissimilarities between the L1 and L2 phones. In fact, one of the principles of SLM involves the notion that the actual targets of speech learning are positional allophones.
PAM is founded in the idea that the focus of speech perception is on information about the distal articulatory events that produced the speech signal (Best, 1994, 1995), a
16 position that is compatible with Articulatory Phonology (Browman & Goldstein, 1992).
PAM posits that perceivers extract articulatory information from the speech signal, rather than forming categories from acoustic phonetic cues. In terms of testable predictions,
PAM establishes a set of relationships that might result when naïve listeners are asked to discriminate non native sounds. Specifically, PAM takes into account how each phone in a contrasting non native pair is perceptually assimilated into the most articulatorily similar native phone, as either categorizable (good exemplar of native phone) or uncategorizable (poor exemplar of native phone). PAM predicts that for listeners with no previous experience with the target language, new linguistic sounds will be perceived through the phonological system of the native language, and consequently the phonetic and phonological levels of the target language will be conflated. For L2 learners, the phonological level of the target language is predicted to be key because these learners have access to lexical knowledge and can therefore begin to form contrastive phonological categories (Best & Tyler, 2007).
In terms of experience, both the SLM and PAM share the notion that new sound category formation is possible throughout the lifespan. Importantly, the SLM posits that the L1 and the L2 share a common phonological space. Thus, distinct from PAM, the
SLM assumes that perceivers form categories based upon acoustic phonetic cues, rather than articulatory gestures.
The SLM also predicts that new category formation is more likely for sounds that do not correspond closely to a sound in the native language, stored in long term memory.
For new phonetic categories to be formed, the learner must discern at least some of the phonetic differences between the novel L2 and the closest L1 sound. The SLM posits that
17 learners can develop the capacity to perceive non native phonetic features over the course of acquisition, but even if the features are in place, the L2 listener may not grant each feature, or cue, the same weight as native speakers of the language do. Thus a ‘new’ L2 category may be based on different features or feature weightings than the corresponding category in a monolingual speaker (Flege, 1995).
An important part of cue weighting functions involve language specific biases and part of the second language learning task is to acquire the correct cue weighting for the target language categories. It is important to understand precisely how L2 learners weight cues and how their weighting might shift in order to fully grasp how L2 sound category acquisition proceeds. Escudero and Boersma (2004) propose a formal linguistic model of cue weighting in which adult second language learners incorporate cues that are used to distinguish categories in their target language but are not used in their native language. Escudero and Boersma’s L1 Spanish speakers learned to use the duration cue in their perception of the English /i/ / I/ contrast, where their native language relies primarily upon spectral information to distinguish among vowels. Their learners noticed that English vowels were functionally differentiated by the duration cue and, with increased language experience, L1 Spanish speakers learned to differentiate between these two vowels. According to Escudero and Boersma, L1 Spanish/L2English learners use a general learning mechanism that interacts with language specific experience.
The SLM, PAM and the model proposed by Escudero and Boersma account for how naïve learners categorize non native sounds, how L2 learners will categorize the sounds of their target language and how an OT perception grammar can model this process. The SLM and Escudero and Boersma’s (2004) model claim that learning new
18 sounds will be a process of acoustic cue weighting, whereby the listener shifts the internal cues that characterize categories in the native language to the correct weighting for the target language cues. For the SLM, the mechanism which carries out this process is not specified. For Escudero and Boersma’s model, it is assumed to be frequency of exposure to the input, i.e., experience.
Related to the work by Escudero and Boersma (2004) is a body of research that examines linguistic cue weighting outside of the OT framework. These studies have generally examined implicit changes over the course of acquisition (e.g., Bohn, 1995;
Cebrian, 2004; Flege, Bohn & Jang, 1997; Morrison, 2006). Work by Francis and colleagues (Francis, Baldwin & Nusbaum, 2000; Francis & Nusbaum, 2002; Francis,
Kaganovich & Driscoll Huber, 2008) considers a role for attention in cue weighting by adult second language learners. In Francis, Baldwin and Nusbaum (2000), participants were given a category level feedback in identification training tasks, which allowed them to implicitly infer the role of specific acoustic cues. Subsequent analyses revealed that these acoustic cues were weighted more heavily in post training tasks.
According to Holt and Lotto (2006), cue weighting characterizes how a listener integrates auditory information (and potentially information from other modalities as well) in perceptual categorization. From this perspective, speech categorization is not just a matter of detecting cues but also assigning the cues the correct weighting function which, according to Holt and Lotto (2006) depends upon the listener’s experience with phonetic distributions. In a recent experiment, Holt and Lotto (2006) trained listeners on a two dimensional acoustic cue space, where the two cues were the center frequency
(CF) and modulation frequencies (MF) of frequency modulated sound waves, i.e., non
19 linguistic sounds and cues. The two cues were psychophysically matched to be equally discriminable and were equally informative for accurate categorization. In spite of this, in
Experiment 1, listeners’ categorization responses reflected a bias for use of one cue over the other (CF>MF). In Experiment 2, Holt and Lotto (2006) changed the informativeness of the preferred cue to more overlapping distributions and the bias was still found.
However, when greater variance was provided for the preferred cue in addition to shifts in distributional information, cue weighting preferences shifted. From this, the authors concluded that cue weighting can be affected by natural biases and also by shifting distributional information in the input.
There is evidence, however, that perceptual learning can proceed differently under explicit conditions. Adult learners, distinct from infants, can be explicitly told what specific aspects of the input to pay attention to. In a study carried out by Guion and
Pederson (2007) using stimuli taken from Hindi stop contrasts, a ‘sound attending group’ was told that even if the beginnings of two words sounded the same, where they referred to lexical items with different meanings, they were in fact two different sounds. The
‘meaning attending group’, on the other hand, was told that if two words referred to something different, they were different words, whether the difference was perceptible or not. Guion and Pederson’s results showed a 5.7% improvement for the sound attending group over a baseline pretest and only a 2.6% improvement for the meaning attending group. The authors interpret these results as suggesting that distinguishing differences between first and second language phonetic categories benefits from explicit attention to phonetic information.
20
The studies cited above show that learners demonstrate a natural bias towards weighting some cues over others – even in non linguistic stimuli, as shown by Holt and
Lotto (2006) and attention can lead to an explicit awareness of allophonic phonetic contrasts that may assist in acquisition. These studies are particularly relevant to the results obtained here. In the case of adult L2 learners of Spanish, the stop approximant alternation is explicitly addressed in classroom instruction, which means that in instructed
L2 Spanish contexts, explicit attention to the cues involved (even if not referred to in phonetic terms of release burst/intensity values but rather as phonetic categories) informs learners in advance that they will ‘hear’ two different sounds at the phonetic level, both of which may be initially mapped onto the voiced stop L1 category.
1.6 Overview of the dissertation
The experiments carried out in this dissertation with adult L2 learners address how contextual factors affect the perception and production of the allophones and how experience with Spanish determines the nature of the interaction. In this sense, the contribution of this dissertation to the L2 speech literature is slightly different from previous studies, which have mainly examined how learners categorize – whether at a phonological or phonetic level – target language allophones. Instead of examining this categorization process directly, I present perception data that addresses how allophonic knowledge affects the perception of contextual factors. I also present production data that examines the issue from the opposite perspective, how the contextual factors of stress and word position affect the production of second language allophones. Collectively, these
21 experiments suggest that learners a) use a distribution based mechanism in acquiring second language allophones and b) create representations that are rich in phonetic detail.
Distribution based learning involves tracking information in the input that can be
subsequently drawn upon when perceiving or producing language. Upon hearing the phonetic cues that indicate one variant consistently occurring in the same context,
learners begin to build the distribution for this allophone. Another set of cues will be
associated with the other variant, stored as part of a separate context.
The basic assumption of all distributional, or statistical, approaches to acquisition is that the learner is aware of which units are relevant to the learning process and must be tracked. One way learners do so is by tracking the variability that occurs in the input.
Specifically, more variability occurs at transitions between units than within units
(Saffran, Aslin & Newport, 1996; Saffran, Newport & Aslin, 1996, inter alia) and learners can exploit these regularities to locate relevant units and also the boundaries between them. Boundaries between categories may be characterized by high variance; low variance indicates that items are members of the same category (Soderstrom,
Conwell, Feldman & Morgan, 2009) Given this, variance in the input that creates peaks and valleys can be highly informative to the learner while homogeneity of variance may indicate that a particular analysis can be either abandoned (in the case of speech segmentation tasks) or pursued (in the case of category formation).
Exemplar based models are grounded in distribution based mechanisms. These
models assume that all information is stored and can be drawn upon in subsequent
categorization and production tasks (Johnson, 2005). However, the accessibility of this
information will depend on various factors, such as individual experience and whether
22 sufficient generalizations about the input have been formed. At the earliest stages, such clusters are sparse and as experience increases, representations become more robust.
Following Munson, Edwards and Beckman (in press), I assume that representations are latent variables and cannot be directly observed. Instead, the nature of representations can only be inferred from behavioural patterns. The data presented here will allow inferences to be drawn regarding the types of representations created over the course of L2 phonological acquisition and furthermore, provide evidence for the type of phonological system is required to create them.
Examining the acquisition of allophones in complementary distribution allows a closer consideration of issues related to representations and the nature of the phonological system. Because their distribution is predictable and therefore contextually dependent, it is possible to analyze how learners’ perception and production are directly affected by the cues in the input and infer how learners use such information over the course of acquisition. From this, it is also possible to make inferences regarding the phonological system itself – whether it is gradient or categorical in nature. For example, unlike phonemes, target like perception and production of allophones could result from the application of a categorical, systematic rule. The application of this rule may be the end state of acquisition (see Chomsky & Halle, 1968, among others, for this approach) or it could be a by product of the learning situation – learners apply abstract rules before they have accumulated extensive experience with the language. In other words, rules can be applied to incipient representations that do not necessarily comprise all the details
23 included in representations that exist at later stages. 3 These early stage rules could be a consequence of explicit classroom instruction regarding a particular sound or process.
Alternatively, experience could lead to the formation of a nuanced, context sensitive generalization applied to detailed representations. On this view, learners do not apply an abstract, categorical rule to acquisition but rather accumulate experience with the input and only subsequently can carry out abstractions (Pierrehumbert, 2003a; Werker &
Curtin, 2005).
In the next two chapters I examine how adult L2 Spanish learners acquire the contextual factors that condition the stop approximant alternation in their target language.
Chapter 2 presents data from the perception of stressed syllables by Low and
Intermediate level L1 English/L2 Spanish learners, Native Spanish and Monolingual
English speakers to investigate how experience with Spanish interacts with the perception of stress, one of the conditioning factors of the allophone alternation. L1 English/L2
Spanish learners must integrate the phonetic cues that serve to distinguish the more stop like segments from the more approximant like segments and link the former with word onset, stressed syllable onset position and the latter with word medial, unstressed syllable onset position. I predict that if L2 learners track the distribution of this alternation, they should link stops to stressed syllables in word onset position and approximants to unstressed, word medial position. In Experiment 1, the allophone onset was crossed with vowel stress. In Experiment 2, the allophone onsets were alternated and the vowel was
3 As will be discussed in Chapter 3, this does not mean that representations change over time. Instead, experience with the input will lead to the creation of denser, more robust representations that still contain all the previously stored information.
24 held steady. Results show that less experienced groups were more likely to perceive stressed vowels as stressed, regardless of onset consonant and approximant onset syllables as stressed. On the other hand, listeners with greater Spanish experience were more likely to perceive stress on stop initial syllables only and were less influenced by the stress quality of the vowel. This pattern follows Spanish distributional information.
These results suggests that learning the interplay between allophonic distributions and their conditioning factors is possible with experience and that contextual factors play a role in second language allophone acquisition.
Chapter 3 includes production data from Low Intermediate and High Intermediate level native English/Spanish second language speakers and five native Mexican Spanish speakers. I examined the use of two cues to the alternation: consonant intensity and the presence of a release burst, and analyzed how these cues varied in participants’ productions in distinct contexts. Results show that the use of these cues differs with experience. That is, learners with greater language experience exhibit cue use that is closer to the native speakers. Results further suggest that Low Intermediate learners may be using a basic rule for producing the alternation but over time this shifts to a more nuanced production pattern, indicating that more experienced learners’ ability to use these phonetic cues in a native like fashion emerges over the course of allophone acquisition.
In Chapter 4, I present child production data of the allophonic alternation by children learning Spanish as their native language. The findings support the existence of natural biases in the acquisition of these segments and also show language specific effects.
25
In order to account for the data presented in this dissertation, we require a learning mechanism that can track where in the word a particular sound occurs
(distributional information) and how this information is subsequently stored (nature of the representations created) so that the learner can draw upon it when faced with similar input. In the final chapter of this dissertation I present a framework for second and first language acquisition that can adequately address these issues, an extension to PRIMIR
(Werker & Curtin, 2005). The PRIMIR framework (Processing Rich Information from
Multidimensional Interactive Representations, Werker & Curtin, 2005) is grounded in two observations: first, rich information is available in the speech stream and second, the listener filters that information (Werker & Curtin, 2005). Importantly, PRIMIR offers an explanation for why some information is available at certain stages of development and not at others, thereby offering a comprehensive explanation for developmental patterns, task effects and attentional effects.
In PRIMIR, Werker and Curtin posit the existence of three dynamic filters, which serve to enhance the raw acoustic saliency of information in the input and can also diminish and/or transform information in the input. The first filter is the result of certain evolutionary and epigenetically based biases. For example, infants prefer speech to non speech, prefer infant directed speech, point vowels and can process rhythmic patterns in speech (Werker & Curtin, 2005:212). The second filter that operates on the infant’s language learning is the developmental level. Younger infants will necessarily have fewer cognitive resources upon which to draw when processing language. Finally, the task itself constitutes the third filter and directs the infant’s attention to certain aspects of the input over others.
26
While PRIMIR was originally designed to account for infant speech development,
I present an extension to PRIMIR, PRIMIR L2, to address the data discussed here.
PRIMIR L2 includes a fourth filter that operates in addition to those mentioned above, the native language, or L1 filter. The L1 filter also serves to direct information pick up and operates on the input together with the other three filters. The L1 filter operates on exemplar representations that store all the information present in the speech stream right from the learner’s first exposure to the target language. The shifting and interaction of the dynamic filters will determine whether or not this information is available to the learner.
27
Chapter Two: Perceiving the relationship between phonological environment and allophones in a second language: Evidence for distributional learning
2.1 Introduction
Studies of L2 speech perception have primarily explored how target language sounds fit into the sound system of the speaker’s native language. In particular, these studies examine whether non native sounds represent new categories, are classified into existing native language phoneme categories, or if they are similar to existing allophones.
For example, Flege’s (1995) Speech Learning Model (SLM) and Best’s Perceptual
Assimilation Model (PAM) (Best 1994; Best, 1995; Best & Tyler, 2007) emphasize
L1/L2 perceptual similarity as predictive of difficulties in discriminating non native contrasts. Flege characterizes non native phones along a continuum of similarity to L1 phones from ‘‘identical’’ through ‘‘similar’’ to ‘‘new’’. New phones may be difficult to perceive by inexperienced listeners, but the sound will eventually become differentiated from L1 phones (and other L2 phones) as L2 learners gain experience. Best’s PAM maintains that listeners assimilate non native phones into L1 categories based upon a perceived continuum of ‘‘category goodness’’. If two non native phones are considered to be of the same native category, they will be very difficult to discriminate. If the two phones are perceptually assimilated to the same native category, but differ in their perceived ‘‘category goodness’’ they will be easier to discriminate. Finally, if non native phones are assimilated to different native categories, they will be very easy to
discriminate. While these models have provided testable hypotheses regarding how
28 second language learners acquire new sounds, they do not explicitly address the effects of context on the acquisition of target language sounds and thus it is difficult to draw clear predictions for the second language acquisition of sounds that differ in the nature of their contrastiveness, such as allophones.
Most models of L2 phonological and phonetic acquisition regard allophone acquisition as similar to that of phoneme acquisition. In the experiments discussed here, the allophone acquisition task is approached from a slightly different perspective. Instead of focusing on the acquisition of new sound categories and how L2 allophones assimilate into the native language sound inventory, I examine whether learners are sensitive to the contextual factors found in the phonological environment that condition the allophones’ distribution. In other words, do learners recognize and store information about the specific context in which each variant occurs? Learners with greater language experience are predicted to be aware of which factors condition the allophonic alternation and this awareness will shift their perception of the target language. The experiments discussed in this chapter assess how L1 English/L2 Spanish learners perceive and make use of the conditioning factors driving the stop approximant alternation in their target language.
The experimental task involved identifying stressed syllables based on the onset consonant that accompanies the syllable and/or the position in the word. This indirect method of detecting participants’ knowledge of the allophonic alternation permits a closer consideration of how experience affects the use of contextual cues. The occurrence of either the stop or approximant allophone is contingent upon the phonological
environment – where in the word it occurs and whether the syllable is stressed. Given
this, it was posited that more experienced learners use stress and word position as
29 probabilistic cues to the stop approximant alternation. Evidence in favour of this relationship would support the claim that learners associate the contextual factors to the allophonic variants and have fairly sophisticated knowledge about sound alternations.
One way learners might do so is by means of a distribution based learning
mechanism. Researchers have shown that both adult and infant listeners are able to form
categories based on the distributions of speech sounds and shift their perception of
allophones by using this type of distribution based mechanism (e.g. Maye & Gerken,
2001; Maye, Werker & Gerken, 2002; Goudbeek, Cutler & Smits, 2006; Holt & Lotto,
2006). A distribution based learning mechanism can track information in the input, for
example, the co occurrence of sounds, and allows learners to draw upon this information
over the course of acquisition. By their very nature, distribution based models assume
learners create and store highly detailed, rich, exemplar type representations and
grammar emerges as the result of generalizations across all the stored items in the lexicon
(see, e.g., Goldinger, 1997, Johnson & Mullennix, 1997; Pierrehumbert, 2001b, 2003). In
the present case, the learning task for L1 English learners of Spanish involves creating a
new distribution for the approximant category, separate from the Spanish stop category
that is very similar to their L1 category. However, accumulating the knowledge required
to determine how context affects the production of the voiced stops in Spanish requires
time and exposure to the target language. The separate categories will emerge only when
the learners have stored sufficient examples of each category and the phonetic details that
distinguish them.
If adult second language learners track distributional information in the input,
they will expect to hear more stops in word initial, stressed position than word medial
30 unstressed position. Thus, learners will be more likely to perceive stress when the stressed syllable is accompanied by a stop consonant and less likely to perceive a syllable as stressed when it begins with an approximant. Given that it takes extensive linguistic experience to build sound category distributions, more advanced learners will associate the alternant with its most probable context of occurrence; less advanced learners will not do so. The fact that the distribution of the stop approximant alternation in Spanish is not categorical, other than the post nasal contexts for stops (and post lateral in the case of
[d]), suggests that acquisition must also be probabilistic in nature. The likelihood of a stressed syllable occurring in word initial position with a stop consonant in the onset is much greater than hearing it in word medial position accompanied by an approximant in the onset, but it is still not 100%.
The acoustic correlates of lexical stress in Spanish and English are similar, other than for unstressed, or reduced syllables, which are common in English and do not occur in Spanish. In English, studies have consistently indicated that for disyllabic words, the acoustic correlates of fundamental frequency (F0), intensity, syllable duration and vowel quality are associated with the perception and production of lexical stress (Lieberman,
1960, 1975; Sluijter and van Heuven, 1996; Sluijter et al., 1997). Stressed syllables have higher F0, greater intensity and longer duration than unstressed syllables. In Spanish, stressed syllables also receive greater prominence by means of pitch, duration and intensity, although the difference in quality between stressed and unstressed vowels is very small (Hualde, 2005). Studies by Llisterri and colleagues (Llisterri, Machuca, de la
Mota, Riera & Ríos, 2003; Llisterrí & Schwab, 2010) show that in Spanish, F0 is the only parameter systematically related to the identification of the stressed syllable of a
31 word, while the role of duration depends on the stress pattern. While in English, duration and intensity can function as independent cues to lexical stress, in Spanish only F0 does so. In general, English lexically stressed syllables can be indentified in the absence of pitch prominence because of vowel quality and durational differences. In Spanish, pitch is a more fundamental correlate of stress. The experiments presented in this chapter use stressed and non stressed vowels recorded by a female native Spanish speaker. There are no shifts in vowel quality nor are there any accentual contextual effects at play. While stress is acoustically manifested in different manners between English and Spanish, the same cues are used, with the difference lying in the crucial role of F0 for Spanish.
In Experiment 1, listeners heard CVCV nonce words, crossed for allophone onset and stressed/unstressed vowels to determine if the perception of stress shifts according to the allophone onset. Following the predictions stated above, learners with knowledge of the relationship between phonological environment and allophones will be more likely to select a syllable as stressed if it begins with a stop consonant and has a stressed vowel, than a syllable with an approximant onset and stressed vowel. In Experiment 2, the vowel was equated for stress and only the syllable onsets alternated between stops and approximants. Listeners with more Spanish experience should select stop onset syllables as stressed with greater likelihood than groups with less Spanish experience, given their increased knowledge of Spanish distributional information.
32
2.2 Experiment 1: Consonant and Vowel Stress Shift
In Experiment 1, participants were presented with bisyllabic CVCV stimuli and asked to select which syllable they perceived as stressed. Because stress is one of the conditioning factors driving the allophonic alternation, and stress is more likely to co occur with stop onsets than with approximant onsets in the Spanish input, Spanish proficient listeners should perceive stress with greater likelihood on stop initial syllables than approximant initial syllables. Stress detection served as an indirect method of determining the perceptual association of stress with stop onset syllables. This indirect behavioural method circumvents problems with phonetic vs. phonological representations4 and also arrives at the key question motivating these experiments: are learners aware of the contextual factors that drive allophonic alternations in their target language?
4 Asking listeners for either discrimination or categorization responses may have led learners of distinct proficiency levels to tap into separate levels of representations. Specifically, the L1 Spanish speakers may tap into the allophones at either the phonetic (resulting in two categories) or phonological (one category) level. L2 Spanish learners, on the other hand, could be tapping into two phonetic categories without considering them necessarily as part of the Spanish sound system. At the phonological level, L2 Spanish listeners could be tapping into the stop category without having unified the two allophones representationally. Thus, in behavioural terms, the L1 and L2 groups could be performing in the same way, but with underlyingly distinct motivations.
33
2.2.1 Method
2.2.1.1 Participants
Participants were 19 Low Intermediate L1 English/L2 Spanish learners and 20
High Intermediate Spanish learners, recruited from second and third year university level
Spanish classes. Students were recruited from different sections of the same course (eight distinct instructors). Participants filled out an autobiographical questionnaire regarding their experience with Spanish, which revealed that no participant from either group had spent more than six weeks in a Spanish speaking country and none spoke Spanish outside of the classroom context. Participants had received explicit instruction regarding the stop approximant alternation during their class sessions. This alternation is mentioned as
‘softening’ of the ‘b, d and g sounds’ when they occur between vowels. This is covered in the low and higher level classes. None had previously taken Spanish phonology or phonetics courses.
Fifteen Native Spanish speakers (NSS) were recruited from the Center for the
Teaching of Foreign Languages at the National Autonomous University of Mexico,
(CELE UNAM) in Mexico City, Mexico. They were age matched with the native
English speaker learner groups. None of the participants had ever lived abroad, none had attended a bilingual school nor did they have more than three hours per week of contact with English. Finally, 15 Monolingual English speakers (ME) were recruited from a university psychology subject pool and were also age matched with the two learner groups. None of these participants spoke any language other than English. All
34 participants were paid the equivalent of $15.00 for their time or, in the case of the
Monolingual English speakers, received course credit. None had any reported hearing difficulties.
2.2.1.2 Stimuli
The stimuli were created from naturalistic speech samples, recorded by a native female speaker of Spanish from Mexico City. Recordings were carried out in a sound proof booth and made directly onto a PowerMac computer (GIA417” Soundcard) and a
Sennheiser microphone. The microphone was placed into a stand and maintained at a 45o angle at all times, approximately 5.5cm from the speaker’s lips. The speech tokens were sampled at a rate of 44.1Hz with a quantization of 16 bits and saved directly onto the computer’s hard drive.
The speaker was asked to read a list of bisyllabic CVCV nonce words, in which the first syllable was stressed, following the expected stress pattern of Spanish. She was told that the nonwords were ‘Spanish’ and was asked to read them in Spanish. The consonants were [b], [d] or [g] and the vowel was [a]. Using PRAAT 5.1 (Boersma &
Weenink, 2008), the consonants were spliced from the vowels to create four separate sounds: stop (word onset), approximant (second syllable onset), stressed vowel and unstressed vowel. For example, the nonce word ‘baba’ [báβa] provided four separate segments; [b], stressed [a], [β] and unstressed [a]. These four sounds were combined to create four different tokens: [báβa], [βába], [baβá] and [βabá]. This procedure was
35 repeated for both [d] and [g], creating a total of 12 tokens. Stimuli ranged in length from
67ms to 78ms.
All stimuli were presented to two native English speakers and two native Spanish speakers and judged for naturalness on a scale of 1 5, where 5 represented ‘extremely natural sounding’ and 1 was ‘extremely artificial sounding’, or how speech like the sounds seemed to them. Stimuli that did not originally receive a rating of 4.5 or higher was re spliced and presented to the judges again.
In the present case, approximants do not occur – or at least very rarely occur in word initial position, following a pause and any stimuli exhibiting this phonotactic pattern will necessarily be deemed unnatural to native Spanish speakers. Asking native speakers to rate stimuli that violate the phonotactic constraints of their native language may be difficult. However, the raters were asked to give a global impression of how
‘Spanish like’ or ‘English like’, i.e., speech like, the sounds seemed to them. Care was taken not to direct their attention to any specific aspect of the stimuli, minimizing the likelihood that they would pay too close attention to the allophone onsets.
2.2.1.3 Procedure
For participants tested in Mexico, the experiment was carried out in a small, quiet room with the door closed. For participants tested outside of Mexico, the experiment was conducted in a soundproof booth. Participants were seated at a table in front of a
PowerMac computer and stimuli were presented through headphones at a comfortable volume using PsyScope (Cohen, MacWhinney, Flatt, & Provost, 1993) experimental
36 software program. Participants were told they were going to listen to non words in
Spanish. For the Native Spanish speaker group and the two learner groups, all communication occurred in Spanish (for the Low Intermediate group, clarifications were given in English where requested). The Monolingual English speaker group was told the non words were from a foreign language. Participants were instructed to select the syllable they perceived as stressed by means of pressing a key on the computer keyboard.
The keys were marked with a sticker indicating either ‘1’ or ‘2’. Subsequent tokens were played after the participant made their selection, with an ISI of 1000ms. If no decision was made in 1500ms, the trial timed out and the following trial was played. This occurred in 2.5% of all trials.
Participants were given five pretest trials before beginning the experimental trials.
The test trials were randomly selected by the experimenter from among the test stimuli, different for each participant. Because there were technically no ‘right’ answers listeners were predicted to perceive stress either according to the vowel or the consonant onset, based upon language experience no feedback was provided during the pretest trials.
All participants first completed a stress detection task prior to the pretest trials to ensure the results would not be compromised due to the inability to detect stressed syllables. The stress detection task involved listening to a series of 20 nonce words that followed the phonotactic and prosodic constraints of Spanish read by a female speaker of
Mexican Spanish, who was asked to read the words in Spanish. All ‘words’ had the form of nouns (i.e., no infinitives and words that were analogically comparable to conjugated verbs were avoided, see Appendix A). Participants indicated by means of pressing keys
37 on the computer keyboard whether they thought stress fell on syllable ‘1’ ‘2’ or ‘3’.
Sixteen of the twenty lexical items were bisyllabic, eight with stress on the first syllable and eight on the second. The four remaining items were trisyllabic and stress fell either on the first (2) or the third (2) syllable. Only participants who obtained at least 75% accuracy on this task had their results included for analysis.
2.2.2 Results
2.2.2.1 Stress perception: allophone + stressed vowel
It was necessary to exclude 4 participants from the Low Intermediate group and 6 from the High Intermediate group because they did not reach the 75% criterion on the
Stress Detection task. This gave a final total of 15 participants for the Monolingual
English, Low Intermediate and Native Spanish speaker groups. For the High Intermediate group n =14.
Recall the prediction that stress perception for low level learners would not be affected by the allophone in the onset position of the syllable and instead would only be affected by the vowel. Low level learners are predicted to perceive stress in accordance with the prominence of the vowel. To test this, a one way ANOVA was carried out to determine whether there were overall differences among the groups in terms of connecting stressed vowels to one or the other allophone. Group was the independent variable and for the dependent variable, a ratio value was calculated as follows:
38
stops + stressed vowel perceived as stressed = Ratio approximants + stressed vowel perceived as stressed
The Native Spanish and High Intermediate groups are predicted to have ratios greater than 1, indicating more syllables with initial stops and stressed vowels were perceived as stressed than syllables with initial approximants and vowel stress. For the
Low Intermediate and Monolingual English groups, the ratios are predicted to be around
1, indicating a lack of preference for either allophone. Figure 1 presents the means for
each of the groups:
Figure 1. Ratio values for stops + stressed vowel perceived as stressed/approximants + stressed vowels perceived as stressed
The results from the one way ANOVA were significant [F(3,56)=30.85, p
<0.001]. The groups with less Spanish experience had ratios that were close to 1 (Low
Intermediate: M=1.3, SD=.25; Monolingual English: M=.95,SD =.15) while the ratios for the two groups with more experience were significantly greater (Native Spanish: M=2.78,
39
SD=.122; High Intermediate: M =2.1, SD=.7). Tukey’s HSD post hoc tests showed significant differences among the Native Spanish speaker group and the two lower proficiency groups, and between the High Intermediate group and the two lower proficiency groups (all ps<0.05), but not between the High Intermediate and Native
Spanish speaker groups. These results show that the more experienced groups perceived stress significantly more often on stop initial syllables than approximant initial syllables, suggesting listeners with more Spanish experience associate stress with the stop allophone.
It is possible, however, that the selection of stressed syllables is also influenced by knowledge of the predominant stress pattern in English and Spanish. To determine whether such a generalization was occurring, I examined whether the selection of stressed syllables aligned with the most frequent stress pattern in Spanish and English.
2.2.2.2 Stress perception: Testing for trochaic bias
Of the 4 829 most frequent polysyllabic words in Spanish, those ending with a vowel followed a trochaic stress pattern 87.5% of the time (Alameda & Cuetos, 1995). In
English, the most common word type is bisyllabic with a trochaic stress pattern. Only about a quarter of the words of English are polysyllabic with a weak initial syllable
(Cutler & Carter, 1987). Given this, it is possible participants are simply relying on predominant stress patterns of English and Spanish, rather than the allophone or the perception of vowel stress, the two independent variables used in the main analysis.
40
In order to investigate the proportion of stressed syllables responses that corresponded to initial stress and determine whether trochaic biases were affecting listeners’ stress perception, I examined the proportion of initial syllables perceived as stressed, independent of the allophone. First, the total number of trials successfully completed by each group was calculated (15 participants * 12 trials = 180 per group, except for High Intermediate group for which there were only 165 responses, n =14).
Subsequently, I calculated the proportion of first syllables perceived as stressed,5 independent of vowel prominence, followed by the proportion of trials perceived as initial stress with stop allophones. While it is possible that all groups potentially demonstrate a trochaic bias for CVCV forms, I predict that only the groups with greater
Spanish experience will show a preference for syllables with stops over approximants in trochaic contexts. The less experienced groups are predicted to be around chance (0.5).
Table 1 presents these results:
Table 1. Stress perception on first syllable across groups and onsets
Segment NES LI HI NSS
First syllable Trials 105/174 98/179 95/164 100/176 Proportion .60 .55 .59 .57
Stop onset 57/87 55/90 59/82 73/88 Trials .65 .62 .71 .81 Proportion
5 The totals do not sum to 180 (or 168 for the High Intermediate, with n = 14) because of the 2.5% miss rate. A total of 18 trials were discarded, distributed as follows: Native Spanish speakers: 4; Native English speakers: 6; Low Intermediate L2 Spanish: 1; High Intermediate L2 Spanish: 4.
41
All groups perceived stress on the first syllable at proportions above chance, results which are consistent with the lexical statistics of Spanish and English. To permit adequate comparisons among the groups, I calculated ratio values (proportion stop allophoneσ1+stressed vowel /proportion approximant allophoneσ1+stressed vowel) and carried out a one way ANOVA with groups as the independent variable and the ratio values as the dependent variable. There was no significant effect for stress detection on the first syllable (F[3, 55]= 1.97, p>.05). A second one way ANOVA was conducted with stop onset syllables in initial position as the dependent variable (also a ratio). The results were significant (F[3,55]=22.03, p<0.001). Tukey HSD post hoc tests revealed significant differences between the Native Spanish speaker group and the other three groups
(p<0.001), but no significant differences emerged among the High Intermediate, Low
Intermediate and Monolingual English groups (all ps >0.05).
These results show that all four groups of listeners show a bias towards hearing trochaic stress patterns, but the Native Spanish speakers demonstrate an additional bias towards perceiving stress on syllables that have stop onsets, consistent with the Spanish distributional information. The High Intermediate group did not differ from the other three groups in their preference for stop onsets in initial position of trochaic non words.
This result was somewhat surprising, given that we expected the High Intermediate group to pattern largely along the same lines as the Native Spanish speakers. One explanation may be that the preference for trochaic bias in stress perception overrides any preference for stop onsets, even for groups that have had extensive experience with Spanish.
According to the predictions outlined in the introduction, the groups with greater
Spanish experience should be most likely to perceive stress on stop initial syllables with
42
stressed vowels, regardless of the position in the word. To determine whether this prediction held, I conducted a goodness of fit chi square test on the proportion of syllables perceived as stressed for each of the four possible onset vowel combinations,
reported for each group. If in fact the groups with more experience prefer stops as onsets to stressed syllables, there should be a difference amongst the four combinations, with the
different allophone types clustering together for the more proficient learners and the
stressed unstressed vowel factor clustering together for the less proficient learners. These results are presented in Figure 2:
Figure 2. Proportion of syllables perceived as stressed
For the Native Spanish speaker group, preference for syllable stress was not
equally distributed, χ2 (3, N=176) = 4.21, p<0.05. For the High Intermediate group, the
same results held, χ2 (3, N=174) = 7.22, p<0.05. For the Low Intermediate and
Monolingual English speaker groups there were no significant differences across the four
contexts (all ps>0.05). The finding that language experience led to significant differences
43 in the perception of stress across the four contexts shows the pivotal role played by this variable in terms of how the allophone drives stress perception in Spanish.
The results reported in this section show that native language trochaic stress perception preference affects both L1 Spanish and L1 English listeners. However, L1
Spanish listeners perceived initial syllables with stops as stressed at a significantly higher rate than the L1 English groups. Thus, the hypothesis that experience determines this particular aspect of stress perception is partially supported.
In the following section I examine which factor allophone onset or vowel stress
– is most likely to affect the stress perception of each group. The previous results show that all groups are affected by a bias towards trochaic stress, which is attributable to their
L1, so we now require an analysis that can reflect the likelihood that a particular group will be affected by vowel stress or allophone onset. To this end, a logistic regression analysis was carried out.
2.2.2.3 Logistic regression analysis
Logistic regression allows us to connect the predictor variables to the probability that they have an effect on the dependent, or outcome variable. A hierarchical logistic regression was used because it allows the testing of each predictor in a cumulative manner and it is better suited to analyses with a small n (Tabachnik & Fidell, 2007).
The three levels used in this analysis were as follows: Stress Vowel (Level 1) included all syllables with a stressed vowel collapsed across allophone; Allophone type and Stressed Vowel (Level 2) included allophone variant and vowel stress
44
(approximant+stressed vowel, e.g., βába; stop+stressed vowel, e.g., báβa); Position
(Level 3) included position and allophone (approximants in word medial position and stops in word initial position, collapsed across stress).
Table 2. Results of the hierarchical logistic regression for Experiment 1 Odds Ratio 6 (SE) Predictor NSS HI LI MES
Level 1 Stressed Vowel .542 .688** 1.707* 2.026** (.196) (.153) (.193) (.223)
Level 2 Stressed Vowel .733 1.116 1.430 1.285 (.314) (.261) (.250 (.243) Stop + Stressed Vowel 1.362 .820 .935 .964 (.214) (.204) (.196) (.239) Approximant + Stressed .593* .613** 1.207 1.930* Vowel (.264) (.234) (.164) (.899)
Level 3 Stressed Vowel .636 1.026 1.39 .966 (.373) (.271) (.332) (.331) Stop + Stressed Vowel 1.369 .888 1.316 .724 (.279) (.215) (.269) (.355) Approximant + Stressed .849 .663 1.345 1.825* Vowel (.337) (.284) (.253) (.306) Approximant Medial .474 .866 1.145 2.323 (.462) (.326) (.357) (.590) Stop Onset .545 .649 1.115** 2.093* (.544) (.452) (.711) (.734) Significance values are in brackets.
* Wald χ2, df=1, p<0.01, ** Wald χ2< 0.05.
6 values greater than 1 indicate likelihoods greater than chance
45
Overall, the results indicate that experienced listeners are more likely to rely on the allophones when detecting a stressed syllable. The results show that at Level 1,
Stressed Vowel reached significance for the Low Intermediate and Monolingual English speaker groups, suggesting that this predictor was particularly good at identifying members of these groups. For the High Intermediate group, significance was also reached, but the odds ratio is less than 1, indicating that members of the High
Intermediate group are less likely to perceive stress on syllables with a stressed vowel than either of the other two learner groups. This predictor did not reach significance for the Native Spanish speaker group. Thus, if participants relied primarily on whether the syllable contained a stressed vowel, they were likely to be members of the Low
Intermediate group and the Monolingual English group and less likely to be in the High
Intermediate group.
The Level 2 predictors (Stressed Vowel, Approximant+Stressed vowel,
Stop+Stressed vowel) demonstrated that if a participant was likely to perceive stress on an approximant initial syllable, he/she was twice as likely to be a Monolingual English speaker than a member of one of the other speaker groups. This shows that Spanish experience affects the likelihood of perceiving stress on an approximant initial syllable.
Finally, the results for the Level 3 predictors (Position) showed that the
Monolingual English speaker group was significantly more likely than the other three groups to perceive stress on approximant medial and stop initial syllables. If we consider first the likelihood of stress perception for approximant medial syllables, the more experienced Spanish groups should have very low probabilities of perceiving stress in this context. These results were borne out by the logistic regression analysis. Only the
46 group with minimal Spanish experience was significantly more likely to perceive stress on approximant, word medial syllables. If we consider the results for the stop initial syllables, the fact that the Monolingual English group was more likely to perceive stress on these syllables independent of vowel stress indicates a strong effect for position on the probabilities of stress detection by this group. This appears to contradict the initial hypothesis that stress perception for the more experienced groups would be driven primarily by the allophone. However, a possible explanation may lie in perceptual biases that drive the Monolingual English listeners to prefer stress on initial syllables (i.e., trochaic stress pattern) to a greater extent than the other three groups. Because they are completely unfamiliar with the sounds that are used in the stimuli, when forced to rely purely upon the onset allophone, the Monolingual English group reverts to the bias they demonstrated above. Specifically, these listeners prefer initial stressed syllables, the pattern that predominates in English.
It still remains to be seen whether the perception of stress is being driven by the onset alone or whether the presence of vowel stress plays a key role. To investigate this, I carried out a second experiment in which only the onset was shifted – whether stop or approximant – and the vowel stress was held steady in both syllables of the bisyllabic non word. In other words, the vowel was ‘prominence neutral’.
2.3 Experiment 2: Allophone alternation, vowel steady
In Experiment 2 I examined whether shifting the consonant onset influences the perception of stress when the prominence of the vowel remains constant across syllables.
47
Participants listened to bisyllabic CVCV sequences in which the vowel was held steady and the syllable onsets alternated between the two allophones (e.g., baβa vs. βaba). Thus, the perception of stress is an illusion in this experiment. That is, stress is not explicitly present in the signal, but rather inferable from the presence of a stop onset, providing that the listener is sensitive to the distributional information connecting stress and stops in the
Spanish input. The presence of a stop onset is predicted to be one of the cues that native
Spanish speakers will use to detect stress in the signal. Other cues may come from the vowel and together, create the percept of stress for the listener.
In addition to the onset shifts, the duration of the onset consonants was also manipulated in order to test the hypothesis that shorter segments are more likely to be perceived as the onset to unstressed syllables than longer segments. Lavoie (2001) found that in addition to manner contrasts, the length of the allophone served to distinguish stop like productions from approximant like productions in a group of native Mexican
Spanish speakers. In the current experiment, native Spanish speakers and listeners with more Spanish experience are expected to use their knowledge of Spanish phonotactics to
‘hear’ stress on the syllable with the stop onset and not hear stress on the syllable with the approximant onset. The groups with less Spanish experience will have no such knowledge, or incomplete knowledge, to draw upon and therefore are predicted to perform at or around chance.
48
2.3.1 Method
2.3.1.1 Participants
The same participants from Experiment 1 took part in Experiment 2.
2.3.1.2 Stimuli
The stimuli for this experiment consisted of CVCV nonwords, created from the same naturalistic speech samples as Experiment 1. However, instead of shifting the vowel
[a] between stressed and unstressed values, the vowel was held steady and only the consonant onsets were alternated. A stressed vowel token was taken from the CVCV stimuli used in Experiment 1 and the intensity was adjusted to 75dB. The F1 value was
806Hz and F2 was 1628Hz and the duration was 74ms. Because stressed syllables are detectable only in comparison to unstressed syllables, by maintaining both vowels equal in terms of vowel duration, intensity and pitch, a ‘stress neutral’ CVCV item was created.
The length of the onset consonant was also varied in terms of duration, from values that were 33 67 100% of the original duration. Finally, the consonants were spliced onto the vowel and counterbalanced for allophone variant. The place of articulation was held constant within each CVCV sequence. This gave 18 possible CVCV combinations. For example, a nonce word with the velar versions of the allophones with the approximant in word initial position would take the form [γ]A[g]A. Stimuli ranged in length from 171ms
49 to 201ms. The 54 stimuli were presented randomly together with the 36 stimuli from
Experiment 1 in one block.
As with the first experiment, two native English speakers and two native Spanish speakers were asked to judge the stimuli for naturalness, on a scale of 1 5, where 5 represented ‘extremely natural sounding’ and 1 was ‘extremely artificial sounding’. Only stimuli rated 4.5 or higher were used for the experiment. The same caveats hold for the explanation of ‘naturalness’ in terms of these stimuli as for the first experiment.
2.3.1.3 Procedure
The same procedure as in Experiment 1 was used. Participants were told that for certain trials it ‘might be very difficult to perceive stress with 100% certainty’ and they should try their best to respond accurately.
2.3.2 Results
A mixed analysis of variance was carried out to determine if there was any effect for the three different onset lengths. Group was the between subjects variable and segment length (33%, 67% or 100%) was the within subjects variable. The results revealed a non significant main effect for group (F[3, 42]=1.8, p>0.05) and a non significant main effect for the within subjects variable of length (F[6,126]=2.1, p>0.05).
This permitted collapsing across consonant lengths for subsequent analyses.
50
2.3.2.1 Stress perception: allophone + non prominent vowel
As with Experiment 1, a one way ANOVA was run to determine whether participants perceived stress in higher proportions on stop syllables or on approximant syllables. Group was the independent variable and the following ratio measure was the dependent variable:
stop initial syllables perceived as stressed = Ratio approximant initial syllables perceived as stressed
There was a significant difference among the means (F[3, 55]=20.18, p<0.001).
Post hoc Tukey HSD tests revealed significant differences between the Native Spanish speaker group and the other three (p<0.01) and the Monolingual English group and the other three groups (p<0.01). There were no significant differences between the High
Intermediate group and the Low Intermediate group.
2.3.2.2 Testing for trochaic bias
As with Experiment 1, responses were examined to see if there was for a bias for perceiving stress on first syllable of the word. Of the 1566 possible test trial responses,
4% were discarded due to timing out, leaving 1503 responses for analysis. Table 3 gives the proportion of first syllables perceived as stressed, followed by the raw totals, for each group. The totals are then broken down into the percentage of syllables perceived with
51 stress on the first syllable and subsequently, the percentage of stressed syllables that were stop initial. This data is presented in Table 3.7
Table 3. Stress perception on first syllable across groups and onsets
Segment NES LI HI NSS
First syllable Trials 440/746 444/792 397/747 380/785 Proportion .60 .56 .53 .48
Stop onset Trials 228/440 229/444 215/397 229/380 Proportion .53 .52 .54 .60
To permit adequate comparisons among the groups, I calculated ratio values
(proportion stop allophoneσ1/ proportion approximant allophoneσ1) and carried out a one way ANOVA. The results did not reach significance (F[1,55]=1.8, p>0.05), possibly because the two higher proficiency groups clustered together, as did the two lower proficiency groups. The mean ratio for the Native Spanish speakers was 2.4 (SD = 0.6) and for the High Intermediate group it was 1.9 (SD= 0.7) demonstrating that these participants prefer to associate stress with stop syllable onsets. For the Low Intermediate group the mean was 1.1 (SD= 0.37) and for the Monolingual English speakers, the mean was .93 (SD=0.4) suggesting that participants in these groups did not distinguish between
7 The totals do not sum to 405 (or 378 for the High Intermediate, with n=14) because of the 4% miss rate. A total of 64 trials were discarded, distributed as follows: Native Spanish speakers: 25; Monolingual English speakers: 12; Low Intermediate L2 Spanish: 18; High Intermediate L2 Spanish: 9.
52 stop and approximant onsets as they related to stress. These results, while not significant, do suggest that the onset allophone interacts with the perception of stress as a function of group membership. Figure 3 presents these ratios:
Figure 3. Stop initial syllables perceived as stressed/approximant initial syllables perceived as stressed ratio values
3.5 3 2.4 2.5 1.9 2 Ratio 1.5 0.93 1.1 1 0.5 0 NES LI HI NSS
The results from this section indicate that with more Spanish experience, the onset allophone – whether stop or approximant can lead to an illusory stress perception effect.
The Native Spanish speaker group heard stress significantly more often on stop initial syllables than on approximant initial syllables as compared to the other three groups and the Monolingual English group perceived stress significantly less often than the other three groups when the syllable had a stop in onset position.
2.3.2.3 Logistic regression analysis
As for Experiment 1, a logistic regression was run to determine whether the likelihood of group membership was affected by the perception of stress, this time solely in terms of the allophone onset. The predictors were as follows: Approximant Onset, Stop
53
Onset. There were six syllables with approximant onsets and six with stop onsets in the stimuli set. The predictor variable most likely to affect the likelihood of belonging to the different groups is the perception of stress on a syllable with an approximant. The likelihood of perceiving stress on an approximant syllable should be close to 1 (chance) for both the Low Intermediate and Monolingual English speaker groups. The likelihood of perceiving stress on an approximant syllable should be lower for the two groups with more Spanish experience. Table 4 presents the results from this analysis:
Table 4. Results of the logistic regression for Experiment 2 ODDS RATIOS (SE) NSS HI LI MES
Approximant Onset .547** .687* 1.986* 1.048** (.175) (.138) (.242) (.207)
Stop Onset 1.07 1.096 1.407 .898 (.201) (.192) (.277) (.264)
Standard error values are in parentheses.*p<0.01, **Wald χ2, df=1,p<0.005
The approximant allophone predictor reached significance for all groups. The
Low Intermediate participants were more likely to select a syllable with an approximant in the onset as the stressed syllable and the likelihood for the Monolingual English group was around chance. The Native Spanish speaker and High Intermediate participants were less likely to select a syllable as stressed if it had an approximant in the onset.
These results add to those from Experiment 1 and further suggest that when vowel stress is not accessible as a direct cue to word stress, listeners with greater Spanish
54 experience follow the distributional information found in Spanish and disprefer approximants as the onset to stressed syllables. The fact that Stop Onset did not reach significance for any of the groups indicates that having stops in onset position is expected by both L1 groups. In other words, none of the groups was more likely than the others to perceive stop onsets as stressed, which follows from the expectation that a natural bias towards preferring stops is at work. It has been extensively documented in the literature on phonology that stops are ideal onsets, whether due to markedness or phonetically grounded motives (Archangeli & Pulleyblank, 1994; Prince & Smolensky, 1993).
These results further suggest that with increased Spanish experience, L1 speakers of English perceive an illusory stress effect, induced by the onset allophone in bisyllabic nonwords. Learners associate the stop allophone with stress and the approximant allophone with absence of stress, but only after considerable experience with Spanish.
2.4 General Discussion
In these experiments, I investigated whether L2 learners connect each allophone to its expected phonological environment and if so, whether language experience plays a role. The results suggest that learners are able to track the distribution of the allophones and over time, they begin to learn the relationship between the allophones and the contexts in which they surface.
One possible way to explain how L2 learners acquire allophones is through distributional learning. This was demonstrated in Experiment 1. As expected, based on the predominant pattern for main stress in both Spanish and English, all four groups
55 showed a preference for perceiving stress on the first syllable. However, upon closer examination, the bias in favour of stop initial, stressed syllables only occurred with the
Spanish proficient groups. This suggests that participants in the Native Spanish and High
Intermediate groups have acquired knowledge about the distribution of these allophones.
In particular, these listeners have connected the phonological environment of stress to a stop onset and lack of stress to an approximant onset. This shows that experience with
Spanish can actually shift the perception of stress in non native speakers in the direction of the distributional information found in their target language.
Distribution based learning mechanisms play a fundamental role in exemplar based models of phonological acquisition. In these types of models, phonological categories are represented as probability distributions over a mental phonetic acoustic/auditory map (Pierrehumbert, 2003a). Categories emerge as multiple exemplars that are phonetically similar accumulate in the same location on the phonetic map. In the present case, native Spanish speakers would have a large number of exemplars at the coordinates for a voiced bilabial approximant [β] and the voiced bilabial stop [b]. These two categories share many articulatory and acoustic characteristics, in addition to being represented by the same orthographic character.
Results supporting distribution based learning in native language allophone perception were shown by Maye & Gerken (2001). They demonstrated that the perception of allophonic contrasts can be modified after exposure to an artificial language containing tokens of the allophones with a certain statistical distribution. They tested the perception of the allophonic contrast between voiced [d] and the voiceless unaspirated [t] in English. Adult native speakers of American English were exposed to either a
56 monomodal or a bimodal distribution of tokens lying on a continuum between these two sounds. After exposure, subjects in the bimodal group performed significantly better than those in the monomodal group in a discrimination task, suggesting that the former but not the latter had constructed two separate categories.
Peperkamp, Pettinato and Dupoux (2002) explored the effect of contextual information on distribution based modifications to native language allophone categories.
They exposed native speakers of French to a bimodal and monomodal distribution of voiced/voiceless uvular trills in their native language. For one group, the trills were presented in context, whereby voicing was the result of assimilation to the following sound. The other group was presented with the same voiced/voiceless trill stimuli, following the same distributions, but without the contextual information. The group without context improved their perception of the contrast while the group exposed to the allophones in context did not. The authors argue that the condition without context led to greater improvement because learners were relying more upon phonetic perception while the other group was relying upon phonological knowledge. According to French phonology, both trills are members of the same phoneme class. In other words, the phonetic differences between the voiced and voiceless versions of the uvular trills were perceivable, but when they were in the correct phonological context for the realization of the allophones, this perceptibility was diminished. These results indicate that distributional information, in this case immediate contextual information regarding voicing assimilation patterns, plays a role in perceived non contrastiveness of allophones.
Peperkamp et al.’s (2002) results add to our finding that learners rely upon a distribution
57 based mechanism in their perception of allophones. Furthermore, our results show that the likelihood of learners’ using contextual cues is a function of language experience.
Another possibility is that listeners are transferring cue use from their L1. In
English, the alveolar stops /d/ and /t/ are flapped in word medial unstressed syllables. /b/ and /g/ do not undergo similar phonological processes. It is possible that the listeners are using their knowledge of flapping in English when perceiving the approximant allophones in Spanish. The evidence presented in this study indicates that this does not appear to be the case, however, as the low level learners and more particularly the
Monolingual English speakers do not show any preference for stop or approximant onsets as stressed or unstressed, indicating that they have not tracked either of these allophones as being more probabilistically related to stress than the other.
Under an exemplar based model, such effects arise when listeners rely upon information they have stored and probabilistically draw upon when exposed to input. The experienced Spanish listeners have representations that probabilistically associate stress with stop onsets. Their perception is biased towards perceiving stops and stress, yielding phonotactic sequences that are highly probable in Spanish. They are biased against hearing stress on approximant initial syllables for the same reason. The groups with less
Spanish experience have not built up sufficiently dense representations and are thus not biased in one direction or the other.
Language specific phonotactics have been shown to bias the perception of individual segments. Massaro & Cohen (1983) found that synthetic stimuli that are ambiguous between /r/ and /l/ tend to be perceived as /r/ when preceded by /t/ and as /l/ when preceded by /s/. The authors argue that perception is biased towards segments that
58 yield the legal clusters /tr/ and /sl/, rather than the illegal clusters /tl/ and /sr/. Similarly,
Hallé, Segui, Frauenfelder and Meunier (1998) found that illegal onset clusters in French are perceived as legal ones. In particular, illegal /dl/ and /tl/ are perceived as legal /gl/ and
/kl/, respectively. This could be part of the explanation for the results from Experiment 2, where more Spanish proficient listeners demonstrated an allophone induced ‘stress illusion’.
As language experience increases, listeners are more affected by the contextual cues, or conditioning factors that drive the allophonic alternation. Specifically, knowledge of probabilistic, distribution based information allowed our more advanced learners to recognize the factors that condition the allophonic alternation. In the present experiment, context effects – i.e., the onset allophone – actually shifted the perception of stress in the learners with greater Spanish proficiency. Lower proficiency learners did not demonstrate any such effects. This suggests that adult L2 speech perception shifts over time and becomes sensitive to the phonotactics of their target language. Similar results were obtained by Dupoux, Kakehi, Hirose, Pallier and Mehler (1999) who compared
French and Japanese native speaker perception of sequences that were respectively phonotactically legal and illegal in their native language. They found that the phonotactic properties of Japanese (very reduced set of syllable types) drove L1 Japanese listeners to perceive “illusory” vowels inside consonant clusters in VCCV stimuli. French listeners, for whom these sequences were legal, did not. Our results indicate that stop allophone onsets can induce the perception of stress in L1 Spanish listeners and L2 English listeners with high Spanish proficiency, an ‘illusion’ that is consistent with the distributional information found in Spanish, but not English.
59
According to PAM (Best, 1995, Best & Tyler, 2007), contextual factors will change how the target language sound is assimilated into native language categories. For example, the Spanish ‘b’ category may be realized as an approximant or as a stop, depending upon the context in which it occurs, and PAM predicts that this will affect the way in which the L2 ‘b’ allophone is assimilated into the L1 ‘b’ category. They will each assimilate into a different L1 category or possibly not into any L1 category at all. Thus, context can play a key role in PAM in terms of phonological categories. However, there is less clarity in terms of how allophones might be assimilated into the L1 on a phonological level. Specifically, L2 listeners may hear the two target language allophones as separate sound categories, may even initially want to assimilate them into two completely separate L1 categories (See Boomershine et al., 2007 for an example of this) but their knowledge of target language orthography or classroom instruction will drive them to classify them as variants of the L1 category. Thus, while the basic prediction of PAM in terms of non native category assimilation and how context may affect this holds, other factors such as orthography and explicit instruction may override this in the case of literate, adult learners. Moreover, the PAM’s assumption that non native categories are assimilated into native categories under certain conditions means that presumably, homophonous lexical items will result. Under exemplar based approaches, fine phonetic details that distinguish between the phonetic realizations of segments from different languages would prevent such assimilation from occurring.
Flege’s Speech Learning Model (SLM, Flege, 1995), provides an account for how
L2 learners acquire the sound categories of their target language. According to this model, phonetic characteristics of speech sounds are stored and production targets are
60 taken from these stored memories. L2 speech learning occurs across the lifespan, causing adaptations and changes to the L1 phonological system. For a new category to be learned, the SLM posits that the listener must notice the difference between the new category and the native language categories. All learning involves acquiring positional allophones in the target language – the acquisition of phonological categories is not addressed. Our results lend support to two important hypotheses of the SLM, specifically, that learning is possible and will occur as experience with the target language accumulates and that learners store phonetic details in their representations. However, the claim that phonetic differences drive the formation of new sound categories is less clear in terms of allophone acquisition. In the present case, learners must realize that there are two allophonic variants of the L1 voiced stop category, but that these two variants are in complementary distribution. Thus, we require some sort of mechanism by which differences can be noted, following the SLM, but category unity can still be maintained on an abstract level.
The results of this study suggest that adult second language learners use contextual information in their acquisition of target language allophones: the perception of stress was conditioned by the onset allophone and the position in the word, as a function of language experience. In a broader sense, these results point to the availability of a distribution based mechanism for adult second language acquisition and further suggest that language experience plays a strong role in how exactly this mechanism is used over the time course of second language acquisition.
2.5 Conclusions
61
The results presented in this chapter provide evidence for a phonological system capable of tracking distributional information in the speech stream. Furthermore, this distributional knowledge is gradually accumulated, as shown by the different effects for the contextual factors across distinct levels of experience with Spanish: listeners with greater Spanish experience demonstrated an illusory effect for stress, induced by the presence of a stop allophone in syllable onset position. These results speak directly to how contextual factors drive listener expectations regarding the allophone alternation and suggests that learner representations encode statistical information such as co occurrence likelihoods.
In the following chapter, I present stop approximant production data from a similar group of L1 English/L2 Spanish learners that speaks to the nature of the phonological system itself.
62
Chapter Three: Evidence for a non categorical phonological system: Adult L2 allophone production
3.1 Introduction
In this chapter, I explore another aspect of the phonological system, namely, whether it operates in a categorical or gradient manner. Stop approximant production data was collected from L1 English/L2 Spanish learners of different proficiency levels in an effort to determine whether their productions reflect the implementation of a categorical rule or instead reflect more fine grained knowledge of how the allophones are produced.
As stated, in order to acquire an allophonic alternation, learners must connect phonetic cues to the context in which each alternant occurs. In the present case, this involves integrating the phonetic cues that serve to distinguish the more stop like segments from the more approximant like segments and linking stops with word onset, stressed syllable onset position and approximants with word medial, unstressed syllable onset position. The cues considered here are the presence of a release burst and consonant intensity: stop like allophones will have release bursts and low intensity while approximant like allophones will have no release burst and higher consonant intensity. In order to form this connection, learners must be sensitive to the contextual factors that condition the alternation.
There are (at least) two ways learners might carry this out. The first involves a rule based phonological system that leads to categorical acquisition patterns (see
63
Chomsky & Halle, 1968). The second involves a more gradual input based system that leads to gradient, non categorical acquisition patterns (Pierrehumbert, 2003a; N.Ellis,
2008). The representations created by each type of phonological system – whether categorical or gradient will necessarily be different, as will the learning outcomes. If L2 learners are acquiring a rule, then they should treat all three members involved this allophonic alternation in the same way. If not, asymmetries may emerge, reflecting place of articulation differences.
Evidence for asymmetries in L2 production comes from a number of different studies. For example, Zampini (1994) examined the acquisition of the Spanish stop approximant alternation by L1 English speakers and found different rates of lenition across the three places of articulation ([d] and [g] were lenited more often than [b]). Other studies have demonstrated that L2 learners often show variation in their substitution patterns as well, related to where the L2 sound occurs in the word (see Brown, 1998, for a feature based approach to L2 positional effects). For example, learners whose first language bans voiced stops often master the voicing in onsets first while codas remain voiceless (Steele, 2005, for L1 German/L2 English). Such position sensitive asymmetries have been attributed to markedness (Eckman, 2007; Broselow, Chen & Wang, 1998 for
Mandarin L1/English L2), whereby more marked segments emerge in positions that are more common cross linguistically, and phonetic principles, whereby segments are acquired more readily in positions which favour the phonetic conditions for their realization (Colantoni and Steele, 2007 for L1 English/L2 French).
In the Generative Phonology tradition phonological knowledge is posited as a series of rules that operate across minimal, abstract representations of lexical items (see,
64 e.g., Sound Pattern of English, Chomsky & Halle 1968) or constraints (e.g., Optimality
Theory, Prince & Smolensky, 1993) that operate over possible outputs. Allophones are the product of rule application or constraint interaction. Because they are entirely predictable, allophones are not stored. Only contrastive sounds (i.e., phonemes) are stored in representations and the lexicon is considered fully separated from the rules and constraints that form the grammatical output (i.e. allophones). Learning is assumed to be categorical and systematic – rules are applied across natural classes in a non gradient fashion. More recently, research has shown that language users are sensitive to non categorical aspects of the signal. For example, frequency and fine phonetic details have been shown to affect lexical recognition and production patterns cross linguistically
(Frisch, Pierrehumbert & Broe, 2004; Dahan, Drucker & Scarborough, 2009). In the present context, evidence for a categorical phonological system would be across the board productions of the alternation, with no differences for place of articulation. If learners demonstrate either differences across place of articulation or for the contextual factors, a more gradient conceptualization of phonology is required, one that is capable of accounting for non categorical patterns.
Presumably, if learner experience plays a role in connecting the alternant’s cues to its context, proficiency will be a determining factor in adult sensitivity to phonological environment. Proficiency effects could be realized in (at least) two ways. First, it is possible that learners of different proficiency levels show distinct cue integration patterns and do not produce the allophones in the correct manner for the phonological environment – the presence of release bursts should co occur with stops in word onset and stressed syllable contexts. The highest intensity realizations should co occur with the
65 lowest proportion of release bursts in word medial and unstressed syllable contexts and result in approximants. If the cues do not co occur with the correct phonological environment, this would indicate a lack of phonetic cue and phonological environment integration.
Another possible effect for proficiency could be at the level of the phonological environment cues themselves: the contextual factors of stress and position could have distinct effects on learner production. For example, learners could be more sensitive to position than to stress in their realization of the phonetic cues. This could lead to an asymmetrical, or non integrated, production of the phonetic cues to the alternation (e.g., they produce higher burst ratios in initial position but intensity ratios are the same across both positions).
Allophone acquisition provides an excellent testing ground for comparing models of categorical and gradient phonological systems because, arguably, learners could be using either a categorical or gradient phonological system to carry out the learning task .
In the present context, evidence for a categorical phonological system would be the finding that no differences across place of articulation emerge. This would support a model that allows for phonological encoding of alternations (Chomsky & Halle, 1968).
If, on the other hand, learners are using a more gradient system, such differences are predicted to emerge. This would support a model that allows for richer representations that store phonetic details such as place of articulation.
66
3.2 Experiment: Stop approximant production data
3.2.1 Method
3.2.1.1 Participants
Three groups of participants took part in this study. One group is classed as Low
Intermediate, one as High Intermediate, and the final one is a Native Speaker group. For the Low Intermediate group, 5 L1English/L2 Spanish learners were recruited from third semester Spanish university level classes. The High Intermediate participants were recruited from fifth semester Spanish classes. They were paid $10.00 for their participation. Participants filled out an autobiographical questionnaire regarding their experience with Spanish. None had spent more than six weeks in a Spanish speaking country and none spoke Spanish outside of the classroom context. Two members of the
High Intermediate group also spoke French. In order to confirm their placement in either the Low or High Intermediate groups, participants were asked to self rate their Spanish ability on a scale of 1 9, where 1 represented ‘my ability on my first day of Spanish class’ and 9 represented ‘my Spanish teacher’. Subsequently, participants were recorded taking part in a 5 minute conversation in Spanish with a speaker who has a near native level of fluency in Spanish.
The Low Intermediate group had taken two university level Spanish courses, with a total class time of approximately 67 hours, over eight months of the same academic year. Two had taken one year of Spanish in high school, three years previous to the data collection. The High Intermediate group had taken four university level Spanish courses,
67 with a total class time of approximately 135 hours, over two academic years. All had taken Spanish in high school, for two years. In terms of the input they received in their
Spanish class, their instructors were Mexico (Mexico City and Guadalajara) and Spain
(both from Madrid). These two varieties of Spanish are relatively conservative in their realizations of the stop approximant alternations and follow the phonological characterization detailed in the introduction. Specifically, stops follow nasals and for [d], the lateral. Otherwise, approximants are expected intervocalically.
Two native Mexican Spanish speakers who were unaware of the study’s goal listened to the conversation and classified the speakers into two groups, based upon their accent, grammar and speech rate, on a scale of 1(low) 5(high). The ratings coincided with the initial recruitment levels in all but two cases, where one participant was moved to the
High Intermediate group and another was moved to the Low Intermediate group. Table 5 presents the result from these two classification tasks:
Table 5. L2 participant biographical data
Age at Testing L1 English participants’ Native Spanish self rating Speaker judges /9 /5 Group average range average range average range
Low Intermediate 26.6 19 53 2.2 1 3 2.2 1 3
High Intermediate 22.4 21 24 5.2 4 7 4.1 3.3 4.5
68
The native Spanish speakers group was composed of five female Mexican
Spanish speakers, from the central region of Mexico (Mexico City [2], Jalisco [1], Puebla
[2]). They all lived in an English speaking environment at the time of data collection and all spoke Spanish and English. Four of the 5 participants reported speaking Spanish at home and at least 50% of the time outside of the home.
3.2.1.2 Stimuli
In selecting words for inclusion, the following factors were crossed: consonant (b, d or g), following vowel context (i, a, u), position (initial or medial), and stress (stressed or unstressed), yielding 36 words (see Appendix A). The word list included real and nonce lexical items. Where the segments of interest were in initial position (50%, 18/36) fourteen of the lexical items were bisyllabic. Where the segments of interest were in medial position, lexical items were either three or four syllables in length. The segment of interest never occurred in syllable final position. Recordings were carried out in a sound proof booth and made directly onto a PowerMac computer (GI A417” Soundcard) and a
Sennheiser microphone. The microphone was placed into a stand and maintained at a 45o
angle at all times, approximately 3.5cm from the speaker’s lips. The speech tokens were
sampled at a rate of 44.1Hz with a quantization of 16 bits and saved directly onto the
computer’s hard drive.
69
3.2.1.3 Procedure
Participants were asked to read three lists of the same words, with semi counterbalanced order, at a moderate pace, using the carrier phrase Diga ___, por favor or “Say ____, please”. 8 Each participant read the same three lists and the third reading was used for analysis in order to counteract possible novelty effects for the lexical items.
Novelty effects occur when words are new to the speaker and may result in a slower, more deliberate reading of the lexical item. Even words that exist in Spanish may exhibit novelty effects for the low level learners.
All communication with the researcher was conducted in Spanish to avoid possible effects for language mode on the learner groups. However, the self rating questionnaire was in English, to avoid confusion for the lower level participants.
3.2.1.4 Phonetic Analyses
Once recordings were made, all target words were labelled using Praat 5.0
(Boersma & Weenink, 2008). Labels were inserted at the following points for each token: consonant onset offset, CV onset offset, burst onset offset (where present) and vowel onset offset. Both the waveform and the spectrogram were consulted during labeling. The
8The final vowel in diga may have influenced the production of the following stop initial word. However, if true, this influence is expected to be in the direction of more approximant like segments, running counter to the hypothesis that speakers would produce stops in post pause position.
70 offset of the previous vowel’s F2 served as the onset of the following consonant and the onset of the following vowel’s F2 served as the offset of the previous consonant (Lavoie,
2001). Where there was doubt, intensity and other formants were also taken into account.
Bursts were identified after a visual inspection of the waveform and spectrogram and also labeled for their onsets and offsets. A fifteen ms Hamming window was used for analyses. Window size for burst measurements was based upon the duration of the burst itself and thus varied from token to token. The labelling procedure served the purpose of allowing scripts to be run on the sound files, guaranteeing accurate recording of the data and also allowing verification of labelling decisions where required. 9 There were a total number of 36 tokens per speaker. Figure 4 provides an example:
Figure 4. Spectrogram of gato ‘cat’ g a t o
0 Time (s)
C onset burst onset offset
C offset/vowel onset
vowel offset
9 I thank Titia Benders for writing the Praat scripts.
71
Recordings were analyzed for consonant intensity and the presence of release bursts. One of the main acoustic features associated with stop production is a noise burst at the moment of release (Kent & Read, 2002). The burst is a very brief acoustic event
(10 30ms in duration) and is the manifestation of the initial release of the air pressure behind the constriction for the stop (Kent & Read, 2001). 10 According to Stevens and
Keyser (1989:90), bursts can be interpreted as an enhancing feature of a stop.
Phonologically, bursts are said to be licensed in onsets – they are often missing in syllable codas or in word final position. Thus, the presence of a release burst indicates a stronger manifestation of the stop and its absence, a weaker segment. Given that there is no closure for approximants, there is no release burst. The implementation of the release burst cue was determined by examining the spectrogram and calculating a ratio based upon the number of bursts present/number of possible contexts. There were nine possible contexts for burst production for each context (stress/unstressed x initial/medial).
The other phonetic cue is consonant intensity. In traditional phonological approaches (e.g., Mascaró, 1984) the stop approximant alternation in Spanish has been characterized as a process sonority increasing lenition: the less sonorous stop becomes more sonorous through a process of lenition, when it surfaces between two vowels.
According to Parker (2002), intensity is the most reliable correlate of phonological sonority, a fact which is also noted by Ladefoged (1975:219): ‘The sonority of a sound is
10 Burst intensity values are an acoustic cue to place of articulation (Raphael, Harris & Borden, 2007:150). For the labial stops /p/ and /b/, the bursts have low frequencies while for the alveolar stops, these frequencies are high. Velar stops exhibit more variability in their burst frequency,
linked closely to the F2 frequency of the following vowel.
72 its loudness relative to that of other sounds with the same length, stress, and pitch’, which is based on intensity or the perceived loudness of a sound. Thus, intensity is connected directly, albeit non linearly, to the loudness of a sound (Raphael et al., 2007). Because intensity can vary across speakers and also across words with different phonemic composition 11, we used a ratio measurement of consonant intensity/CV intensity.12 The ratio was calculated as follows: target consonant intensity (C) = Ratio target consonant + following vowel intensity (CV)
Where the target segment has lower intensity, the ratios will be close to zero, indicating the presence of a more stop like segment. Where the target segment has higher intensity, the ratios will be closer to 1, indicating the presence of a more approximant like segment.
3.2.2 Results
To determine if there were any significant differences between the real and nonce words, a mixed ANOVA was conducted with group as the between subject factor and word type (real vs. nonce) as the within subjects factor. The dependent variable was the
11 Intensity also varies across phonemes. However, given that the segments of interest form a natural class we assume that inherent intensity will not vary greatly across the three segments. 12 Intensity is measured in dB, which are on a logarithmic scale. In order to calculate ratios using logarithmic values, normally one value is subtracted from the other. Given that the objective of this study is to compare productions of cues across proficiency levels, we deemed a pure ratio value sufficient.
73 average consonant intensity.13 The main effect for word type was not significant overall
(F[1,12] = .031, p>0.05, partial eta squared = 0.003). This permitted collapsing across word types for subsequent analyses.
In order to guarantee that each participant only contributed one score to each variable and thus ensure independent error effects (Max & Onghena, 1999), an average for each cue in each context was calculated. For example, to calculate the C CV intensity and burst production values for the phonological environment of stressed syllables, all occurrences of the segments for each phonetic cue in stressed syllables were counted, regardless of their position in the word. To calculate the C CV intensity and burst production for word medial position, all occurrences of the segments in word medial position were counted, regardless of whether the syllable was stressed or not. Again, the creation of these variables ensured that each subject contributed only one score per context.
The first objective is to determine whether learning is systematic and categorical or if gradient effects are observed. The second objective is to determine whether proficiency plays a role in sensitivity to phonological environment factors. Thus, I examined which cues (if any) best separate the three groups and how to interpret these dimensions of difference in terms of the phonetic and phonological environment cues.
13 We selected the consonant intensity variable because the burst ratio values were generally either very low (for the word medial positions, where there were few release bursts produced) or very high (i.e., for word initial position, where there were a high number of release bursts). Thus, an average score for these groups would not have been indicative of their variability.
74
Because we know the native Spanish speakers produce the alternation, their data can serve as a baseline against which to compare the learner groups.
Discriminant Analysis (DA) is the data analysis method that best serves this purpose. DA allows researchers to determine along which dimensions groups differ reliably and how those dimensions can be interpreted (Tabachnik & Fidell, 2007). The
DA was run using the two cues in each of the four phonological environments
(stressed/unstressed, initial/medial). This gave a total of eight potentially significant predictors. The grouping variables were formed by the three proficiency levels: Low
Intermediate, High Intermediate and Native Spanish speakers. In view of the fact that the groups have an n of five, each run of the DA could only use four predictors (Tabachnik &
Fidell, 2007). As a consequence, multiple DAs were run to determine which predictor variables were most important in separating our three groups.
The relative importance of each predictor variable was determined by their structure correlations, or discriminant loadings, which represent the correlation between the predictors and the discriminant functions (Huberty & Olejnik, 2006). The four predictors with the highest structure r’s (all greater than 0.5, p<0.05) were kept. Using these criteria, the following four predictor variables were included in the DA: i) unstressed syllable C CV intensity ratios; ii) medial syllables C CV intensity ratios; iii) unstressed syllable burst ratios; iv) medial syllable burst ratios. These four predictor variables that emerged from the DAs are all related to medial position and unstressed syllables. Table 6 presents the descriptive statistics for the data. There were no missing data nor were there any outliers. The correlations are in the small to moderate range and
75 the equality of variance assumption is not violated (Box test, (F[20, 516.9] = 48.3, p =
.277).
Table 6. Means and standard deviations on the dependent variables for the three groups Group means/(SD) Predictor Low High Native Spanish Intermediate Intermediate Speakers
Unstressed .853 .925 .967 C CV (1) (.458) (.071) (.04)
Medial .849 .919 .951 C CV (2) (.421) (.092) (.044)
Medial .906 .763 .235 Burst (3) (.671) (.049) (.083)
Unstressed .889 .711 .367 Burst (4) (.781) (.149) (.165)
To better determine how the four predictor variables separated the three groups, I examined the two linear discriminant functions (LDFs) which emerged as significant.
Table 7 presents these results:
76
Table 7. Results of Discriminant Analysis for phonetic and phonological environment cues
r’s for Predictor variables Within Groups Correlations Among Discriminant Functions Predictors Matrix
Predictor Function 1 Function 2 Medial Medial Unstressed C CV Burst Burst
Unstressed .592 n.s. .47 .48 .14 C CV Medial .408 .358 .54 .51 C CV Medial n.s. .804 .16 Burst Unstressed n.s. n.s. Burst (4)
Function 1 (Wilks’ =0.001, p<0.001) accounts for 93.3% of the variance found in the data while function 2 (Wilks’=0.15, p<0.001) accounts for 6.7% of the variance.
Function 1 is best defined by C CV intensity, related to both stress and word position: the intensity of the allophone segments in unstressed (.592) and medial syllable (.408) onsets serve to maximally separate the three groups, with intensity values rising relative to amount of Spanish experience. All three groups are separated maximally by this function.
This is consistent with the hypothesis that experience with Spanish will lead to a differentiation in phonetic cue use between word initial, stressed syllable context and word medial, unstressed syllable context. Function 2, on the other hand, loads primarily on the positional predictors. That is, burst ratios in medial position (.804) and C CV intensity in medial position (.358). Function 2 separates the Native Spanish speakers and
77 the Low Intermediate speakers from the High Intermediate speakers. This can be seen in the two dimensional plot of group centroids provided in Figure 5:
Figure 5. Plot of group centroids
The discriminant analysis presented in this section provides a general picture of the two constructs separating the three groups. The first function in the DA revealed that consonant intensity ratios in unstressed and medial syllables contributed greatest to group separation. For the second function, position contributed greatest to group separation.
Thus, the three groups are best separated by consonant intensity ratios in the first instance and position in the second. These results suggest that the two learner groups implement the phonetic cues to the stop approximant alternation and are influenced differently by the phonological environment than the Native Spanish speaker group and in a way that differs from each other. What the DA did not reveal, however, were more precise details
78 regarding intergroup differences for each predictor. To investigate this, a one way multivariate analysis of variance (one way MANOVA) was conducted. The four predictors used in the discriminant analysis (C CV medial, C CV unstressed, burst medial position, burst unstressed) served as the dependent variables. Group was the independent variable and all tests were conducted at p<.05.
The results for the multivariate test show that overall, the means for the dependent variables are significantly different across the three groups (Wilkes’s Lambda = 0.002, significant at F(4,8) = 103.25, p<0.001). The multivariate η2 = .88, indicating that 88% of the variance of the dependent variables is associated with the group factor. Means and standard deviations are presented in Table 6 above. Figure 6 presents the means for the two dependent variables related to C CV intensity ratios and the two variables related to burst ratios:
Figure 6. MA OVA dependent variables
1.00 0.80 0.60 LI 0.40 HI Ratio Value Ratio NSS 0.20 C CV ratio C CV ratio Medial Unstressed
79
1.00 0.80 0.60 LI 0.40 HI Ratio Value Ratio NSS 0.20 Burst ratio Burst ratio Medial Unstressed
The univariate results on the four dependent variables were all significant across all four groups: C CV medial position [F(2,12) = 332.65, p<0.005, η2 = .78]; C CV unstressed [ F(2,12) = 690.92, η2 = .69, p<0.01]; burst unstressed [F(2, 12) = 20.153), η2
= .77, p<0.05]; burst medial [F(2,12) = 135.15, p<0.05, η2 = .61]. To determine if there were any significant differences between the groups, we conducted post hoc analyses to the univariate ANOVA for the four dependent variables. Tukey’s pairwise comparison revealed that the Native Spanish speaker group had significantly different mean scores on all four dependent variables in comparison with the other two groups (all ps<0.05). The
Low Intermediate and High Intermediate pairwise comparisons were significant for all dependent variables except for burst unstressed (p=.126).
Conjointly, the results from the DA and MANOVA demonstrate that proficiency affects sensitivity to the contextual factors of stress and position. The three groups produce significantly different cue values overall and across the four variables that serve to best distinguish between them on the DA analysis. They further suggest that learners demonstrate non systematic learning effects, given that the two conditioning factors affected the learners of different levels in different ways.
80
The question remains, however, whether the non systematic effects occur across different places of articulation. If speakers are applying a systematic rule to the production of the stop approximant alternation, such a rule would target a natural class,
in phonological terms. Therefore, if learners are applying an abstract rule, there should be little, if any, significant differences across places of articulation. On the other hand, if speakers are drawing upon stored phonetic details when executing the articulatory plan
for a specific sound, we expect differences across the three places of articulation.
In order to examine this, a two factor mixed ANOVA was conducted on the C
CV intensity ratios, with context (stressed, unstressed, initial, medial) as the within
subjects variable and place of articulation as the between subjects variable. Each group
was analyzed separately, since the goal was to see whether differences exist across places of articulation within groups, rather than across groups. There were 60 tokens for each
run of the ANOVA (three places of articulation, four contexts, five cases). Figures 7a, b and c show the differences in means among the consonants in each of the four contexts,
for the three groups:
Figure 7. Consonant x context for each group a. Low Intermediate speakers
81
b. High Intermediate speakers
c. Native Spanish speakers
The results for the Low Intermediate group demonstrate a significant main effect for context [F(3,36)=17.645, p<0.001) but not for consonant [F(2,12)=.107, p>0.05].
Pairwise comparisons (Tukey’s, p<0.05) revealed that this was due to significant differences between initial (M=.83) and medial contexts (M=.87). These results suggest that the Low Intermediate group productions are affected by position but not place of
articulation, indicating a systematic acquisition pattern has emerged for this group.
For the High Intermediate group, there were main effects for context
[F(3,36)=58.96, p<0.001] and consonant [F(2,12)=13.8, p<0.001]. A significant interaction between context and consonant also occurred [F(6,36)=2.83, p<0.05].
82
Subsequent post hoc tests revealed significant differences amongst b and d/g (p<0.001).
Thus, it appears that the High Intermediate group productions demonstrate a more gradient pattern than those of the Low Intermediate group.
Finally, the Native Spanish speaker group productions showed a main effect for context [F(3,36)=68.5, p<0.001] and consonant [F(2, 12)=5.03, p<0.05). There was a significant interaction between the two factors as well [F(6,36)=3.71, p<0.01]. Post hoc tests revealed significant differences between b and g (p<0.05).
The results from the two factor mixed ANOVA indicate that gradiency in productions across context and consonant emerges with more Spanish experience. The
Low Intermediate group may be applying a rule along the lines of ‘b, d and g become softer’ (i.e., more lenited/more vowel like) when in the middle of the word. Because there were no significant effects across the places of articulation, for this group we can assume that this is due to the systematic effect of categorical and/or explicit learning at this early stage. As speakers gain experience, their productions become less categorical and more gradient. At the beginning stages, learners may be applying a rule to the natural class of voiced stops and only with more experience do they begin to differentiate across the places of articulation. One way to explain this is that learner representations actually shift over the course of acquisition. Another possibility is that representations remain consistent but the way in which learners access the information they contain is subject the developmentally dependent modulation.
83
3.3 General Discussion
The results from this study show that the answer to the research question of whether proficiency affects sensitivity to contextual factors in adult L2 allophone acquisition is affirmative. The results from the Discriminant Analysis revealed two significant functions separating the three groups. Function 1 loaded primarily on the consonant intensity phonetic cue, in medial and unstressed position. Function 2 loaded primarily on the medial position phonetic environment cue, for both release burst and consonant intensity. Both significant functions that maximally separate the groups are associated with cues that differentiate the phonological environment of approximants from that of stops. They show that significant differences exist across the three groups for the implementation of the contextual factors of stress and position.
The other research question addressed in this chapter relates to the nature of the phonological system and learner knowledge. Specifically, I hypothesized that learning an allophonic alternation could involve either categorical or gradient knowledge. The evidence provided here supports more gradient knowledge, albeit with certain caveats.
The results from the two way ANOVA for place of articulation and context indicate that detailed phonetic knowledge is stored, as shown by the differences across the places of articulation in the more experienced groups’ productions and the interaction between place of articulation and context. Crucially, this detailed phonetic knowledge does not emerge in learners of lower proficiency. These results support the hypothesis that experience with a language is required in order for such subtle effects to emerge in learner productions. Learners with less experience did not produce the fine grained
84 differences across place of articulation and context that were observed in the productions of the High Intermediate learners.
I propose that the nature of L2 classroom learning may play a role in accounting for these results. It is quite common in the Spanish second language classroom for instructors to mention that the b, d and g become ‘softer’ when they occur between vowels. In fact, the textbook used by the Low Intermediate learners mentions this rule in an explicit manner, which may explain why their productions were most influenced by position. As for the more proficient learners, they may still have the categorical pattern but it has been enhanced and rendered more gradient by increased amounts of experience.
There has been a great deal of research in adult L2 acquisition on the role of explicit vs. implicit learning, most of which has concentrated on the acquisition of morphosyntax. In general, this research suggests that teaching explicit rules to adult learners can lead to faster integration of these rules in production and comprehension.
However, the rule must fulfill certain characteristics – for example, it must be relatively simple and transparent in its application – in order for learners to benefit (for a general discussion of this, see N. Ellis, 2008, inter alia). Given the results observed here, this may hold for phonological acquisition as well. Explicit instruction may lead to categorical effects but implicit learning may be required in order for finer grained phonetic differences to emerge, such as place of articulation effects. These distinctions may only emerge once the speaker has had experience with the language and can draw upon sufficiently robust representations (Pierrehumbert, 2003a; N. Ellis, 2008). Again, this has parallels in the area of L2 morphosyntax acquisition.
85
According to models of L2 speech acquisition, a key step in any sort of perceptual learning is the realization that differences exist between the L1 and L2 categories, required in order to initiate the acquisition process (Flege, 1995; Best & Tyler, 2007).
Again, this can occur implicitly or explicitly. Implicit mechanism such as cue salience may operate to draw learner’s attention to the new category that must be created.
However, as has been well documented, when the target sound is fully assimilated into a native language category, the listener will not necessarily realize two distinct sounds exist.14
Alternatively, learners could be using an explicit mechanism that draws their attention to the different target language sounds, for example either being told that two sounds are different because they contrast lexical items in the target language (Guion &
Pedersen, 2007) or by means of orthography. 15 Semantic and orthographic contrasts have been shown to assist L2 learners with lexically based categorization (Cutler, Weber &
Otake, 2006; Escudero, Hayes Harb, Mitterer, 2008). In Spanish and English, the allophones are represented by the same orthographic symbols, 16 which may impede the formation of separate phonetic categories for each allophone. L1 English speakers are
14 Another implicit mechanism that could be affecting the acquisition of the approximant allophone is phonetic naturalness (Stampe, 1979). As discussed in terms of the Aerodynamic Voicing Constraint (AVC, Chomsky & Halle, 1968), the approximant is more phonetically natural in intervocalic context. 15 See recent work by Bassetti (2006; 2008) for additional evidence that orthography plays a role in L2 acquisition. 16 In Spanish, the orthographic symbols b and v are realized in the same manner phonetically and it is claimed that phonologically they also share a representation. None of the target words had the letter v in them, so this was not relevant to the present analysis.
86 exposed to the orthographic symbols b, d, g and associate them with their phonetic/phonological equivalents in English, which are voiced stops. In order to overcome this automatic response, adult learners may initially employ a rule. Indeed, the results seem to suggest that learners shift from an abstract, categorical learning ‘rule’ at the early stages of acquisition, which may be the result of explicit classroom instruction, to an implicit mechanism that can track detailed phonetic information across places of articulation. The difficulty with this explanation however, is the incompatibility of the assumptions regarding the phonological system. We would require two different mechanisms to account for the differences between the two groups and an explanation for how and why they would shift between them.
As an alternative, I propose that all learners – regardless of proficiency level – use the same mechanism and create the same types of representations. However, not all the information that is stored in these representations will necessarily be consistently available to all learners nor will the representations temselves be equally robust. For example, the Low Intermediate group could be abstracting across representations that do not support place of articulation details. In other words, these learners could be accessing information related to position only. The fact that none of the three groups showed a significant effect for nonce vs. real words indicates that they are generalizing across production patterns to never encountered forms.
The High Intermediate learners, on the other hand, may be using different levels
of information in their productions of the stop approximant alternation, information that
allows them to carry out abstractions that could include place of articulation details. This
explanation can also account for why we did not observe differences between the real and
87 nonce words on this production task. The Low Intermediate learners use levels of information in their productions that include positional details, allowing them to abstract from known sublexical patterns (i.e., ‘soften the stops in word medial position’) to new lexical items. The High Intermediate learners, on the other hand, can use this positional information level and also place of articulation information. These learners have stored information regarding sublexical patterns in Spanish that allows them to support generalizations to new words. The High Intermediate group’s additional experience means more detailed, robust representations can be drawn upon when carrying out the articulatory plan. Thus, learners of different proficiency levels access different information over the course of perception and production.
Further support for the fact that learners of different proficiency levels access different information was shown in the DA results. The three groups are separated along both position and stress environment cues, suggesting learners are storing this information and subsequently drawing upon it. However, proficiency will play a key role in precisely how this information is implemented in production. Learners at the early stages do not connect the phonological environment factors of stress and position to the phonetic cues for the approximant and instead produce similar phonetic cue values across the four contexts.
3.4 Conclusions
In the next chapter, I round out the picture of allophone acquisition by considering production data from two children acquiring Spanish as a native language. In contrast to
88 adult L2 acquisition, children have no previously established linguistic system that might interfere with new learning. Nor, in the case of these children, do they have an orthographic system that may interfere with the acquisition of allophonic alternations. By examining data from children it will be possible to consider a possible input accuracy interaction and also analyze potential effects for natural biases in stop approximant acquisition.
89
Chapter Four: Evidence for detailed representations in L1 acquisition: Frequency and allophone production
4.1 Introduction
In this chapter, I examine how children learning Spanish as a first language produce the stop approximant alternation and what this might imply for the phonological system and the types of representations created over the process of L1 allophone acquisition. In child phonological acquisition, it is generally assumed that natural biases strongly influence the order of emergence for segments and combinations of segments.
Natural biases are typically related to one aspect of markedness, namely, relative ease of articulation. Vocal tract physiology does not differ from person to person or language to language and thus the mapping between articulation and acoustics leads to the same divisions of the phonetic space, regardless of the language being acquired. Recent research has shown, however, that the ambient language influences when more marked sounds may emerge. For example, in a study examining the emergence of the stop approximant alternation in Spanish and German monolingual children, as well as bilingual Spanish German children, Lléo and Rakow (2005) demonstrate that the monolingual Spanish children showed the earliest production of the more marked approximant allophone, suggesting a role for language specific input. Another recent study by Edwards and Beckman (2008), in which they examined real word repetitions elicited from 2 and 3 year old children who were monolingual speakers of English,
Cantonese, Greek, or Japanese, found evidence in favour of both language universal
90 effects in phonological acquisition and for language specific effects related to phoneme and phoneme sequence frequency. The authors concluded that common acquisition patterns across languages are the result of universal constraints imposed by the physiology and physics of speech production/perception as well as the influence of individual language effects.
The aim of this chapter is to expand on the findings by Edwards and Beckman
(2008) and investigate how natural biases and language specific facts about the distribution of the stop approximant allophones in Spanish affect the production of the alternation by young children. The data presented here is from two L1 Spanish speaking children (2;1 3;1), MG and FC. Their productions of the stop approximant alternation were analyzed and subsequently compared to data from a Spanish child directed speech corpora and child L1 Spanish production corpora.
4.2 Phonetic universals and language specific effects in phonological development
Research on phonological development has long been guided by Jakobson’s hypotheses (1941/1968:41) regarding universal principles, which he calls “implicational laws that structure the phoneme inventories of all languages and also guide child phonological acquisition.” For example, one well known universal principle states that the presence of voiced and/or aspirated stops in a language necessarily implies the presence of voiceless unaspirated stops. From this, Jakobson predicted that young children will produce unaspirated stops before they produce either voiceless aspirated or voiced stops, a markedness universal in acquisition. This prediction has largely been
91 confirmed, based upon evidence from English (Macken & Barton, 1980a), French (Allen,
1985), and Spanish (Macken & Barton, 1980b), among other languages.
An explanation for this apparently universal acquisition pattern lies in phonetic groundedness. Researchers have shown that aerodynamic requirements render it relatively more difficult to produce voiced stops because the build up of oral air pressure during stop closure inhibits voicing even when the vocal folds are adducted. Another example comes from cross linguistic patterns observed in infant babbling, which contains many more stop consonants than fricatives. Children generally master stop consonants before they master fricatives (e.g., Kent, 1992; Dinnsen, 1992; Smit, Hand, Freilinger,
Bernthal, & Bird, 1990; Vihman, Macken, Miller, Simpson & Miller, 1985). Kent (1992) suggests that stops are mastered earlier than fricatives because the primary gesture in their production involves complete closure of the vocal tract while the production of fricatives and to an even greater degree approximants, requires greater control of the constriction degree in order to allow the sufficient but not excess airflow. The execution of the articulatory gestures required for fricatives involves precise control, which often lies beyond the abilities of young children.
Markedness can also be context dependent. For example, there are certain contexts where stops are in fact the more marked segments. One such environment where contextual markedness disfavours stops is the intervocalic position. In this context, approximants are actually less marked. As Smith (2007) states, sonority increasing lenition is less marked in the intervocalic position where approximants are found. Thus, while stops are universally preferred as syllable onsets, when syllables are in word internal and in post tonic position, lenition to a more sonorous segment is in fact the least
92 marked option. In the case of the stop approximant alternation, both prosodic and linear contextual effects motivate approximants in intervocalic, unstressed syllable onsets.
Prosodically, smaller gestural magnitude is expected in less prominent positions such as word medial, unstressed contexts (Byrd, 1996; Cho & McQueen, 2007; Ortega Llebaría,
2006 for Spanish). Linearly, aerodynamic factors also favour the production of approximants in intervocalic context. According to the Aerodynamic Voicing Constraint
(AVC, Chomsky & Halle, 1968:300 301), for voicing to occur, the vocal cords first must have the appropriate degree of tension and adduction and also must have air flowing through them. As Ohala (1994) states, the AVC favours the emergence of certain phonological patterns across the world’s languages, such as the tendency of intervocalic stops to become approximants. This is due to an effort on the part of the speaker to keep the closure duration short, but still avoid de voicing the stop. Excessive shortening – as might occur between two vowels – may lead to an incomplete closure and a spirant or approximant could result (Ohala, 1994:4).
In addition to the implicational and markedness universals assumption, Jakobson made two further proposals. First, that all children essentially produce the same set of sounds when babbling and only later acquire language specific inventory and second, there is a discontinuity between early babbling and the sounds produced in children’s first words. These claims have been subsequently countered by researchers working on a wide variety of languages (Vihman, et al., 1985; Ingram, 1999), who have shown that the sounds infants use in babbling resemble those sounds that eventually form part of their native language inventory. Thus, contrary to Jakobson’s proposals, it is now recognized that child phonological development reflects universal tendencies and also language
93 specific tendencies that emerge based upon the input received. In other words, universal effects can be modulated by language specific input. Thus, as with all children acquiring the sound system of their language, Spanish speaking children acquiring the stop approximant alternation are subject to two conflicting pressures. One is direct and results from universal phonetic and perceptual constraints imposed by the human speech system at its early stages of development; the second is attributable to the way in which language specific lexical and frequency effects drive the emergence of more marked segments (Edwards & Beckman, 2008).
If natural biases play the determining role in child acquisition of the stop approximant alternation, the two children studied here are predicted to initially produce more stops than approximants, given that approximants are more articulatorily difficult.
If language specific patterns play the determining role in acquisition, the children’s production data will reflect the input frequencies and approximants will appear with little substitution asymmetry occurring. This would imply little to no role for markedness universals.
There is also a third possibility, which follows the results found by Edwards and
Beckman (2008). Specifically, there could be effects for both natural biases and language specific input. If these two factors interact, the children could demonstrate an asymmetry in favour of the stop allophones, reflecting natural biases, but the way in which the approximants emerge could be either categorical or gradient. If the approximants emerge in a categorical fashion, this would lend support to a categorical phonological system in child L1 allophone acquisition.
94
By categorical emergence of the approximant allophones I do not mean that the children will produce all the approximant targets accurately at a uniform point in development. Instead, I mean that there will be no relationship between target like productions of the approximant allophone and what are considered extra grammatical factors, such as frequency in the input or the child’s own output. For example, in more traditional constraint based models of phonological development acquisition occurs via the re ranking of universal constraints over the course of development. Most models assume that re ranking is a consequence of an error analysis process that drives constraints either up in the ranking or down, according to the number of violations the particular constraint incurs for the input provided (Levelt & van de Vijver, 1998;
Boersma & Levelt, 1999; Boersma & Hayes, 2001).
On the other hand, if the emergence of target like allophone production does demonstrate frequency effects, this could be taken as evidence for a non categorical phonological system. Specifically, gradient effects in the emergence of the approximants would suggest that the phonological system is sensitive to information of this type and creates representations that can encode it.
Given the conflicting pressures of universal constraints and language specific effects, I predict that the third possibility will actually be observed in the data – an interaction between natural biases and language specific effects. Spanish speaking children will tend to produce stops in contexts where approximants are expected and not vice versa, consistent with the universal preference for stops over continuant segments but approximants will emerge first in words that are frequent in the child’s lexicon or in the input distributions. Sounds or sequences of sounds that appear frequently afford the
95 child many opportunities to perceive and produce the patterns, facilitating the mapping between perception and production and eventually leading to stronger representations that can be abstracted away from the specific word context (Edwards, Beckman & Munson,
2004).
In the case of allophones, children with incipient lexicons may not be aware of the relationship between the two non contrastive sounds and may treat them as two separate categories. As the size of the lexicon increases, however, connections between allophones begin to emerge – in terms of articulation and/or perception – and recognize that the two sounds are not contrastive in their language. This relationship may emerge in a piece meal fashion. Gradually, as lexical knowledge builds, connections form, allowing generalizations based on the similarities shared by the allophones. This process of generalization may not occur until the child learns to read and recognizes the shared orthographic symbol for both allophones.
The stop approximant alternation provides an ideal testing ground for theories related to the role of universals and language specific patterns in language acquisition. As stated by Beckman and Edwards (2008), language specific sound changes generally reflect universal tendencies – e.g., lenition of stops between vowels (Ohala, 1994) and how children acquire these language specific patterns helps us to understand the nature of the phonological system and the representations created.
96
4.3 Corpus I: recorded data
The data comes from two L1 Spanish children living in Calgary, Alberta. Child 1,
MG, male, was 2;4 and Child 2, FC, also male, was 2;0 at the beginning of the recording.
The children were recorded over an eight month period from October 2005 June, 2006.
The corpus was recorded at a private, home run, Spanish speaking daycare centre in
Calgary, Alberta, Canada. The children attended the daycare four half days a week, from
9 1pm. The daycare workers spoke only Spanish to the children and all books, songs, games and activities were in Spanish as well. On any given day there were between 4 6 children at the daycare centre, between the ages of 2;0–3;8, approximately. The linguistic background of the children represented a mixture of monolingual Spanish (3), bilingual
Spanish English (1) and monolingual English (2). However, the recordings analyzed for this paper were of monolingual Spanish speaking children exclusively.
Recordings of each participant were made on average once every two weeks, for about 45 60 minutes each session, recorded directly onto a laptop computer from a portable microphone and subsequently transcribed using PRAAT 5.1 (Weenink &
Boersma, 2007).All words were transcribed. Stops are predicted to occur in word initial, post pause position and the approximants are predicted to surface everywhere else, following standard phonological and phonetic descriptions of Spanish (Hualde, 2005).
97
4.3.1 Participants
Child 1, MG, is from Bogotá, Colombia and lived there until he was 2;3 when he moved with his family to Calgary, Alberta. The recordings were made when MG was between 2;4.5 and 2;11.26 years of age. MG attended the Spanish speaking daycare centre four half days a week, during which time he was exposed to 100% Spanish with the daycare workers and about 80% Spanish with the other children at the daycare centre.
The other 20% was Canadian English. MG has one older brother who was in Grade 1 and learning English at school. At home, MG spoke only Spanish with his mother, step father and older brother. He watched TV and videos in English, however, and had books in
English as well. Spanish was the language of communication 100% of the time in the house and the language of entertainment (TV, videos, books) about 60% of the time.
According to parental reports, MG did not comprehend or produce any English words at the time the recordings began or ended.
Child 2, FC, was 2;0;11 when recorded for the first time. FC was born in
Barranquilla, Colombia and moved with his mother and father to Calgary, Alberta when he was 1;10. FC attended the same Spanish speaking daycare centre as MG. Spanish was the only language used at home, by both parents, and FC had access to books, videos and television in that language as well. He is an only child.
98
4.4 Presentation of the data
For the data analysis, only singleton productions of /b/, /d/ and /g/ targets were considered, where they did not follow a sonorant homorganic nasal or lateral in the case of /d/. In Colombian Spanish, the target variety of the participants in this study, highland and coastal speakers tend to produce stops after all consonants, where speakers of other varieties of Spanish produce stops only after homorganic sonorants (Hualde, 2005. Given this particularity of Colombian Spanish, all stop stop sequences were counted as obligatory for stops. Consequently, these contexts were not part of the data analyzed.
Where the same word was repeated over the course of the same session, each individual production was counted. For example, if the child produced amigo ‘friend’( masc, sing)’ with an approximant twice and a stop once, it was counted as three tokens of the word, two approximants and one stop.
4.4.1 Data
Data from MG is presented first. Over the 33 week span of which the recordings were made, he was between 2;4.5 to 2;11.26 years old. MG was recorded a total of twelve times. Table 8 presents a breakdown of allophone distribution across the tokens produced by MG. The first column contains the total number of tokens with /b/ /d/ /g/ as singletons, in either word initial or word medial position. Column two includes the total number of tokens with stop targets, that is, with one of the three target sounds in word initial position where the stop is the expected allophone. Column three comprises the
99 total number of tokens with approximant targets, that is, with one of the three sounds in word medial position where the approximant is the expected allophone. Column four contains the actual number of stops that were produced in the expected position and in a target like manner. Finally, column five presents the same information for approximant targets. For all columns, the percentages reflect the number of productions for that specific category divided by the target number in the entire corpus.
Table 8. MG’s productions Total number Tokens with Tokens with Tokens realized Tokens realized of tokens with stop targets approximant with target like with target like /b/ /d/ /g/ (word initial) targets stops approximants* targets (word medial)
209 124/209 85/209 119/124 51/85 (59%) (41%) (96%) (61%)
*The other 34 targets were all realized as stops. Only two stops were realized as approximants, the other three were eliminated completely.
The data presented in Table 8 shows that MG produced more words with stop targets than with approximant targets (59% vs. 41%) and that the stop targets were produced overwhelmingly in greater target like fashion (96% vs. 61%) than the approximants. The asymmetry observed in the target like realization of the allophones supports the prediction that natural biases play a role in MG’s phonological development.
FC was four months younger than MG when recording began (2;0.11) and even though he was recorded a total of thirteen times, only eight sessions produced data usable
100
for the present study. Over the 16 weeks covered by the recordings, FC produced a total
of 71 tokens with the target segments. This data is presented in Table 9:
Table 9. FC’s productions Total number of Tokens with Tokens with Tokens realized Tokens realized tokens with stop targets approximant with target like with target like /b/ /d/ /g/ targets targets stops approximants*
71 28/71 43/71 22/28 28/43 39% 61% 80% 65%
*Ten of these targets were realized as stops, five were realized as fricatives.
FC produced fewer words with stop targets, but still had a much higher accuracy
rate with that allophone than with the approximant target (80% vs. 65%). These results
also support the hypothesis that universal factors are influencing the production of the
allophones for FC.
To summarize, it appears that there is a notable asymmetry in the production of
the stop approximant alternation. Both children produced more stop targets accurately
than approximant targets, in spite of the fact that approximants were almost 50% of the
targets in MG’s data and 61% in FC’s data, indicating that natural biases may be playing
a role in their production patterns. Examples of productions where the stops and
approximant targets are produced as target like by each child are shown in (2):
101
(2) a. MG: Target like stops
Target form Child Form Adult Target Age bien [bién] [bién] 2;4.21 ‘good’ dos [dos] [dos] 2;4.21 ‘two’ gané [gane] [gane] 2;6.3 ‘(I) won’ voy a comer [bói a komer] [bói a komer] 2;6.3 ‘I am going to eat’
b. FC: Target like stops
Target form Child Form Adult Target Age vaca [baka] [baka] 2;2.22 ‘cow’
dos [doh] [doh] 2;2.22 ‘two’ gorra [gola] [gorɸa] 2;3.13 ‘cap’ basura [basuta] [basuȎa] 2;6.4 ‘garbage’
c. MG: Target like approximants
Target form Child Form Adult Target Age 2;4.5 avion [aβión] [aβión] ‘plane’
sabes [saβes] [saβes] 2;4.5 ‘(you) know’
102
yo ya te diji (dije) [teðixi] [teðixi] 2;6.24 ‘I already told you.’
d. FC: Target like approximants
Target form Child Form Adult Target Age a bajar [aβaga] [aβahaȎ] 2;2.23 ‘to go down’ caballo [aβaidȣo] [kaβaidȣo] 2;6.4 ‘horse’
The data in (2) provides examples of productions that conform to the predictions for Spanish: stops will occur in word onset and approximants in word medial position.
However, as shown in Tables 8 and 9 above, there are notable asymmetries in the non target like productions of the two children. Specifically, stops are substituted for approximants at a much higher rate than approximant substitutions. The data in (3) provides examples from MG and FC’s productions where such substitutions occur:
(3) a. MG: Stop substitutions for approximants:
Target form Child Form Adult Target Age tu tengas fuego (juego) [fuégo] [fuégo] 2;4.5 ‘(You) have (a) game.’ caballo [kabaiȭo] [kaβaiȭo] 2;6.3 ‘horse’ yo estoy buscando [ȭo ehtóibuhkan̪ do] [ȭo ehtóiβuhkando] 2;6.24 ‘I am looking.’
103 b. FC: Stop substitutions for approximants
Target form Child Form Adult Target Age yogúr [ȭogu] [ȭoγu] 2;2.22 ‘yogurt’ víbora [biboȎa] [biβoȎa] 2;3.13 ‘snake’ abajo [abáo] [aβaho] 2;5.14 ‘below’
(4) a. MG: Approximant substitutions for stops:
Target form Child Form Adult Target Age bicicleta [βisiket] [bisikleta] 2;4.5 ‘bicycle’
b. FC: Approximant substitutions for stops
Target form Child Form Adult Target Age
‘I turn (it) over’ [ðoilawelta] [doi lawelta] 2;6.3 voy a quitar ... [βoi a kita] [boi a kitaȎ] 2;6.3 ‘(I) am going to get rid of...’ [βaŋga] [baŋga] 2;2.22 banca ‘bench’
In the contexts where MG substituted the incorrect allophone, there was a notable asymmetry in terms of his substitution pattern. Stops were overwhelmingly substituted for approximants and not the other way around. The example provided in (4a) was the only token where he substituted an approximant for a stop, or indeed where stops were
104 not produced in a target like fashion. The alternative substitution happened 33 times. 17
This clearly skewed substitution pattern provides support for universal principles in
MG’s phonological development, favouring stops over approximants and stops in word initial or post pause position.
Interestingly, FC has a slightly higher rate of target like approximant production
(65%) than MG. This is notable because he is younger than MG and has a smaller vocabulary. Given his age and smaller lexicon, the expectation is that FC would substitute more stops for approximants and consequently have a lower accuracy rate. One possible explanation for this may lie in the specific lexical items FC produced. When analyzing frequency effects in phonology, researchers typically discuss two manners of calculating it: type frequency, or the number of individual lexical items and token frequency, or the total number of lexical items produced. FC produced 50 lexical types and 71 lexical tokens, for a type:token ratio of 0.7. MG had similar type:token ratios (141 lexical types/209 tokens, or .67). However, there was a marked difference in precisely how the stops and approximant targets were distributed among the tokens produced by the two children: 61% of the tokens produced by FC had approximant targets in them.
For MG, this total was only 41%. Thus, FC attempted proportionately more words with approximant targets, which afforded him more opportunities to practice the articulation of this allophone across fewer words. This could explain why FC demonstrates higher
17 For MG there were no other substitution patterns that occurred for the target segments. For FC there were other patterns, which are discussed below.
105 accuracy rates for approximant target words, in spite of his younger age, smaller lexicon and the relative difficulty of these segments as compared to stops.
There were examples of truncations in FC’s data that resulted in contexts for the stop allophone. These are presented in (5):
(5) FC: Stop approximant substitutions in initial position as a result of truncation:
Target form Child Form Adult Target Age abuelo [béo] [aβuélo] 2:2.22 ‘grandfather’ abajo [bao] [aβaho] 2;2.22 ‘below’ caballo [baiȴo] [kaβao] 2;7.18 ‘horse’
In these examples, FC truncated the initial weak syllable, leaving one of the three target segments in word onset position. The target sounds were originally in word medial position, the target for approximants, but after truncation was applied they became word initial and FC produced them as stops. Based upon this data, it appears that FC has generalized across positions and produces all word initial segments as stops.
In Inkelas and Rose’s (2008) study of E, the child passed through a stage during which he was neutralizing plosives in prosodically strong positions to velars and producing glides instead of laterals in these positions as well. E. produced velars in the onsets of primary and secondary stressed syllables and word initially, even in unstressed syllables, which the authors took as evidence that E had grammaticalized positional
106 hardening, or strengthening, in these positions.18 Grammaticalization of positional hardening requires abstract and presumably innate knowledge of syllable onsets in addition to what constitutes the ‘ideal’ syllable onset. It is possible that FC’s productions are also subject to a similar constraint, whereby target segments that are approximants are hardened to stops when they occur in strong, i.e., word initial position. However, in (6) there are examples which show this stop initial generalization does not hold across the board:
(6) FC: Other substitutions:
Target form Child Form Adult Target Age verde [fede] [beȎde] 2;2.22 ‘green’ juguete [feke] [huγete] 2;2.22 ‘toy’ abajo [vako] [aβaxo] 2;2.22 ‘below’ arriba ‘above’ [aviva] [arɸiβa] 2;5.23 [viva] [vivía]
In the first example from (5), verde ‘green’, FC substitutes a voiceless labiodental
fricative for the target bilabial stop. He makes a similar substitution in the following
example, juguete, where the initial syllable is truncated and substituted by [f]. There are
two possible explanations for this substitution pattern in juguete. FC could be truncating
18 One problem with such an analysis is what type of evidence would serve to convince E that he had generalized incorrectly? That is, if the substitutions have been grammaticalized, what would subsequently drive the child to shift the constraints to a new ranking order?
107 and maintaining the [+cont] feature of the /h/ onset or, alternatively, he could be maintaining the [+cont] feature from the target velar approximant. It appears that the first explanation is more accurate, as the subsequent syllable has /k/ as the onset, which may result from fusion of the velar approximant and voiceless coronal.
In the next two examples, abajo and arriba, FC also produces a fricative in initial position. For abajo, FC truncates the initial unstressed syllable and maintains the [+cont] and voicing feature of the target approximant segment, which now occurs in the word onset. This contrasts with the previous token of the word from (3), where FC truncates and produces a voiced stop in initial position. For the target arriba, FC alternates between truncating and not truncating the initial unstressed syllable, but he substitutes the voiced fricative for the target trill in all tokens. The examples from (6) show that FC does not always respect the universal unmarked preference for stops in initial position. He produces target stop allophones as continuants in verde and abajo and is faithful to the
[+cont] feature in the initial segment of juguete. Finally, he harmonizes the place of articulation from the approximant target in arriba to the onset of the word, substituting the voiced labiodentals fricative in that position.
The data in (6) from FC shows that he is not grammaticalizing positional hardening and in fact appears to be producing fricatives and stops interchangeably. Thus, it appears that for certain lexical items, FC produces stops in initial position, as a result of truncation, while for others, with and without truncation, he reproduces a segment with the [+cont] feature, characteristic of the approximant allophone. This inconsistency suggest that he is either using a production strategy that is lexically specific, or a result of
108 some other phonetic or phonological process. More data would be required to confidently draw a conclusion one way or another.
Nonetheless, based upon the examples from (4), (5) and (6), it appears that FC has a production pattern that fluctuates between the unmarked stop segment (as in (5)) in onset position and the more marked [+cont] segment (as in (6)). He may be drawing upon a distribution of the two allophones that can be characterized as in Figure 8. For ease of exposition, the segments are restricted to bilabials, but there could be a similar distributional overlap for the dental and velar allophones as well.
Figure 8. Schematized distribution of onset segments for FC
stops either stop/approx/fric approximant
[b] [b, β, v, f] [β]
[beo] [bao]/[vako] [kaβaiȴo]
[baiȴo] [aviva]/[viva] [aβahar]
Figure 8 shows how the distributions for FC may overlap in his phonetic space,
leading to inconsistent productions for the tokens which fall in the middle. According to
109 exemplar based models of production (Pierrehumbert, 2003b), speakers select production targets from the mean of their distributions. In the case of FC, he has two distributions, which correspond to the stop and approximant allophones, and he also has a extensive region of overlap between the two allophones. On the occasions where FC pulls his production target from this overlapping region, his productions will fluctuate between the stop and the approximant at times during the same session and possibly for the same lexical target. On the occasions where he pulls his production target from one or the other non overlapping region, he will be more consistent.
4.4.1.1 [ d ] deletion
There is another pattern of alternation that occurs only with the dental allophone, that of / d / deletion. In many varieties of Spanish, especially those spoken in the
Caribbean region, / d / deletion is common. In these varieties, speakers commonly delete intervocalic /d/ in participles and other lexical items with the same ending, where the /d/ is found intervocalically (Hualde, 2005). For example, comida ‘food’ [koɑmiða] ~
[koɑmia]. This is characteristic of certain regional varieties of Colombian Spanish, including the variety being acquired by FC and MG. In this section data from each child showing examples of / d / deletion in expected contexts is presented. Unlike /b/ and /g/, the input received by the child for the dental target includes stops, approximants and complete deletion. In (7) is data from M in which he deletes /d/.
110
(7) MG / d / deletion
Target form Child Form (Standard) Age Adult Target puedo [puéo] [puéðo] 2;4.21 ‘(I) can’ al lado [aláo] [al laðo] 2;6.3 ‘next to’ helado [elao] [elaðo] 2;6.3 ‘ice cream’ comida [komía] [komiða] 2;7.7 ‘food’
While MG produces helado ‘ice cream’ with the expected / d / deletion he produces it a second time without it and in fact produces a stop instead of the expected deletion. As well, there are examples of two other words that do not undergo / d / deletion:
(8) a. MG no / d / deletion
Target form Child Form (Standard) Adult Age Target helado [elado] [elaðo] 2;8.21 ‘ice cream’
revolcado [puepokáðo] [Ȏeβolkaðo] 2;8.21
‘overturned’
espada [espada] [espaða] 2;10.1
‘sword’
111
In the case of revolcado, MG produces an approximant and in the case of espada, he produces a stop. The fluctuation between deletion approximant stop is not found in FC’s production. He only produced three words with the correct context for / d / deletion, and all were produced with the expected target form:
b. FC / d / deletion
Target form Child Form (Standard) Adult Age Target no, nada [no naa] [no naða] 2;4.22 ‘no, nothing’ helado [eláo] [elaðo] 2;7.18 ‘ice cream’ vestido [bestío] [bestiðo] 2;7.18 ‘dress’
The fact that the children are producing / d / deletion as expected in some but not all cases suggests that MG and FC are not applying an across the board rule. If they were applying a rule, all words would exhibit the same /d/ deletion effects, independent of the individual lexical items themselves. In other words, such a rule would affect all possible candidates for its application in an equal fashion.
As shown in the data presented in (2) – (8), MG and FC demonstrate inconsistencies in terms of their allophone production that suggests they are not using a rule based approach. The irregular production of the approximant across different lexical items suggests that there are more gradient factors at play.
112
4.4.2 General discussion of the data: phonetic groundedness
According to the data presented above, it might be possible that phonetic groundedness is driving the emergence of the approximant allophone. Specifically, as discussed, the Aerodynamic Voicing Constraint (AVC) can account for why children eventually produce approximants in their favoured context. However, the AVC cannot necessarily explain why these effects do not emerge immediately, once the child has mastered the articulatory precision required to execute the gesture itself. In order to account for this, the AVC must be complemented by a model of acquisition that can incorporate effects for motor learning, or practice effects. Motor learning occurs when the speaker has practiced and repeated the required articulatory gestures sufficiently often and developed the motor skills necessary to execute it. Thus, some sort of frequency effect must also play a role.
Frequency effects have played an important role in debates over the motivating force behind fine grained adjustments in articulation. As Munson, Edwards and Beckman
(2009) state, children could repeat a sequence of sounds that occurs in many words by relying upon the articulatory and acoustic representations that they have accumulated over the course of their language learning. Sequences that are more frequent will be easier to draw upon because they will be more accessible. In probabilistic terms, highly frequent items will be represented in distributional peaks over the phonetic space and it is more likely they will be pulled out when articulatory planning is taking place. On the other hand, sounds that occur relatively infrequently cannot be drawn upon as efficiently.
113
A similar argument has been put forth by Ouedeyer (2005, 2006), who states that automatization allows articulations to become more energy efficient with practice.
A key assumption to this explanation is that motor skill development plays an important role in phonological development, which in turn is based in production frequency. In the present case, this means that approximant segments that are produced more frequently afford the child more opportunities to practice the articulatory gestures required for its execution and therefore should be more accurate. In order to test the prediction that motor skill development is driving the emergence of approximants in MG and FC’s productions, we require a comparison between accuracy rates and production frequency for each child. Alignment between production frequency and accuracy rates would suggest that practice, or motor skill development, plays a key role in the emergence of allophones.
This information is presented in Figure 9, which depicts the target (the number of words with the target sound attempted by the child, divided by the total number of words in the corpus) and proportion accurate (the number of target words produced accurately divided by the total number of words attempted). For example, MG produced 30 words with /b/ initial as the target, out of a corpus that included 209 words. This gave a ‘Target’ proportion of 0.14. Of those 30 target words, 24 were produced accurately (i.e., as stops), giving a ‘Proportion Accurate’ score of 0.8.
114
Figure 9. Proportion attempted and accurate across target sounds
1 0.80 0.79 0.73 0.78 0.8 0.59 0.6 0.4 0.33 0.26 0.21 0.14 0.14 0.2 0.11 0.07 0 b initial b medial d initial d medial g initial g medial
M Target M Proportion Accurate
Figure 9 shows that accuracy rates do not necessarily align with position, place of articulation or frequency of production. There is no discernable pattern in MG or FC’s productions for accuracy rates across position or place of articulation. For MG, b medial position had the highest proportion of words attempted (.33) but the lowest accuracy proportion (.26) across the six position/place of articulation combinations. For FC, the highest proportion of words attempted was also for b medial (.39) but he had a low
accuracy rate for these targets (.61). The highest accuracy rates for both children occurred
with /b/ initial targets, although the proportion of words with this target was relatively low for both children. Based upon this, it appears that no relation exists between the
115 number of words attempted and the actual proportion of target segments that were produced accurately. Therefore, a linear relationship between motor learning and AVC effects cannot hold as an explanation for the patterns observed here.
Another way of considering the data presented above is by looking for possible effects for individual lexical items. It is possible that by considering individual words a pattern in allophone accuracy rates may emerge. It has often been noted that higher frequency lexical items show a higher probability, or greater degree, of phonetic lenition.
The influence of lexical frequency on lenited productions suggests that articulation can be influenced by details stored in long term memory (Pierrehumbert, 2001a,b, 2002,
2003ab; Bybee, 2001, 2002; Beckman & Pierrehumbert, 2003). That is, the development of phonetic and phonological knowledge is connected to the development of the lexicon.
Individual words that occur frequently in the input (or output) will be more likely candidates for lenition because the child has had more opportunities to develop the motor skills necessary to produce the approximant.
In order to determine what role (if any) the lexicon may play in the acquisition of the stop approximant alternation in Spanish, we require data showing what type of input children receive. To this end, the following section addresses how the target sounds are distributed in the input received by young Spanish speaking children. I used a corpus of child directed speech and child productions, taken from the CHILDES database of L1
Spanish speaking children. The goal was to determine what type of input children are exposed to when acquiring Spanish as a native language and whether patterns in this input are reflected in child productions. If so, this would provide support for lexically based acquisition patterns.
116
4.5 Corpus II: CHILDES database
In this section of the chapter, the nature of the input Spanish learning children receive over the course of language learning is examined. In order to do so, data was combined to form two corpora, taken from Spanish child language databases found in
CHILDES. The first corpus consisted of child productions (C Corpus) and the other consisted of child directed speech (CDS Corpus). The corpora were taken from a combination of eight sub corpora of Spanish learning children (three in Latin America, five in Spain), in which the children ranged in age from ten months to five years of age.
In what follows, I present a statistical description of the two corpora and comparative data to examine a) positional segmental frequencies to see if there are correlations among them and b) whether there is a correlation between the emergence and subsequent frequency of approximants and overall word frequency across the input received by MG and FC.
4.5.1 Description of the C and CDS Corpora
All words with the target segments in either initial or medial position were first extracted from the relevant CHILDES databases. In order to minimize possible effects for variability among the sub corpora, only the top 50% of all tokens were included. The C
Corpus had a total of 12 291 tokens and 492 lexical types and the CDS Corpus had a total of 60 842 tokens and 2210 lexical types. Following this, the positional frequency for each
117 sound was calculated. Positional segment frequency is the likelihood of occurrence of a given sound in a given position. It is calculated by summing the log frequency of the words in the corpus containing the target sound in the target word position divided by the sum of the log frequency of the total number of words containing any segment in the target word position in the corpus (Storkel, 2004). For our purposes, this meant taking the log frequency counts for all the words with, for example, /b/ in medial position and dividing it by the log frequency counts for all the words with singleton /b d g/ in either position. All inflected forms were counted as separate lexical items. For example, gato and gatos were counted as separate tokens, as were forms such as estaba (he/she/it was) and estaban (they were). This way the token frequency counts faithfully reflect the total number of times children hear certain segments in the input. All proper nouns from the database were eliminated, given that these may bias individual token statistics in favour of the child’s name or other idiosyncratic items. Finally, where the C Corpus transcription reflected errors such as cluster simplification, I took the target word as the correct form, not the actual production. For example, if the child produced [gan̪ de] instead of [grande], I counted the production as a cluster and it was not calculated as part of the total number of tokens. In the case of / d/ deletion, where noted in the transcriptions, I considered deletion as the target and any deviation as non target like.
The C Corpus is related to the production of the target segments while the CDS
Corpus represents the input received by the child. It is necessary to acknowledge that the
C Corpus represents an idealized version of what children are actually producing – the databases considered here did not provide allophonic transcriptions. Thus, the tokens from the C Corpus represent the number of potential contexts for the stop approximant
118 alternation, not necessarily the number that were actually produced by the children who participated in the database creation. The CDS Corpus, on the other hand, represents what the children receive as input to their language acquisition process. Because this input was provided by adult native speakers of Spanish, it is assumed that the tokens follow the expected distributional patterns for each allophone. Thus, the two corpora represent the token frequencies for what Spanish speaking children produce and what they hear.
As shown above in Figure 2, the accuracy and target production proportions do not align directly for either child, neither in terms of position nor in terms of place of articulation. It was argued that this provided evidence against a strictly articulatorily based explanation. By turning now to a comparison between what MG and FC are receiving as input (the CDS Corpus) and what Spanish speaking children generally produce, it is hoped that we can obtain an idea of whether their production patterns follow any types of generalizations that occur in the input or across production patterns.
4.5.2 Frequency data for C and CDS Corpora
In order to see whether the CDS and C corpora shared similar characteristics, a correlational analysis of the positional segment frequency values for the two databases was carried out. Strong correlations exist between the positional segment frequency counts for the two databases (r = .821, p<0.05), indicating that children are producing similar rates of /b d g/ as they hear in the input. Figure 10 illustrates this correlation:
119
Figure 10. C and CDS Positional frequencies correlation
Based upon these results it is safe to assume that the two databases share similar frequency distributions for segments of interest to this study, albeit not necessarily the
same lexical items. Because this is based on token counts, it is possible that MG and FC are producing many fewer tokens than those which actually occurred in the two
databases. Table 10 presents the positional frequency values for each consonant position
combination:
Table 10. Positional log frequency CDS C
b initial .91 .82
b medial .85 .81
d initial .84 .85
d medial .82 .82
g initial .71 .74
g medial .76 .8
120
A non parametric test was carried out to determine if there were significant differences among the ranks of positional segment frequency values. According to the
Mann Whitney test, there were no significant differences between the two corpora on this variable (U= 15, p>0.05), further illustrating that the two corpora are similar in terms of the positional distribution of the target segments.
Next, a comparison between the positional frequencies for MG and FC’s production of the target segments and the positional frequencies found in the two corpora was conducted. Because the sizes of the MG and FC corpora are several magnitudes smaller than the C and CDS corpora, it was necessary to use the Log Likelihood (LL) statistic to compare across corpora (Rayson and Garside, 2000). Log Likelihood statistics are based upon the expected and observed frequencies and therefore can be adapted to corpora of widely differing sizes. If the Log Likelihood (LL) frequencies are positive, then the reference corpus has greater than expected frequencies than the comparison corpora. The opposite holds if the LL is negative. Using the recorded corpus of MG and
FC as the reference corpus, the LL numbers comparing them should be around zero, or at least not significantly different if MG and FC’s accuracy rates directly reflect the input they receive. Recall that the goal is to see if there is a relationship between the input MG and FC receive, in the form of an idealized corpus, and their accurate productions of the allophones, particularly the approximant targets.
The first set of comparisons is presented in Figure 4. It depicts Log Likelihood by position for each child, compared to the C and CDS corpora. Figure 2 above showed that little, if any, relationship exists between accuracy rates and the number of words attempted, suggesting that motor skill development is not driving the emergence of
121 approximants in a linear fashion. It is possible, however, that the children’s accuracy
reflects the rate at which the alternants occur in the input. This would explain why the higher frequency targets were not produced more accurately. If motor practice for each
target were the primary driving force, then more frequently produced targets should be more accurate. This was not found to hold. It is possible that accuracy rates reflect the frequency of the input received, rather than the actual production by the child.
Figure 10. Log Likelihood Position Accuracy19
All ps>0.01
Based upon a significance level of p<0.01, and a critical value of 6.63, the results from Figure 4 show that the Log Likelihood of MG and FCs accuracy rates did not significantly differ from the rates of positional occurrence for the three target segments
(all ps >0.01). These results suggest that both children are producing the target positions
19 Following Rayson and Garside (2000), significance of the LL values can be interpreted as follows: 95th percentile (5% level), p < 0.05, critical value = 3.84; 99th percentile (1% level), p < 0.01, critical value = 6.63; 99.9th percentile (0.1% level), p < 0.001, critical value = 10.83; 99.99th percentile, (0.01% level) p < 0.0001, critical value = 15.13.
122 at accuracy rates that correspond to the rates at which other children produce the target segments. Furthermore, these results show that FC and MG’s accuracy rates also reflect the input. The fact that MG and FC’s accuracy rates reflect the input they receive may be a result of two things. On the one hand, it could mean that the children are repeating precisely what they hear and faithfully reproducing the target language positional occurrence rates, albeit it potentially as a result of abstraction. On the other hand, it could be the result of an asymmetry in the accuracy rates across places of articulation. For example, the children could be producing all their b medial segments at higher accuracy rates than the other two segments, or producing more words with b segments than the
CDS or C corpus contain, thereby skewing the overall results.
In order to investigate this, Figures 12 and 13 depict the Log Likelihood values for each child compared to each corpus, across the three places of articulation.
Differences across places of articulation for the Log Likelihood statistic would suggest
that children are not merely reproducing what they hear in the input and generalizing across place of articulation.
Figure 12. MG Log Likelihood Place of Articulation Accuracy
123
Based upon a significance level of p<0.01, and a critical value of 6.63, the Log
Likelihood values for MG’s accuracy rates and the C Corpus were significantly different only for the b medial target, where MG had higher expected rates of production than occurred in the C corpus. This suggests that MG’s accuracy rates on the six target sounds is similar to the rate at which Spanish children produce these sounds. In other words, the distribution of MG’s accuracy rates and the distribution of these target sounds found in the words produced by children in the C Corpus are very similar. For the input, however, the picture is slightly different. MG’s accuracy rates were significantly different from those of the CDS Corpus in four of the six target contexts. Only the d initial and g medial targets did not reach significance, suggesting that MG’s accuracy does not directly align with the input received. However, the direction of this difference is also important. In only the d medial and g initial contexts did MG’s accuracy rates reflect a significantly lower LL than the CDS corpus (p<0.001). This means that MG produced a lower rate of expected accuracy for these two segments only. The b initial and b medial segments actually demonstrated higher accuracy rates than those predicted by the corpus.
Figure 13 illustrates FC’s Log Likelihood values for place of articulation accuracy. FC’s accuracy rates differed significantly (p>0.01) from the C Corpus for the b medial and d initial targets. For the CDS corpus, FC’s accuracy rates reached significance for the b medial targets only. FC’s accuracy rates align more closely with the
CDS corpus than do MG’s. It is hard to draw any firm conclusions regarding this, however, because of the relatively few tokens he produced overall.
124
Figure 13. FC Log Likelihood Place of Articulation Accuracy
10.8 5.07 6.8 6.9 0.31 0.18 1.16