On the Acquisition of Phonological Representations

B. Elan DRESHER Department of Linguistics University of Toronto Toronto, Ontario Canada M5S 3H1 [email protected]

Abstract equipped with innate phonetic feature detect- Language learners must acquire the grammar ors, one might suppose that they can use (rules, constraints, principles) of their lan- these to extract phonetic features from the guage as well as representations at various signal. These extracted phonetic features levels. I will argue that representations are would then constitute phonological repre- part of the grammar and must be acquired sentations (surface, or phonetic, representa- together with other aspects of grammar; thus, tions). Once these are acquired, they can grammar acquisition may not presuppose serve as a basis from which learners can ac- knowledge of representations. Further, I will quire the rest of the grammar, namely, the argue that the goal of a learning model phonological rules (and/or constraints) and should not be to try to match or approximate the lexical, or underlying, representations. target forms directly, because strategies to do This idea of acquisition by stages, with re- so are defeated by the disconnect between presentations preceding rules, has enduring principles of grammar and the effects they appeal, though details vary with the prevail- produce. Rather, learners should use target ing theory of grammar; versions of this the- forms as evidence bearing on the selection of ory can be found in (Bloch, 1941) and the correct grammar. I will draw on two (Pinker, 1994:264–5). The idea could not be areas of phonology to illustrate these argu- implemented in American Structuralist pho- ments. The first is the grammar of , or nology, however (Chomsky, 1964), and I metrical phonology, which has received will argue that it remains untenable today. I much attention in the learning model litera- will discuss two areas of phonology in which ture. The second concerns the acquisition of representations must be acquired together phonological features and contrasts. This with the grammar, rather than prior to it. The aspect of acquisition turns out, contrary to first concerns the grammar of stress, or first appearances, to pose challenging prob- metrical phonology. The second concerns the lems for learning models. acquisition of phonological features. These pose different sorts of problems for learning 1 Introduction models. The first has been the subject of con- I will discuss the extent to which representa- siderable discussion. The second, to my tions are intertwined with the grammar, and knowledge, has not been discussed in the consequences of this fact for acquisition context of formal learning models. Though it models. I will on phonological rep- has often been assumed, as mentioned above, resentations, but the argument extends to that acquisition of features might be the most other components of the grammar. straightforward aspect of phonological acqui- One might suppose that phonological rep- sition, I will argue that it presents challeng- resentations can be acquired directly from the ing problems for learning models. acoustic signal. If, for example, children are

41 2 Representations of stress (2) Acquired representations Phonetic representations are not simply bun- a. Amrica b. Mnitba dles of features. Consider stress, for example. x x Line 2 Depending on the language, stress may be (x) (x x) Line 1 indicated phonetically by pitch, duration, x(x x) (x x)(x) Line 0 loudness, or by some combination of these L L L L L L H L dimensions. So even language learners gifted Ameri ca Mani to:ba with phonetic feature detectors will have to sort out what the specific correlates of stress are in their language. For purposes of the en- Looking at the word America, these repre- suing discussion, I will assume that this sentations indicate that the first A is much can be acquired prior to further unfooted, that the next two meri acquisition of the phonology. constitute a trochaic , and that the final But simply deciding which syllables have syllable ca is extrametrical. Manitoba has stress does not yield a surface representation two feet, hence two stresses, of which the of the stress contour of a word. According to second is stronger than the first. The Ls and metrical theory (Liberman and Prince 1977, Hs under the first line of the metrical grid Halle and Idsardi 1995, Hayes 1995), stress designate light and heavy syllables, respec- results from grouping syllables into feet; the tively. The distinction is important in Eng- strongest foot is assigned the main stress, the lish: The syllable to: in Manitoba is heavy, other feet are associated with secondary hence capable of making up a foot by itself, stress. Moreover, some syllables at the edges and it receives the stress. If it were light, then of the stress domain may be designated as Manitoba would have stress on the extrametrical, and not included in feet. antepenultimate syllable, as in America. For example, I assume that learners who How does a learner know to assign these have sorted out which acoustic cues signal surface structures? Not just from the acoustic stress can at some point assign the stress signal, or from the schematic stress contours contours depicted in (1) to English words. in (1). Observe that an unstressed syllable The height of the column over each syllable, can have several metrical representations: it S, indicates how much relative stress it has. can be footed, like the first syllable in However, these are not the surface represen- America; it can be the weak position of a tations. They indicate levels of stress, but no foot, like the second syllable of Manitoba; or metrical organization. it can be extrametrical, like the final syllables in both words. One cannot tell from the (1) Representations of stress contours before sound which of these representations to as- setting metrical parameters sign. The only way to know this is to acquire a. Amrica b. Mnitba the grammar of stress, based on evidence drawn from the observed contours in (1). x x Line 2 x x x Similar remarks hold for determining syl- Line 1 lable quantity. English divides syllables into x x x x x x x x Line 0 light and heavy: a light syllable ends in a S S S S S S S S short vowel, and a heavy syllable contains America Manito:ba either a long vowel or is closed by a con- sonant. In many other languages, though, a According to conventional accounts of closed syllable containing a short vowel is English stress, the metrical structures as- considered to be light, contrary to the English signed to these words are as in (2). categorization. Learners must decide how to

42 classify such syllables, and the decision can- terms of concepts such as heavy syllables, not be made on phonetic grounds alone. heads, feet, and so on. In syntax, various parameters have been posited that refer spe- 3 Acquisition of metrical structure cifically to anaphors, or to functional projec- How, then, are these aspects of phonological tions of various types. These entities do not structure acquired? Following Chomsky come labelled as such in the input, but must (1981), I will suppose that metrical structures themselves be constructed by the learner. So, are governed by a finite number of para- to echo the title character in Plato’s dialogue meters, whose value is to be set on the basis The Meno, how can learners determine if of experience. The possible values of a main stress falls on the first or last foot if parameter are limited and given in advance.1 they do not know what a foot is, or how to Parameter setting models must overcome a identify one? This can be called the Episte- basic problem: the relation between a para- mological Problem: in this case we know meter and what it does is indirect, due to the about something in the abstract, but we do fact that there are many parameters, and they not recognize that thing when it is front of us. interact in complex ways (Dresher and Kaye, Because of the Credit Problem and the 1990). For example, in English main stress is Epistemological Problem, parameter setting tied to the right edge of the word. But that is not like learning to hit a target, where one does not mean that stress is always on the can correct one’s aim by observing where last syllable: it could be on the penultimate previous shots land. The relation between syllable, as in Manitoba, or on the antepen- number of parameters correct and apparent ultimate, as in America. What is consistent in closeness to the target is not smooth (Turkel, these examples is that main stress devolves 1996): one parameter wrong may result in onto the strong syllable of the rightmost foot. forms that appear to be way off the target, Where this syllable and foot is in any given whereas many parameters wrong may word depends on how a variety of parameters produce results that appear to be better are set. Some surprising consequences follow (Dresher, 1999). This discrepancy between from the nontransparent relationship between grammar and outputs defeats learning models a parameter and its effects. that blindly try to match output forms The first one is that a learner who has (Gibson and Wexler, 1994), or that are based some incorrectly set parameters might know on a notion of goodness-of-fit (Clark and that something is wrong, but might not know Roberts, 1993). In terms of Fodor (1998), which parameter is the source of the prob- there are no unambiguous triggers: thus, lem. This is known as the Credit Problem learning models that seek them in individual (cf. Clark 1989, 1992, who calls this the target forms are unlikely to be successful. Selection Problem): a learner cannot reliably I have argued (Dresher, 1999) that Plato’s assign credit or blame to individual solution – a series of questions posed in a parameters when something is wrong. specified order – is the best approach we There is a second way in which parameters have. One version of this approach is the can pose problems to a learner. Some para- cue-based learner of (Dresher and Kaye, meters are stated in terms of abstract entities 1990). In this model, not only are the prin- and theory-internal concepts that the learner ciples and parameters of Universal Grammar may not initially be able to identify. For ex- innate, but learners must be born with some ample, the theory of stress is couched in kind of a road map that guides them in setting the parameters. Some ingredients of 1For some other approaches to the acquisition of this road map are the following: stress see (Daelemans Gillis and Durieux, 1994), First, Universal Grammar associates every (Gupta and Touretzky, 1994), (Tesar, 1998, 2004), and parameter with a cue, something in the data (Tesar and Smolensky, 1998).

43 that signals the learner how that parameter is fore, to identify a one must be able to be set. The cue might be a pattern that the to assign to it a representation in terms of learner must look for, or simply the presence feature specifications. What are these repre- of some element in a particular context. sentations? Since Saussure, it has been a Second, parameter setting proceeds in a central assumption of much linguistic theory (partial) order set by Universal Grammar: that a unit is defined not only in terms of its this ordering specifies a learning path (Light- substance, but also in negative terms, with foot 1989). The setting of a parameter later respect to the units it contrasts with. On this on the learning path depends on the results of way of thinking, an /i/ that is part of a three- earlier ones. vowel system /i a u/ is not necessarily the Hence, cues can become increasingly ab- same thing as an /i/ that is part of a seven- e a o Ѩ u/. In a three-vowel گ stract and grammar-internal the further along vowel system /i the learning path they are. As learners ac- system, no more than two features are re- quire more of the system, their representa- quired to distinguish each vowel from all the tions become more sophisticated, and they others; in a seven-vowel system, at least one are able to build on what they have already 2 more feature is required. learned to set more parameters. Jakobson and Halle (1956) suggested that If this approach is correct, there is no distinctive features are necessarily binary be- parameter-independent learning algorithm. cause of how they are acquired, through a This is because the learning path is depend- series of ‘binary fissions’. They propose that ent on the particular parameters. Also, the the order of these contrastive splits, which cues must be discovered for each parameter. form what I will call a contrastive hierarchy Thus, a learning algorithm for one part of the (Dresher 2003a, b) is partially fixed, thereby grammar cannot be applied to another part of 3 allowing for certain developmental sequen- the grammar in an automatic way. ces and ruling out others. This idea has been fruitfully applied in acquisition studies, 4. Segmental representations where it is a natural way of describing devel- Up to now we have been looking at an aspect oping phonological inventories (Pye Ingram of phonological representation above the and List, 1987), (Ingram, 1989), (Levelt, level of the segment. I have argued that ac- 1989), (Dinnsen et al., 1990), (Dinnsen, quisition of this aspect of surface phono- 1992), and (Rice and Avery, 1995). logical representation cannot simply be based Consider, for example, the development of on attending to the acoustic signal, but segment types in onset position in Dutch requires a more elaborate learning model. (Fikkert, 1994): But what about acquisition of the phonemic inventory of a language? One might suppose (3) Development of Dutch onset consonants that this be achieved prior to the acquisition (Fikkert 1994) of the phonology itself. consonant Since the pioneering work of Trubetzkoy u m and Jakobson, phonological theory has pos- obstruent sonorant ited that are characterized in terms urum urum of a limited set of distinctive features. There- plosive g fricativ g e nasa g l liquid/glide g

2For details of parameter ordering, defaults, and /P/ /F/ /N/ /L/J/ cues in the acquisition of stress, see (Dresher and Kaye, 1990) and (Dresher, 1999). At first there are no contrasts. The value of 3 For further discussion and critiques of cue-based the consonant defaults to the least marked (u) models see (Nyberg, 1991), (Gillis Durieux and Daele- mans, 1995), (Bertolo et al. 1997), and (Tesar, 2004). onset, namely an obstruent plosive, desig-

44 ѐ u nated here as /P/. The first contrast is be- b. / / triggers labial harmony, but / / and /Ѩ / do not. Though phonetically tween obstruent and sonorant. The former re- u mains the unmarked (u), or default, option; [+labial], there is no evidence that / / and /Ѩ/ are specified for this feature. the marked (m) sonorant defaults to nasal, /N/. At this point children differ. Some ex- Acquiring phonological specifications is pand the obstruent branch first, bringing in not the same as identifying phonetic features. marked fricatives, /F/, in contrast with Surface phonetics do not determine the pho- plosives. Others expand the sonorant branch, nological specifications of a segment. Man- introducing marked sonorants, which may be chu /i/ is phonetically [+ATR], but does not either liquids, /L/, or glides, /J/. Continuing Ѩ in this way we will eventually have a tree bear the feature phonologically; /u/ and / / that gives all and only the contrasting fea- are phonetically [+labial], but are not specif- tures in the language. ied for that feature. How does a learner de- duce phonological (contrastive) specifica- 5 5. Acquiring segmental representations tions from surface phonetics? Let us consider how such representations It must be the case that phoneme acqui- might be acquired. To illustrate, we will look sition requires learners to take into account at the vowel system of Classical Manchu phonological processes, and not just the local (Zhang, 1996), which nicely illustrates the phonetics of individual segments (Dresher types of problems a learning model will have and van der Hulst, 1995). Thus, the phonolo- to overcome. Zhang (1996) proposes the con- gical status of Manchu vowels is demonstrat- trastive hierarchy in (4) for Classical Man- ed most clearly by attending to the effects of chu, where the order of the features is [low]> the vowel on neighbouring segments. This conclusion is strengthened when we [coronal]>[labial]>[ATR]. consider that the distinction between /u/ and U (4) Classical Manchu vowel system (Zhang / / in Classical Manchu is phonetically evi- 1996)4 dent only after back consonants; elsewhere, [low] they merge to [u]. To determine the under- – + lying identity of a surface [u], therefore, a language learner must observe its patterning [coronal] [labial] +ru– –ru+ with other vowels: if it co-occurs with

[+ATR] vowels, it is /u/; otherwise, it is /U/. /i/ [ATR] [ATR] /ѐ/ +ty– +ty– The nonlocal and diverse character of the /u/ /Ѩ/ /ђ/ /a/ evidence bearing on the feature specifica- tions of segments poses a challenge to Part of the evidence for these specifica- learning models. tions comes from the following observations: Finally, let us consider the acquisition of the hierarchy of contrastive features in each (5) Evidence for the specifications in (4) language. Examples such as the acquisition u ђ i a. / / and / / trigger ATR harmony, but / / of Dutch onsets given above appear to accord does not, though /i/ is phonetically well with the notion of a learning path, [+ATR], suggesting that /i/ lacks a whereby learners proceed to master individ- phonological specification for [ATR]. ual feature contrasts in order. If this order were the same for all languages, then this

4Zhang (1996) assumes privative features: [F] vs. 5Phonological contrasts that play a role in phono- the absence of [F], rather than [+F] vs. [–F]. The logical representations are thus different from their distinction between privative and binary features is not phonetic manifestations, the subject of studies such as crucial to the matters under discussion here. (Flemming, 1995).

45 much would not have to be acquired. How- Perhaps this paradox is only apparent. How- ever, it appears that the feature hierarchies ever it is resolved, the issue raises an inter- vary somewhat across languages (Dresher, esting problem for models of acquisition. 2003a, b). The existence of variation raises the question of how learners determine the 7 Acknowledgements order for their language. The problem is This research was supported in part by grant difficult, because establishing the correct 410-2003-0913 from the Social Sciences and ordering, as shown by the active contrasts in Humanities Research Council of Canada. I a language, appears to involve different kinds would like to thank the members of the pro- of potentially conflicting evidence. In the ject on Contrast in Phonology at the case of metrical parameters, the relevant evi- University of Toronto (http://www.chass. dence could be reduced to particular cues, or utoronto.ca/~contrast/) for discussion. so it appears. Whether the setting of feature hierarchies can be parameterized in a similar References way remains to be demonstrated. Stefano Bertolo Kevin Broihir Edward Gibson and Kenneth Wexler. 1997. Char- 6 Conclusion acterizing learnability conditions for cue- I will conclude by raising one further based learners in parametric language sys- problem for learning models that is suggested tems. In Tilman Becker and Hans-Ulrich by the Manchu vowel system. We have ob- Krieger, editors, Proceedings of the Fifth served that in Classical Manchu, /ђ/ is the Meeting on the Mathematics of Language. [+ATR] counterpart of /a/. Both vowels are http://www.dfki.de/events/ mol/. [+low]. Since [low] is ordered first among Bernard Bloch. 1941. Phonemic overlapping. the vowel features in the Manchu hierarchy, American Speech 16:278–284. Reprinted we might suppose that learners determine in Martin Joos, editor, Readings in Lingui- which vowels are [+low] and which are not stics I, Second edition, 93–96. New York: at an early stage in the process, before as- American Council of Learned Societies, signing the other features. However, a vowel 1958. that is phonetically [ђ] is ambiguous as to its Noam Chomsky. 1964. Current issues in lin- featural classification. In many languages, guistic theory. In Jerry A. Fodor and including descendants of Classical Manchu Jerrold J. Katz, editors, The Structure of (Zhang, 1996, Dresher & Zhang, 2003) such Language, 50–118. Englewood Cliffs, NJ: vowels are classified as [–low]. What helps Prentice-Hall. to place /ђ/ as a [+low] vowel in Classical Noam Chomsky. 1981. Principles and para- meters in syntactic theory. In Norbert Manchu is the knowledge that it is the Hornstein and David Lightfoot, editors, [+ATR] counterpart of /a/. That is, in order to ђ Explanation In Linguistics: The Logical assign the feature [+low] to / /, it helps to Problem of Language Acquisition, 32–75. know that it is [+ATR]. But, by hypothesis, London: Longman. [low] is assigned before [ATR]. Similarly, the Robin Clark. 1989. On the relationship bet- determination that /i/ is contrastively ween the input data and parameter setting. [+coronal] is tied in with its not being con- In Proceedings of NELS 19, 48–62. trastively [–labial]; but [coronal] is assigned GLSA, University of Massachusetts, prior to [labial]. Amherst. It appears, then, that whatever order we Robin Clark. 1992. The selection of syntactic choose to assign features, it is necessary to knowledge. Language Acquisition 2:83– have some advance knowledge about classi- 149. fication with respect to features ordered later.

46 Robin Clark and Ian Roberts. 1993. A com- vowel systems. Paper presented at the putational model of language learnability Twenty-Ninth Annual Meeting of the and language change. Linguistic Inquiry Berkeley Linguistics Society, February 24:299–345. 2003. To appear in the Proceedings. Walter Daelemans Steven Gillis and Gert Paula Fikkert. 1994. On the Acquisition of Durieux. 1994. The acquisition of stress: A Prosodic Structure (HIL Dissertations 6). data-oriented approach. Computational Dordrecht: ICG Printing. Linguistics 20:421–451. Edward Flemming. 1995. Auditory represen- Daniel A. Dinnsen. 1992. Variation in devel- tations in phonology. Doctoral disserta- oping and fully developed phonetic inven- tion, UCLA. tories. In Charles Ferguson Lise Menn and Janet Dean Fodor. 1998. Unambiguous trig- Carol Stoel-Gammon, editors, Phonologi- gers. Linguistic Inquiry 29:1–36. cal Development: Models, Research, Im- Edward Gibson and Kenneth Wexler. 1994. plications, 191–210. Timonium, MD: Triggers. Linguistic Inquiry 25:407–454. York Press,. Steven Gillis Gert Durieux and Walter Daniel A. Dinnsen Steven B. Chin Mary Daelemans. 1996. A computational model Elbert and Thomas W. Powell. 1990. of P&P: Dresher & Kaye (1990) revisited. Some constraints on functionally disorder- In Frank Wijnen and Maaike Verrips, edit- ed phonologies: Phonetic inventories and ors, Approaches to Parameter Setting, phonotactics. Journal of Speech and 135–173. Vakgroep Algemene Taalweten- Hearing Research 33:28–37. schap, Universiteit van Amsterdam. B. Elan Dresher. 1999. Charting the learning Prahlad Gupta and David Touretzky. 1994. path: Cues to parameter setting. Linguistic Connectionist models and linguistic Inquiry 30:27–67. theory: Investigations of stress systems in B. Elan Dresher. 2003a. Contrast and asym- language. Cognitive Science 18:1–50. metries in inventories. In Anna-Maria di Morris Halle and William J. Idsardi. 1995. Sciullo, editor, Asymmetry in Grammar, General properties of stress and metrical Volume 2: Morphology, Phonology, structure. In John Goldsmith, editor, The Acquisition, 239–257. Amsterdam: John Handbook of Phonological Theory, 403– Benjamins. 443. Cambridge, MA: Blackwell. B. Elan Dresher. 2003b. The contrastive Bruce Hayes. 1995. Metrical Stress Theory: hierarchy in phonology. In Daniel Currie Principles and Case Studies. Chicago: Hall, editor, Toronto Working Papers in University of Chicago Press. Linguistics (Special Issue on Contrast in David Ingram. 1989. First Language Acquis- Phonology) 20, 47–62. Toronto: Depart- ition: Method, Description and Explana- ment of Linguistics, University of tion. Cambridge: Cambridge University Toronto. Press. B. Elan Dresher and Harry van der Hulst. Roman Jakobson and Morris Halle. 1956. 1995. Global determinacy and learnability Fundamentals of Language. The Hague: in phonology. In John Archibald, editor, Mouton. Phonological Acquisition and Phonologi- Clara C. Levelt. 1989. An essay on child cal Theory, 1–21. Hillsdale, NJ: Lawrence phonology. M.A. thesis, Leiden Uni- Erlbaum. versity. B. Elan Dresher and Jonathan Kaye. 1990. A Mark Liberman and Alan Prince. 1977. On computational learning model for metrical stress and linguistic rhythm. Linguistic phonology. Cognition 34:137–195. Inquiry 8:249–336. B. Elan Dresher and Xi Zhang. 2003. Phono- David Lightfoot. 1989. The child’s trigger logical contrast and phonetics in Manchu experience: Degree-0 learnability (with

47 commentaries). Behavioral and Brain Sci- ences 12:321–375. Eric H. Nyberg 3rd. 1991. A non-determin- istic, success-driven model of parametric setting in language acquisition. Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA. Steven Pinker. 1994. The Language Instinct. New York: William Morrow. Plato. Meno. Various editions. Clifton Pye David Ingram and Helen List. 1987. A comparison of initial consonant acquisition in English and Quich. In Keith E. Nelson and Ann Van Kleeck, editors, Children's Language (Vol. 6), 175–190. Hillsdale, NJ: Erlbaum. Keren Rice and Peter Avery. 1995. Variabil- ity in a deterministic model of language acquisition: A theory of segmental elabo- ration. In John Archibald editor, Phonolo- gical Acquisition and Phonological Theory, 23–42. Hillsdale, NJ: Lawrence Erlbaum. Bruce Tesar. 1998. An iterative strategy for language learning. Lingua 104:131–145. Bruce Tesar. 2004. Using inconsistency de- tection to overcome structural ambiguity. Linguistic Inquiry 35:219–253. Bruce Tesar and Paul Smolensky. 1998. Learnability in Optimality Theory. Lin- guistic Inquiry 29:229–268. William J. Turkel. 1996. Smoothness in a parametric subspace. Ms., University of British Columbia, Vancouver. Xi Zhang. 1996. Vowel systems of the Manchu-Tungus languages of China. Doc- toral dissertation, University of Toronto.

48