Modelling Exemplar-Based Phonologization

Robert Kirchner

Section I of this chapter presents several laboratory phonology studies of sound change and phonologization which compel a reexamination of standard assumptions about gradience and categoricality. Section II presents evidence of the incremental nature of sound change. In this section, we confront the implications of these results for standard phonological theory, and suggest a way forward.

1. W(h)ither phonology?

At the nexus of -phonology interface and synchrony-diachrony issues lies the problem of phonologization, standardly conceived as a diachronic development whereby gradient phonetic patterns come to be reanalysed as patterns over symbolic representations (Hyman 1975). Renewed attention to phonologization, particularly in the Evolutionary Phonology framework of Blevins & Garrett (1998, 2004), has cast doubt on the very centrepiece of modern phonological theory, the markedness constraints of Optimality Theory. Simpler grammatical models are possible, the argument goes, if the phonological formalism need not concern itself with questions of typological markedness or phonetic naturalness, leaving that job to diachronic interaction with the phonetic component, which is needed in any case [as is argued in section I, but see my comments]. Consider a phonetically sensible rule such as k ➝ k ʲ /__{i,e}. Phonological systems tend to include rules like this, rather than, say, k ➝ m/__{i,e}, and the former is widely attested as a sound change, simply because it arises from phonologization of gradient coarticulation, whereas there is unlikely ever to be a pattern of phonetic variation between /k/ and an [m]-like allophone to serve as grist for reanalysis. A phonological markedness constraint favouring assimilatory dorsal fronting is therefore superfluous. The research programs of phonetically based Optimality Theory (e.g. Hayes, Kirchner & Steriade 2004), and Functional Phonology (Boersma 1998) though coming at this issue from the opposite direction – incorporating phonetics more tightly into phonological theory and analysis – seem, ironically, to confirm the Evolutionary Phonology verdict. Striking resemblances have been found, in every domain of phonological typology1, between the substance of well-attested phonological patterns and lower-level phonetic variation, which relate straightforwardly to phonetic considerations such as articulatory undershoot, gestural overlap, aerodynamics, auditory salience, etc., such that there appears to be no domain of pure phonological markedness, autonomous from phonetics.

At this point, it is useful to remind ourselves exactly what work the phonology module (now divested of responsibility for markedness generalizations) does in this division of labour. The reason we speak of some patterns as being phonologized is that they display categoricalization and stabilization, which are difficult to account for in terms of purely phonetic factors. The notion of articulatory undershoot (Lindblom 1963), for example, can explain gradient reduction, where the degree of centralization varies continuously with speech rate (and any other factors affecting articulatory

1 With the probable exception of metrical phonology, which seems to reflect a rhythmic cognitive faculty (cf. Tilsen 2009) distinct from articulatory and perceptual phonetic considerations. This rhythmic faculty, however, does not serve as example of autonomous phonological markedness, insofar as rhythm is found in many extra-linguistic domains of human (and animal) behaviour, such as limb movement. displacement/velocity). But it cannot, by itself, account for categorical reductions of the sort discussed by Crosswhite 2004, e.g. a distribution of with a cluster of points around [ə ], and other clusters around full vowel values, but with few points in between (see generally Pierrehumbert 1994 on the instrumental interpretation of discrete versus gradient variation). Nor can phonetic factors such as undershoot explain why phonologized processes are conditioned by coarse phonetic context, particular relatively stable cues such as stress placement, rather than fine phonetic detail which may vary from token to token, such as precise vowel duration. In a typical categorical vowel reduction, for example, the [ə ] fails to revert to a full vowel even in slow, careful speech, when articulatory velocity considerations are less pressing. Both categoricalization of the variation and stabilization of the context can be accounted for by assuming that the phonologized reduction pattern is stated over a different level of representation from the gradient pattern.

To answer the question posed in the previous paragraph: this is in fact the only work that phonology appears to be doing -- if by 'phonology' we mean a symbolic level of representation for sound patterns and its attendant theory -- and it does it by brute force. The observation of categorical and stable behaviour is obtained simply by stipulating that the structural descriptions of phonologized patterns are limited in reference to a small set of discrete, symbolic units. Moreover, this assumption does not come with any intrinsic account of how phonologization occurs. At some point, under this story, speakers reanalyse patterns of variation, from numeric to symbolic terms; but what mechanism induces this shift, and what factors in the original phonetic pattern is it sensitive to? And if phonologization is merely an arbitrary reassignment of a pattern from one level of representation to the other, why don't we observe this development in reverse: 'phoneticizations' of originally stable categorical patterns?2

Indeed, this standard view, on closer examination, encounters a number of immediate difficulties. How do we reconcile this abrupt shift from numeric to symbolic patterns, which the standard view presupposes, with the incremental nature of sound change, discussed in section II above? Moreover, is the distinction between phonetic and phonologized patterns really as clear-cut as the foregoing discussion implies? Phonologization might instead be a matter of degree, ranging from low-level, slightly speaker-controlled variation at one end of the spectrum, to categorical, stable, perhaps somewhat morphologically conditioned alternations at the other. The two-level assumption forces a choice between phonetic and phonological analyses of any given pattern, thereby precluding elegant treatments of partially phonologized patterns, cf. Pierrehumbert et al. 2000, Cohn 2006. As an example of the latter, consider lenition in Florentine Italian (Giannelli & Savoia 1979, Kirchner 2004): • Voiceless stops, /g/, and /tʃ / and /d ʒ / obligatorily lenite to continuants in 'weak position' (i.e. roughly intervocalic within an intonational phrase); • but the outcome of this lenition varies gradiently from close to Ø, depending on , speech rate and register; • additional undergo various forms of lenition in weak position in faster/more casual speech;

2 The editors suggest that near mergers might represent such a case of phoneticization. Near mergers, however, involve blurring of a lexically conditioned distinction (in some or all contexts), not a contextually conditioned pattern of variation going from categorical to gradient application. Phoneticization, in my intended sense, would correspond to, e.g., a final devoicing alternation pattern which is categorical, perhaps neutralizing, at one stage of a language, and a variable, partial devoicing pattern, sensitive to fine phonetic detail, at the next stage. In all the controversy about incomplete neutralization in final devoicing (see e.g. Warner et al. 2004), no one has suggested a historical development from categorical to gradient application as the explanation. • and lenition expands beyond weak position in faster/more casual speech. The categorical aspects of this pattern, spirantization of voiceless stops, /g/, and the affricates, are just the tip (indeed, three separate tips) of an iceberg of quantitative phonetic variation. I suspect that partial phonologizations will prove, upon sufficiently close examination of patterns in a broad range of languages, to be the rule rather than the exception.

To state the problem in another way: Pierrehumbert 1994 observes that virtually every case of gradient allophonic variation which phoneticians have investigated has proven to be, in some respects, language-specific (cf. discussion of Beddor 2007 and Beddor et al. 2007 in section I above). How do these gradient patterns arise? There must be some mechanism whereby purely physiologically determined (and therefore language-independent) patterns of variation come to be incrementally enhanced, in language-specific ways. (Such a mechanism is sketched in the remainder of this chapter.) Here the phonologization problem resurfaces in a slightly different guise; but in this case we cannot attribute the development of the pattern to a shift in level of representation, for this is all quantitative variation within the phonetic component. On the other hand, a model of this quantitative enhancement of low-level variation presumably could handle phonologization as well. Categoricality can be regarded merely as an advanced stage of enhancement, 'the discrete limit of [a] continuous process,” as Pierrehumbert et al. 2000 put it, without resort to a symbolic level of representation.

In sum, the assumption of a symbolic phonological level of representation is neither necessary nor sufficient for an account of phonologization. The time is ripe, therefore, to consider the outlines of a theory of phonology in which sound patterns (categorical, gradient, and intermediate) are stated not over symbols, but directly over numeric auditory and articulatory signals. This move promises to resolve the debate between Evolutionary Phonology (which objects to the massive redundancy of phonologized markedness constraints that replicate phonetic factors), and phonetically based Optimality Theory (which objects to a phonological theory that makes no markedness claims, and that pushes markedness issues into the realm of unformalized meta-theory). The new phonological theory directly includes quantitative phonetic factors in the scope of its formalism; if the research program is successful, phonological markedness generalizations would emerge from the interaction between direct phonetic factors and the pattern-generalizing properties of the speech processing system.

2. Exemplar theory

Contemporaneous with this debate about phonological markedness constraints vs. phonetic factors, an alternative conception of phonology (indeed, of grammar generally), namely usage-based, or exemplar theory, has been put forward, most comprehensively in its application to phonology by Bybee 2001, and most explicitly by Pierrehumbert 2001. The essence of exemplar theory in phonology is massive storage of exemplars: of individual experiences of speech, including fine phonetic detail. Linguistic categories are not represented as symbols, but as 'clouds' of exemplars associated with category labels. Speech recognition involves a calculation of distance in phonetic space between an auditory stimulus and the stored exemplars, and the application of a classification rule to these distances. Pierrehumbert, for example, adopts the k-Nearest-Neighbours rule (see generally see Mitchell 1997, ch. 8, for a machine perspective on kNN and other 'instance-based' classification rules). If k=10, and the ten exemplars closest to the stimulus have the word category labels {'pit', 'pet', 'pit', 'pit', 'pet', 'pit', 'peat', 'pit', 'pet', 'pit', 'pit'}, then the modal category, 'pit', is chosen as the output of recognition. Exemplar-based , in turn, involves generation of an output based on mean phonetic properties of the exemplars of the target category. Taking the notion of speech production seriously, this output should include a motor plan, i.e. a matrix of vocal tract muscle group activation levels over time. The output would also include an auditory target signal; comparison of the auditory target signal to actual auditory self-perception provides an error signal for feedback purposes, see Moore 2007; cf. Flemming 1995, arguing for parallel auditory and articulatory representations in phonology. At this stage of exemplar theory's development, however, modellers have either used toy numeric representations (e.g. Pierrehumbert 2001, Wedel 2004), or acoustic signals (e.g. Johnson 1997, Kirchner & Moore, to appear), as proxies for the auditory/motor signals which are in principle required.

The production output may also be influenced by phonetic pressures, such as bias towards articulatory reduction. Moreover, the exemplars may be weighted by recency and semantic appropriateness, or tagged with indexical or pragmatic variables, for purposes of production and recognition.

2.1. What are the categories? Word token frequency effects discussed in section II, such as Bybee's (2001) every--mammary reduction pattern, and Goldinger's (1996, 2001) low-frequency imitation result, motivate the identification of the exemplar category labels with words. On the other hand, an adequate exemplar-based theory of phonology must be able to capture sound patterns that pertain to smaller domains than the whole word, most obviously segments. Pierrehumbert (2002) therefore assumes that exemplars are also parsed into sub-word-level phonological unit categories, such as segments. A whole word exemplar of 'pit', for example, might simultaneously be classified, in its initial portion, as an exemplar of the segment [pʰ], etc. I argue below that this resort to a priori phonological unit categories is in fact unnecessary, given a production model in which portions (of any size) of exemplars can be compared to portions of other exemplars, for purposes of pattern generalization (section 3).

2.2. Motivation for exemplar theory.

For an overview of the growing body of (principally experimental) literature motivating exemplar- based phonology, see Gahl & Yu (2006) and articles contained therein, as well as Port 2007. See also Maier & Moore 2007, and DeWachter 2007, for exemplar-based approaches to automatic speech recognition. A few illustrative effects are considered below:

2.2.1 Incremental sound change. Assume a variable phonetic bias which causes the output for a particular word to deviate from previous exemplars, to a greater or lesser extent, in direction X. This output is immediately classified as a new exemplar of the target word, shifting the category mean subtly towards a more X-like pronunciation. This is incremental sound change at the word level.

2.2.2. Frequency-sensitivity. Assume that the phonetic bias above is one of articulatory reduction. The more often a word is produced, i.e. the higher its token frequency, the more often its outputs are subjected to the reduction bias, the more the word category mean shifts towards a reduced pronunciation. This is Bybee's every-memory-mammary reduction pattern.

2.2.3. Imitation: recency. Assume exemplars are weighted by a factor which decays over time. Recent exemplars therefore have a stronger influence on production outputs than older exemplars, particularly if the target word is of low token frequency, because there are fewer countervailing exemplars within the word category to resist the recent exemplars' influence. This is the Goldinger effect.

2.2.4. Imitation: individual and sociophonetic variation. In pragmatic contexts where imitation of a particular individual or identification with a particular group is important to the speaker, exemplars tagged as productions of that individual, or other group members, can be up-weighted, resulting in a temporary shift in productions towards speech characteristic of that individual or group (see Stuart- Smith 2007). Similarly, hearers can tune their perception to speaker and sociophonetic variation, by up- weighting stored exemplars of the speaker, or of other speakers with similar social characteristics. The hearer can thus avail herself of all her stored exemplars, while giving additional weight to exemplars of speakers similar in gender, age, and other relevant characteristics to the speaker, and greatest weight to exemplars of the speaker him/herself, if previously experienced.

2.2.5. Generalization to other words. As noted above, any phonologically adequate production model must be able to access and compare portions of word exemplars, not just whole words, to one another. One basic thing that phonological knowledge allows us to do, as humans, is to produce words that we have never uttered before, e.g. repeating a word just learned from another speaker. At the point of hearing this new word (and recognizing it as such), the individual acquires an exemplar encoding an auditory experience of the word, but no corresponding articulatory experience. Without articulatory information for this word, no motor plan can be output to the speaker's vocal tract. This deficiency can only be overcome by generalizing: in exemplar terms, by assembling a novel motor plan for the target word based on portions of exemplars of other words with similar auditory targets. In fact, the generalization issue is pervasive in speech production. Consider production of a word in some previously unencountered syntactic or pragmatic context, e.g. where it is subject to some phrasal phonological pattern; where it receives contrastive stress; or where a whispered or shouted production of the word is felicitous. Again, an adequate production model needs to generate a composite output -- one that blends together exemplars of the target word with contextually appropriate subsequences from exemplars of other word categories. A computational technique for doing this is presented in section 3. In the meantime, let us assume that this technique allows us to compare, inter alia, segment-sized portions of exemplars to one another, and to pool similar exemplar portions in order to form generalizations. Thus, for all the word-level effects described above, it should be possible to model corresponding segment-level effects.

2.2.6. Phonologization as pattern entrenchment. A basic property of the production model is that it generates outputs based on mean properties of the relevant clouds of exemplars. Whatever the initial variance, generating exemplars in the neighbourhood of the cloud mean results in the cloud's distribution sharpening about the mean, i.e. progressively less variance. This result is called 'pattern entrenchment' in the exemplar literature.

Putting the entrenchment idea together with the segment-level generalization story above, we must ask how conflict is resolved if a temporal portion of the exemplars of the target word exhibits one phonetic pattern, but the preponderance of similar portions of exemplars of other words exhibit a contrary pattern. Consider a pattern of affrication of /t/ before a high front vowel, presumably originating as a gradient pattern correlating with the release trajectory of the tongue blade and intra-oral air pressure at time of the release: lower velocity release and greater pressure would favour a greater degree of affrication. Assume the existence of a word whose exemplars happen not to conform to this pattern (e.g. [ati]), whereas most exemplars of many other words conform to the pattern ([botsi], [tsigama], [aratsiketo], etc.,). In a Usage-Based approach, pattern strength is assumed to depend on a trade-off between similarity (inversely proportional to distance) and frequency. In the scenario sketched above, the exemplars within the target word cloud are all relatively similar to each other in their entirety, favouring an output which conforms to the word cloud pattern, i.e. [ati]. By comparison, the other exemplars containing a [tsi] sequence are not very similar to one another globally, and consequently they exert no unified pull on the output, except for the pattern itself. Considerations of global similarity therefore favour maintenance of the word-level pattern. On the other hand, the exemplars containing a [t(s)i] sequence are a proper superset (perhaps a large superset) of the exemplars of the target word. Considerations of frequency therefore favour extension of the affrication pattern to the output. The winning pattern would depend on the actual numbers in the exemplar corpus. If the target word has a high token frequency, it may resist the general pattern (cf. Bybee's 2001 observation that high token frequency licenses phonological and morphological exceptionality in words).3 Otherwise, the output will succumb to the affrication pattern, resulting in a new exemplar, [atsi] for this word. The mean for this word category accordingly shifts slightly in the direction of affrication. The affrication outcome in future productions of this word now has the combined pull of the external exemplars, and this internal exemplar as well, making further affrication outcomes increasingly likely for this word (assuming there are no relevant competing patterns), and eventually obligatory. This is lexical diffusion of the affrication pattern. Once the pattern reaches a critical mass among the words of the lexicon, the forgoing dynamics make it inevitable that the pattern will become obligatory for all words that meet its structural conditions, hence Neogrammarian regularity as the end stage of the sound change. Moreover, this generalization of the pattern is independent of its phonetic origin: rather than occurring to varying degrees depending on tongue blade trajectory and intra-oral pressure, we arrive at an allophone which is stably conditioned by a following high front vowel. This is phonologization.

Note, though, that phonologization of the pattern does not mean original phonetic considerations favouring affrication cease to exist. Phonologized patterns can take on a life of their own and develop in ways that considerations of phonetic naturalness would not predict; but phonetic naturalness considerations continue to shape our speech, so long as we speak with our mouths and hear with our ears. The result is a system in which sound patterns may become mildly unnatural, but not completely arbitrary.

2.2.7. Word vs. segment recency effects. Putting the phonological generalization generalization story above together with recency weighting: a recency effect should be stronger if the recent exemplars and the target word are of the same type, and weaker if they only share similar portions, such as a segment: in the later case, the recency effect will be diluted by the greater number of relevant non-recent exemplars. But with sufficiently high recency weighting, the recent exemplars could have an observable effect even on the production of different words containing the same segment, the experimental effect reported by Delvaux & Soquet (2007) discussed in section II. But how do we

3 Bybee further claims that the propensity of a pattern to generalize depends on type frequency (how many word types instantiate the pattern). Type frequency is not directly computable in a pure exemplar-based model, as there are no types per se, only sets of tokens. Cliff and Kirchner (in progress), however, show that a type frequency effect emerges from an interaction between aggregate token frequency (the total number of tokens instantiating the pattern) and similarity. In brief, if the exemplars instantiating a pattern are phonetically diffuse, as would be the case if they are scattered over a large number of types, a target word need have little in common with each of the pattern-bearing exemplars in order to be subject to the pattern. If however, the pattern-bearing exemplars are tightly clustered into a few types, the target word will either have to conform to one of the types in all respects, effectively becoming homophonous with it, or it will escape the pull of the pattern entirely. reconcile Goldinger's recency effect, which was limited to low frequency words of the same type as the stimuli, with Delvaux & Soquet's recency effect, which extended beyond the stimulus word types? It may be significant that Delvaux & Soquet's effect reportedly persisted on the order of several minutes, whereas Goldinger's persisted for two weeks. In exemplar-based terms, the recency factor might decay to the point where the stimuli's effect becomes negligible at the segmental level, but still be strong enough to exert an effect at the whole-word level.

2.2.8. Caveat. It must be borne in mind that the forgoing claims about the behaviour of an exemplar- based speech processing model are merely conjectures on what seem to be likely outcomes, given input patterns with certain characteristics. Most assessments of exemplar theory's capacities, particularly in the experimental literature, unfortunately remain at this impressionistic level. Clearly, the phonological results of an approach involving computation of numeric similarities over various differentially weighted clouds of exemplars cannot be demonstrated with any rigour in the absence of an explicit computational model, implemented and tested on real speech data sets.

2.3. Treatment of the time dimension

Exemplar theory's development, however, has suffered from the lack of an explicit model capable of applying to real speech. The problem is that speech is variable-length time-series data. A slow-speech exemplar of a word may be considerably longer than a fast-speech exemplar. The discussion of exemplar-based comparison above assumed that it was possible to calculate the distances among a set of exemplars. But how does one calculate a distance between items of unequal length?

The recognition side of the model is not the central problem. A number of exemplar-based recognition models have been put forward, from Johnson's (1997) original X-Mod to the large-vocabulary continuous automatic speech recognition system of DeWachter 2007. A recognition model merely needs to assign a category label (or a sequence thereof) to an input signal based on its similarity to the variously labelled speech exemplars in memory. That is, matching whole word exemplars to whole word exemplars. This distance calculation can be done optimally using classic dynamic time warping (DTW), a well-understood computational technique for aligning two variable-length signals, locally stretching or shrinking subsequences within one to best fit the other (see generally Sankoff & Kruskal 1983).

Most of the phonologically interesting attributes exemplar theory, however, pertain to the production side of the model. Production involves a harder problem: generation of a concrete output signal from a target word category (or a sequence thereof). We have already observed that an adequate exemplar- based processing model must be able to identify patterns obtaining over portions of exemplars. Thus, we have the further problem of deciding how to identify and compare similar portions of different exemplars, which may likewise be of variable length, and may begin and end at different points in relation to the start or end of the exemplar. Pierrehumbert's (2001) exemplar model deals only with static, fixed-dimensional data, and so does not address the variable-length problem, in either recognition or production. It is not clear how Pierrehumbert's model might be extended to real speech. To recap, the production system needs to be able to generalize, but how can it generalize over a collection of unique, variable-length speech signals?

One response to this problem, adopted (but not computationally fleshed out) in Pierrehumbert 2002 and Wedel 2004, is to appeal to less time-variable units, such as segments. Segments can be characterized, albeit crudely, in terms of relatively static phonetic targets. Thus, if our exemplar system parses signals into segment as well as word categories, we can pool together all exemplars of, e.g. /s/, reduce these to fixed-dimensional vectors representing the phone 'target' (perhaps with contextual target measurements as well), abstracting away from temporal variation within the exemplars. We can now generate an output based on an average of these fixed-dimensional vector values. However, this move raises the non-trivial problem of how these categories come to be established. Moreover, this segmentation into a priori phonological units seems contrary to the spirit of exemplar theory. Phonological units such as segments are simply local patterns obtaining over speech signals, involving relatively stable correlations between auditory cues and articulatory gestures. Such units, like all phonological patterns, should emerge bottom-up from comparison over the exemplars, rather than being treated as primitives. Moreover, this approach fails to do justice to the rich dynamic structure of speech.

Rather than segmenting the dynamic signal into quasi-static chunks, one might adopt a dynamic computational technique ab initio. DTW, useful for calculating distances between whole exemplars for recognition purposes, can also be used, with certain enrichments, to solve the problem of identifying similar portions within exemplars, without resort to a priori phonological unit categories. The remainder of this chapter describes the Phonological Exemplar-Based Learning System (PEBLS) model of Kirchner & Moore (to appear).

3. PEBLS

To generate an output for a given word, PEBLS begins, as in Pierrehumbert's model, by randomly selecting an exemplar from this word class for use as the input. The production problem can now be cast as finding an optimal alignment between between the input and the word cloud. That is, the output is constructed from subsequences of the cloud exemplars which more-or-less correspond to subsequences of the input, and which more-or-less reflect typical subsequences (i.e. generalizations) within the cloud, as schematically represented in Figure 1. Figure 1: Output as alignment of input with cloud. Numbers indicate corresponding subsequences within the input and cloud, and the concatenation of these subsequences which form the output. Letters show the particular exemplar from which each output subsequence was taken.

The challenge lies in specifying an alignment criterion that can find these subsequences. PEBLS builds upon the DTW technique, with two particular innovations. Firstly, whereas DTW aligns a whole signal to another whole signal, PEBLS allows alignment of any frame4 of the input with any frame of any exemplar within the cloud -- transitioning forward or backward in time within any given exemplar, or from part of one exemplar to another. This move permits PEBLS in principle to find alignments of subsequences of one exemplar with subsequences of another exemplar, as suggested in Figure 1 – that is, to pool data on a less-than-whole-exemplar basis. Intuition suggests, though, that some transitions are more permissible than others, namely transitions similar to those instantiated within the cloud. To compute this permissibility, an intra-cloud transition network is constructed: a similarity matrix of the entire cloud to itself, offset by one frame. Cell (i,j) of this matrix thus encodes not the similarity of frame i to j, but the similarity of i to the frame that immediately precedes j. By means of this transition network, PEBLS takes into account not only how the input aligns with each exemplar in the cloud, but how the cloud aligns with itself -- getting emergent structure from self-similarity within the data.

Secondly, whereas DTW simply finds the minimum-distance alignment, PEBLS requires an alignment that generalizes, reflecting frame sequences which are in some sense prototypical of the cloud. To capture the generalization effect, the alignment criterion must incorporate some measure of the frequency of similar subsequences within the cloud. This problem is analogous to the statistical notion of confidence that a particular sample reflects the distribution of an underlying population. This 4 That is, the signal is preprocessed into a spectrographic or quasi-spectrographic representation (in Kirchner & Moore (to appear), actually a mel-frequency cepstral representation), where the representation consists of a sequence of frames, each frame representing an analysis of the acoustic signal during a fixed-width time window. The similarity calculation used throughout PEBLS is based on Euclidean distance between frames. confidence-sensitivity can be obtained by hierarchically clustering the vector of alignment scores from the previous frame at each dynamic programming step, and selecting the cluster that maximizes a function of the cluster's mean similarity, size, and variance. The criterion thus involves a potential trade-off between similarity and density (i.e. size over variance): a high-similarity but atypical alignment may lose to a somewhat lower-similarity alignment if drawn from a much higher-density cluster. This is PEBLS' implementation of the similarity-frequency trade-off for assessing the relative strength of competing patterns, discussed in sec. 2.2.6.

Kirchner and Moore (to appear) report that PEBLS, trained on a small corpus of recordings of short nonce words, generated outputs that reflect pattern entrenchment. That is, outputs for words tended to conform to the prevailing pattern within the word cloud, even when a pattern-violating input was selected. Moreover, when the model was applied iteratively, pattern-violating outputs became increasingly rare, eventually ceasing altogether. The model thus showed a diachronic progression from a variable to an obligatory pattern.

4. Conclusions

PEBLS provides a solution (though perhaps better solutions remain to be discovered) to the modelling problem which has hindered the development of exemplar theory, namely how to generate a composite output from a collection of unique, variable-length signals. PEBLS further provides the first explicit model of exemplar-based pattern entrenchment using real speech signals. Many of the conjectured capacities of exemplar-based phonology (section 2.2) remain to be established, such as modelling of recency effects, generalization of patterns outside the word class, modelling of sociophonetic variation, and modelling of top-down semantic and pragmatic effects.

Inasmuch as PEBLS computes a global optimization for the output, there exist deep parallels to Optimality Theory (or more directly, to Harmonic Grammar). The alignment process described above is analogous to OT enforcement of correspondence constraints. A more elaborated version of PEBLS would include soft constraints reflecting phonetic pressures as part of the optimization criterion, e.g. an energy minimization imperative. In PEBLS then, as in OT, phonological patterns would arise from conflict between constraints favouring current patterns (including patterns within the word-class, as with IO-faithfulness), and constraints favouring phonetic naturalness. PEBLS, however, computes over numeric signals rather than symbolic representations.

This approach is presented as a possible way forward for phonology, addressing the legitimate critique of Evolutionary Phonology by abandoning symbolically stated (therefore pseudo-phonetic) markedness constraints. At the same time, this exemplar-theoretic approach does not relegate markedness concerns to unformalized meta-theory, but rather seeks to model markedness effects explicitly, through interaction between direct phonetic constraints and the pattern-entrenching dynamics of an exemplar- based speech processing system.

References

Beddor, P. (2007) Nasals and nasalization: the relation between segmental and coarticulatory timing. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken.

Beddor, P., A. Brasher & C. Narayan (2007) Applying perceptual methods to the study of phonetic variation and sound change. In M. Solé, P. Beddor & M. Ohala (eds.) Experimental approaches to phonology. Oxford University Press. 127-143.

Boersma, P. (1998) Functional phonology. Amsterdam: Landelijke Onderzoekschool Taalwetenschap.

Bybee, J. (2001) Phonology and language use. Cambridge University Press.

Cliff, E. & R. Kirchner (in progress) An exemplar-based account of type-frequency effects in pattern generalization. Ms. U. Alberta.

Cohn, A. (2006) Is there gradient phonology? In G. Fanselow, C. Fery, M. Schlesewsky & R. Vogel (eds.) Gradience in grammar: generative perspectives. Oxford University Press, 25-44

Crosswhite, K. (2004) Vowel reduction. In Hayes et al. 2004, chapter 7.

Delvaux, V. & A. Soquet (2007) The influence of ambient speech on adult speech production through unintentional imitation. Phonetica 64, 145-173.

DeWachter, M. (2007) Example based continuous speech recognition. Doctoral dissertation, Katholieke Universiteit Leuven.

Flemming, E. (1995) Auditory representations in phonology. Doctoral dissertation, UCLA.

Gahl, S. (2008) Time and Thyme are not homophones: The effect of lemma frequency on word durations in spontaneous speech. Language 84:3, 474-496.

Gahl, S. & A. Yu (2006) Introduction to the special issue on exemplar-based models in linguistics. The Linguistic Review, 23:3, 213.

Giannelli, L. & L. Savoia (1979) Indebolimento consonantico in Toscana, Revista Italiana di Diallettologia 2: 23-58.

Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental : Learning, Memory, and , 22, 1166–1183.

Goldinger, S. D. (2000) The role of perceptual episodes in lexical processing. In A. Cutler, J.M. McQueen, and R. Zondervan (eds.) Proceedings of SWAP (Spoken Word Access Processes), Nijmegen, Max Planck Institute for Psycholinguistics. 155-159.

Hayes, B., R. Kirchner & D. Steriade (2004) Phonetically based phonology. Cambridge University Press.

Hyman, L. (1975) Phonology: theory and analysis. NY: Holt, Rinehart & Winston. Johnson, K. (1997) without speaker normalization. In K. Johnson & J. Mullennix (eds) Talker variability in speech processing. San Diego: Academic Press.

Kirchner, R. (1999) Preliminary thoughts on phonologization within an exemplar-based speech- processing system, UCLA Working Papers in Linguistics (Papers in Phonology 2), M. Gordon, ed., 1, 205-231.

Kirchner, R. (2004) Consonant Lenition. In Hayes et al. (2004), chapter 10.

Kirchner, R. & R.K. Moore (to appear) Computing phonological generalization over real speech exemplars. Journal of Phonetics [available as ROA 1007-1208].

Lindblom, B. (1963) Spectrographic study of vowel reduction. JASA 35, 1773-1781.

Mitchell, T. (1997) Machine Learning. McGraw-Hill.

Moore R. K. (2007) Spoken language processing: piecing together the puzzle. J. Speech Communication, Special Issue on Bridging the Gap Between Human and Automatic Speech Processing, 49, 418-435.

Moore, R.K. & V. Maier (2007) Preserving fine phonetic detail using episodic memory: automatic speech recognition with MINERVA2, Proc. ICPhS, Saarbruchen.

Pierrehumbert, J. (1994) Knowledge of variation. In K. Beals, J. Denton, R. Knippen, L. Melnar, H. Suzuki & E. Zeinfeld (eds.) Papers from the 30th Meeting of the Chicago Linguistics Society. Vol 2. Papers from the parasession on variation, 232-256.

Pierrehumbert, J. (2001) Exemplar dynamics: word frequency, lenition, and contrast. J. Bybee & P. Hopper (eds.), Frequency effects and the emergence of linguistic structure, 137-157, Amsterdam: John Benjamins.

Pierrehumbert, J. (2002) Word-specific phonetics. In Carlos Gussenhoven & Natasha Warner (eds.), Papers in Laboratory Phonology VII, Berlin: Mouton de Gruyter. 101-140.

Pierrehumbert, J., M. Beckman, D. Ladd (2000) Conceptual Foundations of Phonology as a Laboratory Science. In N. Burton-Roberts, P. Carr, & G. Docherty (eds.) Phonological Knowledge: Its Nature and Status, 273-303. Cambridge University Press.

Port. R. (2007) How are words stored in memory? Beyond phones and phonemes. New Ideas in Psychology 25, 143-170.

Sankoff, D. & J. Kruskal (1983) Time warps, string edits and macromolecules. CSLI Publications.

Stuart-Smith, J. (2007) A sociophonetic investigation of postvocalic /r/ in Glaswegian adolescents. Proceedings of the XVIth International Congress of Phonetic Sciences, Saarbrücken, 1307. Tilsen, S. (2009). Multitimescale dynamical interactions between speech rhythm and gesture. Cognitive Science, 33, 839-879.

Warner, N., A. Jongman, J. Sereno, R. Kemps (2004) Incomplete neutralization and other sub- phonemic durational differences in production and perception: evidence from Dutch. Journal of Phonetics 32, 251–276.

Wedel, A. (2004) Self-organization and categorical behavior in phonology. PhD dissertation, UC Santa Cruz.