A Generative Model of Phonotactics

A Generative Model of Phonotactics Richard Futrell Adam Albright Brain and Cognitive Sciences Department of Linguistics Massachusetts Institute of Technology Massachusetts Institute of Technology [email protected] [email protected] Peter Graff Timothy J. O’Donnell Intel Corporation Department of Linguistics [email protected] McGill University [email protected] Abstract tuitions reveal that speakers are aware of the restrictions on sound sequences which can make up possi- We present a probabilistic model of phono- ble morphemes in their language—the phonotactics tactics, the set of well-formed phoneme se- of the language. Phonotactic restrictions mean that quences in a language. Unlike most compu- each language uses only a subset of the logically, tational models of phonotactics (Hayes and or even articulatorily, possible strings of phonemes. Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, model- Admissible phoneme combinations, on the other ing a process where forms are built up out hand, typically recur in multiple morphemes, lead- of subparts by phonologically-informed struc- ing to redundancy. ture building operations. We learn an inven- It is widely accepted that phonotactic judgments tory of subparts by applying stochastic memo- may be gradient: the nonsense word blick is better ization (Johnson et al., 2007; Goodman et al., as a hypothetical English word than bwick, which 2008) to a generative process for phonemes structured as an and-or graph, based on con- is better than bnick (Hayes and Wilson, 2008; Al- cepts of feature hierarchy from generative bright, 2009; Daland et al., 2011). To account for phonology (Clements, 1985; Dresher, 2009). such graded judgements, there have been a vari- Subparts are combined in a way that allows ety of probabilistic (or, more generally, weighted) tier-based feature interactions. We evaluate models proposed to handle phonotactic learning and our models’ ability to capture phonotactic dis- generalization over the last two decades (see Da- tributions in the lexicons of 14 languages land et al. (2011) and below for review). How- drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher proba- ever, inspired by optimality-theoretic approaches to bilities to held-out forms than a sophisticated phonology, the most linguistically informed and suc- N-gram model for all languages. We also cessful such models have been constraint-based— present novel analyses that probe model be- formulating the problem of phonotactic generaliza- havior in more detail. tion in terms of restrictions that penalize illicit combinations of sounds (e.g., ruling out ∗bn-). generative 1 Introduction In this paper, by contrast, we adopt a approach to modeling phonotactic structure. Our People have systematic intuitions about which se- approach harkens back to early work on the sound quences of sounds would constitute likely or un- structure of lexical items which made use of mor- likely words in their language: Although blick is not pheme structure rules or conditions (Halle, 1959; an English word, it sounds like it could be, while Stanley, 1967; Booij, 2011; Rasin and Katzir, 2014). bnick does not (Chomsky and Halle, 1965). Such in- Such approaches explicitly attempted to model the 73 Transactions of the Association for Computational Linguistics, vol. 5, pp. 73–86, 2017. Action Editor: Eric Fosler-Lussier. Submission batch: 8/2016; Revision batch: 11/2016; Published 2/2017. c 2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00047 by guest on 26 September 2021 redudancy within the set of allowable lexical forms cally universal inventory. In our case, this amounts in a language. We adopt a probabilistic version of to the idea that an inventory of segments and sub- this idea, conceiving of the phonotactic system as segments can be acquired by a learner that stores the component of the linguistic system which gen- and reuses commonly occuring segments in partic- erates the phonological form of lexical items such ular, phonologically relevant contexts. In short, we as words and morphemes.1 Our system learns in- view the problem of learning the phoneme inven- ventories of reusable phonotactically licit structures tory as one of concentrating probability mass on the from existing lexical items, and assembles new lex- segments which have been observed before, and the ical items by combining these learned phonotac- problem of phonotactic generalization as learning tic patterns using phonologically plausible structure- which (sub-)segments are likely in particular tier- building operations. Thus, instead of modeling based phonological contexts. phonotactic generalizations in terms of constraints, we treat the problem as a problem of learning lan- 2 Model Motivations guage specific inventories of phonological units and language specific biases on how these phones are In this section, we give an overview of how our likely to be combined. model works and discuss the phenomena and the- Although there have been a number of earlier gen- oretical ideas that motivate it. erative models of phonotactic structure (see Sec- tion 4) these models have mostly used relatively 2.1 Feature Dependency Graphs simplistic or phonologically implausible representa- Most formal models of phonology posit that seg- tions of phones and phonological structure-building. ments are grouped into sets, known as natural By contrast, our model is built around three repre- classes, that are characterized by shared articulatory sentational assumptions inspired by the generative and acoustic properties, or phonological features phonology literature. First, we capture sparsity in (Trubetzkoy, 1939; Jakobson et al., 1952; Chomsky the space of feature-specifications of phonemes by and Halle, 1968). For example, the segments /n/ and using feature dependency graphs—an idea inspired /m/ are classified with a positive value of a nasal- by work on feature geometries and the contrastive ity feature (i.e., NASALITY:+). Similarly, /m/ and hierarchy (Clements, 1985; Dresher, 2009). Sec- /p/ can be classified using the labial value of a ond, our system can represent phonotactic general- PLACE feature, PLACE:labial. These features al- izations not only at the level of fully specified seg- low compact description of many phonotactic gen- ments, but also allows the storage and reuse of sub- eralizations.2 segments, inspired by the autosegments and class From a probabilistic structure-building perspec- nodes of autosegmental phonology. Finally, also in- tive, we need to specify a generative procedure spired by autosegmental phonology, we make use of which assembles segments out of parts defined in a structure-building operation which is senstitive to terms of these features. In this section, we will build tier-based contextual structure. up such a procedure starting from the simplest possi- To model phonotactic learning, we make use of ble procedure and progressing towards one which is tools from Bayesian nonparametric statistics. In par- more phonologically informed. We will clarify the ticular, we make use of the notion of lexical mem- 2 oization (?; Goodman et al., 2008; Wood et al., For compatibility with the data sources used in evaluation 2009; O’Donnell, 2015)—the idea that language- (Section 5.2), the feature system we use here departs in several ways from standard feature sets: (1) We use multivalent rather specific generalizations can be captured by the stor- than binary-valued features. (2) We represent manner with a age and reuse of frequent patterns from a linguisti- single feature, which has values such as vocalic, stop, and fricative. This approach allows us to refer to manners more 1Ultimately, we conceive of phonotactics as the module compactly than in systems that employ combinations of features of phonology which generates the underlying forms of lexical such as sonorant, continuant, and consonantal. For items, which are then subject to phonological transformations example, rather than referring to vowels as ‘non-syllabic’, we (i.e., transductions). In this work, however, we do not attempt refer to them using feature value vocalic for the feature to model transformations from underlying to surface forms. MANNER. 74 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00047 by guest on 26 September 2021 generative process here using an analogy to PCFGs, zero or nominal mass on any sequence containing but this analogy will break down in later sections. the segment /x/, although this is a logically possi- The simplest procedure for generating a seg- ble phoneme. So our generative procedure for a ment from features is to specify each feature phoneme must be able to learn to generate only the independently. For example, consider the set licit segments of a language, given some probabil- of feature-value pairs for /t/: {NASALITY:-, ity distributions at the and- and or-nodes. For this PLACE:alveolar, ...}. In a naive generative pro- task, independently sampling values at and-nodes cedure, one could generate an instance of /t/ by inde- does not give us a way to rule out particular com- pendently choosing values for each feature in the set binations of features such as those forming /x/. {NASALITY, PLACE, ...}. We express this process Our approach to this problem uses the idea of using the and-or graph notation below. Box-shaped stochastic memoization (or adaptation), in which the

Load more