What Do Phone Embeddings Learn About Phonology?

What do phone embeddings learn about Phonology? Sudheer Kolachina Lilla Magyar [email protected] [email protected] Abstract resentations using human judgement datasets (Ba- roni et al., 2014; Levy et al., 2015). Asr and Jones Recent work has looked at evaluation of phone (2017) use artificial language experiments to study embeddings using sound analogies and corre- the difference between similarity and relatedness lations between distinctive feature space and embedding space. It has not been clear what in evaluating distributed semantic models. Phone aspects of natural language phonology are embeddings induced from phonetic corpora have learnt by neural network inspired distributed been used in tasks such as word inflection (Sil- representational models such as word2vec. fverberg et al., 2018) and sound sequence align- To study the kinds of phonological relation- ment (Sofroniev and Ç oltekin¨ , 2018). Silfverberg ships learnt by phone embeddings, we present et al.(2018) show that dense vector representa- artificial phonology experiments that show tions of phones learnt using various techniques are that phone embeddings learn paradigmatic re- able to solve analogies such as p is to b as t is to lationships such as phonemic and allophonic distribution quite well. They are also able X, where X = d. They also show that there is a to capture co-occurrence restrictions among significant correlation between distinctive feature vowels such as those observed in languages space and the phone embedding space. with vowel harmony. However, they are un- Our goal in this paper is to understand better the able to learn co-occurrence restrictions among evaluation of phone embeddings. We argue that the class of consonants. significant correlation between distinctive feature 1 Introduction space and phone embedding space cannot be auto- matically interpreted as the model’s ability to cap- Over the last few years, distributed represen- ture facts about the phonology of natural language. tation models based on neural networks such Since many distinctive features tend to be pho- as word2vec (Mikolov et al., 2013a) and netically based, natural classes denoted by these GloVe (Pennington et al., 2014) have been of features capture phonetic facts as well as phono- much importance in speech and natural language logical facts. For example, the feature [±long] processing (NLP). The word2vec technique is denotes the distinction between long and short a shallow neural network that takes a text corpus vowels, which is a language-independent phonetic as input and outputs a vector space containing all fact. But, whether this distinction is a phonolog- unique words in the text. The dense vector rep- ical fact varies from language to language. It is resentations of words induced using word2vec important to make this distinction between pho- have been shown to capture multiple degrees netic facts and phonological facts when evaluating of similarities between words. Mikolov et al. phone embeddings for their learning of phonology. (2013a,b) show that word embeddings can solve In this paper, we propose an alternative method- word analogy questions and sentence completion ology to evaluate word2vec’s ability to learn tasks. Mikolov et al.(2013b) show that word phonological facts. We define artificial languages embeddings represent words in continuous space, with different kinds of phoneme-allophone dis- making it possible to perform algebraic opera- tinctions and co-occurrence restrictions and study tions, such as vector(King) − vector(Man) + vec- how well phone embeddings capture these rela- tor(Woman) = vector(Queen). Considerable atten- tionships. Several interesting insights regarding tion has been paid to evaluating these vector rep- the relationship between phonetics and phonol- 160 Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 160–169 Florence, Italy. August 2, 2019 c 2019 Association for Computational Linguistics ogy, the role of distinctive features and the task of dition to distinctive features in phonology, there distinctive feature/phoneme induction accrue from are also phonetic features that describe the artic- our experiments. ulatory and acoustic properties of phones (Lade- foged and Johnson, 2010). However, in practice, 2 Background and Related work there is considerable overlap between phonological distinctive features and phonetic features. This One major difference between words and phones already poses an interesting question about the is that while words are meaningful units in lan- nature of the relationship between phonetics and guage, phones have no meaning in themselves. phonology, which as we will see, is relevant to the However, as with words, there are clear patterns of evaluation of phone embeddings. organization of individual phones in a language. Next, let us examine the notion of correlation One well-known pattern in phonology is the dis- between distinctive feature space and phone em- tinction between contrastive and complementary bedding space to evaluate phone embeddings as distribution. Two phones are said to be in con- proposed by Silfverberg et al.(2018). Pair-wise trastive distribution if they occur in the same con- featural similarity is estimated using a metric such text and create a meaning contrast. For example, as Hamming distance or Jaccard index applied to b and k occur in word-initial position and create feature representations of phones. Pair-wise con- a contrast in meaning, such as in bæt versus kæt. textual similarity is estimated as cosine similar- This is why they are considered distinct phonemes ity between phone embeddings induced using a h in the language. On the other hand, p and p never technique like word2vec. The correlation be- occur in the same context, which is referred to as tween pairwise featural similarity and pairwise being in complementary distribution. Since they contextual similarity is estimated using Pearson’s are phonetically related, they are considered allo- r or Spearman’s ρ. The value of this correla- phones, variants of the same underlying phoneme. tion is shown for a number of languages in ta- The notions of contrastive and complementary dis- ble1. Data for Shona and Wargamay are taken tribution are purely based on context. They can from Hayes and Wilson(2008) 1. Similar datasets be considered instances of paradigmatic similar- were constructed for Telugu and the Vedic va- ity discussed in the distributed semantic literature. riety of Sanskrit2. For English, the CMU pho- Allophony also involves the notion of phonetic netic dictionary was used with a feature represen- similarity. Another pattern in natural language tation based on Parrish(2017) with some minor phonology is that of co-occurrence restrictions. extensions. The word2vec implementation in A well-known example is homorganic consonant the Gensim toolkit (Rehˇ u˚rekˇ and Sojka, 2010) was clusters. For example, in nasal plus stop clusters, used to induce phone embeddings using the fol- the nasal must have identical place of articulation lowing parameters- CBOW, dimensionality of 30, to the following stop. Yet another example of window size of 4, negative sampling of 3, mini- co-occurrence restriction in phonology is the phe- mum count of 5, learning rate of 0:05. We use nomenon of vowel harmony. In some languages, CBOW which predicts the most likely phone given a word can only have vowels which agree with re- a context of 4 phones in either direction as this is spect to certain features, such as backness, round- intuitively similar to the task of a phonologist. It ing or height. Co-occurrence restrictions can be would be interesting to compare CBOW and Skip- considered to be instances of syntagmatic similar- gram architectures and also, study the effect of dif- ity whereby words that frequently occur together ferent parameters on this correlation between dis- form a syntagm (phrase). Again, most types of co- tinctive feature space and phone embedding space. occurrence restrictions involve phonetic similarity. However, this is not the goal of our study. In this The traditional method to describe phones paper, we restrict our attention to the linguistic sig- in phonology is in terms of distinctive fea- nificance of this correlation. tures (Jakobson et al., 1951). Distinctive features All languages in Table1 show a significant pos- allow phones to be grouped into natural classes, itive correlation between distinctive feature space which are established on the basis of participa- 1 tion in common phonological processes. They https://linguistics.ucla.edu/people/ hayes/Phonotactics/index.htm#simulations allow for generalizations about phonotactic con- 2Datasets and code available at https://github. texts to be captured in an economical way. In ad- com/skolachi/sigmorphoncode 161 Language Size Pearson Spearman meaning, when embeddings of two phones show English 135091 0:589 0:612 Shona 4395 0:431 0:575 high similarity, it is not clear if it is an instance of Telugu 19627 0:349 0:350 paradigmatic similarity (phonemic relationship) or Wargamay 5910 0:411 0:428 syntagmatic similarity (co-occurrence restriction). Vedic 45334 0:351 0:285 English 4000 0:129 0:161 Feature Class Shona 4000 0:507 0:533 -high a0,a1,a2,aa1 Telugu 4000 0:202 0:206 +high i0,i1,i2,ii1,u0,u1,u2,uu1,w,y +long aa1,ii1,uu1 Wargamay 4000 0:219 0:387 -long a0,a1,a2,i0,i1,i2,u0,u1,u2 Vedic 4000 0:146 0:159 +back a0,a1,a2,aa1,u0,u1,u2,uu1,w -back i0,i1,i2,ii1,y -approximant N,b,d,g,j,m,n,nj Table 1: Correlation between distinctive feature space +approximant R,a0,a1,a2,aa1,i0,i1,i2,ii1,l,r,u0,u1,u2,uu1,w,y and embedding space, all values significant (p < 0:01) -sonorant b,d,g,j +sonorant N,R,a0,a1,a2,aa1,i0,i1,i2,ii1,l,m,n,nj,r,u0,u1,u2,uu1,w,y +syllabic a0,a1,a2,aa1,i0,i1,i2,ii1,u0,u1,u2,uu1 -syllabic N,R,b,d,g,j,l,m,n,nj,r,w,y +main a1,aa1,i1,ii1,u1,uu1 and embedding space.

Load more