Exploring Phonetic Realization in Danish by Transformation-Based Learning
Total Page:16
File Type:pdf, Size:1020Kb
Exploring phonetic realization in Danish by Transformation-Based Learning Marcus Uneson*, Ruben Schachtenhaufen** Lund University*, Copenhagen Business School** Abstract We align phonemic and semi-narrow phonetic transcriptions in the DanPASS corpus and extend the phonemic description with sound classes and with traditional pho- netic features. From this representation, we induce rules for phonetic realization by Transformation-Based Learning (TBL). The rules thus learned are classified accord- ing to relevance and qualitatively evaluated. Introduction consonants are realized differently depending on syllable position: /p t k/ are aspirated in onset and Language abounds with classification tasks – unaspirated in coda; /d g v r/ are contoid in onset some we solve ourselves, some we hand over to and vocoid or ; in coda. machines. In the latter case, we may or may not The realization of /ə/ is quite complex. More be interested in what the machine actually learns. often than not it is elided, leaving its syllabic Stochastic classifiers such as HMMs and SVMs trait and compensatory lengthening on adjacent are useful for many purposes, but their target rep- sounds. The combination of consonant gradation resentation is usually inscrutable to humans. Rule in coda and a very fleeting /ə/ results in a highly learners, on the other hand, may or may not match unstable sound structure in current Danish. stochastic classification performance, but what In traditional descriptions, being based mainly they learn may be interesting in itself – sometimes, on conservative, careful, read speech, Danish it might even be the main point. phonemes typically have one or two, rarely three, In the present paper, we explore one of those allophones, e.g. “/d/ > [ð] in coda, [d] else- cases: the application of a well-known rule in- where”. In spontaneous speech phonemes have duction technique, Transformation-Based Learn- a much wider range of realization – for instance, ing (Brill, 1995), on phonetic string representa- in DanPASS (see below), /d/ is transcribed [d ð tions. The problem can be phrased thus: given ɾ ɹ t z s], among others. a phonemic and a semi-narrow phonetic transcrip- tion of speech, can we extract transformation rules Transformation-based learning which will take the first to the second, or at least part of the way? If so, do these rules give us any Transformation-based learning (TBL) was pro- new insights? Somewhat less abstractly, our aim posed by Eric Brill (Brill, 1995). It is, in a one- is to automatically induce typical textbook rules sentence summary, a supervised machine learning for phonetic realization, from a transcribed, real- method producing a compact, ordered, human- world corpus of spontaneous, connected speech. readable list of classification rules (or transforma- The language under study is Danish, where, ar- tions), each chosen greedily from a set of candi- guably, the distance between these two represen- dates dynamically calculated from user-supplied tations is particularly noteworthy. patterns (the templates), so that it maximally re- duces (a function of) the difference between the Background system’s present idea of the classification (the cur- rent corpus) and a gold standard (the truth). One Danish phonology sentence is likely not enough; we refer to Brill (1995). Grønnum (2005) analyzes Danish phonology into The task at hand reminds somewhat of letter- 11 vowel phonemes (/i e ɛ a y ø œ u o ɔ ə/) and to-sound (LTS) conversion, to which TBL also 15 consonant phonemes (/m n p t k b d g f v s has been applied (Bouma, 2000). Abstractly, both h l r j/), plus the prosodic elements stød, length, problems concern transforming one string repre- and stress. Briefly, most non-high vowels are re- sentation of language into another.One major dif- alized more open before and/or after /r/, and some ference is that LTS aims at lexical pronunciation: der igen er i midten Experimental setup phonemic: deːʔr_iˈgɛn ɛr_iːʔ ˈmetən phonetic: daɪˈgɛn ˈaɪ ˈmedn̩ We employed the µ-TBL system (Lager, 1999), with the semi-narrow transcription tier taken as Figure 1: DanPASS phonemic and phonetic tiers truth and the phonemic tier as the initial current for der igen er i midten ’that again is in the middle’ corpus. TBL requires that the current corpus and the truth are containers of the same shape, which in the present case requires alignment of the transcriptions; for this task, we used the sound class alignment method proposed by List (2010).2 it usually has a well-defined target. Phonetic re- Since our interest lies in rules which apply with alizations, by contrast, have several influencing few or no exceptions, all rules were required have factors but few truly functional dependencies. In a minimum accuracy of 0.95. this paper we will pay more attention to the rules The problem encoding required more consid- themselves extracted than how close to the (partly eration. Rules should of course be conditioned arbitrary) target they will take us. on the immediate phonetic context. Importantly, however, the learner should also be capable of at The present study least simple generalizations: if rule R applies in phonetic environment A, and phonetic environ- On transcriptions ment B is “similar” to A, then maybe R applies in B as well? One way to operationalize the no- Although historically much used, the method of tion of similarity is to partition the phoneme set taking transcriptions as point of departure for into predefined sound classes; another is to allow phonetic conclusions is not without its problems. subphonemic descriptions. On a closer look, these Transcriptions imply a simplistic and much re- are actually not very different: they both define duced ’beads-on-a-string’ view on speech, often characteristic functions and allow a rule learner to with weak support in data which have not been fil- construct predicates on a given environment. tered through the perception of a native speaker. In the experiments described below, the sound In the words of Grønnum (2009), “phonetic no- classes follow a suggestion by Dolgopolsky, as tation, specically of the rather narrow kind, and adapted and extended by List (2010). In princi- prosodic labeling are both impressionistic exer- ple, the entire IPA space is partitioned into the cises”. For the purposes of this paper, however, classes in Table 1. The subphonemic description we will accept this armchair view. of a phoneme is simply its associated features in the traditional sense, treated as sets. A sample of The DanPASS corpus the corpus thus encoded (which also exemplifies 3 Our data is certainly not armchair; it was taken the alignment) is given in Table 2. from the DanPASS corpus1 (Danish Phoneti- The TBL templates chosen to operate on these cally Annotated Spontaneous Speech) (Grønnum, features are given in condensed form below. An 2009). In total, the corpus comprises about 10 additional constraint was that an elided segment hours (73kW) of annotated high-quality record- would not be subject to further changes. ings of connected speech produced by 27 speak- Change segment A to segment B when … ers, distributed among several tasks in non- scripted monologue and dialogue. DanPASS ad- • …(left/right) (segment/segment class) is X; dresses no particular research need specifically, • …left (segment/segment class) is X and but is generally well suited for studying phe- right (segment/segment class) is Y; nomena associated with connected, spontaneous speech. With the exception of a small fraction of • …(left/right) (segment/segment class) is X non-spontaneous speech (elicited word lists), we and the next neighbour (segment/segment used all of it. class) is Y; The phonemic transcription in DanPASS is based on the analysis of Grønnum (2005) men- • …(left/right) segment has feature F; tioned above. The annotations of DanPASS are 2http://lingulist.de/lingpy/ available as Praat tiers. The ones of concern here 3It is worth noting that µ-TBL permits Prolog code as part are the the phonemic notation and and the semi- of the template specifications, thus forming a little embedded narrow phonetic notation. Figure 1 shows a small language. Whether the class and phonetic features of Table 2 corpus sample for these. were prespecified or calculated dynamically only influences running time and memory use, not semantics. This is very 1http://danpass.dk useful in interactive experimentation. Table 1: Dolgopolsky sound classes for phonolog- Table 2: The DanPASS sample of Figure 1, where ical rules, as adapted by List (2010) phonemes are encoded with their identity (phm), Code Segment Example class (cls), and features. Implicit time axis runs P labial obstruents p,b,f from top to bottom. The two left columns also T dental obstruents d,t,θ,ð show the resulting alignment of the phonetic (pht) S sibilants s,z,ʃ,ʒ and phonemic transcription. K velar obstr.; dent. & alv. affricates k,g,ʦ,ʧ Pht Phm Cls Features M labial nasal m T [’voiced’, ’alveolar’, ’plosive’] N remaining nasals d d n,ɲ,ŋ A [’length-mark’, ’plosive’, ’glottal’, R liquids aɪ eːʔ r,l ’front’, ’unrounded’, ’close-mid’] W voiced labial fric.; init. rd. vowels v,u - r R [’voiced’, ’alveolar’, ’trill’] J palatal approximant j - i A [’front’, ’close’, ’unrounded’] H laryngeals and initial velar nasal h,ɦ,ŋ g g K [’voiced’, ’velar’, ’plosive’] A all vowels a,e,i ɛ ɛ A [’front’, ’open-mid’, ’unrounded’] n n N [’alveolar’, ’nasal’] aɪ ɛ A [’front’, ’open-mid’, ’unrounded’] - r R [’voiced’, ’alveolar’, ’trill’] - iːʔ A [’length-mark’, ’plosive’, ’glottal’, ’unrounded’, ’front’, ’close’] • …current segment shares feature F with m m M [’nasal’, ’bilabial’] (left/right/left and right) segment. e e A [’front’, ’unrounded’, ’close-mid’] d t T [’voiceless’, ’alveolar’, ’plosive’] - ə A [’schwa’] Results and discussion n̩ n N [’alveolar’, ’nasal’] From a training material of 210,000 phonemes and at score threshold of 10, the system learned 446 rules in about six hours.