Deriving Word Prosody from Orthography in Hindi
Total Page:16
File Type:pdf, Size:1020Kb
Deriving Word Prosody from Orthography in Hindi Somnath Roy Centre for Linguistics Jawaharlal Nehru University New Delhi-110067 [email protected] Abstract expert knowledge (i.e., the rule-set designed by an expert). However, these rule-sets may not be This study proposes a word prosody exhaustive for capturing many language-specific converter (WPC), which takes Hindi properties such as word morphology and stress grapheme as input and yields output as pattern (Pagel et al., 1998).Therefore, researchers a sequence of phonemes with syllable nowadays rely on state-of-the art machine learn- boundaries and stress mark. The WPC ing (data-driven) techniques for developing a G2P has two submodules connected in the lin- model. A data-driven system is trained using a ear fashion. The first submodule is a manually annotated dataset. The manually anno- grapheme to phoneme (G2P) converter. tated dataset contains words and its phonemic se- The output of G2P converter is fed to the quence. These datasets are language specific in second submodule which is for prosody nature. The machine learning algorithm learns the specific job. The second submodule con- phonemic sequence for words based on the prob- sists of two finite state machines (FSMs). abilistic or geometric calculation. These calcu- The first FSM does the syllabification and lation varies across machine learning approaches. the second assigns prosodic labels to the In data-driven approaches, one need not to worry syllabified strings. The prosodic labels about the language specific complexities such as are translated into the stressed and un- word morphology and stress pattern. The algo- stressed component using rules specific to rithm automatically captures these patterns in the the language. This study proposes a novel generated model. A data-driven G2P conversion rule-based system which uses non-linear process is broadly categorized into three subpro- phonological rules with the provision of cesses i) Sequence alignment ii) Model training recursive foot structure for G2P conver- and iii) Decoding (for details see (Novak et al., sion and prosodic labeling. The imple- 2012)). Many data-driven techniques are available mentation1 of the proposed rules outper- for G2P conversion. The important ones are deci- forms the G2P models trained on the state sion tree (Black et al., 1998), Conditional Random of the art data-driven techniques such as Field (Wang and King, 2011), Hidden Markov joint sequence model (JSM) and LSTM. Model (Taylor, 2005), Joint-Sequence techniques 1 Introduction (Bisani and Ney, 2008) and Recurrent Neural Net- work (Rao et al., 2015). A dictionary is an essential component of a text- to-speech (TTS) and an automatic speech recog- The function of a word prosody model is simi- nition (ASR) system. These systems are of open lar to that of a grapheme to phoneme (G2P) con- nature and can have an input word which is not verter. Moreover, it also describes syllable bound- present in the dictionary. Such input words are aries and predict stressed syllables in a word. called out-of-vocabulary (OOV) words. There- The schematic diagram of word prosody model fore, a G2P converter is required, which can gen- is shown in Fig 1. The accuracy of a word erate the pronunciation of the OOV words. A G2P prosody module for Hindi language depends on converter can be a rule-based or data driven sys- an efficient solution of the two sub-problems well- tem. A rule-based G2P converter relies on the known in Hindi phonology as schwa deletion 1https://github.com/somnat/Hindi-Word-Prosody-Hindi- and pronunciation of diacritic marks anusvara and G2P 2 anunasika (Ohala, 1983; Pandey, 1989; Pandey, S Bandyopadhyay, D S Sharma and R Sangal. Proc. of the 14th Intl. Conference on Natural Language Processing, pages 2–12, Kolkata, India. December 2017. c 2016 NLP Association of India (NLPAI) d. The usefulness of syllable as the basic lin- guistic unit in the context of speech recognition system has been explored in English (Ganapathi- raju et al., 2001) and Tamil (Lakshmi and Murthy, 2006). Similar work for Hindi requires a software for syllabification. This work fulfills that need. 1.1 Main Contributions The WPC does not require the information • of morphological boundaries. The proposed rules take into account the syllable patterns of compound, derived and inflected words. The syllabification and syllable labeling pro- • cess follow finite state machine. The faultless syllabification and syllable labeling at under- Figure 1: Schematic Diagram of Word Prosody lying phonemic form yields better accuracy Model in schwa deletion and pronunciation of di- acritic—anusvara and anunasika. The syl- labification at underlying phonemic form is 1990; Narasimhan et al., 2004; Pandey, 2014). called as I-level syllabification in this work. Ohala used linear phonological rules to derive sur- face phonemic form. Pandey showed the superior- The rules proposed in this study assume the • ity of non-linear phonological rules over the linear extrametricality of foot unlike syllable as pro- one. The motivation for the current work is stated posed in (Pandey, 2014). The contention below. is that the stress can be predicted elegantly a. In the past, Hindi G2P converters were imple- using the notion of extrametrical foot (Mc- mented in the context of speech synthesis (Bali et Carthy and Prince, 1990; Crowhurst, 1994) . al., 2004), (Narasimhan et al., 2004) and (Choud- Also, the directionality is LR (left to right) hury, 2003). However, these works have given unlike RL (right to left) used in (Pandey, partial attention to the anusvara/anunasika disam- 2014). biguation. (Pandey, 2014) describes it as the prob- Anusvara and anunasika are used inter- lem of Hindi orthography. • changeably in Hindi. Therefore, both anus- b. These G2P converters are based on lin- vara and anunasika is mapped to a hypothet- ear phonological rules proposed by (Ohala, 1983) ical phoneme X at the underlying phonemic with the exception of (Pandey, 2014). Non-linear form. The decision for homo-organic nasal phonological rules have advantages over linear consonant or a nasalized vowel for phoneme one as explained below. (Bernhardt and Gilbert, X is based on the minimum moraic weight 1992). difference of the syllable having phoneme X i. Non-linear rules capture both the prosodic and the next syllable. The moraic weight and segmental information. difference is calculated after schwa deletion ii. The hierarchical representation used in non- and re-syllabification. The proposed map- linear framework captures more information; this ping rule almost removes the pronunciation results in a compact rule set. ambiguity related to anusvara and anunasika. c. Syllable is known to be a better unit for Hindi speech synthesis (Bellur et al., 2011; Kishore and Rest of this paper is organized as follows. Sec- Black, 2003). Therefore, a Hindi text-to-speech tion 2 describes the salient points of metrical (TTS) system needs an automatic syllabification phonology relevant to this work. Section 3 de- module. The automatic syllabification would be scribes the process of syllabification and sylla- more useful if it could also predict the stressed ble labeling. Section 4 describes foot forma- syllables in words of natural speech as this would tion. Section 5 describes schwa deletion and re- facilitate synthesis. 3 syllabification. Section 6 describes the observa- tions and rules for the anusvara and anunasika pro- and Kleinhenz, 1999). nunciation. Section 7 describes the data-driven G2P systems implemented for Hindi. Section 8 3 I-Level Syllabification compares the performance of current system to I-level syllabification is derived from the under- data-driven systems and previous rule-based im- lying phonemic form (UPF), which in turn is de- plementations. Section 9 describes the rules for rived from orthography using the following map- the prediction of the stressed syllables and reports ping rules. the accuracy of current system for syllabification i. Each consonant in Devanagari script is inher- and stress prediction. The conclusion and limita- ently associated with the mid-central vowel called tions are written in Section 10. schwa or its lower counterpart ”a”2. ii. If a consonant is followed by a vowel dia- 2 Theoretical Background critic mark, or a diacritic called halant, the inher- Metrical phonology is based on nonlinear arrange- ent schwa is deleted. ment of the constituents of a phrase (Liberman and iii. The inherent schwa is not realized in case of Prince, 1977; Selkirk, 1980; Hayes, 1980; Selkirk, consonant at word final position. 1986; Hayes, 1995; Apoussidou, 2006). The non- iv. Two or three consonant together can form a linear arrangement is realized in the form of a tree ligature. with nodes as the constituents of a phrase. The v. A short vowel at word final position is length- constituents are syllable, foot, phonological word, ened. phonological phrase and intonational phrase. Syl- The following examples illustrate derivation of lable is the lowest unit in the hierarchy dominated UPF from orthography: /kml/ k@m@l (Lotus) by foot, which in turn is dominated by a phono- → /kmAl/ k@ma:l logical word. The higher units such as phonolog- → ical phrase and intonational phrase are not rele- The process of syllabification in Hindi was vant in the current work ( for clarity see fig 4 - explored by (Ohala, 1983) and (Pandey, 1989; 9). Syllable functions as a domain for segmen- Pandey, 2014). Their analysis do not talk about tal phonological rules. In non-linear phonology, the maximal onset principle for syllabification. the rules are written on the basis of interaction The present analysis for syllabification follows among syllables under the domain of higher con- maximum onset principle (Selkirk, 1984; Selkirk, stituents. A syllable has obligatory rhyme and op- 1981). The maximum onset principle is a suffi- tional coda. The syllables are also described by ciency condition as demonstrated by the following the moraic weight in quantity-sensitive languages examples.