Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights Devaraja Adiga1∗ Rishabh Kumar1∗ Amrith Krishna2
[email protected] [email protected] [email protected] Preethi Jyothi1 Ganesh Ramakrishnan1 Pawan Goyal3
[email protected] [email protected] [email protected] 1IIT Bombay, Mumbai, India; 2University of Cambridge, UK; 3IIT Kharagpur, WB, India Abstract grapheme-phoneme correspondences. Connected speech leads to phonemic transformations in utter- Automatic speech recognition (ASR) in San- acnes, and in Sanskrit this is faithfully preserved in skrit is interesting, owing to the various lin- guistic peculiarities present in the language. writing as well (Krishna et al., 2018). This is called The Sanskrit language is lexically productive, as Sandhi and is defined as the euphonic assimi- undergoes euphonic assimilation of phones at lation of sounds, i.e., modification and fusion of the word boundaries and exhibits variations in sounds, at or across the boundaries of grammatical spelling conventions and in pronunciations. In units (Matthews, 2007, p. 353). Phonemic orthog- this work, we propose the first large scale study raphy is beneficial for a language, when it comes to of automatic speech recognition (ASR) in San- designing automatic speech recognition Systems skrit, with an emphasis on the impact of unit (ASR), specifically for unit selection at both the selection in Sanskrit ASR. In this work, we release a 78 hour ASR dataset for Sanskrit, Acoustic Model (AM) and Language Model (LM) which faithfully captures several of the linguis- levels. tic characteristics expressed by the language. Regardless of the aforementioned commonali- We investigate the role of different acoustic ties preserved in both the speech and text in San- model and language model units in ASR sys- skrit, designing a large scale ASR system raises tems for Sanskrit.