INFORMATICA, 2019, Vol. 30, No. 3, 573–593 573 2019 Vilnius University DOI: http://dx.doi.org/10.15388/Informatica.2019.219 Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription 1,4 2 Gailius RAŠKINIS ∗, Gintar˙ePAŠKAUSKAITE˙ , Aušra SAUDARGIENE˙ 1,3, Asta KAZLAUSKIENE˙ 1, Airenas VAIČIUNAS¯ 1 1 Vytautas Magnus University, K. Donelaičio 58, LT-44248, Kaunas, Lithuania 2 Kaunas University of Technology, K. Donelaičio 73, LT-44249, Kaunas, Lithuania 3 Lithuanian University of Health Sciences, Eiveniu˛4, LT-50161, Kaunas Lithuania 4 Recognisoft, Ltd., K. Donelaičio 79-1, LT-44249, Kaunas, Lithuania e-mail:
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] Received: June 2018; accepted: May 2019 Abstract. Conventional large vocabulary automatic speech recognition (ASR) systems require a mapping from words into sub-word units to generalize over the words that were absent in the training data and to enable the robust estimation of acoustic model parameters. This paper surveys the research done during the last 15 years on the topic of word to sub-word mappings for Lithua- nian ASR systems. It also compares various phoneme and grapheme based mappings across a broad range of acoustic modelling techniques including monophone and triphone based Hidden Markov models (HMM), speaker adaptively trained HMMs, subspace gaussian mixture models (SGMM), feed-forward time delay neural network (TDNN), and state-of-the-art low frame rate bidirectional long short term memory (LFR BLSTM) recurrent deep neural network. Experimental comparisons are based on a 50-hour speech corpus.