Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules Fadi Biadsy and Nizar Habash and Julia Hirschberg Department of Computer∗ Science, Columbia† University, New York,∗ USA ∗ {fadi,julia}@cs.columbia.edu Center for Computational Learning Systems, Columbia University, New York, USA †
[email protected] Abstract dictionaries are typically written by hand. However, for morphologically rich languages, such as MSA,1 In this paper, we show that linguistically mo- pronunciation dictionaries are difficult to create by tivated pronunciation rules can improve phone hand, because of the large number of word forms, and word recognition results for Modern Stan- each of which has a large number of possible pro- dard Arabic (MSA). Using these rules and the MADA morphological analysis and dis- nunciations. Fortunately, the relationship between ambiguation tool, multiple pronunciations per orthography and pronunciation is relatively regu- word are automatically generated to build two lar and well understood for MSA. Moreover, re- pronunciation dictionaries; one for training cent automatic techniques for morphological anal- and another for decoding. We demonstrate ysis and disambiguation (MADA) can also be useful that the use of these rules can significantly in automating part of the dictionary creation process improve both MSA phone recognition and (Habash and Rambow, 2005; Habash and Rambow, MSA word recognition accuracies over a base- 2007) Nonetheless, most documented Arabic ASR line system using pronunciation rules typi- cally employed in previous work on MSA Au- systems appear to handle only a subset of Arabic tomatic Speech Recognition (ASR). We ob- phonetic phenomena; very few use morphological tain a significant improvement in absolute ac- disambiguation tools.