Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq & Elmar Nöth Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic Fadi Sindran
[email protected] Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany Firas Mualla
[email protected] Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany Tino Haderlein
[email protected] Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany Khaled Daqrouq
[email protected] Department of Electrical and Computer Engineering King Abdulaziz University, Jeddah, 22254, Saudi Arabia Elmar Nöth
[email protected] Faculty of Engineering/ Department of Computer Science Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, 91058, Germany Abstract Statistical studies based on automatic phonetic transcription of Standard Arabic texts are rare, and even though studies have been performed, they have been done only on one level – phoneme or syllable – and the results cannot be generalized on the language as a whole. In this paper we automatically derived accurate statistical information about phonemes, allophones, syllables, and allosyllables in Standard Arabic. A corpus of more than 5 million words, including words and sentences