2015 First International Conference on Arabic Computational Linguistics Processing Judeo-Arabic Texts Kfir Bar, Nachum Dershowitz, Lior Wolf, Yackov Lubarsky Yaacov Choueka School of Computer Science Friedberg Genizah Project Tel Aviv University Beit Hadefus 20 Ramat Aviv, Israel Jerusalem, Israel {kfirbar,nachum,wolf}@tau.ac.il,
[email protected] [email protected] Abstract—Judeo-Arabic is a set of dialects spoken and borrowings, which cannot be transliterated into Ara- and written by Jewish communities living in Arab bic, but rather need to be translated into Arabic. Those countries. Judeo-Arabic is typically written in Hebrew embedded words sometimes get inflected following Arabic letters, enriched with diacritic marks that relate to the al-shkhina, “the) אלשכינה ,underlying Arabic. However, some inconsistencies in morphological rules; for example rendering words in Hebrew letters increase the level of divine spirit”), where the prefix al is the Arabic definite ambiguity of a given word. Furthermore, Judeo-Arabic article, and the word shkhina is the Hebrew word for divine texts usually contain non-Arabic words and phrases, spirit. such as quotations or borrowed words from Hebrew A large number of Judeo-Arabic works (philosophy, and Aramaic. We focus on two main tasks: (1) auto- matic transliteration of Judeo-Arabic Hebrew letters Bible translation, biblical commentary, and more) are cur- into Arabic letters; and (2) automatic identification of rently being made available on the Internet (for research language switching points between Judeo-Arabic and purposes). However, most Arabic speakers are unfamiliar Hebrew. For transliteration, we employ a statistical with the Hebrew script, let alone the way it is used to translation system trained on the character level, re- render Judeo-Arabic.