Uyghur language processing on the Web Dr. Waris Abdukerim Janbaz , Prof. Imad Saleh Paragraphe Laboratory, University of Paris VIII, France
[email protected],
[email protected] http://paragraphe.univ-paris8.fr Abstract navigators) and correctly displaying Uyghur characters In this paper, we discuss some important issues related to presented huge difficulties. In spite of the fairly passive web processing of an agglutinative Turkic language – attitude of Government authorities to the development of Uyghur. Especially, we will discuss the advent of Uyghur information technology, many individuals started grassroots efforts on Uyghur Unicode font developing, creating Uyghur websites using the three above Uyghur character displaying, font embedding and mentioned script. ASU, used by the most populous Uyghur character inputting method within Uyghur- segment of XUAR Uyghurs caused special coding support-less environment. We will also introduce a problems given that it uses a non-standard set of Arabic- multiscript conversion application to further use the based glyphs. Unicode standard for Uyghur language processing. 2. Background Keywords: Unicode, Font, Turkic Language, multiscript, For ASU, before 2002, either of the two following transliteration, Arabic-Script Uyghur, Cyrillic-Script methods became very common on web publishing in Uyghur, Latin-Script Uyghur. Uyghur: 1) font downloading; and/or 2) image format. There is no need to explain the inconvenience of the 1. Introduction second method. More interesting but complex problems The Uyghurs are a Turkic-speaking ethnic group, occurred in the case of the first one. The major problem officially about nine million, inhabiting in Central Asia came from the fact that every web site owner created and including today’s Xinjiang Uyghur Autonomous Region named his/her own fonts, and users/visitors had to (hereafter: XUAR, also called Chinese Turkistan) as well download a specific font (or different fonts) for almost as parts of Kazakhstan and urban regions in the Ferghana every single website.