Uyghur Language Processing on the Web

Total Page:16

File Type:pdf, Size:1020Kb

Uyghur Language Processing on the Web Uyghur language processing on the Web Dr. Waris Abdukerim Janbaz , Prof. Imad Saleh Paragraphe Laboratory, University of Paris VIII, France [email protected], [email protected] http://paragraphe.univ-paris8.fr Abstract navigators) and correctly displaying Uyghur characters In this paper, we discuss some important issues related to presented huge difficulties. In spite of the fairly passive web processing of an agglutinative Turkic language – attitude of Government authorities to the development of Uyghur. Especially, we will discuss the advent of Uyghur information technology, many individuals started grassroots efforts on Uyghur Unicode font developing, creating Uyghur websites using the three above Uyghur character displaying, font embedding and mentioned script. ASU, used by the most populous Uyghur character inputting method within Uyghur- segment of XUAR Uyghurs caused special coding support-less environment. We will also introduce a problems given that it uses a non-standard set of Arabic- multiscript conversion application to further use the based glyphs. Unicode standard for Uyghur language processing. 2. Background Keywords: Unicode, Font, Turkic Language, multiscript, For ASU, before 2002, either of the two following transliteration, Arabic-Script Uyghur, Cyrillic-Script methods became very common on web publishing in Uyghur, Latin-Script Uyghur. Uyghur: 1) font downloading; and/or 2) image format. There is no need to explain the inconvenience of the 1. Introduction second method. More interesting but complex problems The Uyghurs are a Turkic-speaking ethnic group, occurred in the case of the first one. The major problem officially about nine million, inhabiting in Central Asia came from the fact that every web site owner created and including today’s Xinjiang Uyghur Autonomous Region named his/her own fonts, and users/visitors had to (hereafter: XUAR, also called Chinese Turkistan) as well download a specific font (or different fonts) for almost as parts of Kazakhstan and urban regions in the Ferghana every single website. No one accepted the font name and valley. The official writing system of the XUAR Uyghurs coding of the other, and no common standard was created. is Arabic-Script Uyghur1 (hereafter: ASU) whereas the Most of the fonts created during this period, either Cyrillic-Script Uyghur2 (hereafter: CSU ) is still in used replaced the ASCII characters or replaced the Unicode by the Uyghurs of the ex-Soviet Union Republics Arabic characters (0x600-0x6FF) with Uyghur characters, (USSR). The newly introduced transliteration3 – Latin- without replacement agreement. Since the number of the Script Uyghur 4 (hereafter: LSU) has become widely Arabic letters in the code rage 0x600-0x6FF is larger accepted among Uyghurs and Uyghurologists is a than the number of ASU letters, people made different commonly used standard for the transliteration for both choices as they replaced some Arabic characters with ASU and CSU. ASU characters. Therefore, multiplication of the font The influence of web publishing started appearing in names and the growth of coding differences (for the same Uyghur society in the last 10 years. Since the existing glyphs) among the fonts became an obstacle to the platforms don’t supply any Uyghur input method nor any development of ASU computer processing and web fonts that including all the glyphs of the ASU alphabet, publishing. A large number of issues regarding non- inputting Uyghur text into interactive web pages (in the standard fonts and their use were addressed in many different ways to the individual computer scientists. Meanwhile, many of these problems were circumvented 1 See annex 2 by using methods unrelated to the Unicode standard. As a 2 See annex 1 result, web site creators eventually expressed their strong 3 Using one writing system to represent words in another is desire to further use the Unicode standard for Uyghur called transliteration. language processing. 4 called Uyghur Kompyutér Yéziqi (UKY) or Uyghur Latin In June 2002, the author developed the first Uyghur Yéziqi (ULY) in Uyghur, meaning “Uyghur Computer Writing” or “Latin-Script Uyghur”. See Unicode font and implemented both system-level and http://www.ukij.org/teshwiq/UKY_Heqqide(KonaYeziq).htm browser-level Input Method Editors for Windows. It became a revolutionary accomplishment, owing mostly The creation of a Unicode based Uyghur font has became to the new method and applications that are fully a necessity for the progress of Uyghur information Unicode-compliant (as opposed to occasionally processing since the existing platforms do not include compatible). Hence, a campaign was launched to (supply) any Uyghur font. Existing fonts (both Arabic popularize and adapt the Unicode standard for Uyghur fonts and other fonts which include Arabic letters) do not fonts. In this paper, we present the entire process that we include all the necessary shapes of Uyghur letters (see have been following and developing for three years. The annex 2), and therefore some substitution sequences following subsections will cover four major parts of the mislead display problems. For example: 1. ﺋﺎﻟەﻣﺪىﻜﻰ هەﻣﻤە ﺋىﻨﺴﺎن ﻗەﺑىﻪ ﺋەﻣەس .entire implementation procedure 2. ﺋﺎﻟﻪﻣﺪﯨﻜﻰ ھﻪﻣﻤﻪ ﺋﯩﻨﺴﺎﻥ ﻗﻪﺑﯩﻬ ﺋﻪﻣﻪﺱ 3. Uyghur Unicode font developing (Not all human beings in the world are evil) Uyghur (ASU) letters have been developed on the basis The first sentence above is considered illegal character of the Arabic alphabet from Arabic. The ASU alphabet combination if it uses existing fonts (ex: Times New has 8 vowels5 and 24 consonants (see annex1). Uyghur, Roman, Traditional Arabic) because the cursive shapes of are not correct according to the ASU alphabet ﺋﻪ ,ھ ,ﻯ just like Arabic, is written from right to left, each letter having different shapes depending on its position in a (see annex 2). It should appear as in sentence 2 in which word. The Uyghur letters have initial, median, final and the letters use a specific font — UKIJ Tuz Tom. In order isolated forms; some letters have conjunct forms6. In total, to create right cursive connection forms for Uyghur, it the Uyghur alphabet has 126 different glyphs. The 108 was necessary to take special measures for three basic glyphs7 of the Uyghur letters have already been ”ﺌ , ﺉ and two “glottal stop signs ﺋﻪ ,ھ , ﻯproblem-letters accepted by the Unicode Consortium/ISO, and 18 glyphs8 out of the 20 glyphs for composed forms were added in (supported hamze), during the creation of Uyghur fonts. 1998. Unfortunately, two conjunct median forms (of the The absence of such measures would make it impossible are still absent11 in to display the cursive forms of the three letters correctly 10ﺌﯩ and 9ﺌﯧ (ﺋﻰ and ﺋﯥ Uyghur letters in browsers and other application software. the Unicode Standard’s table 12 – Arabic Presentation door). The 8 ,ﺋﯩﺸﯩﻚ) Uyghur letter i as in ishik : 13 ﻯ forms-A. This lack renders the Unicode Consortium/ISO as it stands incomplete and this has forced people to different forms are listed in the table 1 below. For the of this letter we use the (ﯨ , ﯩ) supplement it through borrowing from FBD1 and FBD2 initial′ and median′ forms for ;0649 ﻯ the “supported hamze” which is then combined with the initial and median forms of the Arabic letter we use the final and (ﻯ , ﻰ) to generate two synthetic the final′ and isolated′ forms ﺋﻰ and ﺋﯥ median′ form of .06CC, respectively ﻯ combined letters. isolated forms of the Farsi letter The 20 conjunct glyphs can also be expressed as a in the ,ﺋﻪﻳﻨﻪﻛﻠﻪﺭﺩە) Uyghur letter e as in eyneklerde :14ﺋﻪ sequence of two existing Unicode glyphs (as it is the case , ﻩ)now for the two missing conjunct glyphs). But this kind mirrors). This letter uses the final and isolated glyph s h), in the same way as)0647 ھ of the Arabic letter (ﻪ of usage may cause problems like reducing text inputting speed, increasing data storage redundancy, complicating Persian does. This causes a special problem due to the h) in the initial)0647 15ھ data sorting operations etc. fact that the glyphs of Arabic correspond to those of Uyghur (ھ , ﻬ)and median positions 5 gunah, sin or ﮔﯘﻧﺎھ ;hélihem, even now ھﯧﻠﯩﻬﻪﻡ h as in) ھ The Arabic alphabet only has 3 letters and for long vowels The others are not noted in normal writing. Given its .ﺍ ﻭ ﻱ uses qebih, odious), which, in turn, has different ﻗﻪﺑﯩﻬ ;offense ﺋﺎ، ﺋﻪ، :phonetic characteristics, Uyghur notes down all vowels In order to deal with this .(ھ , ﻬ)final and isolated glyphs using derivates of traditional Arabic , ﺋﻮ، ﺋﯘ، ﺋﯚ، ﺋﯜ، ﺋﯥ، ﺋﻰ letters. inconsistency, we have chosen to use 06D5 for the .ھ and 06BE for the Uyghur letter ﺋﻪ The initial form and, under some circumstances, the median Uyghur letter 6 .iso.′ fin.′ med.′ ini.′ iso. fin. med. ini ”ﺌ or ﺉ form of all vowels is preceded by one “glottal stop sign ﯪ ﯫ ﺎ ﺍ supported hamze) with which they form a common letter) followed ﻝ .(treated by Uyghur as a single letter, see annex 2) ﯬ ﯭ ﻪ ﻩ .depending on their position ﻻ or ﻼ forms ﺍ by ﯮ ﯯ ﻮ ﻭ 7 See http://www.oyghan.com/images/UyghurUnicodeTable.gif ﯰ ﯱ ﯘ ﯗ – See Arabic Presentation Forms-A, glyph code range: FBEA 8 ﯲ ﯳ ﯚ ﯙ .FBFB. See also table 1 9 Character name for the Unicode Standard: ARABIC ﯴ ﯵ ﯜ ﯛ LIGATURE YEH WITH HAMZA ABOVE WITH E .(Baghériq) ﺑﺎﻏﺌﯧﺮﯨﻖ :MEDIAN FORM. Ex 10 Character name for the Unicode Standard: ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA 13 Character name for the Unicode Standard: ARABIC ABOVE WITH ALEF MAKSURA MEDIAN FORM. Ex: LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA certainly, doubtlessly) (represents YEH-shaped letter with no dots in any positional) ﻗﻪﺗﺌﯩﻲ 11 The XUAR’s delegation members, Prof. Hoshur Islam and form), 0649. Yasin Imin, who have submitted the proposition also admit this 14 Character name for the Unicode Standard:ARABIC LETTER .(ە fault.
Recommended publications
  • Similarities and Dissimilarities of English and Arabic Alphabets in Phonetic and Phonology: a Comparative Study
    Similarities and dissimilarities of English and Arabic 94 Similarities and dissimilarities of English and Arabic Alphabets in Phonetic and Phonology: A Comparative Study MD YEAQUB Research Scholar Aligarh Muslim University, India Email: [email protected] Abstract: This paper will focus on a comparative study about similarities and dissimilarities of the pronunciation between the syllables of English and Arabic with the help of phonetic and phonological tools i.e. manner of articulation, point of articulation and their distribution at different positions in English and Arabic Alphabets. A phonetic and phonological analysis of the alphabets of English and Arabic can be useful in overcoming the hindrances for those want to improve the pronunciation of both English and Arabic languages. We all know that Arabic is a Semitic language from the Afro-Asiatic Language Family. On the other hand, English is a West Germanic language from the Indo- European Language Family. Both languages show many linguistic differences at all levels of linguistic analysis, i.e. phonology, morphology, syntax, semantics, etc. For this we will take into consideration, the segmental features only, i.e. the consonant and vowel system of the two languages. So, this is better and larger to bring about pedagogical changes that can go a long way in improving pronunciation and ensuring the occurrence of desirable learners’ outcomes. Keywords: Arabic Alphabets, English Alphabets, Pronunciations, Phonetics, Phonology, manner of articulation, point of articulation. Introduction: We all know that sounds are generally divided into two i.e. consonants and vowels. A consonant is a speech sound, which obstruct the flow of air through the vocal tract.
    [Show full text]
  • Arabic Sociolinguistics: Topics in Diglossia, Gender, Identity, And
    Arabic Sociolinguistics Arabic Sociolinguistics Reem Bassiouney Edinburgh University Press © Reem Bassiouney, 2009 Edinburgh University Press Ltd 22 George Square, Edinburgh Typeset in ll/13pt Ehrhardt by Servis Filmsetting Ltd, Stockport, Cheshire, and printed and bound in Great Britain by CPI Antony Rowe, Chippenham and East bourne A CIP record for this book is available from the British Library ISBN 978 0 7486 2373 0 (hardback) ISBN 978 0 7486 2374 7 (paperback) The right ofReem Bassiouney to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. Contents Acknowledgements viii List of charts, maps and tables x List of abbreviations xii Conventions used in this book xiv Introduction 1 1. Diglossia and dialect groups in the Arab world 9 1.1 Diglossia 10 1.1.1 Anoverviewofthestudyofdiglossia 10 1.1.2 Theories that explain diglossia in terms oflevels 14 1.1.3 The idea ofEducated Spoken Arabic 16 1.2 Dialects/varieties in the Arab world 18 1.2. 1 The concept ofprestige as different from that ofstandard 18 1.2.2 Groups ofdialects in the Arab world 19 1.3 Conclusion 26 2. Code-switching 28 2.1 Introduction 29 2.2 Problem of terminology: code-switching and code-mixing 30 2.3 Code-switching and diglossia 31 2.4 The study of constraints on code-switching in relation to the Arab world 31 2.4. 1 Structural constraints on classic code-switching 31 2.4.2 Structural constraints on diglossic switching 42 2.5 Motivations for code-switching 59 2.
    [Show full text]
  • Arabic Letters Joined Up
    Arabic Letters Joined Up Sick and idiomorphic Izak ratchet: which Kelwin is patriotic enough? Which Wayne pretermitting so plaguily that Robert taboo her Saxons? When Brooks mismate his dawdler feezes not spitefully enough, is Evan puir? The efficacy of up arabic letters into groups Cookies: This site uses cookies. Some time in the standard types and no as necessary are minimal shapes usually reduce the right angles and variation in books. How to having the Arabic letters. This letter joining letters join through the arab world saw its limitations under the islam. If the word is pronounced the way it is written, similar to Latin type forms and proportions, signaling short vowels and tanwin. Word join up arabic letter joining hamza is when word formation of arab countries tend to submit some capacitors bent on arabic! The arabic lettering crafts developed. Community is currently only viewable. This book excluding covers, either printer or usual ligatures are you how arabic fonts can i have become the joined letters of the post. This arabic letters join up languages, joined up in most writing system placed in. For arabic letter forms are indicated by the arab people of up the word? Therefore, Diwani, time again not on shell side. In the Western spelling of Arabic names it is usually left out. This paper describes the ways in evaluate a operate of letter pairs written in Arabic can be constructed, which makes it look a two Ain shapes are repeated. This means that smile means being on one left, Jordan and Oman based on several letterforms studied.
    [Show full text]
  • Proposal for Arabic Script Root Zone LGR
    Proposal for Arabic Script Root Zone LGR LGR Version: 1 Authors: Task Force on Arabic Script IDN (TF-AIDN) https://community.icann.org/display/MES/TF-AIDN+Work+Space Date: 18 November 2015 Document Version: 3.4 Contents General Information ..................................................................................................................................... 2 1 Script and Languages Covered .............................................................................................................. 2 2 Process Undertaken for Developing the Proposal ................................................................................ 4 2.1 Team diversity and process .......................................................................................................... 4 2.2 Analysis of code point repertoire.................................................................................................. 6 2.3 Analysis of code point variants ..................................................................................................... 8 3 Code Point Repertoire......................................................................................................................... 10 3.1 Summary of code point repertoire included and excluded ........................................................ 10 3.2 Code point repertoire included .................................................................................................. 12 4 Final Recommendation of Variants for Top Level Domains (TLDs) ....................................................
    [Show full text]
  • A Handbook of Modern Uyghur
    ﺗﻪﻛﻠﯩﻤﺎﻛﺎﻧﺪﯨﻦ ﺳﺎﻻﻡ ھﺎﺯﯨﺮﻗﻰ ﺯﺍﻣﺎﻥ ﺋﯘﻳﻐﯘﺭ ﺗﯩﻠﻰ ﻗﻮﻟﻼﳕﯩﺴﻰ Greetings from the Teklimakan: a handbook of Modern Uyghur Volume 1 Tarjei Engesæth, Mahire Yakup, and Arienne Dwyer Teklimakandin Salam / A Handbook of Modern Uyghur (Vol. 1) Version 1.0 ©2009 by Tarjei Engesæth, Mahire Yakup and Arienne Dwyer University of Kansas Scholarworks Some rights reserved The authors grant the University of Kansas a limited, non-exclusive license to disseminate the textbook and audio through the KU ScholarWorks repository and to migrate these items for preservation purposes. Bibliographic citation: Engesæth, Tarjei, Mahire Yakup, and Arienne Dwyer. 2009. Teklimakandin Salam: hazirqi zaman Uyghur tili qollanmisi / Greetings from the Teklimakan: a handbook of Modern Uyghur. ISBN 978-1-936153-03-9 (textbook), ISBN 978-1-936153-04-6 (audio) Lawrence: University of Kansas Scholarworks. Online at: http://hdl.handle.net/1808/5624 You are free to share (to download, print, copy, distribute and transmit the work), under the following conditions: Attribution 1. — You must attribute any part or all of the work to the authors, according to the citation above. 2. Noncommercial — You may not use this work for commercial purposes. 3. No Derivative Works — You may not alter, transform, or build upon this work. ii Engesæth, Yakup & Dwyer TABLE OF CONTENTS page i-xiii ﻛﯩﺮﯨﺶ ﺳﯚز Preface What is Uyghur? Why study Uyghur? Why is this a free textbook? How to use this textbook How to enhance your learning experience Contributions of each co-author Acknowledgements Abbreviations used in this textbook References Introduction 1 ﺋﻮﻣﯘﻣﻰ ﭼﯜﺷﻪﻧﭽﻪ Uyghur Grammar 1. General Characteristics 2. Sound system 3.
    [Show full text]
  • Henze, Paul B
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Lancaster E-Prints Ideology and Alphabets in the former USSR Ideology and Alphabets in the former USSR Mark Sebba, Department of Linguistics, Lancaster University Published as: Sebba, Mark. Ideology and Alphabets in the former USSR. Language Problems & Language Planning, Volume 30, Number 2, 2006, pp. 99-125(27) http://www.ingentaconnect.com/content/jbp/lplp/2006/00000030/00000002/art00001 Brief biography Mark Sebba Mark Sebba has worked in the Department of Linguistics and Modern English Language at Lancaster University since 1989. He is currently Reader in Sociolinguistics and Language Contact His interests include language contact, bilingualism, corpus linguistics and orthography. His previous publications include The Syntax of Serial Verbs (John Benjamins, 1987), a study of verb forms in creoles, West African and other languages, London Jamaican (Longman, 1993), on the language of young Caribbeans born in London and Contact Languages: Pidgins and Creoles (Macmillan, 1997). He is working on a book on the sociolinguistics of orthography around the world. 1 Ideology and Alphabets in the former USSR Ideology and Alphabets in the former USSR Abstract In November 2002 the Russian parliament passed a law requiring all official languages within the Russian Federation to use the Cyrillic alphabet. The legislation caused great controversy and anger in some quarters, especially in Tatarstan, the Russian republic whose attempt to Romanise the script for the Tatar language provoked the new law. This paper examines the background to these recent events in the former Soviet Union, showing how they provide a contemporary illustration of the ways that linguistic (specifically: orthographic) issues can interact with ideologies and discourses at the political and social levels.
    [Show full text]
  • Transliteration Rules Arabic
    Arabic Transliteration Rules in Document 9303 Mike ELLIS ISO, Australia V3.4.1 10/2011 Main points: 1. Identification: name in ARABIC script is only reliable basis 2. Countries that use the Arabic script should have the benefits of the MRZ The Arabic name must appear in Latin characters in VIZ 8.3 Languages and characters. When the mandatory elements of Zones I, II and III are in a national language that does not use the Latin alphabet, a transliteration shall also be provided. Status qqpuo: name in VIZ copied to MRZ Manyyp phonetic transcri ptions of same Arabic name Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions Manyyp phonetic transcri ptions and possibly over 9,000 more... MRZ same as VIZ Much variation in (Latin) name IDENTIFICATION: the Arabic name is unique ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ So the solution is to base the MRZ on the Arabic Name. But the MRZ can only contain OCR-B A-Z and < ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ ﻣﺤﻤﻮد ﻋﺒﺪاﻟﺮﺣﻴﻢ 9.4.1 Names in the MRZ are represented differently from those in the VIZ. National characters must be transliterated using only the allowed OCR character set [A..Z]. Solution: Transliteration of Arabic name into Latin Transliteration table based on closest match + ‘escape’ Arabic letter Name MRZ Unicode hamza XE ء 0621 alef with madda above XAA ﺁ 0622 alef with hamza above XAE أ 0623 waw with hamza above XW ؤ 0624 alef with hamza below I إ 0625 yeh with hamza above XI ئ 0626 alef A ا 0627 beh B ب Technical 0628 [teh marbuta XTA/XAH[1 ة 0629 Report – teh T ت 062A theh XTH ث Appendix 1 062B jeem J ج 062C hah XH ح 062D khah XKH خ 062E dal D د 062F thal XDH ذ 0630 reh R ر 0631 [1] XTA is used generally except if teh marbuta occurs at the end of the name component, in which case XAH is used.
    [Show full text]
  • Arabic Range: 0600–06FF
    Arabic Range: 0600–06FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Automatic Arabic Dialect Identification Systems for Written Texts: a Survey
    Automatic Arabic Dialect Identification Systems for Written Texts: A Survey A Preprint Maha J. Althobaiti Department of Computer Science Taif University Taif, Saudi Arabia [email protected] September 22, 2020 Abstract Arabic dialect identification is a specific task of natural language processing, aimingto automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing applications such as machine translation, multilingual text-to-speech synthesis, and cross-language text generation. Therefore, in the last decade, interest has increased in addressing the problem of Arabic dialect identification. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We first define the problem and its challenges. Then, the survey extensively discusses in a critical manner many aspects related to Arabic dialect identification task. So, we review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification. We also detail the features and techniques for feature representations used to train the proposed systems. Moreover, we illustrate the taxonomy of Arabic dialects studied in the literature, the various levels of text processing at which Arabic dialect identification are conducted (e.g., token, sentence, and document level), as well as the available annotated resources, including evaluation benchmark corpora. Open challenges and issues are discussed
    [Show full text]
  • An Uyghur-English Dictionary
    A Henry G. Schwarz Western Washington F E D c B A L 1 15 a a b 1 b b 4 4 4 4 2 P P V V V V 3 44 44 44 4 4 t t 4 d d 5 •• a 6 • • ) ) e 7 £ 9 q V V 8 4 4 X h e C 9 h h Jb 10 if if o O > 11 V vif 6 e > Jj 12 s s 13 A A A § X Mi 14 r r > J J 15 4 4 4 z z > J J 16 A = Isolated form B = Initial Position C = Medial Position D = Final Position E = Latin Script F = Transliteration used in this dictionary i^'JS F E D C B A A >A J JA 17 A A w w 3-A 3 3 18 u u > y 19 u u > 20 f f SL♦ 3♦ ci 21 ♦♦ d q h j- A 3♦♦ 22 k k iSL 51 S' c£J 23 n ng JL cil 24 J e e iS 25 ♦ ♦ ♦ ♦ • • 1 1 0^ J 26 y y J J 27 ♦♦ ♦♦ ♦♦ ♦ ♦ jT g g J. _r 28 ♦ ♦ ♦ ♦ g Oi 2- X c 29 1 1 J- 1 j J 30 m m r ja f 31 n n (> ♦ 3♦ u 32 A = Isolated form B = Initial Position C = Medial Position D = Final Position E = Latin Script F = Transliteration used in this dictionary Schwarz / An Uyghur-English Dictionary Special Collections Wilson Librai, APR 1 2 2002 Center for East Asian Studies Western Washington University East Asian Research Aids & Translations, Wume 3 An Uyghur-English Dictionary, by Henry G.
    [Show full text]
  • Turkmen Language Manual. INSTITUTION Peace Corps, Washington, D.C
    DOCUMENT RESUME ED 362 052 FL 021 519 AUTHOR Tyson, David; Clark, Larry TITLE Turkmen Language Manual. INSTITUTION Peace Corps, Washington, D.C. PUB DATE Jun 93 NOTE 185p. PUB TYPE Guides Classroom Use Instructional Materials (For Learner) (051) EDRS PRICE MF01/PC08 Plus Postage. DESCRIPTORS Adult Educotion; *Cultural Awareness; *Daily Living Skills; Food; Foreign Countries; *Interpersonal Communication; Job Skills; Public Agencies; Second Language Instruction; *Second Languages; Self Expression; Telephone Usage Instruction; Transportation; Uncommonly Taught Languages; Volunteer Training IDENTIFIERS *Turkmen ABSTRACT The manual of standard Turkmen language is designed to teach basic language skills that Peace Corps volunteers need during a tour in Turkmenistan. An introductory section gives information about the Turkmen language, including a brief history, notes on the alphabet, vowel and consonant sounds, rules of vowel harmony, and specific grammatical forms (nominal and verbal words, affixes, articles, personal pronouns, postpositions, relative clauses, complex sentences, word order). Lessons are organized by topic: personal identification and greetings; conversations with hosts; food and food etiquette; transportation; getting, giving, and clarifying directions; shopping; communication systems; medical issues; communication in social situations; and workplace communication. Each lesson includes some or all of these elements: cultural notes on the topic in question, a list of intended competeacies, a brief dialogue in Turkmen, a vocabulary list, and grammar and vocabulary notes. Appended materials consist of the dialogues in English, calendar-related vocabulary, numbers, terms of relationship, forms of address, anatomy and health, school terminology, notes on verb conjugation, and a glossary of the words contained in the dialogues. (MSE) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document.
    [Show full text]
  • Uyghur Script in ISO/IEC 10646
    ISO/IEC JTC1/SC2/WG2N 2013-3-27 Proposal to Encode the Uyghur Script in ISO/IEC 10646 Omarjan Osman Nagaoka University of Technology Kamitomioka 1603-1, Nagaoka Shi, Niigata 940-2188, Japan. Global Information Infrastructure Laboratory. [email protected]. March 27, 2013 Abstract Uyghur script, which is based on Aramaic alphabet is composed of phonetic characters. The Uyghur writing system that had been used in the Turkistan area in Central Asia since around the eighth century until the end of the nineteenth century is completely forgotten in modern days.Up to now, it has not become the object of information processing.However, this writing system is a direct ancestor of many writing systems of the East Asian cultures like Mongolian and Manchurian. Moreover, it has been used as a medium to record a lot of historical documents that have high cultural value.Authors want to contribute to the creation of technology as the basis for the preservation and utilization of the historical Uyghur documents by establishing a character code for the writing system.This article introduces the results of our study on this issue and proposes a Uyghur character code design to- gether with a glyph table design, and some background ideas behind these designs. The Uyghur character codes are not yet included in International standard ISO / IEC 10646 and Unicode. In this proposal, the authors propose a design of Uyghur character code and glyph table. Keyword: Uyghur character, character code, ISO/IEC 10646, glyph and font 1 Introduction This is a proposal to encode the Uyghur script in Roadmap to the Supplementary Multilingual Plane (Plane 1, [7]) of the Universal Character Set (ISO/IEC 10646).
    [Show full text]