The Mongolian Script

&'()'* !"#$% The Mongolian script: What’s going on?! $%&"' !"# 梁海 · Liang Hai · लांग हाइ · [email protected] 11 September 2018, IUC #42 * Get the latest revision from ↗ lianghai.github.io/mongolian * Note The views expressed by the speaker in this talk are his own and are NOT meant to reflect those of the Unicode Consortium or the Unicode Technical Committee. Agenda I. A crash course on the script II. The Unicode Mongolian, an encoding model from hell III. What exactly are not working? IV. Tough lessons learned V. Ongoing efforts, and how to participate · Part I · A crash course on the script What is the Mongolian script? Writing systems and features. I. Crash course: Origin Aramaic Sogdian Old Uyghur Mongolian, initially “Uyghur Mongolian” …………………………… early 13th century ..………………………………………………………………………… I. Crash course: Writing systems & languages Mongolian/Hudum (Mongolian) ……….. early 13th ……….. late 16th . early 17th . mid-17th … mid-18th …… mid-20th .. Mongolian/Hudum Ali Gali (Sanskrit–Tibetan) Manchu (Manchu) Manchu Ali Gali (Sanskrit–Tibetan) Sibe (Sibe Manchu) Todo (Oirat–Kalmyk Mongolian & Sanskrit–Tibetan) I. Crash course: Writing systems & languages [cont.] moŋol ()*+), moŋol mnju !-.-/ s [→] Hudum, Manchu, Sibe, and Todo, in some typical styles: moŋgol hudum | … manju | … sibe | … todo ] . cont [ 0.1 BC:DE 9:;< 0.2 0F2 0;2 032 0G2 0=2 ende ada ata | 45.6 HIDJ >5;? ordo urtu urdu ordo urtu Writing systems & languages systems & Writing 4536 HIKJ >5=? 75.8 7IF8 @5;A Hudum, Manchu–Sibe, and Todo, normalized to the same style: normalized Todo, Hudum, Manchu–Sibe, and ] ↓ [ I. Crash course: ] . cont [ 0.1 BC:DE 9:;< 0.2 0F2 0;2 , normalized to the same style: , normalized 032 0G2 0=2 ende ada ata | Todo 45.6 HIDJ >5;? , and ordo urtu urdu ordo urtu Writing systems & languages systems & Writing 4536 HIKJ >5=? 75.8 7IF8 @5;A Manchu–Sibe Hudum, ] ↓ [ I. Crash course: I. Crash course: Writing systems & languages [cont.] Writing systems: Served languages: • Hudum and Hudum Ali Gali • Mongolian, incl. Oirat–Kalmyk • Manchu–Sibe and Manchu Ali Gali • Manchu–Sibe • Todo • Sanskrit–Tibetan Also, note some marginal caes: Manchu–Sibe for Daur, Hudum for Evenki, and Vagindra for Buryat Mongolian. I. Crash course: General features Inherited fom Aramaic ~ Sogdian: + , • Cursive - , 567$894 ./0 • Largely dual-joining. , 0 , ﻳﻮﻧﻴﻜﻮد cf. Arabic 1/1 • • Bowed consonants ﻛـﻜـﻚ ك ـﻮ و ـﺪ د 2 , 3 , 4 ] . cont [ @A,BA,BC @C !:,;:,;< !< =0,70,71 =1 =>,7>,7? =? General features General om Aramaic ~ Sogdian: Aramaic om f Cursive Cursive consonants Bowed • • Inherited Inherited I. Crash course: ] . *D HF *D I *D EF *D ? cont [ GHF =1 GEF =? @A,BA,BC @C !:,;:,;< !< =0,70,71 =1 =>,7>,7? =? General features General om Aramaic ~ Sogdian: Aramaic om f Cursive Cursive consonants Bowed • • Inherited Inherited I. Crash course: MNF MO JKF JL ] . *D HF *D I *D EF *D ? cont [ GHF =1 GEF =? @A,BA,BC @C !:,;:,;< !< =0,70,71 =1 =>,7>,7? =? as a as a and General features General tail f is not a is not a tooth le + tail tail f tooth bare le . ] A A ] om Aramaic ~ Sogdian: Aramaic om f fun fact fun positional allographs positional allographs tail right usages), while usages), of allograph contextual whole is a [ Ali Gali in (unless grapheme • Cursive Cursive consonants Bowed • • Inherited Inherited I. Crash course: I. Crash course: General features [cont.] Inherited fom Sogdian ~ Old Uyghur: • Vertical writing ↓·→ • Originated from ←·↓ being rotated 90° counterclockwise. • ↓·→-in-narrow-column or →·↓ as fallback in horizontal writing. • True alphabet • (+ dual-joining =) Consonant leters seldom appear on their own, but are usually writen in internally joined syllables. I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: => R> , 7> , S> • Syllable onset placeholder , , (aleph) 7T , Q R? =? ﻧﯩـﻨﯩـﻨﻰ ﻧﻰ ﺋـ cf. Uyghur • ﺋﯩـﺌﯩـ ﺋﻰ $= (…Syllable coda forms (n, g, d • R$ , 7$ /U , • Vowel harmony class–specific S$ , , 7W , ﯨـ ى V/ consonants RW • Phonetic letters/syllables =W I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: => R , > 7> • Syllable onset placeholder , (aleph) S , > , 7T , Q R ﺋـ cf. Uyghur • =? ? ﻧﯩـﻨﯩـﻨﻰ ﻧﻰ • [fun fact] The crown and the tooth are ﺋﯩـﺌﯩـ ﺋ ﻰ $= positional allographs to each other. R , $ 7$ /U , Syllable coda forms S • , (n, g, d…) $ , 7W , ﯨـ ى V/ R • Vowel harmony class–specific =W consonants W • Phonetic letters/syllables I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: = R> > , 7> • Syllable onset placeholder , (aleph) S> , , 7T , Q R? ﺋـ cf. Uyghur • = ﻧﯩـﻨﯩـﻨﻰ ﻧ ﻰ ? • [fun fact] The crown and the tooth are ﺋﯩـﺌﯩـ ﺋﻰ = positional allographs to each other. R$ $ , 7$ /U , Syllable coda forms S$ • (n, g, d…) , , 7W , ﯨـ ى / V RW • Vowel harmony class–specific = consonants W • Phonetic letters/syllables I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: = R> > , 7> • Syllable onset placeholder , (aleph) S> , , 7T , Q R? ﺋـ cf. Uyghur • = ﻧﯩـﻨﯩـﻨﻰ ﻧ ﻰ ? • [fun fact] The crown and the tooth are ﺋﯩـﺌﯩـ ﺋﻰ = positional allographs to each other. R$ $ , 7$ /U , Syllable coda forms S$ • (n, g, d…) , , 7W , ﯨـ ى / V RW • Vowel harmony class–specific = consonants W • Phonetic letters/syllables ] . cont X>Y, F3>Z [ [>\, F)>] =>S, F7>^ ) aleph ( …) d General features General , g , n ( –specific Phonetic letters/syllables Phonetic Vowel harmony class harmony Vowel consonants Syllable onset placeholder placeholder onset Syllable coda forms Syllable • • • • Invented during Old Uyghur ~ Mongolian: ~ Uyghur Old during Invented I. Crash course: ] . cont X>Y, F3>Z [ [>\, F)>] =>S, F7>^ ) aleph ( …) d General features General , g , n ( –specific Phonetic letters/syllables Phonetic Vowel harmony class harmony Vowel consonants Syllable onset placeholder placeholder onset Syllable coda forms Syllable • • • • Invented during Old Uyghur ~ Mongolian: ~ Uyghur Old during Invented I. Crash course: ] . cont X>Y, F3>Z [ [>\, F)>] F_` =>S, F7>^ Fa` ) aleph ( …) d General features General , g , n ( –specific Phonetic letters/syllables Phonetic Vowel harmony class harmony Vowel consonants Syllable onset placeholder placeholder onset Syllable coda forms Syllable • • • • Invented during Old Uyghur ~ Mongolian: ~ Uyghur Old during Invented I. Crash course: I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: =? =T • Syllable onset placeholder (aleph) • Syllable coda forms (n, g, d…) !< !b • Vowel harmony class–specific f? cd consonants ed [? • Complementary distribution of two gutural series: gimel and kaph. • Phonetic letters/syllables I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: =? =T • Syllable onset placeholder (aleph) • Syllable coda forms (n, g, d…) !< !b • Vowel harmony class–specific f c d consonants ? e [ • Complementary distribution of d ? two gutural series: gimel and kaph. • Phonetic letters/syllables I. Crash course: General features [cont.] Invented during Old Uyghur ~ Mongolian: Graphs … Letters … Phones • Syllable onset placeholder (aleph) Spanih G L … P • Syllable coda forms (n, g, d…) Englih G L … … … P • Vowel harmony class–specific consonants Arabic G … L … … … P • Phonetic letters/syllables Tibetan G … … L … … P • Reanalyzed leters on the basis of Mongolian G … … … L* … … … P phonemes instead of graphemes. g h, =i =6,7j,7k V n l m, =n =6,7j,7k ü V_ ö n V_ o p, =q =0,70,71 u n o V n r s, =t =0,70,71 i e V_ a u v, =W =$,7$,7W V → w x, =T =y,7y,7T ↓ z {, =? =y,7y,7T g h, =i =6,7j,7k V n l m, =n =6,7j,7k ü V_ ö n V_ o p, =q =0,70,71 u n o V n r s, =t =0,70,71 i e V_ a u v, =W =$,7$,7W V → w x, =T =y,7y,7T ↓ z {, =? =y,7y,7T I. Crash course: Writing system analyses • Hudum and Todo are analyzable as either phonetic sylables or phonetic leters. • Hudum is largely unpredictable (one-to-multi, involving grammatical/ lexical information) and highly confuable (multi-to-one). • Todo is highly predictable and minimally confusable. • Manchu–Sibe is analyzable as phonetic sylables. • Also highly predictable and minimally confusable. • Can be very weird if must be analyzed as phonetic leters. 0.1 BC:DE 9:;< ende 0.2 0F2 0;2 ada 032 0G2 0=2 ata | 45.6 HIDJ >5;? urdu 4536 HIKJ >5=? Todo: Hudum, Manchu–Sibe, and ] urtu ↓ [ 75.8 7IF8 @5;A ordo de 0.1 BC:DE 9:;< : en Todo 0.2 0F2 0;2 da a , and ta 032 0G2 0=2 a | du Manchu–Sibe 45.6 HIDJ >5;? , ur tu Hudum 4536 HIKJ >5=? ] ur ↓ [ 75.8 7IF8 @5;A do or I. Crash course: Hudum-specific features • Disjointed tail (ćaćulga, detached a/e) • First-vowel forms (o, u, ö, ü) • Controversial diphthongs • Complex scopes • One scope per word-stem (note compound words) • Word-stem boundaries affect syllable boundaries • Suffixes (including enclitics, which is disconnected) extend scopes • Purely lexical variants I. Crash course: Hudum-specific features • Disjointed tail (ćaćulga, detached a/e) • First-vowel forms (o, u, ö, ü) • Controversial diphthongs • Complex scopes • One scope per word-stem (note compound words) • Word-stem boundaries affect syllable boundaries • Suffixes (including enclitics, which is disconnected) extend scopes • Purely lexical variants F F a` F 7? ^ _n _na _n[a] F F a F 7 ^ ? ` _n _na _n[a] I. Crash course: Hudum-specific features [cont.] • Disjointed tail (ćaćulga, detached a/e) • First-vowel forms (o, u, ö, ü) • Controversial diphthongs • Complex scopes • One scope per word-stem (note compound words) • Word-stem boundaries affect syllable boundaries • Suffixes (including enclitics, which is disconnected) extend scopes • Purely lexical variants /0, /1 R6,S6, Ri =6,76,7j,7k =i /0, /1 ü u R0,S0, Rq i =0,70,70,71 =q /U, /V R$,S$, RW =$,7$,7$,7W =W I. Crash course: Hudum-specific features [cont.] • Disjointed tail (ćaćulga, detached a/e) • First-vowel forms (o, u, ö, ü) • Controversial diphthongs • Complex scopes • One scope per word-stem (note compound words) • Word-stem boundaries affect syllable boundaries • Suffixes (including enclitics, which is disconnected) extend scopes • Purely lexical variants | | = >}$ >$$ >$ ~>^ •? ^ naima sain/sayin/sayn sayihan I.

The Mongolian Script

Section 14.4, Phags-Pa

Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names

Writing Systems: Their Properties and Implications for Reading

A New Database for Online Handwritten Mongolian Word Recognition

Unicode Alphabets for L ATEX

Orthographic Transparency and the Ottoman Abjad Maithili Jais

1 Introduction 6

Korean Hankul and the Þp'ags-Pa Script

(RSEP) Request October 16, 2017 Registry Operator INFIBEAM INCORPORATION LIMITED 9Th Floor

Classifying Arabic Fonts Based on Design Characteristics: PANOSE-A

Mongolian-Chinese Dictionary: Online Version"

Scripts and Politics in the Ussr