Draft Additional Repertoire for ISO/IEC 10646:2016 (5Th Edition)

Total Page:16

File Type:pdf, Size:1020Kb

Draft Additional Repertoire for ISO/IEC 10646:2016 (5Th Edition) Title: Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) DIS Date: 2016-10-26 L2/15-170 WG2 N4728 Source: Michel Suignard, project editor Status: Project Editor's summary of the character repertoire addition as included in the DIS ballot Action: For review by WG2 and UTC experts Distribution: WG2 and UTC Replaces: Status This document presents a summary of all characters that constitute the tentative new repertoire for ISO/IEC 10646 5th edition, with code positions, representative glyphs and character names. Manner of Presentation The character names and code points shown are the same for Unicode and ISO/IEC 10646, including annotations. Note to Reviewers UTC/WG2 Reviewers, please use this document as a description of the result from the disposition of ballot comments for the CD.2. Contents This document lists 8447 characters. The following list shows all 26 blocks (existing or new) to which characters are proposed to be added, or which have been affected by other changes documented here. 0860-086F Syriac Supplement see document L2/15-088 0980-09FF Bengali See document: L2/15-161 L2/15-172 0A80-0AFF Gujarati see document L2/15-103 0D00-0D7F Malayalam See document L2/14-015R L2/14-292 L2/15-045 1CD0-1CFF Vedic Extensions See document: L2/15-160 1DC0-1DFF Combining Diacritical Marks Supplement See document L2/14-285 L2/15-173R 20A0-20CF Currency Symbols See document L2/15-299 2300-23FF Miscellaneous Technical See document L2/15-031 2B00-2BFF Miscellaneous Symbols and Arrows See document L2/15-083 2E00-2E7F Supplemental Punctuation See document L2/15-173 3100-312F Bopomofo See document WG2 N4695 4E00-9FEA CJK Unified Ideographs See document L2/14-228 IRG N2110 Res 45.5 10300-1032F Old Italic See document WG2 N4395 N4669 L2/12-386 L2/15-169 11400-1147F Newa See document L2/14-285 11A00-11A4F Zanabazar Square See document WG2 N4541 L2/14-028 11A50-11AAF Soyombo See document WG2 N4655 L2/15-004 11D00-11D5F Masaram Gondi see document L2/15-090 16FE0-16FFF Ideographic Symbols and Punctuation See document N4525 N4522 1B170-1B2FF Nushu See document WG2 N4472 N4693 N4697 L2/13-160 1E900-1E95F Adlam See document WG2 N4628 L2/14-219 1F100-1F1FF Enclosed Alphanumeric Supplement See document WG2 N4671 1F200-1F2FF Enclosed Ideographic Supplement see document L2/14-278R 1F300-1F5FF Miscellaneous Symbols and Pictographs see document L2/15-054R4 1F680-1F6FF Transport and Map Symbols see document L2/15-054R4 L2/15-195R 1F900-1F9FF Supplemental Symbols and Pictographs see document L2/15-054R4 L2/15-195R 2CBE0-2EBEF CJK Unified Ideographs Extension F See document WG2 N4580 0860 Syriac Supplement 086F 086 Syriac letters Used for writing Suriyani Malayalam, which is also known as Garshuni (Karshoni) and Syriac Malayalam 0 0860 SYRIAC LETTER MALAYALAM NGA 0860 0861 SYRIAC LETTER MALAYALAM JA 0862 SYRIAC LETTER MALAYALAM NYA 1 0863 SYRIAC LETTER MALAYALAM TTA 0864 SYRIAC LETTER MALAYALAM NNA 0861 0865 SYRIAC LETTER MALAYALAM NNNA 0866 SYRIAC LETTER MALAYALAM BHA 2 0867 SYRIAC LETTER MALAYALAM RA 0868 SYRIAC LETTER MALAYALAM LLA 0862 0869 SYRIAC LETTER MALAYALAM LLLA 086A SYRIAC LETTER MALAYALAM SSA 3 0863 4 0864 5 0865 6 0866 7 0867 8 0868 9 0869 A 086A B C D E F Printed: 26-May-2016 3 0980 Bengali 09FF 098 099 09A 09B 09C 09D 09E 09F 0 ঀ ঐ ঠ র $ী ৠ ৰ 0980 0990 09A0 09B0 09C0 09E0 09F0 1 $ঁ ড $ু ৡ ৱ 0981 09A1 09C1 09E1 09F1 2 $ং ঢ ল $ূ $ৢ ৲ 0982 09A2 09B2 09C2 09E2 09F2 3 $ঃ ও ণ $ৃ $ৣ ৳ 0983 0993 09A3 09C3 09E3 09F3 4 ঔ ত $ৄ ৴ 0994 09A4 09C4 09F4 5 অ ক থ ৵ 0985 0995 09A5 09F5 6 আ খ দ শ ০ ৶ 0986 0996 09A6 09B6 09E6 09F6 7 ই গ ধ ষ $ে $ৗ ১ ৷ 0987 0997 09A7 09B7 09C7 09D7 09E7 09F7 8 ঈ ঘ ন স $ৈ ২ ৸ 0988 0998 09A8 09B8 09C8 09E8 09F8 9 উ ঙ হ ৩ ৹ 0989 0999 09B9 09E9 09F9 A ঊ চ প ৪ ৺ 098A 099A 09AA 09EA 09FA B ঋ ছ ফ $ো ৫ ৻ 098B 099B 09AB 09CB 09EB 09FB C ঌ জ ব $় $ৌ ড় ৬ 098C 099C 09AC 09BC 09CC 09DC 09EC 09FC D ঝ ভ ঽ $্ ঢ় ৭ ॰ 099D 09AD 09BD 09CD 09DD 09ED 09FD E ঞ ম $া ৎ ৮ 099E 09AE 09BE 09CE 09EE F এ ট য $ি য় ৯ 098F 099F 09AF 09BF 09DF 09EF 4 Printed: 26-May-2016 0980 Bengali 09E3 The Bengali script is also known as Bangla. In Assam, the 09B3 " <reserved> preferred name of the script is Asamiya or Assamese. The 09B4 " <reserved> Assamese language has also been written historically using 09B5 " <reserved> distinct regional scripts known as Kamrupi. 09B6 শ BENGALI LETTER SHA 09B7 ষ BENGALI LETTER SSA Various signs 09B8 স BENGALI LETTER SA 0980 ঀ BENGALI ANJI 09B9 হ BENGALI LETTER HA = siddham, siddhirastu • used at the beginning of texts as an invocation Various signs 0981 $ঁ BENGALI SIGN CANDRABINDU 09BC $় BENGALI SIGN NUKTA 0982 $ং BENGALI SIGN ANUSVARA • for extending the alphabet to new letters 0983 $ঃ BENGALI SIGN VISARGA 09BD ঽ BENGALI SIGN AVAGRAHA Independent vowels Dependent vowel signs 0985 অ BENGALI LETTER A 09BE $া BENGALI VOWEL SIGN AA 0986 আ BENGALI LETTER AA 09BF $ি BENGALI VOWEL SIGN I 0987 ই BENGALI LETTER I • stands to the left of the consonant 0988 ঈ BENGALI LETTER II 09C0 $ী BENGALI VOWEL SIGN II 0989 উ BENGALI LETTER U 09C1 $ু BENGALI VOWEL SIGN U 098A ঊ BENGALI LETTER UU 09C2 $ূ BENGALI VOWEL SIGN UU 098B ঋ BENGALI LETTER VOCALIC R 09C3 $ৃ BENGALI VOWEL SIGN VOCALIC R 098C ঌ BENGALI LETTER VOCALIC L 09C4 $ৄ BENGALI VOWEL SIGN VOCALIC RR 098D " <reserved> 09C5 " <reserved> 098E " <reserved> 09C6 " <reserved> 098F এ BENGALI LETTER E 09C7 $ে BENGALI VOWEL SIGN E 0990 ঐ BENGALI LETTER AI • stands to the left of the consonant 0991 " <reserved> 09C8 $ৈ BENGALI VOWEL SIGN AI 0992 " <reserved> • stands to the left of the consonant 0993 ও BENGALI LETTER O Two-part dependent vowel signs 0994 ঔ BENGALI LETTER AU These vowel signs have glyph pieces which stand on both sides Consonants of the consonant; they follow the consonant in logical order, 0995 ক BENGALI LETTER KA and should be handled as a unit for most processing. 0996 খ BENGALI LETTER KHA 09CB $ো BENGALI VOWEL SIGN O 0997 গ BENGALI LETTER GA ≡ 09C7 $ে 09BE $া 0998 ঘ BENGALI LETTER GHA 09CC $ৌ BENGALI VOWEL SIGN AU 0999 ঙ BENGALI LETTER NGA ≡ 09C7 $ে 09D7 $ৗ 099A চ BENGALI LETTER CA Virama 099B ছ BENGALI LETTER CHA 09CD $্ BENGALI SIGN VIRAMA 099C জ BENGALI LETTER JA = hasant (Bengali term for halant) 099D ঝ BENGALI LETTER JHA 099E ঞ BENGALI LETTER NYA Additional consonant 099F ট BENGALI LETTER TTA 09CE ৎ BENGALI LETTER KHANDA TA 09A0 ঠ BENGALI LETTER TTHA • a dead consonant form of ta, without implicit 09A1 ড BENGALI LETTER DDA vowel, used in some sequences 09A2 ঢ BENGALI LETTER DDHA Sign 09A3 ণ BENGALI LETTER NNA 09D7 $ৗ BENGALI AU LENGTH MARK 09A4 ত BENGALI LETTER TA 09A5 থ BENGALI LETTER THA Additional consonants 09A6 দ BENGALI LETTER DA 09DC ড় BENGALI LETTER RRA 09A7 ধ BENGALI LETTER DHA ≡ 09A1 ড 09BC $় 09DD ঢ় BENGALI LETTER RHA 09A8 ন BENGALI LETTER NA ≡ ঢ $় 09A9 " <reserved> 09A2 09BC 09DE " <reserved> 09AA প BENGALI LETTER PA 09DF য় BENGALI LETTER YYA 09AB ফ BENGALI LETTER PHA ≡ য $় 09AC ব BENGALI LETTER BA 09AF 09BC = Bengali va, wa Additional vowels for Sanskrit 09AD ভ BENGALI LETTER BHA 09E0 ৠ BENGALI LETTER VOCALIC RR 09AE ম BENGALI LETTER MA 09E1 ৡ BENGALI LETTER VOCALIC LL 09AF য BENGALI LETTER YA 09E2 $ৢ BENGALI VOWEL SIGN VOCALIC L 09B0 র BENGALI LETTER RA 09E3 $ৣ BENGALI VOWEL SIGN VOCALIC LL 09B1 " <reserved> 09B2 ল BENGALI LETTER LA Printed: 26-May-2016 5 09E4 Bengali 09FD Reserved For viram punctuation, use the generic Indic 0964 and 0965. Note that these punctuation marks are referred to as dahri and double dahri in Bangla. 09E4 " <reserved> → 0964 । devanagari danda 09E5 " <reserved> → 0965 ॥ devanagari double danda Digits 09E6 ০ BENGALI DIGIT ZERO 09E7 ১ BENGALI DIGIT ONE 09E8 ২ BENGALI DIGIT TWO 09E9 ৩ BENGALI DIGIT THREE 09EA ৪ BENGALI DIGIT FOUR 09EB ৫ BENGALI DIGIT FIVE 09EC ৬ BENGALI DIGIT SIX 09ED ৭ BENGALI DIGIT SEVEN 09EE ৮ BENGALI DIGIT EIGHT 09EF ৯ BENGALI DIGIT NINE Additions for Assamese 09F0 ৰ BENGALI LETTER RA WITH MIDDLE DIAGONAL = Assamese letter ra 09F1 ৱ BENGALI LETTER RA WITH LOWER DIAGONAL = Assamese letter wa = bengali letter va with lower diagonal (1.0) Currency signs 09F2 ৲ BENGALI RUPEE MARK = taka • historic currency sign 09F3 ৳ BENGALI RUPEE SIGN = Bangladeshi taka Historic symbols for fractional values The use of these signs is not limited to currency, despite the character names. 09F4 ৴ BENGALI CURRENCY NUMERATOR ONE • not in current usage 09F5 ৵ BENGALI CURRENCY NUMERATOR TWO • not in current usage 09F6 ৶ BENGALI CURRENCY NUMERATOR THREE • not in current usage 09F7 ৷ BENGALI CURRENCY NUMERATOR FOUR 09F8 ৸ BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR 09F9 ৹ BENGALI CURRENCY DENOMINATOR SIXTEEN Sign 09FA ৺ BENGALI ISSHAR = ishvar • represents the name of a deity = svargiya • written before the name of a deceased person Historic currency sign 09FB ৻ BENGALI GANDA MARK Signs 09FC BENGALI LETTER VEDIC ANUSVARA • denotes a Vedic anusvara 09FD ॰ BENGALI ABBREVIATION SIGN 6 Printed: 26-May-2016 0A80 Gujarati 0AFF 0A8 0A9 0AA 0AB 0AC 0AD 0AE 0AF 0 ઐ ઠ ર $ી ૐૠ ૰ 0A90 0AA0 0AB0 0AC0 0AD0 0AE0 0AF0 1 $ઁઑ ડ $ુ ૡ ૱ 0A81 0A91 0AA1 0AC1 0AE1 0AF1 2 $ં ઢ લ $ૂ $ૢ 0A82 0AA2 0AB2 0AC2 0AE2 3 $ઃ ઓ ણ ળ $ૃ $ૣ 0A83 0A93 0AA3 0AB3 0AC3 0AE3 4 ઔ ત $ૄ 0A94 0AA4 0AC4 5 અ ક થ વ $ૅ 0A85 0A95 0AA5 0AB5 0AC5 6 આ ખ દ શ ૦ 0A86 0A96 0AA6 0AB6 0AE6 7 ઇ ગ ધ ષ $ે ૧ 0A87 0A97 0AA7 0AB7 0AC7 0AE7 8 ઈ ઘ ન સ $ૈ ૨ 0A88 0A98 0AA8 0AB8 0AC8 0AE8 9 ઉ ઙ હ $ૉ ૩ ૹ 0A89 0A99 0AB9 0AC9 0AE9 0AF9 A ઊ ચ પ ૪ $ૺ 0A8A 0A9A 0AAA 0AEA 0AFA B ઋ છ ફ $ો ૫ $ૻ 0A8B 0A9B 0AAB 0ACB 0AEB 0AFB C ઌ જ બ $઼ $ૌ ૬ $ૼ 0A8C 0A9C 0AAC 0ABC 0ACC 0AEC 0AFC D ઍ ઝ ભ ઽ $્ ૭ $૽ 0A8D 0A9D 0AAD 0ABD 0ACD 0AED 0AFD E ઞ મ $ા ૮ $૾ 0A9E 0AAE 0ABE 0AEE 0AFE F
Recommended publications
  • 75 Characters Maximum
    Kannada Script LGR Proposal Introduction, Current Analysis and Next Steps Dr. U.B. Pavanaja NBGP F2F Meeting, Colombo 14 December 2017 | 1 Agenda 1 2 3 Introduction to Repertoire Analysis Within Script Kannada Script Variants 4 5 6 Cross-Script WLE Rules Current Status and Variants Next Steps for Completion | 2 Introduction to Kannada Script Population – there are about 60 million speakers of Kannada language which uses Kannada script. Geographical area - Kannada is spoken predominantly by the people of Karnataka State of India. It is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad Languages written in Kannada script – Kannada, Tulu, Kodava (Coorgi), Konkani, Havyaka, Sanketi, Beary (byaari), Arebaase, Koraga | 3 Classification of Characters Swaras (vowels) Letter ಅ ಆ ಇ ಈ ಉ ಊ ಋ ಎ ಏ ಐ ಒ ಓ ಔ Vowel sign/ N/Aಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ matra Yogavahas In Kannada, all consonants Anusvara ಅಂ (vyanjanas) when written as ಕ (ka), ಖ (kha), ಗ (ga), etc. actually have a built-in vowel sign (matra) Visarga ಅಃ of vowel ಅ (a) in them. | 4 Classification of Characters Vargeeya vyanjana (structured consonants) voiceless voiceless aspirate voiced voiced aspirate nasal Velars ಕ ಖ ಗ ಘ ಙ Palatals ಚ ಛ ಜ ಝ ಞ Retroflex ಟ ಠ ಡ ಢ ಣ Dentals ತ ಥ ದ ಧ ನ Labials ಪ ಫ ಬ ಭ ಮ Avargeeya vyanjana (unstructured consonants) ಯ ರ ಱ (obsolete) ಲ ವ ಶ ಷ ಸ ಹ ಳ ೞ (obsolete) | 5 Repertoire Included-1 Sr. Unicode Glyph Character Name Unicode Indic Ref Widespread No. Code General Syllabic use ? Point Category Category [Yes/No] 1 0C82 ಂ KANNADA SIGN ANUSVARA Mc Anusvara Yes 2 0C83 ಂ KANNADA SIGN VISARGA Mc Visarga Yes 3 0C85 ಅ KANNADA LETTER A Lo Vowel Yes 4 0C86 ಆ KANNADA LETTER AA Lo Vowel Yes 5 0C87 ಇ KANNADA LETTER I Lo Vowel Yes 6 0C88 ಈ KANNADA LETTER II Lo Vowel Yes 7 0C89 ಉ KANNADA LETTER U Lo Vowel Yes 8 0C8A ಊ KANNADA LETTER UU Lo Vowel Yes KANNADA LETTER VOCALIC 9 0C8B ಋ R Lo Vowel Yes 10 0C8E ಎ KANNADA LETTER E Lo Vowel Yes | 6 Repertoire Included-2 Sr.
    [Show full text]
  • Know Your Keyboard Description Key 1,4 Join/Virama/Halant 2
    Know Your Keyboard 1 2 3 4 5 6 Description Key 1,4 Join/Virama/Halant 2 Combination/Shift 3 Function/Fn 5 Num Lock* Fn + F11 6 Language Switch** Fn + F12 *Toggle Num Lock to switch between native numbers and English numbers. 1 ** Language Switch works on Windows, Linux and Android. For macOS, a configuration in settings is required. Note: If numbers are appearing in English, turn off Num Lock. Connecting Your Keyboard To Computer – Plug-in the cable to USB port on your computer. To Android Phone/Tablet 3 2 1 Use USB-to-OTG connector to plug-in keyboard. 2 Language and Layout You can use one keyboard to type multiple languages. You need to install at least one language to type. Language Layout Bengali Ka-Naada Bengali Keyboard Assamese Devanagari Sanskrit Hindi Ka-Naada Hindi Keyboard Marathi Neapli English Ka-Naada English Keyboard Guajarati Ka-Naada Guajarati Keyboard Kannada Ka-Naada Kannada Keyboard Malayalam Ka-Naada Malayalam Keyboard Tulu Odiya Ka-Naada Odiya Keyboard Panjabi Ka-Naada Gurmukhi Keyboard Telugu Ka-Naada Telugu Keyboard 3 Note: You need to switch to Ka-Naada input language before typing. Note: To switch between the languages you’re using, repeatedly press Language Switch key to cycle through all your installed languages. Language Pack Installation Go to https://ka-naada.com/downloads/ and click on the “Download” button in front of your operating system. Installation – Windows 1. Open your “Downloads” folder and locate “kanaada_keyboards.zip”. 2. Right click on zip file and choose “Extract Here” from the option menu. 3.
    [Show full text]
  • Analysis of Comments for Telugu Script LGR Proposal for the Root Zone Revision: June 30, 2019
    Neo-Brahmi Generation Panel: Analysis of comments for Telugu script LGR Proposal for the Root Zone Revision: June 30, 2019 Neo-Brahmi Generation Panel (NBGP) published the Telugu script LGR Propsoal for the Root Zone for public comment on 8 August 2018. This document is an additional document of the public comment report, collecting NBGP analyses as well as the concluded responses. There is 1 (one) comment submission. The analysis is as follow: No. 1 From Liang Hai Subject A Quick review of the Telugu proposal Comment 2, “telɯgɯ”: This is probably a phonetic transcription, not an accurate transliteration that should be used in this document. NBGP The NBGP acknowledges the comment. Analysis NBGP Updated the proposal in section 2 to use ‘Telugu’ Response Comment 3.5, “… and 16 dependent signs”: 15. NBGP There are 16 Matras: 14 Matras are in the repertoire, 2 Matras are Analysis excluded from the repertoire. NBGP No action required. Response Comment 3.5.1: Vocalic l should be categorized with vocalic rr and vocalic ll. Transliteration of vocalic ll is wrong. NBGP Agree. Analysis NBGP Update as suggested. Response 1 Comment 3.5.1, R1, “ca= a consonant with an inherent ‘a’”: When discussing text encoding, Indic consonants naturally are with an inherent vowel. Try to distinguish phonetic seQuence and written forms and encoded character sequence. The 3 lines under R1 are not helpful. NBGP The comment does not affect the normative part of the LGR. Analysis NBGP No action required. Response Comment 3.5.3: The introduction of arasunna usage is unclear. Is it commonly used today or not? NBGP The arsunna is not used frequently and it is not in the MSR.
    [Show full text]
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]
  • An Introduction to Indic Scripts
    An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
    [Show full text]
  • Roadmaps to ISO/IEC 10646 and the Unicode Standard
    Roadmaps to ISO/IEC 10646 and the Unicode Standard • INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2461 2002-05-14 Title: Snapshot of Pictorial view of Roadmaps to BMP, SMP, SIP and SSP Ad hoc group on Roadmaps ( Michael Everson, Rick McGowan, and Ken Whistler) Source: Adapted by: V.S. Umamaheswaran For review and adoption WG2 at its meeting M42 Status: (includes update as of May 14, 2002) Distribution: ISO/IEC JTC 1/SC 2/WG 2, ISO/IEC JTC 1/SC 2 and Liaison Organisations Replaces: WG2 document N2409 Summary This document is the latest snapshot of the roadmap documents that has been presented to and adopted by WG2 (ISO/IEC JTC1/SC2/WG 2). Four tables containing the roadmaps to ● the Basic Multilingual Plane (BMP - Plane 0) (version bmp-3-5, 2002-04-04) ● the Supplementary Multilingual Plane (SMP - Plane 1) (version smp-3-2, 2002-04-03) ● the Supplementary Ideographic Plane (SIP - Plane 2) (version sip-3-0, 2001-10-10), and ● the Supplementary Special-purpose Plane (SSP - Plane 14) (version ssp-3-1, 2002-01-22) are included in this document. The Roadmap ad hoc group maintains and updates this document as a service to the UTC (Unicode Technical Committee) and to WG2 (ISO/IEC JTC1/SC2/WG2). The latest working version of this document can be found at http://www.unicode.org/roadmaps. This document is informative. Please send corrigenda and other comments to the authors.
    [Show full text]
  • N4185 Preliminary Proposal to Encode Siddham in ISO/IEC 10646
    ISO/IEC JTC1/SC2/WG2 N4185 L2/12-011R 2012-05-03 Preliminary Proposal to Encode Siddham in ISO/IEC 10646 Anshuman Pandey Department of History University of Michigan Ann Arbor, Michigan, U.S.A. [email protected] May 3, 2012 1 Introduction This is a preliminary proposal to encode the Siddham script in the Universal Character Set (ISO/IEC 10646). It is a collaborative effort between the Script Encoding Initiative (SEI) at the University of California, Berke- ley and the Shingon Buddhist International Institute, Fresno, California. Feedback is requested from experts and users of the script. Comments may be submitted to the author at the email address given above. Siddham is a Brahmi-based writing system that originated in India, but which is used primarily in East Asia. At present it is associated with esoteric Buddhist traditions in Japan. Nevertheless, Siddham is structurally an Indic script and its proposed encoding adheres to the UCS model for Brahmi-based writing systems, such as Devanagari and similar scripts. The technical description for Siddham given here may differ from the traditional analysis and philosophical interpretations of the script and its constituent characters and glyphs. An attempt has been made to encode all distinct characters attested in Siddham records, although more characters may be uncovered through additional research. The characters that are proposed for encoding have been analyzed in accordance with the character-glyph model of the UCS. As a result, the proposed encoding may contain characters that are not part of traditional character repertoires. It may also exclude characters that are traditionally regarded as independent letters, such as conjuncts, which are to be represented in the manner specified by the UCS encoding model.
    [Show full text]
  • Phonetics of Sgaw Karen in Thailand
    PHONETICS OF SGAW KAREN IN THAILAND: AN ACOUSTIC DESCRIPTION PONGPRAPUNT RATTANAPORN MASTER OF ARTS IN ENGLISH THE GRADUATE SCHOOL CHIANG MAI UNIVERSITY SEPTEMBER 2012 PHONETICS OF SGAW KAREN IN THAILAND: AN ACOUSTIC DESCRIPTION PONGPRAPUNT RATTANAPORN AN INDEPENDENT STUDY SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN ENGLISH THE GRADUATE SCHOOL CHIANG MAI UNIVERSITY SEPTEMBER 2012 PHONETICS OF SGAW KAREN IN THAILAND: AN ACOUSTIC DESCRIPTION PONGPRAPUNT RATTANAPORN THIS INDEPENDENT STUDY HAS BEEN APPROVED TO BE A PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN ENGLISH 26 September 2012 © Copyright by Chiang Mai University iii ACKNOWLEDGEMENT I would like to express my gratitude to everybody who contributed to this research. Firstly, my deepest thanks go to both of my supervisors, Dr. Preeya Nokaew (Chiang Mai University) and Dr. Paul Sidwell (Australian National University), for their excellent support from when I first started until I finished this research. Secondly, my thanks go to the examining committee, Dr. Peter Freeouf and Dr. Sudarat Hatfield, for their useful comments. Thirdly, I would like to express my thanks to Assoc. Prof. Suphanee Jancamai and Asst. Prof. Panit Boonyavatana and all the teachers in the graduate program. Also a special thanks to Dr. Phinnarat Akharawatthanakun (Payap University) for her valuable advice. This research would not have been successful without the assistance of the Dujeto’s, Mr. Thaworn Kamponkun and all the informants and my friends Ey, Sprite, Mike, Chris, Darwin, Oh, Jay, Rose and Yrrah. Last but not least, I appreciate everybody in my family for their love, encouragement and financial support.
    [Show full text]
  • A Barrier to Indic-Language Implementation of Unicode Is the Perception That Encoding Order in Unicode Is Equivalent to Lingui
    Issues in Indic Language Collation Issues in Indic Language Collation Cathy Wissink Program Manager, Windows Globalization Microsoft Corporation I. Introduction As the software market for India1 grows, so does the interest in developing products for this market, and Unicode is part of many vendors’ solutions. However, many software vendors see a barrier to implementing Unicode on products for the Indic-language market. This barrier is the perception that deficiencies in Unicode will keep software developers from creating products that are culturally and linguistically appropriate for the Indian market. This perception manifests itself in a number of ways, but one major concern that the Indic language community has voiced is the fact that the Unicode character encoding order is not appropriate for linguistic collation (or sorting). This belief that character encoding order in Unicode must be equivalent to linguistic collation of these same scripts and their respective languages is considered by some developers a blocking point to adoption of Unicode in the Indian market, and is indicative of the greater concern within the Indic-language community about the feasibility of Unicode for their scripts. This paper will demonstrate that this perceived barrier to Unicode adoption does not exist and that it is possible to provide properly globalized software for the Indic market with the current implementation of Unicode, using the example of Indic language collation. A brief history of Indic encodings will be given to set the stage for the current mentality regarding Unicode in the Indian market. The basics of linguistic collation and its application to Indic scripts will then be discussed, compared to encoding, and demonstrated as it exists on Windows XP.
    [Show full text]
  • Encoding Diversity for All the World's Languages
    Encoding Diversity for All the World’s Languages The Script Encoding Initiative (Universal Scripts Project) Michael Everson, Evertype Westport, Co. Mayo, Ireland Bamako, Mali • 6 May 2005 1. Current State of the Unicode Standard • Unicode 4.1 defines over 97,000 characters 1. Current State of the Unicode Standard: New Script Additions Unicode 4.1 (31 March 2005): For Unicode 5.0 (2006): Buginese N’Ko Coptic Balinese Glagolitic Phags-pa New Tai Lue Phoenician Nuskhuri (extends Georgian) Syloti Nagri Cuneiform Tifinagh Kharoshthi Old Persian Cuneiform 1. Current State of the Unicode Standard • Unicode 4.1 defines over 97,000 characters • Unicode covers over 50 scripts (many of which are used for languages with over 5 million speakers) 1. Current State of the Unicode Standard • Unicode 4.1 defines over 97,000 characters • Unicode covers over 50 scripts (often used for languages with over 5 million speakers) • Unicode enables millions of users worldwide to view web pages, send e-mails, converse in chat-rooms, and share text documents in their native script 1. Current State of the Unicode Standard • Unicode 4.1 defines over 97,000 characters • Unicode covers over 50 scripts (often used for languages with over 5 million speakers) • Unicode enables millions of users worldwide to view web pages, send e-mails, converse in chat- rooms, and share text documents in their native script • Unicode is widely supported by current fonts and operating systems, but… Over 80 scripts are missing! Missing Modern Minority Scripts India, Nepal, Southeast Asia China:
    [Show full text]
  • World Braille Usage, Third Edition
    World Braille Usage Third Edition Perkins International Council on English Braille National Library Service for the Blind and Physically Handicapped Library of Congress UNESCO Washington, D.C. 2013 Published by Perkins 175 North Beacon Street Watertown, MA, 02472, USA International Council on English Braille c/o CNIB 1929 Bayview Avenue Toronto, Ontario Canada M4G 3E8 and National Library Service for the Blind and Physically Handicapped, Library of Congress, Washington, D.C., USA Copyright © 1954, 1990 by UNESCO. Used by permission 2013. Printed in the United States by the National Library Service for the Blind and Physically Handicapped, Library of Congress, 2013 Library of Congress Cataloging-in-Publication Data World braille usage. — Third edition. page cm Includes index. ISBN 978-0-8444-9564-4 1. Braille. 2. Blind—Printing and writing systems. I. Perkins School for the Blind. II. International Council on English Braille. III. Library of Congress. National Library Service for the Blind and Physically Handicapped. HV1669.W67 2013 411--dc23 2013013833 Contents Foreword to the Third Edition .................................................................................................. viii Acknowledgements .................................................................................................................... x The International Phonetic Alphabet .......................................................................................... xi References ............................................................................................................................
    [Show full text]
  • Iso/Iec Jtc1/Sc2/Wg2 N2383
    ISO/IEC JTC1/SC2/WG2 N2383 2001-10-10 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по стандартизации Doc Type: Working Group Document Title: Roadmaps Source: The Unicode Consortium Status: Liaison Contribution Date: 2001-10-10 Distribution: WG2 and UTC The Roadmaps to the UCS have been maintained by the ad-hoc committee on the Roadmap, which consists of Michael Everson, Rick McGowan, and Ken Whistler. They were hosted on Michael Everson’s site, and this caused difficulties for WG2 in referencing them. The Unicode Consortium has offered to host the Roadmap documents as official Unicode documents, acknowledging their usefulness and stability. This should allow WG2 to reference them by URL (http://www.unicode.org/roadmaps). Whenever major revisions are made, the Unicode Consortium will forward a document similar to this one to WG2. 1 2001-04-01 Roadmap to the BMP 2001-10-10 15:28 Roadmap to the BMP Home ||Site Map Search Tables Roadmap to the BMP Roadmap Introduction Roadmap to the Revision 3.0 BMP (Plane 0) Authors Michael Everson, Rick McGowan, Ken Whistler Roadmap to the SMP (Plane 1) Date 2001-10-10 Roadmap to the SIP (Plane 2) This Version http://www.unicode.org/roadmaps/bmp-3-0.html Roadmap to the SSP (Plane 14) Previous Version n/a Not the Roadmap Latest Version http://www.unicode.org/roadmaps/bmp-3-0.html More Information Summary The Unicode Standard, Version 3.0 The following table comprises a proportional map of Plane 0, the BMP (Basic Multilingual Proposed Plane).
    [Show full text]