Internationalized Domain Names-Boro
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
UNICODE for Kannada General Information & Description
UNICODE for Kannada (U+0C80 to U+0CFF) General Information & Description Written by: C V Srinatha Sastry Issued by: Director Directorate of Information Technology Government of Karnataka Multi Storied Buildings, Vidhana Veedhi BANGALORE 560 001 INDIA PDF created with FinePrint pdfFactory Pro trial version http://www.pdffactory.com UNICODE for Kannada Introduction The Kannada script is a South Indian script. It is used to write Kannada language of Karnataka State in India. This is also used in many parts of Tamil Nadu, Kerala, Andhra Pradesh and Maharashtra States of India. In addition, the Kannada script is also used to write Tulu, Konkani and Kodava languages. Kannada along with other Indian language scripts shares a large number of structural features. The Kannada block of Unicode Standard (0C80 to 0CFF) is based on ISCII-1988 (Indian Standard Code for Information Interchange). The Unicode Standard (version 3) encodes Kannada characters in the same relative positions as those coded in the ISCII-1988 standard. The Writing system that employs Kannada script constitutes a cross between syllabic writing systems and phonemic writing systems (alphabets). The effective unit of writing Kannada is the orthographic syllable consisting of a consonant (Vyanjana) and vowel (Vowel) (CV) core and optionally, one or more preceding consonants, with a canonical structure of ((C)C)CV. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation. The orthographic syllable is built up of alphabetic pieces, the actual letters of Kannada script. -
Tibetan Romanization Table
Tibetan Comment [LRH1]: Transliteration revisions are highlighted below in light-gray or otherwise noted in a comment. ALA-LC romanization of Tibetan letters follows the principles of the Wylie transliteration system, as described by Turrell Wylie (1959). Diacritical marks are used for those letters representing Indic or other non-Tibetan languages, and parallel the use of these marks when transcribing their counterpart letters in Sanskrit. These are applied for the sake of consistency, and to reflect international publishing standards. Accordingly, romanize words of non-Tibetan origin systematically (following this table) in all cases, even though the word may derive from Sanskrit or another language. When Tibetan is written in another script (e.g., ʼPhags-pa) the corresponding letters in that script are also romanized according to this table. Consonants (see Notes 1-3) Vernacular Romanization Vernacular Romanization Vernacular Romanization ka da zha ཀ་ ད་ ཞ་ kha na za ཁ་ ན་ ཟ་ Comment [LH2]: While the current ALA-LC ga pa ’a table stipulates that an apostrophe should be used, ག་ པ་ འ་ this revision proposal recommends that the long- nga pha ya standing defacto LC practice of using an alif (U+02BC) be continued and explicitly stipulated in ང་ ཕ་ ཡ་ the Table. See accompanying Narrative for details. ca ba ra ཅ་ བ་ ར་ cha ma la ཆ་ མ་ ལ་ ja tsa sha ཇ་ ཙ་ ཤ་ nya tsha sa ཉ་ ཚ་ ས་ ta dza ha ཏ་ ཛ་ ཧ་ tha wa a ཐ་ ཝ་ ཨ་ Vowels and Diphthongs (see Notes 4 and 5) ཨི་ i ཨཱི་ ī རྀ་ r̥ ཨུ་ u ཨཱུ་ ū རཱྀ་ r̥̄ ཨེ་ e ཨཻ་ ai ལྀ་ ḷ ཨོ་ o ཨཽ་ au ལཱྀ ḹ ā ཨཱ་ Other Letters or Diacritical Marks Used in Words of Non-Tibetan Origin (see Notes 6 and 7) ṭa gha ḍha ཊ་ གྷ་ ཌྷ་ ṭha jha anusvāra ṃ Comment [LH3]: This letter combination does ཋ་ ཇྷ་ ◌ ཾ not occur in Tibetan texts, and has been deprecated from the Unicode Standard. -
N4185 Preliminary Proposal to Encode Siddham in ISO/IEC 10646
ISO/IEC JTC1/SC2/WG2 N4185 L2/12-011R 2012-05-03 Preliminary Proposal to Encode Siddham in ISO/IEC 10646 Anshuman Pandey Department of History University of Michigan Ann Arbor, Michigan, U.S.A. [email protected] May 3, 2012 1 Introduction This is a preliminary proposal to encode the Siddham script in the Universal Character Set (ISO/IEC 10646). It is a collaborative effort between the Script Encoding Initiative (SEI) at the University of California, Berke- ley and the Shingon Buddhist International Institute, Fresno, California. Feedback is requested from experts and users of the script. Comments may be submitted to the author at the email address given above. Siddham is a Brahmi-based writing system that originated in India, but which is used primarily in East Asia. At present it is associated with esoteric Buddhist traditions in Japan. Nevertheless, Siddham is structurally an Indic script and its proposed encoding adheres to the UCS model for Brahmi-based writing systems, such as Devanagari and similar scripts. The technical description for Siddham given here may differ from the traditional analysis and philosophical interpretations of the script and its constituent characters and glyphs. An attempt has been made to encode all distinct characters attested in Siddham records, although more characters may be uncovered through additional research. The characters that are proposed for encoding have been analyzed in accordance with the character-glyph model of the UCS. As a result, the proposed encoding may contain characters that are not part of traditional character repertoires. It may also exclude characters that are traditionally regarded as independent letters, such as conjuncts, which are to be represented in the manner specified by the UCS encoding model. -
Internationalized Domain Names-Dogri
Draft Policy Document For INTERNATIONALIZED DOMAIN NAMES Language: DOGRI 1 RECORD OF CHANGES *A - ADDED M - MODIFIED D - DELETED PAGES A* COMPLIANCE VERSION DATE AFFECTED M TITLE OR BRIEF VERSION OF NUMBER D DESCRIPTION MAIN POLICY DOCUMENT 1.0 21 January, Whole M Language Specific 2010 Document Policy Document for DOGRI 1.1 22 March, Whole M Description of 2011 Document sequence added, Variant removed 1.2 13 January, Page5,6,7 M Inclusion of Vowel 2012 Modifier(MODIFIE R LETTER APOSTROPHE) [S] after Matra [M] 1.3 01 January, All M D Modified character 1.8 2013 repertoire as per IDNA 2008 2 Table of Contents 1. AUGMENTED BACKUS-NAUR FORMALISM (ABNF) ...........................................4 1.1 Declaration of variables ....................................................................................... 4 1.2 ABNF Operators .................................................................................................. 4 1.3 The Vowel Sequence ............................................................................................ 4 1.4 Consonant Sequence ............................................................................................ 5 1.5 Sequence .............................................................................................................. 7 1.6 ABNF Applied to the DOGRI IDN ..................................................................... 7 2. RESTRICTION RULES ................................................................................................10 3. EXAMPLES: ............................................................................................................12 -
Iso/Iec Jtc1/Sc2/Wg2 L2/12-322
ISO/IEC JTC1/SC2/WG2 N4330 L2/12-322 2012-09-27 Title: Proposal to Encode the SANDHI MARK for Sharada Source: Script Encoding Initiative (SEI) Author: Anshuman Pandey ([email protected]) Status: Liaison Contribution Action: For consideration by UTC and WG2 Date: 2012-09-27 1 Introduction This is a preliminary proposal to encode a new character in the Sharada block of the Universal Character Set (ISO/IEC 10646). Properties of the proposed character are: 111C9 Po 0 L N AL 2 Description The is used in some Sharada manuscripts for indicating external sandhi in Sanskrit documents. It is a below-base mark that is written after the syllable where sandhi occurs. The excerpt below, from Bhagavad Gītā 1.1, illustrates usage of the mark (adapted from Lokesh Chandra 1982): Here the occurs in the phrases pāṇḍavāścaiva and kimakurvata . In the first phrase, it indicates ⃓ + + ⃓ vowel sandhi between ca and eva (°a e° → °ai°); in the second it indicates fusion of a bare consonant and a vowel between kim and akurvata (°m a° → °ma°). In this manuscript of the BG, the is written twice in order to indicate sandhi between two long vowels, as in the following excerpt from BG 2.1: 1 Proposal to Encode the SANDHI MARK for Sharada Anshuman Pandey Here the doubled mark occurs in the phrase kr̥ payāvisṭ aṃ , where it indicates sandhi of identical long vowels ++ between kr̥ payā and āvisṭ aṃ (°ā ā° → °ā°). However, the double is not consistently through- out the manuscript for marking long-vowel sandhi. 3 Relationship to Avagraha The graphical shape of is similar to that of ᇁ +111C1 . -
Internationalized Domain Names-Sanskrit
Policy Document For INTERNATIONALIZED DOMAIN NAMES Language: SANSKRIT 1. AUGMENTED BACKUS-NAUR FORMALISM (ABNF) .......................................... 3 1.1 Declaration of variables ............................................................................................ 3 1.2 ABNF Operators ....................................................................................................... 3 1.3 The Vowel Sequence ................................................................................................. 3 1.4 Consonant Sequence ................................................................................................. 4 1.5 ABNF Applied to the SANSKRIT IDN .................................................................... 5 2. RESTRICTION RULES ................................................................................................. 6 3. EXAMPLES ................................................................................................................... 8 4. LANGUAGE TABLE: SANSKRIT ............................................................................... 9 5. NOMENCLATURAL DESCRIPTION TABLE OF SANSKRIT LANGUAGE TABLE ............................................................................................................................................11 6. VARIANT TABLE ........................................................................................................ 14 7. EXPERTISE/BODIES CONSULTED .......................................................................... 15 8. -
MSR-4: Annotated Repertoire Tables, Non-CJK
Maximal Starting Repertoire - MSR-4 Annotated Repertoire Tables, Non-CJK Integration Panel Date: 2019-01-25 How to read this file: This file shows all non-CJK characters that are included in the MSR-4 with a yellow background. The set of these code points matches the repertoire specified in the XML format of the MSR. Where present, annotations on individual code points indicate some or all of the languages a code point is used for. This file lists only those Unicode blocks containing non-CJK code points included in the MSR. Code points listed in this document, which are PVALID in IDNA2008 but excluded from the MSR for various reasons are shown with pinkish annotations indicating the primary rationale for excluding the code points, together with other information about usage background, where present. Code points shown with a white background are not PVALID in IDNA2008. Repertoire corresponding to the CJK Unified Ideographs: Main (4E00-9FFF), Extension-A (3400-4DBF), Extension B (20000- 2A6DF), and Hangul Syllables (AC00-D7A3) are included in separate files. For links to these files see "Maximal Starting Repertoire - MSR-4: Overview and Rationale". How the repertoire was chosen: This file only provides a brief categorization of code points that are PVALID in IDNA2008 but excluded from the MSR. For a complete discussion of the principles and guidelines followed by the Integration Panel in creating the MSR, as well as links to the other files, please see “Maximal Starting Repertoire - MSR-4: Overview and Rationale”. Brief description of exclusion -
Some Observations on the Padapå†Ha of the Ùgveda* (Published In: Indo-Iranian Journal 24 (1982), Pp
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Serveur académique lausannois PADAPÓÈHA OF THE ÙGVEDA 1 JOHANNES BRONKHORST Some observations on the padapå†ha of the Ùgveda* (published in: Indo-Iranian Journal 24 (1982), pp. 181-189) 1.1. In another article (Bronkhorst, 1981: § 1; see also Bronkhorst, 1982) I have shown that the Padapå†ha of the Ùgveda — composed by Íåkalya according to Yåska's Nirukta 6.28 — is older than the finally redacted version of that same Veda. This implies that the Ùgveda known to Íåkalya, on the basis of which he composed his Padapå†ha, had a form which was more archaic than ‘our’ Saµhitåpå†ha, at least where it concerns details of sandhi. This information is, by itself, of limited value, since it is exactly the details of sandhi which are largely absent from the Padapå†ha. Comparison of the text known to Íåkalya with the finally redacted Saµhitåpå†ha is therefore rarely possible, e.g., where we know how Íåkalya wanted the words of his Padapå†ha to be joined together. We shall discuss one particularly revealing case. It has long been known that e.g. RV 1.164.8 Sp. dh¥ty agre and RV 1.20.4 Sp. vi∑†y akrata replace original dh¥ti agre and vi∑†i akrata; see, e.g., Wackernagel, 1896: 322; Kuiper, 1955: 256. The Padapå†ha has dh¥t¥/ agre/ and vi∑†¥/ akrata/, and is therefore simply wrong. This does not, however, mean that the text which Íåkalya had before him was wrong. -
Grantha Script Lessons.Pdf
1 | Page http://www.virtualvinodh.com Grantha Script Lessons Grantha Lipi Pāṭhāḥ �न् िलिप पाठाः ³ரத² பி பாடா²: ലിപി പാഠാഃ ගන් ලි පාඨාඃ คฺรนฺถ ลิปิ ปาฐา: �គន្ លិ បិ បោឋ Grantha Script Lessons by Vinodh Rajan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 India License. Based on a work at www.virtualvinodh.com. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc- sa/2.5/in/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. 2 | Page http://www.virtualvinodh.com Contents Buddhanusmriti 3 Grantha - 1 - Vowels 5 Grantha - 2 - Ayogavaha 11 Grantha - 3 - Consonants - Ka 17 Grantha - 4 - Consonants ca - ṭa 21 Grantha - 5 - Consonants ta - pa 25 Grantha - 6 - Consonants ya - ha 29 Grantha - 7 - Summary I 34 Grantha - 8 - Vowel Signs I 42 Grantha - 9 - Vowel Signs II 48 Grantha - 10 - Vowel-less Consonants 52 Grantha - 11 - Summary II 55 Grantha - 12 - Conjuncts I 61 Grantha - 13 - Conjuncts II 64 Grantha - 14 - Conjuncts III 70 Grantha - 15 - Conjunct IV 75 Grantha - 16 - Conjuncts V 80 Grantha - 17 - Grantha Fonts & Softwares 85 Sample Texts in Grantha 93 3 | Page http://www.virtualvinodh.com buddhānusmṛtiḥ बद्धु नुु ्मृ�स �³த³தா �ஸ்மʼதி: ⁴ oṁ namaḥ sarvabuddhabodhisattvebhyaḥ ॐ नमः सवरबदबु ्धबस�धसत्तृ ஓ்ʼ நம: ஸர்�³த³த ேபா³தி ஸத்த்ே ய: ⁴ ⁴ ⁴ ityapi buddho bhagavāṁstathāgato'rhan इत्य��बद्�ु धो भगवांस्तथागतो ऽ இத்யப��³த³ேதா ப க³்ா்ʼஸததா²க³ேதா(அ)ர்ஹ ⁴ ⁴ samyaksaṁbuddho vidyācaraṇasampannaḥ समतयक्सद्�यवु ोत्वद्याचरधृ ஸம்யஸ்ʼ�³த³ேதா -
N4035 Proposal to Encode the Maithili Script in ISO/IEC 10646
ISO/IEC JTC1/SC2/WG2 N4035 L2/11-175 2011-05-05 Proposal to Encode the Maithili Script in ISO/IEC 10646 Anshuman Pandey Department of History University of Michigan Ann Arbor, Michigan, U.S.A. [email protected] May 5, 2011 1 Introduction This is a proposal to encode the Maithili script in the Universal Character Set (UCS). The recommendations made here are based upon the following documents: • L2/06-226 “Request to Allocate the Maithili Script in the Unicode Roadmap” • N3765 L2/09-329 “Towards an Encoding for the Maithili Script in ISO/IEC 10646” 2 Background Maithili is the traditional writing system for the Maithili language (ISO 639: mai), which is spoken pre- dominantly in the state of Bihar in India and in the Narayani and Janakpur zones of Nepal by more than 35 million people. Maithili is a scheduled language of India and the second most spoken language in Nepal. Maithili is a Brahmi-based script derived from Gauḍī, or ‘Proto-Bengali’, which evolved from the Kuṭila branch of Brahmi by the 10th century . It is related to the Bengali, Newari, and Oriya, which are also descended from Gauḍī, and became differentiated from these scripts by the 14th century.1 It remained the dominant writing system for Maithili until the 20th century. The script is also known as Mithilākṣara and Tirhutā, names which refer to Mithila, or Tirhut, as these regions of India and Nepal are historically known. The Maithili script is known in Bengal as tiruṭe, meaning the script of the Tirhuta region.2 The Maithili script is associated with a scholarly and scribal tradition that has produced literary and philo- sophical works in the Maithili and Sanskrit languages from at least the 14th century . -
Creating Unicode Compatible Opentype Telugu Fonts
Creating Unicode Compatible OpenType Telugu Fonts Tirumala Krishna Desikacharyulu Winnipeg, Manitoba, Canada Abstract Pothana2000 is the first Unicode compatible OpenType font designed for Telugu. It was designed under Windows 2000, with generous help from Apurva Joshi of Microsoft's Opentype Division. This font was working in Windows 2000 even before Microsoft officially released the Gautami Opentype Telugu font with Windows XP Professional. In this paper I discuss the practical aspects of designing the Pothana2000 font so as to provide stimulus and guidance to other Telugu font designers. 1.0 Introduction With Windows 2000 Microsoft started supporting Indian languages. The initial support was limited to Hindi, Sanskrit, Marathi, Konkani and Tamil. This support was extended to Telugu and Kannada and some other languages in their Windows XP Professional operating system, and in their Office products such as Word XP. With XP, Microsoft also provided an Opentype Telugu font named Gautami. But it is a proprietary font, not available without license from Microsoft. Although Gautami serves the basic needs of the operating system, it is not the best looking font for aesthetic composition of Telugu documents. So, there is a great need to develop aesthetically appealing Telugu fonts for publication of Telugu documents and books. I was active in Telugu font design for more than a decade. My Pothana font, which was designed for pre-2000 Windows systems is widely used for publications in North America. However, like many other Telugu fonts designed in these environments, it doesn't conform to any standard. So, when Microsoft started supporting Unicode Indian fonts in Windows 2000, I upgraded this font to Unicode compatible OpenType format, thus making it the first Telugu font working in this environment. -
Transliteration Guide for Members of the DHARMA Project Dániel Balogh, Arlo Griffiths
Transliteration Guide for Members of the DHARMA Project Dániel Balogh, Arlo Griffiths To cite this version: Dániel Balogh, Arlo Griffiths. Transliteration Guide for Members of the DHARMA Project. 2019. halshs-02272407v2 HAL Id: halshs-02272407 https://hal.archives-ouvertes.fr/halshs-02272407v2 Preprint submitted on 20 Dec 2019 (v2), last revised 2 Jul 2020 (v3) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0 International License Transliteration Guide for members of the project Release Version 1.1, 2019-12-18 Dániel Balogh & Arlo Griffiths Contents 1. Introduction ................................................................................................................. 2 1.1. Version History 2 1.1.1. Summary of changes since the last version 2 1.2. Coverage 2 1.3. Separation of Transliteration and Encoding 2 1.4. Terms and Definitions 3 1.4.1. Abbreviations 3 1.4.2. Script and its elements 4 1.4.3. Script conversion 6 1.4.4. Notation for transliteration and transcription 7 2. General Principles ........................................................................................................ 8 2.1. Character Set and Input Method 8 2.2. Strict and Loose Transliteration 9 2.2.1. Strict transliteration 9 2.2.2.