Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali

Total Page:16

File Type:pdf, Size:1020Kb

Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages 29-31 August 2018, Gurugram, India Brahmic Schwa-Deletion with Neural Classifiers: Experiments with Bengali Cibu Johny, Martin Jansche Google AI, London, United Kingdom {cibu, mjansche}@google.com Abstract Table 1: The words car and bus as loanwords in various lan- guages, shown in native orthography and transliteration The Brahmic family of writing systems is an alpha-syllabary, in which a consonant letter without an explicit vowel marker can Group Languages Orth. Trans. Orth. Trans. be ambiguous: it can either represent a consonant phoneme or Bengali কার kār@ বাস bās@ a CV syllable with an inherent vowel (“schwa”). The schwa- Hindi, Marathi, Nepali कार kār@ बस b@s@ deletion ambiguity must be resolved when converting from text 1 Gujarati કાર kār@ બસ b@s@ to an accurate phonemic representation, particularly for text-to- Sinhala කා kār බ b@s Malayalam കാർ kār ബസ് b@s speech synthesis. We situate the problem of Bengali schwa- Tamil கா kār ப p@s deletion in the larger context of grapheme-to-phoneme conver- 2 sion for Brahmic scripts and solve it using neural network clas- sifiers with graphemic features that are independent of the script The phenomenon of schwa deletion arises in several lan- and the language. Classifier training is implemented using Ten- guages written in Brahmic scripts, including in Hindi [3, 4], sorFlow and related tools. We analyze the impact of both train- Punjabi [5] in Brahmic Gurmukhi (as opposed to Perso-Arabic ing data size and trained model size, as these represent real-life Shahmukhi) script, Marathi [6, 7], and others. Choudhury data collection and system deployment constraints. Our method et al. [8] approach the problem diachronically for several lan- achieves high accuracy for Bengali and is applicable to other guages, including Bengali. languages written with Brahmic scripts. Index Terms: Schwa-deletion, grapheme-to-phoneme, Bengali, Schwa deletion does not affect the entire Brahmic fam- Brahmic scripts, phonology, TensorFlow, neural networks ily. To show the absence of an inherent vowel, most Brah- mic scripts can make use of ligatures and of an explicit vowel suppression mark, generically called virama following Sanskrit/ 1. Introduction Unicode conventions. One subset of languages – including San- The English word table exists as a loanword in Bengali, where it skrit, Sinhala, and the major Dravidian languages – requires ex- is pronounced /ʈebil/1. In Bengali script – also known as Eastern plicit signaling of schwa deletion: a consonant without an inher- Nagari, a prominent member of the Brahmic family [1] – this is ent vowel must appear in ligated form or carry a visible virama; written as টিবল. It is divided into three parts, one of which is otherwise the inherent vowel is always realized. ট followed by িব and ল. The first, ট, represents the syllable The cross-linguistic situation is illustrated in Table 1. For /ʈe/ and consists of the consonant letter ট (indicating /ʈ/) and easy comparison a uniform alphabetic transliteration (based on the modifier ◌ appearing on its left and indicating /e/. This [9] and purely graphemic) is shown alongside the native abugida illustrates the workings of an alpha-syllabary [2] or abugida: in orthographies. Bengali and other languages in Group 1 do not this type of writing system, consonant letters like ট and ব can explicitly suppress the final schwa: phonetically both car and support vowel modifiers, which attach to them. A typical visual bus end in a consonant, but this fact is not indicated in the or- cluster represents a consonant-vowel (CV) syllable: ট is read thography and must be inferred. Hindi kār@ 2, and b@s@ ⟨ ⟩ ⟨ ⟩ /ʈe/ and িব is read /bi/. contrast with Sinhala kār and b@s . Sinhala and the Dra- This raises two related questions: How should a consonant vidian languages in Group⟨ ⟩ 2 consistently⟨ ⟩ use a visible virama sound be written when it is not followed by a vowel? And how (sometimes ligated in Malayalam) on the last letter of each should a consonant letter be read if no modifier is attached? word. Group 2 has a straightforward answer to our first ques- In an abugida an unmodified consonant is thought to carry tions: How should a consonant sound be written when it is not an inherent vowel and is read as a CV syllable. The precise pho- followed by a vowel? The consonant letter must be explicitly netic quality of that vowel varies across languages, and some- marked as lacking a vowel. times within the same language. Following established conven- Such simple conventions are absent from Group 1 lan- tions, we refer to the inherent vowel as schwa. We use that term guages, not only those listed in Table 1 but also Punjabi and in a purely graphemic sense, without any phonetic implications. Kashmiri in Brahmic scripts, Bhojpuri, Maithili, Assamese, Syl- The English word ball is pronounced /bɔl/ as a loanword in heti, and others. Consequently Group 1 has no straightforward Bengali. Its written representation বল consists of two unmodi- answer to our second question: How should a consonant letter fied consonant letters already familiar from above. The default be read if no modifier is attached? As the words बस b@s@ reading of the inherent vowel in Bengali is /ɔ/, and so plain is ⟨ ⟩ ব /bəs/ in Hindi and বল b@l@ /bɔl/ in Bengali illustrate, the read /bɔ/. We say that schwa has been phonetically realized. inherent vowel can either⟨ be realized⟩ or deleted. This paper de- In both বল /bɔl/ and the earlier example টিবল /ʈebil/ the scribes a general method for resolving this ambiguity. last letter ল has no modifiers attached and represents the final consonant sound. The inherent vowel of ল is not realized pho- netically. We say that schwa has been deleted. 2Brahmic graphemes are transliterated to the Latin script inside . ⟨⟩ The inherent vowel is represented by @ unless it has been explicitly 1Phonetic transcriptions are written with IPA symbols inside slashes. suppressed, in which case no corresponding⟨ ⟩ vowel symbol is written. 264 10.21437/SLTU.2018-55 Table 2: Illustration of g2p stages and schwa outcomes Table 3: Comparison of schwa deletion in Hindi and Bengali Representation stage Example Language Orthography Transliteration Phonemes processing step Hindi राम rām@ ram Bengali রাম rām@ ram 1 Orthography ক ম ল ন গ র transliterate Hindi आठ सौ āṭ@ sau aʈsɔ ↓ Bengali আটশ āṭ@ś@ aʈʃo 2 Abstract graphemes, with schwa k@ m@ l@ n@ g@ r@ resolve schwa Hindi कम k@rm@ kərm ↓ Bengali k@rm@ kɔrmo 3 Abstract graphemes, no schwa k a m a l n a g a r কম convert to phonemes Hindi अपयात ap@ryāpt@ əpərjapt ↓ 4 Phonemes k ɔ m o l n ɔ g o r Bengali অপযা ap@ryāpt@ ɔpɔrdʒapto larger g2p and phonological regularities in Bengali.3 For exam- 2. Background ple the raising of /ɔ/ to /o/ generally affects all occurrences of /ɔ/, The task of relating a written representation of a word to a regardless of whether they arose from an inherent vowel or from the independent vowel letter অ a . It also parallels the raising phonemic representation is known as grapheme-to-phoneme ⟨ ⟩ conversion, or g2p for short. It is a common building block in of /ɛ/ to /e/ in Bengali. In the context of Bengali TTS [11] our several larger tasks, including transliteration [7], speech recog- preferred approach is to train sequence-to-sequence models on nition, and text-to-speech synthesis (TTS) [3, 4], where the need annotated data. For other languages, hand-written rewrite rules for highly accurate pronunciations is especially acute. go a long way. Details are beyond the scope of the current paper. 2.1. Grapheme-to-phoneme conversion for Brahmic 2.2. Schwa deletion in Bengali compared with Hindi Schwa deletion is a well-studied phenomenon of Hindi [3, 4]. The wide variety of Brahmic scripts used for writing South Bengali shares some similarities, but differs in important ways. Asian languages creates both complexities and opportunities. Table 3 compares cognates between Hindi and Bengali. The sheer diversity of scripts is an obstacle, but their similar Words like the name Ram that end in simple final consonants workings provide an opportunity for applying a single generic are written and pronounced essentially the same in Hindi and solution to many similar problems. Our larger aim – beyond the Bengali: the final schwa deletes. On the other hand, in the com- scope of this paper – is towards generic grapheme-to-phoneme pound word āṭ@ś@ (eight hundred) the middle schwa for Brahmic across similar languages, with minimal language- আটশ deletes and the final⟨ schwa is⟩ pronounced. specific customization. A generic approach must arguably be Hindi has a very strong prohibition against final schwa. able to deal with the schwa deletion phenomenon, as it affects a While this is a general preference in Bengali as well, a simi- large group of languages to various degrees. larly strong restriction does not exist there. In fact, in several We conceive of schwa resolution as one module in a mul- situations a final schwa must be pronounced in Bengali, when tistage process outlined in Table 2 which converts script char- it would be deleted in the corresponding Hindi word. One large acters (Stage 1) to phonemes (Stage 4). Table 2 illustrates and systematic class of exceptions are those that end in conso- the processing stages and steps using the Bengali place name nant clusters in Hindi and in the corresponding cluster followed (Kamalnagar), pronounced /kɔmolnɔgor/. কমলনগর by /o/ in Bengali. Table 3 lists several examples.
Recommended publications
  • Talking Stone: Cherokee Syllabary Inscriptions in Dark Zone Caves
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 12-2017 Talking Stone: Cherokee Syllabary Inscriptions in Dark Zone Caves Beau Duke Carroll University of Tennessee, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Recommended Citation Carroll, Beau Duke, "Talking Stone: Cherokee Syllabary Inscriptions in Dark Zone Caves. " Master's Thesis, University of Tennessee, 2017. https://trace.tennessee.edu/utk_gradthes/4985 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Beau Duke Carroll entitled "Talking Stone: Cherokee Syllabary Inscriptions in Dark Zone Caves." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Arts, with a major in Anthropology. Jan Simek, Major Professor We have read this thesis and recommend its acceptance: David G. Anderson, Julie L. Reed Accepted for the Council: Dixie L. Thompson Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Talking Stone: Cherokee Syllabary Inscriptions in Dark Zone Caves A Thesis Presented for the Master of Arts Degree The University of Tennessee, Knoxville Beau Duke Carroll December 2017 Copyright © 2017 by Beau Duke Carroll All rights reserved ii ACKNOWLEDGMENTS This thesis would not be possible without the following people who contributed their time and expertise.
    [Show full text]
  • Cross-Language Framework for Word Recognition and Spotting of Indic Scripts
    Cross-language Framework for Word Recognition and Spotting of Indic Scripts aAyan Kumar Bhunia, bPartha Pratim Roy*, aAkash Mohta, cUmapada Pal aDept. of ECE, Institute of Engineering & Management, Kolkata, India bDept. of CSE, Indian Institute of Technology Roorkee India cCVPR Unit, Indian Statistical Institute, Kolkata, India bemail: [email protected], TEL: +91-1332-284816 Abstract Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script). Training with one source script and testing with another script to have a reasonable result is not easy in handwriting domain due to the complex nature of handwriting variability among scripts. Also it is difficult in mapping between source and target characters when they appear in cursive word images. The proposed Indic cross language framework exploits a large resource of dataset for training and uses it for recognizing and spotting text of other target scripts where sufficient amount of training data is not available. Since, Indic scripts are mostly written in 3 zones, namely, upper, middle and lower, we employ zone-wise character (or component) mapping for efficient learning purpose. The performance of our cross- language framework depends on the extent of similarity between the source and target scripts.
    [Show full text]
  • SC22/WG20 N896 L2/01-476 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale De Normalisation
    SC22/WG20 N896 L2/01-476 Universal multiple-octet coded character set International organization for standardization Organisation internationale de normalisation Title: Ordering rules for Khmer Source: Kent Karlsson Date: 2001-12-19 Status: Expert contribution Document type: Working group document Action: For consideration by the UTC and JTC 1/SC 22/WG 20 1 Introduction The Khmer script in Unicode/10646 uses conjoining characters, just like the Indic scripts and Hangul. An alternative that is suitable (and much more elegant) for Khmer, but not for Indic scripts, would have been to have combining- below (and sometimes a bit to the side) consonants and combining-below independent vowels, similar to the combining-above Latin letters recently encoded, as well as how Tibetan is handled. However that is not the chosen solution, which instead uses a combining character (COENG) that makes characters conjoin (glue) like it is done for Indic (Brahmic) scripts and like Hangul Jamo, though the latter does not have a separate gluer character. In the Khmer script, using the COENG based approach, the words are formed from orthographic syllables, where an orthographic syllable has the following structure [add ligation control?]: Khmer-syllable ::= (K H)* K M* where K is a Khmer consonant (most with an inherent vowel that is pronounced only if there is no consonant, independent vowel, or dependent vowel following it in the orthographic syllable) or a Khmer independent vowel, H is the invisible Khmer conjoint former COENG, M is a combining character (including COENG, though that would be a misspelling), in particular a combining Khmer vowel (noted A below) or modifier sign.
    [Show full text]
  • The Festvox Indic Frontend for Grapheme-To-Phoneme Conversion
    The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion Alok Parlikar, Sunayana Sitaram, Andrew Wilkinson and Alan W Black Carnegie Mellon University Pittsburgh, USA aup, ssitaram, aewilkin, [email protected] Abstract Text-to-Speech (TTS) systems convert text into phonetic pronunciations which are then processed by Acoustic Models. TTS frontends typically include text processing, lexical lookup and Grapheme-to-Phoneme (g2p) conversion stages. This paper describes the design and implementation of the Indic frontend, which provides explicit support for many major Indian languages, along with a unified framework with easy extensibility for other Indian languages. The Indic frontend handles many phenomena common to Indian languages such as schwa deletion, contextual nasalization, and voicing. It also handles multi-script synthesis between various Indian-language scripts and English. We describe experiments comparing the quality of TTS systems built using the Indic frontend to grapheme-based systems. While this frontend was designed keeping TTS in mind, it can also be used as a general g2p system for Automatic Speech Recognition. Keywords: speech synthesis, Indian language resources, pronunciation 1. Introduction in models of the spectrum and the prosody. Another prob- lem with this approach is that since each grapheme maps Intelligible and natural-sounding Text-to-Speech to a single “phoneme” in all contexts, this technique does (TTS) systems exist for a number of languages of the world not work well in the case of languages that have pronun- today. However, for low-resource, high-population lan- ciation ambiguities. We refer to this technique as “Raw guages, such as languages of the Indian subcontinent, there Graphemes.” are very few high-quality TTS systems available.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Writing Systems Reading and Spelling
    Writing systems Reading and spelling Writing systems LING 200: Introduction to the Study of Language Hadas Kotek February 2016 Hadas Kotek Writing systems Writing systems Reading and spelling Outline 1 Writing systems 2 Reading and spelling Spelling How we read Slides credit: David Pesetsky, Richard Sproat, Janice Fon Hadas Kotek Writing systems Writing systems Reading and spelling Writing systems What is writing? Writing is not language, but merely a way of recording language by visible marks. –Leonard Bloomfield, Language (1933) Hadas Kotek Writing systems Writing systems Reading and spelling Writing systems Writing and speech Until the 1800s, writing, not spoken language, was what linguists studied. Speech was often ignored. However, writing is secondary to spoken language in at least 3 ways: Children naturally acquire language without being taught, independently of intelligence or education levels. µ Many people struggle to learn to read. All human groups ever encountered possess spoken language. All are equal; no language is more “sophisticated” or “expressive” than others. µ Many languages have no written form. Humans have probably been speaking for as long as there have been anatomically modern Homo Sapiens in the world. µ Writing is a much younger phenomenon. Hadas Kotek Writing systems Writing systems Reading and spelling Writing systems (Possibly) Independent Inventions of Writing Sumeria: ca. 3,200 BC Egypt: ca. 3,200 BC Indus Valley: ca. 2,500 BC China: ca. 1,500 BC Central America: ca. 250 BC (Olmecs, Mayans, Zapotecs) Hadas Kotek Writing systems Writing systems Reading and spelling Writing systems Writing and pictures Let’s define the distinction between pictures and true writing.
    [Show full text]
  • Myanmar Script Learning Guide Burmese 1,2,3 Tone System Is Developed by Naing Tin-Nyunt-Pu Copyright © 2013-2017, Naing Tinnyuntpu & Asia Pearl Travels
    Myanmar Script Learning guide (Revision F) by Naing Tinnyuntpu WITH FREE ONLINE AUDIO SUPPORT https://www.asiapearltravels.com/language/myanmar-script-learning-guide.php 1 of 104 Menu ≡ Introduction ........................................................................................4 Vowel: A .............................................................................................6 Vowel: E or I ....................................................................................13 Vowel: U ..........................................................................................20 Vowel: O ..........................................................................................24 Vowel: Ay .........................................................................................28 Vowel: Au ........................................................................................33 Vowel: Un or An ..............................................................................37 Vowel: In ..........................................................................................42 Vowel: Eare ......................................................................................46 Vowel: Ain .......................................................................................51 Vowel: Ome .....................................................................................53 Vowel: Ine ........................................................................................56 Vowel: Oon ......................................................................................59
    [Show full text]
  • Proposal to Encode 0D5F MALAYALAM LETTER ARCHAIC II
    Proposal to encode 0D5F MALAYALAM LETTER ARCHAIC II Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2012-May-22 §1. Introduction In the Malayalam Unicode encoding, the independent letter form for the long vowel Ī is: ഈ where the length mark ◌ൗ is appended to the short vowel ഇ to parallel the symbols for U/UU i.e. ഉ/ഊ. However, Ī was originally written as: As these are entirely different representations of Ī, although their sound value may be the same, it is proposed to encode the archaic form as a separate character. §2. Background As the core Unicode encoding for the major Indic scripts is based on ISCII, and the ISCII code chart for Malayalam only contained the modern form for the independent Ī [1, p 24]: … thus it is this written form that came to be encoded as 0D08 MALAYALAM LETTER II. While this “new” written form is seen in print as early as 1936 CE [2]: … there is no doubt that the much earlier form was parallel to the modern Grantha , both of which are derived from the old Grantha Ī as seen below: 1 ma - īdṛgvidhā | tatarāya īṇ Old Grantha from the Iḷaiyānputtūr copper plates [3, p 13]. Also seen is the Vatteḻuttu Ī of the same time (line 2, 2nd char from right) which also exhibits the two dots. Of course, both are derived from old South Indian Brahmi Ī which again has the two dots. It is said [entire paragraph: Radhakrishna Warrier, personal communication] that it was the poet Vaḷḷattōḷ Nārāyaṇa Mēnōn (1878–1958) who introduced the new form of Ī ഈ.
    [Show full text]
  • An Introduction to Old Persian Prods Oktor Skjærvø
    An Introduction to Old Persian Prods Oktor Skjærvø Copyright © 2016 by Prods Oktor Skjærvø Please do not cite in print without the author’s permission. This Introduction may be distributed freely as a service to teachers and students of Old Iranian. In my experience, it can be taught as a one-term full course at 4 hrs/w. My thanks to all of my students and colleagues, who have actively noted typos, inconsistencies of presentation, etc. TABLE OF CONTENTS Select bibliography ................................................................................................................................... 9 Sigla and Abbreviations ........................................................................................................................... 12 Lesson 1 ..................................................................................................................................................... 13 Old Persian and old Iranian. .................................................................................................................... 13 Script. Origin. .......................................................................................................................................... 14 Script. Writing system. ........................................................................................................................... 14 The syllabary. .......................................................................................................................................... 15 Logograms. ............................................................................................................................................
    [Show full text]
  • Writing Systems: Their Properties and Implications for Reading
    Writing Systems: Their Properties and Implications for Reading Brett Kessler and Rebecca Treiman doi:10.1093/oxfordhb/9780199324576.013.1 Draft of a chapter to appear in: The Oxford Handbook of Reading, ed. by Alexander Pollatsek and Rebecca Treiman. ISBN 9780199324576. Abstract An understanding of the nature of writing is an important foundation for studies of how people read and how they learn to read. This chapter discusses the characteristics of modern writing systems with a view toward providing that foundation. We consider both the appearance of writing systems and how they function. All writing represents the words of a language according to a set of rules. However, important properties of a language often go unrepresented in writing. Change and variation in the spoken language result in complex links to speech. Redundancies in language and writing mean that readers can often get by without taking in all of the visual information. These redundancies also mean that readers must often supplement the visual information that they do take in with knowledge about the language and about the world. Keywords: writing systems, script, alphabet, syllabary, logography, semasiography, glottography, underrepresentation, conservatism, graphotactics The goal of this chapter is to examine the characteristics of writing systems that are in use today and to consider the implications of these characteristics for how people read. As we will see, a broad understanding of writing systems and how they work can place some important constraints on our conceptualization of the nature of the reading process. It can also constrain our theories about how children learn to read and about how they should be taught to do so.
    [Show full text]
  • Comment on L2/13-065: Grantha Characters from Other Indic Blocks Such As Devanagari, Tamil, Bengali Are NOT the Solution
    Comment on L2/13-065: Grantha characters from other Indic blocks such as Devanagari, Tamil, Bengali are NOT the solution Dr. Naga Ganesan ([email protected]) Here is a brief comment on L2/13-065 which suggests Devanagari block for Grantha characters to write both Aryan and Dravidian languages. From L2/13-065, which suggests Devanagari block for writing Sanskrit and Dravidian languages, we read: “Similar dedicated characters were provided to cater different fonts as URUDU MARATHI SINDHI MARWARI KASHMERE etc for special conditions and needs within the shared DEVANAGARI range. There are provisions made to suit Dravidian specials also as short E short O LLLA RRA NNNA with new created glyptic forms in rhythm with DEVANAGARI glyphs (with suitable combining ortho-graphic features inside DEVANAGARI) for same special functions. These will help some aspirant proposers as a fall out who want to write every Indian language contents in Grantham when it is unified with DEVANAGARI because those features were already taken care of by the presence in that DEVANAGARI. Even few objections seen in some docs claim them as Tamil characters to induce a bug inside Grantham like letters from GoTN and others become tangential because it is going to use the existing Sanskrit properties inside shared DEVANAGARI and also with entirely new different glyph forms nothing connected with Tamil. In my doc I have shown rhythmic non-confusable glyphs for Grantham LLLA RRA NNNA for which I believe there cannot be any varied opinions even by GoTN etc which is very much as Grantham and not like Tamil form at all to confuse (as provided in proposals).” Grantha in Devanagari block to write Sanskrit and Dravidian languages will not work.
    [Show full text]
  • Proposal for Ethiopic Script Root Zone LGR
    Proposal for Ethiopic Script Root Zone LGR LGR Version 2 Date: 2017-05-17 Document version:5.2 Authors: Ethiopic Script Generation Panel Contents 1 General Information/ Overview/ Abstract ........................................................................................ 3 2 Script for which the LGR is proposed ................................................................................................ 3 3 Background on Script and Principal Languages Using It .................................................................... 4 3.1 Local Languages Using the Script .............................................................................................. 4 3.2 Geographic Territories of the Language or the Language Map of Ethiopia ................................ 7 4 Overall Development Process and Methodology .............................................................................. 8 4.1 Sources Consulted to Determine the Repertoire....................................................................... 8 4.2 Team Composition and Diversity .............................................................................................. 9 4.3 Analysis of Code Point Repertoire .......................................................................................... 10 4.4 Analysis of Code Point Variants .............................................................................................. 11 5 Repertoire ....................................................................................................................................
    [Show full text]