Proposal to Encode the Hanifi Rohingya Script in Unicode
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Some Principles of the Use of Macro-Areas Language Dynamics &A
Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates. -
Constructed Scripts
The Art and Science of Constructed Scripts Jasper Danielson Rice University 5/5/13 2 Images from Omniglot.com Why create a constructed script? Companion to a Conlang • Adds depth to the language/world • Can provide a social or historical feature • Provides the public face of a conlang • Can either enhance, or detract from, a conlang Companion to a Conlang • Tolkien’s constructed scripts • Tengwar • Cirth • Sarati • Many others… International Alphabets • International Phonetic AlphaBet (IPA) • Interbet • Universal Phonetic AlphaBet Shorthand Scripts Gregg Shorthand Shorthand Scripts Gregg Shorthand Other op<mizaon scripts Non-Linguistic Uses • Mathematical shorthand 0 < |x – x0| < δ ==> |f(x) – L| < ε • Musical notation • Computer Programming DO :1 <- #0¢#256 Educational Con-scripts • A novel way to introduce the study of languages in the classroom • Gets children excited aBout learning languages • Recruiting! The Neuroscience of Language Audiovisual Pathways • Connection Between how we process: • Written language • Speech • Emotion • Conlangers can play on this connection to create better scripts Synesthesia • Neurological disorder where phonemes/graphemes are associated with a sensory experience • Grapheme/color • Ordinal-Linguistic PersoniTication Synesthesia "T’s are generally crabbed, ungenerous creatures. U is a soulless sort of thing. 4 is honest, But… 3 I cannot trust… 9 is dark, a gentleman, tall and graceful, But politic under his suavity.” -Anonymous Synesthete I am a synesthete! (But so are all of you!) Kiki / Bouba Effect -
The Festvox Indic Frontend for Grapheme-To-Phoneme Conversion
The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion Alok Parlikar, Sunayana Sitaram, Andrew Wilkinson and Alan W Black Carnegie Mellon University Pittsburgh, USA aup, ssitaram, aewilkin, [email protected] Abstract Text-to-Speech (TTS) systems convert text into phonetic pronunciations which are then processed by Acoustic Models. TTS frontends typically include text processing, lexical lookup and Grapheme-to-Phoneme (g2p) conversion stages. This paper describes the design and implementation of the Indic frontend, which provides explicit support for many major Indian languages, along with a unified framework with easy extensibility for other Indian languages. The Indic frontend handles many phenomena common to Indian languages such as schwa deletion, contextual nasalization, and voicing. It also handles multi-script synthesis between various Indian-language scripts and English. We describe experiments comparing the quality of TTS systems built using the Indic frontend to grapheme-based systems. While this frontend was designed keeping TTS in mind, it can also be used as a general g2p system for Automatic Speech Recognition. Keywords: speech synthesis, Indian language resources, pronunciation 1. Introduction in models of the spectrum and the prosody. Another prob- lem with this approach is that since each grapheme maps Intelligible and natural-sounding Text-to-Speech to a single “phoneme” in all contexts, this technique does (TTS) systems exist for a number of languages of the world not work well in the case of languages that have pronun- today. However, for low-resource, high-population lan- ciation ambiguities. We refer to this technique as “Raw guages, such as languages of the Indian subcontinent, there Graphemes.” are very few high-quality TTS systems available. -
Con-Scripting the Masses: False Documents and Historical Revisionism in the Americas
University of Massachusetts Amherst ScholarWorks@UMass Amherst Open Access Dissertations 2-2011 Con-Scripting the Masses: False Documents and Historical Revisionism in the Americas Frans Weiser University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/open_access_dissertations Part of the Comparative Literature Commons Recommended Citation Weiser, Frans, "Con-Scripting the Masses: False Documents and Historical Revisionism in the Americas" (2011). Open Access Dissertations. 347. https://scholarworks.umass.edu/open_access_dissertations/347 This Open Access Dissertation is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. CON-SCRIPTING THE MASSES: FALSE DOCUMENTS AND HISTORICAL REVISIONISM IN THE AMERICAS A Dissertation Presented by FRANS-STEPHEN WEISER Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment Of the requirements for the degree of DOCTOR OF PHILOSOPHY February 2011 Program of Comparative Literature © Copyright 2011 by Frans-Stephen Weiser All Rights Reserved CON-SCRIPTING THE MASSES: FALSE DOCUMENTS AND HISTORICAL REVISIONISM IN THE AMERICAS A Dissertation Presented by FRANS-STEPHEN WEISER Approved as to style and content by: _______________________________________________ David Lenson, Chair _______________________________________________ -
Myanmar Script Learning Guide Burmese 1,2,3 Tone System Is Developed by Naing Tin-Nyunt-Pu Copyright © 2013-2017, Naing Tinnyuntpu & Asia Pearl Travels
Myanmar Script Learning guide (Revision F) by Naing Tinnyuntpu WITH FREE ONLINE AUDIO SUPPORT https://www.asiapearltravels.com/language/myanmar-script-learning-guide.php 1 of 104 Menu ≡ Introduction ........................................................................................4 Vowel: A .............................................................................................6 Vowel: E or I ....................................................................................13 Vowel: U ..........................................................................................20 Vowel: O ..........................................................................................24 Vowel: Ay .........................................................................................28 Vowel: Au ........................................................................................33 Vowel: Un or An ..............................................................................37 Vowel: In ..........................................................................................42 Vowel: Eare ......................................................................................46 Vowel: Ain .......................................................................................51 Vowel: Ome .....................................................................................53 Vowel: Ine ........................................................................................56 Vowel: Oon ......................................................................................59 -
An Introduction to Old Persian Prods Oktor Skjærvø
An Introduction to Old Persian Prods Oktor Skjærvø Copyright © 2016 by Prods Oktor Skjærvø Please do not cite in print without the author’s permission. This Introduction may be distributed freely as a service to teachers and students of Old Iranian. In my experience, it can be taught as a one-term full course at 4 hrs/w. My thanks to all of my students and colleagues, who have actively noted typos, inconsistencies of presentation, etc. TABLE OF CONTENTS Select bibliography ................................................................................................................................... 9 Sigla and Abbreviations ........................................................................................................................... 12 Lesson 1 ..................................................................................................................................................... 13 Old Persian and old Iranian. .................................................................................................................... 13 Script. Origin. .......................................................................................................................................... 14 Script. Writing system. ........................................................................................................................... 14 The syllabary. .......................................................................................................................................... 15 Logograms. ............................................................................................................................................ -
Asia in Motion: Geographies and Genealogies
Asia in Motion: Geographies and Genealogies Organized by With support from from PRIMUS Visual Histories of South Asia Foreword by Christopher Pinney Edited by Annamaria Motrescu-Mayes and Marcus Banks This book wishes to introduce the scholars of South Asian and Indian History to the in-depth evaluation of visual research methods as the research framework for new historical studies. This volume identifies and evaluates the current developments in visual sociology and digital anthropology, relevant to the study of contemporary South Asian constructions of personal and national identities. This is a unique and excellent contribution to the field of South Asian visual studies, art history and cultural analysis. This text takes an interdisciplinary approach while keeping its focus on the visual, on material cultural and on art and aesthetics. – Professor Kamran Asdar Ali, University of Texas at Austin 978-93-86552-44-0 u Royal 8vo u 312 pp. u 2018 u HB u ` 1495 u $ 71.95 u £ 55 Hidden Histories Religion and Reform in South Asia Edited by Syed Akbar Hyder and Manu Bhagavan Dedicated to Gail Minault, a pioneering scholar of women’s history, Islamic Reformation and Urdu Literature, Hidden Histories raises questions on the role of identity in politics and private life, memory and historical archives. Timely and thought provoking, this book will be of interest to all who wish to study how the diverse and plural past have informed our present. Hidden Histories powerfully defines and celebrates a field that has refused to be occluded by majoritarian currents. – Professor Kamala Visweswaran, University of California, San Diego 978-93-86552-84-6 u Royal 8vo u 324 pp. -
Finite-State Script Normalization and Processing Utilities: the Nisaba Brahmic Library
Finite-state script normalization and processing utilities: The Nisaba Brahmic library Cibu Johny† Lawrence Wolf-Sonkin‡ Alexander Gutkin† Brian Roark‡ Google Research †United Kingdom and ‡United States {cibu,wolfsonkin,agutkin,roark}@google.com Abstract In addition to such normalization issues, some scripts also have well-formedness constraints, i.e., This paper presents an open-source library for efficient low-level processing of ten ma- not all strings of Unicode characters from a single jor South Asian Brahmic scripts. The library script correspond to a valid (i.e., legible) grapheme provides a flexible and extensible framework sequence in the script. Such constraints do not ap- for supporting crucial operations on Brahmic ply in the basic Latin alphabet, where any permuta- scripts, such as NFC, visual normalization, tion of letters can be rendered as a valid string (e.g., reversible transliteration, and validity checks, for use as an acronym). The Brahmic family of implemented in Python within a finite-state scripts, however, including the Devanagari script transducer formalism. We survey some com- mon Brahmic script issues that may adversely used to write Hindi, Marathi and many other South affect the performance of downstream NLP Asian languages, do have such constraints. These tasks, and provide the rationale for finite-state scripts are alphasyllabaries, meaning that they are design and system implementation details. structured around orthographic syllables (aksara)̣ as the basic unit.1 One or more Unicode characters 1 Introduction combine when rendering one of thousands of leg- The Unicode Standard separates the representation ible aksara,̣ but many combinations do not corre- of text from its specific graphical rendering: text spond to any aksara.̣ Given a token in these scripts, is encoded as a sequence of characters, which, at one may want to (a) normalize it to a canonical presentation time are then collectively rendered form; and (b) check whether it is a well-formed into the appropriate sequence of glyphs for display. -
Languages in the Rohingya Response
Languages in the Rohingya response This overview relates to a Translators without Borders (TWB) study of the role of language in humanitarian service access and community relations in Cox’s Bazar, Bangladesh and Sittwe, Myanmar. Language use and word choice has implications for culture, identity and politics in the Myanmar context. Six languages play a role in the Rohingya response in Myanmar and Bangladesh. Each brings its own challenges for translation, interpretation, and general communication. These languages are Bangla, Chittagonian, Myanmar, Rakhine, Rohingya, and English. Bangla Bangla’s near-sacred status in Bangladeshi society derives from the language’s tie to the birth of the nation. Bangladesh is perhaps the only country in the world which was founded on the back of a language movement. Bangladeshi gained independence from Pakistan in 1971, among other things claiming their right to use Bangla and not Urdu. Bangladeshis gave their lives for the cause of preserving their linguistic heritage and the people retain a patriotic attachment to the language. The trauma of the Liberation War is still fresh in the collective memory. Pride in Bangla as the language of national unity is unique in South Asia - a region known for its cultural, religious, and linguistic diversity. Despite the elevated status of Bangla, the country is home to over 40 languages and dialects. This diversity is not obvious because Bangladeshis themselves do not see it as particularly important. People across the country have a local language or dialect apart from Bangla that is spoken at home and with neighbors and friends. But Bangla is the language of collective, national identity - it is the language of government, commerce, literature, music and film. -
Map by Steve Huffman; Data from World Language Mapping System
Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard -
An Introduction to Indic Scripts
An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set. -
The Rohingya Origin Story: Two Narratives, One Conflict
Combating Extremism SHAREDFACT SHEET VISIONS The Rohingya Origin Story: Two Narratives, One Conflict At the center of the Rohingya Crisis is a question about the group’s origin. It is in this identity, and the contrasting histories that the two sides claim (i.e., the Rohingya minority and the Buddhist government/some civilians), where religion and politics collide. Although often cast as a religious war, the contemporary conflict didn’t exist until World War II, when the minority Muslim Rohingya sided with British colonial rulers, while the Buddhist majority allied with the invading Japanese. However, it took years for the identity politics to fully take root. It was 1982, when the Rohingya were stripped of their citizenship by law (For more on this, see “Q&A on the Rohingya Crisis & Buddhist Extremism in Myanmar”). Myanmarese army commander Senior General Min Aung Hlaing made it clear that Rohingya origin lay at the heart of the matter when, on September 16, 2017, he posted to Facebook a statement saying that the current military action against the Rohingya is “unfinished business” stemming back to the Second World War. He also stated, “They have demanded recognition as Rohingya, which has never been an ethnic group in Myanmar. [The] Bengali issue is a national cause and we need to be united in establishing the truth.”i This begs the question, what is the truth? There is no simple answer to this question. At the present time, there are two dominant, opposing narratives regarding the Rohingya ethnic group’s history: one from the Rohingya perspective, and the other from the neighboring Rakhine and Bamar peoples.