Khojki Script in ISO/IEC 10646
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Origins, Evolution and Decline of the Khojki Script
The origins, evolution and decline of the Khojki script Juan Bruce The origins, evolution and decline of the Khojki script Juan Bruce Dissertation submitted in partial fulfilment of the requirements for the Master of Arts in Typeface Design, University of Reading, 2015. 5 Abstract The Khojki script is an Indian script whose origins are in Sindh (now southern Pakistan), a region that has witnessed the conflict between Islam and Hinduism for more than 1,200 years. After the gradual occupation of the region by Muslims from the 8th century onwards, the region underwent significant cultural changes. This dissertation reviews the history of the script and the different uses that it took on among the Khoja people since Muslim missionaries began their activities in Sindh communities in the 14th century. It questions the origins of the Khojas and exposes the impact that their transition from a Hindu merchant caste to a broader Muslim community had on the development of the script. During this process of transformation, a rich and complex creed, known as Satpanth, resulted from the blend of these cultures. The study also considers the roots of the Khojki writing system, especially the modernization that the script went through in order to suit more sophisticated means of expression. As a result, through recording the religious Satpanth literature, Khojki evolved and left behind its mercantile features, insufficient for this purpose. Through comparative analysis of printed Khojki texts, this dissertation examines the use of the script in Bombay at the beginning of the 20th century in the shape of Khoja Ismaili literature. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
WG2 M52 Minutes
ISO.IEC JTC 1/SC 2 N____ ISO/IEC JTC 1/SC 2/WG 2 N3603 2009-07-08 ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC 10646 Secretariat: ANSI DOC TYPE: Meeting Minutes TITLE: Unconfirmed minutes of WG 2 meeting 54 Room S206/S209, Dublin Centre University, Dublin, Ireland 2009-04-20/24 SOURCE: V.S. Umamaheswaran, Recording Secretary, and Mike Ksar, Convener PROJECT: JTC 1.02.18 – ISO/IEC 10646 STATUS: SC 2/WG 2 participants are requested to review the attached unconfirmed minutes, act on appropriate noted action items, and to send any comments or corrections to the convener as soon as possible but no later than the Due Date below. ACTION ID: ACT DUE DATE: 2009-10-12 DISTRIBUTION: SC 2/WG 2 members and Liaison organizations MEDIUM: Acrobat PDF file NO. OF PAGES: 60 (including cover sheet) Michael Y. Ksar Convener – ISO/IEC/JTC 1/SC 2/WG 2 22680 Alcalde Rd Phone: +1 408 255-1217 Cupertino, CA 95014 Email: [email protected] U.S.A. ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2 N____ ISO/IEC JTC 1/SC 2/WG 2 N3603 2009-07-08 Title: Unconfirmed minutes of WG 2 meeting 54 Room S206/S209, Dublin Centre University, Dublin, Ireland; 2009-04-20/24 Source: V.S. Umamaheswaran ([email protected]), Recording Secretary Mike Ksar ([email protected]), Convener Action: WG 2 members and Liaison organizations Distribution: ISO/IEC JTC 1/SC 2/WG 2 members and liaison organizations 1 Opening Input document: 3573 2nd Call Meeting # 54 in Dublin; Mike Ksar; 2009-02-16 Mr. -
N3596 Proposal to Encode the Khojki Script in ISO/IEC 10646
ISO/IEC JTC1/SC2/WG2 N3596 L2/09-101 2009-03-25 Proposal to Encode the Khojki Script in ISO/IEC 10646 Anshuman Pandey University of Michigan Ann Arbor, Michigan, U.S.A. [email protected] March 25, 2009 Contents Proposal Summary Form i 1 Introduction 1 2 Background 1 3 Characters Proposed 5 3.1 Character Set . 5 3.2 Basis for Character Set and Glyph Shapes . 6 3.3 Characters Not Proposed . 8 4 The Writing System 10 4.1 General Features . 10 4.2 Inadequacies of the Script . 10 4.3 Supression of Inherent Vowel . 11 4.4 Consonant-Vowel Ligatures . 11 4.5 Consonant Conjuncts . 12 4.6 Nasalization . 13 4.7 Consonant Gemination . 13 4.8 Creation of New Characters . 13 4.9 Punctuation . 14 4.10 Abbreviations . 15 4.11 Number Forms and Unit Marks . 16 4.12 Variant Forms of Characters . 16 5 Implementation 16 5.1 Encoding Model . 16 5.2 Collation . 16 5.3 Character Properties . 17 5.4 Outstanding Issues . 18 6 References 19 List of Tables 1 Glyph chart for Khojki . 2 2 Names list for Khojki . 3 3 Transliteration of Khojki characters and Sindhi-Arabic analogues . 4 List of Figures 1 Inventory of Khojki vowel letters from Grierson (1905) . 20 2 Inventory of Khojki consonant letters from Grierson (1905) . 21 3 Inventory of Khojki consonant letters from Grierson (1905) . 22 4 Inventory of Khojki independent vowel letters from Asani (1992) . 23 5 Inventory of Khojki dependent vowel signs from Asani (1992) . 24 6 Inventory of Khojki consonants (b to dy) from Asani (1992) . -
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad. -
Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel Proposal for a Gujarati Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 3.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1 General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Gujarati LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-gujarati-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: gujarati-test-labels-06mar19-en.txt 2 Script for which the LGR is proposed ISO 15924 Code: Gujr ISO 15924 Key N°: 320 ISO 15924 English Name: Gujarati Latin transliteration of native script name: gujarâtî Native name of the script: ગજુ રાતી Maximal Starting Repertoire (MSR) version: MSR-4 1 Proposal for a Gujarati Root Zone LGR Neo-Brahmi Generation Panel 3 Background on the Script and the Principal Languages Using it1 Gujarati (ગજુ રાતી) [also sometimes written as Gujerati, Gujarathi, Guzratee, Guujaratee, Gujrathi, and Gujerathi2] is an Indo-Aryan language native to the Indian state of Gujarat. It is part of the greater Indo-European language family. It is so named because Gujarati is the language of the Gujjars. Gujarati's origins can be traced back to Old Gujarati (circa 1100– 1500 AD). -
Finite-State Script Normalization and Processing Utilities: the Nisaba Brahmic Library
Finite-state script normalization and processing utilities: The Nisaba Brahmic library Cibu Johny† Lawrence Wolf-Sonkin‡ Alexander Gutkin† Brian Roark‡ Google Research †United Kingdom and ‡United States {cibu,wolfsonkin,agutkin,roark}@google.com Abstract In addition to such normalization issues, some scripts also have well-formedness constraints, i.e., This paper presents an open-source library for efficient low-level processing of ten ma- not all strings of Unicode characters from a single jor South Asian Brahmic scripts. The library script correspond to a valid (i.e., legible) grapheme provides a flexible and extensible framework sequence in the script. Such constraints do not ap- for supporting crucial operations on Brahmic ply in the basic Latin alphabet, where any permuta- scripts, such as NFC, visual normalization, tion of letters can be rendered as a valid string (e.g., reversible transliteration, and validity checks, for use as an acronym). The Brahmic family of implemented in Python within a finite-state scripts, however, including the Devanagari script transducer formalism. We survey some com- mon Brahmic script issues that may adversely used to write Hindi, Marathi and many other South affect the performance of downstream NLP Asian languages, do have such constraints. These tasks, and provide the rationale for finite-state scripts are alphasyllabaries, meaning that they are design and system implementation details. structured around orthographic syllables (aksara)̣ as the basic unit.1 One or more Unicode characters 1 Introduction combine when rendering one of thousands of leg- The Unicode Standard separates the representation ible aksara,̣ but many combinations do not corre- of text from its specific graphical rendering: text spond to any aksara.̣ Given a token in these scripts, is encoded as a sequence of characters, which, at one may want to (a) normalize it to a canonical presentation time are then collectively rendered form; and (b) check whether it is a well-formed into the appropriate sequence of glyphs for display. -
The Unicode Standard, Version 6.3
Currency Symbols Range: 20A0–20CF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 6.3 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-6.3/ for charts showing only the characters added in Unicode 6.3. See http://www.unicode.org/Public/6.3.0/charts/ for a complete archived file of character code charts for Unicode 6.3. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 6.3 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 6.3, online at http://www.unicode.org/versions/Unicode6.3.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, and #45, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
An Introduction to Indic Scripts
An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set. -
KALĀM-E-MAWLĀ (Hindi – Gujarati Sayings of Hazrat Mawlana Ali A.S)
KALĀM-E-MAWLĀ (Hindi – Gujarati sayings of Hazrat Mawlana Ali A.S) With the introduction, annotation, transliteration, translation and approximated Arabic sayings and Quranic verses By Dr. Amin Valliani, ITREB (Pakistan), Karachi. 1 Ever-Blessing Words From the very beginnings of Islam, the search for knowledge has been central to our cultures. I think of the words of Hazrat Ali ibn Abi Talib, the first hereditary Imam of the Shia Muslims, and the last of the four rightly-guided Caliphs after the passing away of the Prophet (may peace be upon Him). In his teachings, Hazrat Ali emphasized that “No honour is like knowledge.” And then he added that “No belief is like modesty and patience, no attainment is like humility, no power is like forbearance, and no support is more reliable than consultation.” Notice that the virtues endorsed by Hazrat Ali are qualities which subordinate the self and emphasize others ---modesty, patience, humility, forbearance and consultation. What he thus is telling us, is that we find knowledge best by admitting first what it is we do not know, and by opening our minds to what others can teach us. Mawlana Hazar Imam, Address at the Commencement Ceremony of the American University in Cairo, dated 15th June, 2006. 2 This book is a humble tribute to my late teacher Itmadi Noor Din H.Bakhsh (d.2000) who inspired me to undertake this research based academic exercise. 3 Acknowledgments Upon completion of this project on Kālām-e-Mawlā, I bow my head and heart to thank the He, who made me able to undertake the academic task. -
Typographic Development of the Khojki Script and Printing Affairs at the Turn of the 19Th Century in Bombay by Juan Bruce
Typographic development of the Khojki script and printing affairs at the turn of the 19th century in Bombay by Juan Bruce Khōjā Studies Conference, cnrs-ceias, París, December 2016. Paper based on the dissertation The origins, evolution and decline of the Khojki script submitted in September of 2015 by the same author as part of the requirements of the Master of Arts in Typeface Design at the University of Reading, uk. overview The Khojki script is an Indian script whose origins are in Sindh (now south of Pakistan), a region that has witnessed more than 1,200 years of interplay between Islam and Hinduism. After the gradual occupation of the region by Muslims from the 8th century onwards, the script took on different usages among its community, the Khojas. Sindh, Punjab and Gujarat, were the first places that received gradual Muslim religious incursions into Hindu communities, to which language and writing constituted the first barrier for conversion. It is believed that one prominent Muslim pir (Pir Sadruddin) was very much active inside the Hindu Lohana community in the 14th century, a caste of merchants and traders. As a way to approach and transmit the teachings to the people, the pir adopted the Lohānākī script of this community. Later on, the converted came to be known as the Khoja caste. Significantly, a totally different creed, known as Satpanth, grew inside the community after the blend of these cultures. Academic interest has been stimulated by the special nature of the Satpanthi literature and its place in the Indian subcontinent. It has emerged as a fascinating example of an Islamic religious movement expressing itself within a local Indian religious culture.