Proposal to Encode 0C34 TELUGU LETTER LLLA
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
The Dravidian Languages
THE DRAVIDIAN LANGUAGES BHADRIRAJU KRISHNAMURTI The Pitt Building, Trumpington Street, Cambridge, United Kingdom The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York, NY 10011–4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarc´on 13, 28014 Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org C Bhadriraju Krishnamurti 2003 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2003 Printed in the United Kingdom at the University Press, Cambridge Typeface Times New Roman 9/13 pt System LATEX2ε [TB] A catalogue record for this book is available from the British Library ISBN 0521 77111 0hardback CONTENTS List of illustrations page xi List of tables xii Preface xv Acknowledgements xviii Note on transliteration and symbols xx List of abbreviations xxiii 1 Introduction 1.1 The name Dravidian 1 1.2 Dravidians: prehistory and culture 2 1.3 The Dravidian languages as a family 16 1.4 Names of languages, geographical distribution and demographic details 19 1.5 Typological features of the Dravidian languages 27 1.6 Dravidian studies, past and present 30 1.7 Dravidian and Indo-Aryan 35 1.8 Affinity between Dravidian and languages outside India 43 2 Phonology: descriptive 2.1 Introduction 48 2.2 Vowels 49 2.3 Consonants 52 2.4 Suprasegmental features 58 2.5 Sandhi or morphophonemics 60 Appendix. Phonemic inventories of individual languages 61 3 The writing systems of the major literary languages 3.1 Origins 78 3.2 Telugu–Kannada. -
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
An Introduction to Indic Scripts
An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set. -
Proposal for the Encoding of Brāhmī in Plane 1 of ISO/IEC 10646
ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 106461 Please fill all the sections A, B and C below. (Please read Principles and Procedures Document for guidelines and details before filling this form.) See http://www.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html for latest Form. See http://www.dkuug.dk/JTC1/SC2/WG2/docs/principles.html for latest Principles and Procedures document. See http://www.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html for latest roadmaps. A. Administrative 1. Title: Proposal for the Encoding of Brāhmī in Plane 1 of ISO/IEC 10646. 2. Requesters' names: Stefan Baums, Andrew Glass. 3. Requester type (Member body/Liaison/Individual contribution): Individual contribution. 4. Submission date: DRAFT 27 July 2003. 5. Requester's reference (if applicable): N/A. 6. This is a complete proposal. B. Technical - General 1. This proposal is for a new script (set of characters). Proposed name of script: Brāhmī. 2. Number of characters in proposal: 120. 3. Proposed category (see section II, Character Categories): C. 4. Proposed Level of Implementation (1, 2 or 3) (see clause 14, ISO/IEC 10646-1: 2000): 3. Is a rationale provided for the choice? Yes. If Yes, reference: Combining marks are used. 5. Is a repertoire including character names provided? Yes. a. If Yes, are the names in accordance with the 'character naming guidelines in Annex L of ISO/IEC 10646-1: 2000? Yes. b. Are the character shapes attached in a legible form suitable for review? Yes. -
Root Based Stemmer for Telugu Script
International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019 Root Based Stemmer for Telugu Script Narla Swapna Abstract: In this paper, a new stemmer has been proposed The process of stemming is, reducing the inflected or named as “Root based stemmer”. This stemmer is strictly based resultant words to their base words. In general, a stem is a on Dravidian script. Stemming can be used to pick up the written word form. The stem need not be the same to the effectiveness of information retrieval. In proposed Root based morphological root of the word, it is usually enough that stemming technique, each and every token is compared against correlated words map to the identical stem, even if this stem with all the words of a valid root words dictionary until a match is found. Then extract the matched string or substring from a token is not in itself a valid root. In Text classification, stemming and identified as valid root. The present work is aimed to build tries to cut off details like exact form of a word and produce dictionary based stemmer to extract valid root words for Indian word bases as features. In 1960s, the stemming algorithms languages especially for Telugu and compare the results with have been developed in the department of computer science. existing stemmers. While replicating a document, the choice of the suitable Keywords: stemming, Information retrieval, Telugu documentary unit is the first step. A documentary unit consists of set of words extracted from the sentences. The I. -
(RSEP) Request October 16, 2017 Registry Operator INFIBEAM INCORPORATION LIMITED 9Th Floor
Registry Services Evaluation Policy (RSEP) Request October 16, 2017 Registry Operator INFIBEAM INCORPORATION LIMITED 9th Floor, A-Wing Gopal Palace, NehruNagar Ahmedabad, Gujarat 380015 Request Details Case Number: 00874461 This service request should be used to submit a Registry Services Evaluation Policy (RSEP) request. An RSEP is required to add, modify or remove Registry Services for a TLD. More information about the process is available at https://www.icann.org/resources/pages/rsep-2014- 02-19-en Complete the information requested below. All answers marked with a red asterisk are required. Click the Save button to save your work and click the Submit button to submit to ICANN. PROPOSED SERVICE 1. Name of Proposed Service Removal of IDN Languages for .OOO 2. Technical description of Proposed Service. If additional information needs to be considered, attach one PDF file Infibeam Incorporation Limited (“infibeam”) the Registry Operator for the .OOO TLD, intends to change its Registry Service Provider for the .OOO TLD to CentralNic Limited. Accordingly, Infibeam seeks to remove the following IDN languages from Exhibit A of the .OOO New gTLD Registry Agreement: - Armenian script - Avestan script - Azerbaijani language - Balinese script - Bamum script - Batak script - Belarusian language - Bengali script - Bopomofo script - Brahmi script - Buginese script - Buhid script - Bulgarian language - Canadian Aboriginal script - Carian script - Cham script - Cherokee script - Coptic script - Croatian language - Cuneiform script - Devanagari script -
Proposal on Handling Reph in Gurmukhi and Telugu Scripts 1
Proposal on Handling Reph in Gurmukhi and Telugu Scripts Nagarjuna Venna August 1, 2006 1 Introduction Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts. Devanagari is described in Section 9.1; the principles of Indic scripts are covered in some detail in the introduction to Devanagari. The descriptions of the remaining Indic scripts were abbreviated highlighting any dierences from Devanagari where appropriate. Some of the problems in this description were claried by Public Review Issue #37 [2] which focused on consistent handling of Zero Width Joiner (ZWJ) in Indic scripts. That proposal put forth a set of rules for handling ZWJ and ZWNJ that are applicable across all Indic scripts. The formation of Reph is dened in Section 9.1, Rules for Rendering, R2 of [1]. Reph is dened as a nonspacing combining mark glyph form of U+0930 DEVANAGARI LETTER RA positioned above or attached to the upper part of a base glyph form. Basically, Reph is formed when a RA which has the inherent vowel killed by the virama begins a syllable. Not all scripts have Reph; if the script in question has a Reph form, the sequence <RA, VIRAMA, C> is rendered with Reph on C. Also, for Devanagari, the sequence <RA, VIRAMA, ZWJ, ...> is always rendered as eyelash-RA instead of Reph. Devanagari, Bengali, Gujarati, Oriya, and Kannada are listed in [1] and [2] as scripts that have a Reph form. Gurmukhi, Tamil, and Telugu are listed as scripts that do not have a Reph form. Malayalam is described as a script that has Reph in the traditional orthography but not in modern usage. -
Examination and Comparison of the Handwriting Characteristics Of
Journal of Forensic Science & Criminology Volume 7 | Issue 2 ISSN: 2348-9804 Research Article Open Access Examination and Comparison of the Handwriting Characteristics of Devanagari Script with Gurumukhi and Telugu Script Sharma B*1, Sharma A2 and Mahajan M2 1Department of Forensic Science, Punjabi University Patiala, Punjab, India 2Regional Forensic Science Laboratory, Northern Range, Dharamshala, India *Corresponding author: Sharma B, Department of Forensic Science, Punjabi University Patiala, Punjab, India, Tel: +919810711930, E-mail: [email protected] Citation: Sharma B, Sharma A, Mahajan M (2019) Examination and Comparison of the Handwriting Characteristics of Devanagari Script with Gurumukhi and Telugu Script. J Forensic Sci Criminol 7(2): 206 Received Date: July 11, 2019 Accepted Date: August 28, 2019 Published Date: August 30, 2019 Abstract India has been acclaimed for the extent diversity especially in the context of cast, culture, languages, religion, and scripts. Brahmi script is the oldest writing system of ancient India, developed in 1st millennium BCE. Brahmi script eventually gave evolution to all the modern scripts of India including Devanagari, Gurumukhi, Telugu, Gujarati, Bengali, Oriya, Telugu, Tamil, Malayalam, Kannada, Sinhala and scripts found in south and central Asia. Hindi language written in Devanagari script and is one of the official languages of India. About forty five percentage of the Indian population use Hindi. Telugu language is one of the official languages of India with 6.93% native speakers, whereas Punjabi is the language of Punjab state and is the 5th most spoken language in European country which is written in Gurumukhi script. The similar origins, including the divergence among these scripts, are influential and critical. -
Nominations for Padma Awards 2011
c Nominations fof'P AWARDs 2011 ADMA ~ . - - , ' ",::i Sl. Name';' Field State No ShriIshwarappa,GurapJla Angadi Art Karnataka " Art-'Cinema-Costume Smt. Bhanu Rajopadhye Atharya Maharashtra 2. Designing " Art - Hindustani 3. Dr; (Smt.).Prabha Atre Maharashtra , " Classical Vocal Music 4. Shri Bhikari.Charan Bal Art - Vocal Music 0, nssa·' 5. Shri SamikBandyopadhyay Art - Theatre West Bengal " 6: Ms. Uttara Baokar ',' Art - Theatre , Maharashtra , 7. Smt. UshaBarle Art Chhattisgarh 8. Smt. Dipali Barthakur Art " Assam Shri Jahnu Barua Art - Cinema Assam 9. , ' , 10. Shri Neel PawanBaruah Art Assam Art- Cinema Ii. Ms. Mubarak Begum Rajasthan i", Playback Singing , , , 12. ShriBenoy Krishen Behl Art- Photography Delhi " ,'C 13. Ms. Ritu Beri , Art FashionDesigner Delhi 14. Shri.Madhur Bhandarkar Art - Cinema Maharashtra Art - Classical Dancer IS. Smt. Mangala Bhatt Andhra Pradesh Kathak Art - Classical Dancer 16. ShriRaghav Raj Bhatt Andhra Pradesh Kathak : Art - Indian Folk I 17., Smt. Basanti Bisht Uttarakhand Music Art - Painting and 18. Shri Sobha Brahma Assam Sculpture , Art - Instrumental 19. ShriV.S..K. Chakrapani Delhi, , Music- Violin , PanditDevabrata Chaudhuri alias Debu ' Art - Instrumental 20. , Delhi Chaudhri ,Music - Sitar 21. Ms. Priyanka Chopra Art _Cinema' Maharashtra 22. Ms. Neelam Mansingh Chowdhry Art_ Theatre Chandigarh , ' ,I 23. Shri Jogen Chowdhury Art- Painting \VesfBengal 24.' Smt. Prafulla Dahanukar Art ~ Painting Maharashtra ' . 25. Ms. Yashodhara Dalmia Art - Art History Delhi Art - ChhauDance 26. Shri Makar Dhwaj Darogha Jharkhand Seraikella style 27. Shri Jatin Das Art - Painting Delhi, 28. Shri ManoharDas " Art Chhattisgarh ' 29. , ShriRamesh Deo Art -'Cinema ,Maharashtra Art 'C Hindustani 30. Dr. Ashwini Raja Bhide Deshpande Maharashtra " classical vocalist " , 31. ShriDeva Art - Music Tamil Nadu Art- Manipuri Dance 32. -
Information, Characters, Unicode
Information, Characters, Unicode Unicode © 24 August 2021 1 / 107 Hidden Moral Small mistakes can be catastrophic! Style Care about every character of your program. Tip: printf Care about every character in the program’s output. (Be reasonably tolerant and defensive about the input. “Fail early” and clearly.) Unicode © 24 August 2021 2 / 107 Imperative Thou shalt care about every Ěaracter in your program. Unicode © 24 August 2021 3 / 107 Imperative Thou shalt know every Ěaracter in the input. Thou shalt care about every Ěaracter in your output. Unicode © 24 August 2021 4 / 107 Information – Characters In modern computing, natural-language text is very important information. (“number-crunching” is less important.) Characters of text are represented in several different ways and a known character encoding is necessary to exchange text information. For many years an important encoding standard for characters has been US ASCII–a 7-bit encoding. Since 7 does not divide 32, the ubiquitous word size of computers, 8-bit encodings are more common. Very common is ISO 8859-1 aka “Latin-1,” and other 8-bit encodings of characters sets for languages other than English. Currently, a very large multi-lingual character repertoire known as Unicode is gaining importance. Unicode © 24 August 2021 5 / 107 US ASCII (7-bit), or bottom half of Latin1 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SS SI DLE DC1 DC2 DC3 DC4 NAK SYN ETP CAN EM SUB ESC FS GS RS US !"#$%&’()*+,-./ 0123456789:;<=>? @ABCDEFGHIJKLMNO PQRSTUVWXYZ[\]^_ `abcdefghijklmno pqrstuvwxyz{|}~ DEL Unicode Character Sets © 24 August 2021 6 / 107 Control Characters Notice that the first twos rows are filled with so-called control characters.