Tending to the Inheritance of Tulu Script a Work in Progress
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Updated Proposal to Encode Tulu-Tigalari Script in Unicode
Updated proposal to encode Tulu-Tigalari script in Unicode Vaishnavi Murthy Kodipady Yerkadithaya [ [email protected] ] Vinodh Rajan [ [email protected] ] 04/03/2021 Document History & Background Documents : (This document replaces L2/17-378) L2/11-120R Preliminary proposal for encoding the Tulu script in the SMP of the UCS – Michael Everson L2/16-241 Preliminary proposal to encode Tigalari script – Vaishnavi Murthy K Y L2/16-342 Recommendations to UTC #149 November 2016 on Script Proposals – Deborah Anderson, Ken Whistler, Roozbeh Pournader, Andrew Glass and Laurentiu Iancu L2/17-182 Comments on encoding the Tigalari script – Srinidhi and Sridatta L2/18-175 Replies to Script Ad Hoc Recommendations (L2/16-342) and Comments (L2/17-182) on Tigalari proposal (L2/16-241) – Vaishnavi Murthy K Y L2/17-378 Preliminary proposal to encode Tigalari script – Vaishnavi Murthy K Y, Vinodh Rajan L2/17-411 Letter in support of preliminary proposal to encode Tigalari – Guru Prasad (Has withdrawn support. This letter needs to be disregarded.) L2/17-422 Letter to Vaishnavi Murthy in support of Tigalari encoding proposal – A. V. Nagasampige L2/18-039 Recommendations to UTC #154 January 2018 on Script Proposals – Deborah Anderson, Ken Whistler, Roozbeh Pournader, Lisa Moore, Liang Hai, and Richard Cook PROPOSAL TO ENCODE TIGALARI SCRIPT IN UNICODE 2 A note on recent updates : −−−Tigalari Script is renamed Tulu-Tigalari script. The reason for the same is discussed under section 1.1 (pp. 4-5) of this paper & elaborately in the supplementary paper Tulu Language and Tulu-Tigalari script (pp. 5-13). −−−This proposal attempts to harmonize the use of the Tulu-Tigalari script for Tulu, Sanskrit and Kannada languages for archival use. -
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad. -
A Script Independent Approach for Handwritten Bilingual Kannada and Telugu Digits Recognition
IJMI International Journal of Machine Intelligence ISSN: 0975–2927 & E-ISSN: 0975–9166, Volume 3, Issue 3, 2011, pp-155-159 Available online at http://www.bioinfo.in/contents.php?id=31 A SCRIPT INDEPENDENT APPROACH FOR HANDWRITTEN BILINGUAL KANNADA AND TELUGU DIGITS RECOGNITION DHANDRA B.V.1, GURURAJ MUKARAMBI1, MALLIKARJUN HANGARGE2 1Department of P.G. Studies and Research in Computer Science Gulbarga University, Gulbarga, Karnataka. 2Department of Computer Science, Karnatak Arts, Science and Commerce College Bidar, Karnataka. *Corresponding author. E-mail: [email protected],[email protected],[email protected] Received: September 29, 2011; Accepted: November 03, 2011 Abstract- In this paper, handwritten Kannada and Telugu digits recognition system is proposed based on zone features. The digit image is divided into 64 zones. For each zone, pixel density is computed. The KNN and SVM classifiers are employed to classify the Kannada and Telugu handwritten digits independently and achieved average recognition accuracy of 95.50%, 96.22% and 99.83%, 99.80% respectively. For bilingual digit recognition the KNN and SVM classifiers are used and achieved average recognition accuracy of 96.18%, 97.81% respectively. Keywords- OCR, Zone Features, KNN, SVM 1. Introduction recognition of isolated handwritten Kannada numerals Recent advances in Computer technology, made every and have reported the recognition accuracy of 90.50%. organization to implement the automatic processing Drawback of this procedure is that, it is not free from systems for its activities. For example, automatic thinning. U. Pal et al. [11] have proposed zoning and recognition of vehicle numbers, postal zip codes for directional chain code features and considered a feature sorting the mails, ID numbers, processing of bank vector of length 100 for handwritten Kannada numeral cheques etc. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
Unit 2 Distribution of Indian Tribes, Groups and Sub-Groups: Causes of Variations
Tribal Cosmogenies UNIT 2 DISTRIBUTION OF INDIAN TRIBES, GROUPS AND SUB-GROUPS: CAUSES OF VARIATIONS Structure 2.0 Objectives 2.1 Introduction 2.2 Distribution of Indian tribes: groups and sub-groups 2.2.1 Geographical 2.2.2 Linguistic 2.2.3 Racial 2.2.4 Size 2.2.5 Economy or subsistence pattern 2.2.6 Degree of incorporation into the Hindu society 2.3 Causes of variations 2.3.1 Migration 2.3.2 Acculturation and assimilation. 2.3.3 Geography or the physical environment 2.4 Let us sum up 2.5 Activity 2.6 References and further reading 2.7 Glossary 2.8 Answers to questions 2.0 OBJECTIVES After having read this Unit, you will be able to: x get a panoramic view of the Indian tribes; x understand the principles upon which their classification is based; and x map their distribution on the basis of their geographical location, language, physical attributes, economy and social structure. 2.1 INTRODUCTION We have had a general outline about the tribes of India in the preceding Unit (Course 4-Block 1- Unit 1). In this present Unit we will deal with distribution of the tribes of India in detail. Structurally this Unit is divided into two main sections. In this first section of the Unit we will discuss the distribution pattern of Indian tribes and how they are classified into different groups and sub-groups based on various criteria. 24 These criteria are based on their geographical location, language, physical Migrant Tribes / Nomads attributes, economy and the degree of incorporation into the Hindu society. -
WO 2010/131256 Al
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date 18 November 2010 (18.11.2010) WO 2010/131256 Al (51) International Patent Classification: AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, G06F 3/01 (2006.01) CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, (21) International Application Number: HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, PCT/IN20 10/000052 KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, (22) International Filing Date: ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, 29 January 2010 (29.01 .2010) NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, (25) Filing Language: English TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (26) Publication Language: English (84) Designated States (unless otherwise indicated, for every (30) Priority Data: kind of regional protection available): ARIPO (BW, GH, 974/DEL/2009 13 May 2009 (13.05.2009) IN GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, (72) Inventor; and TM), European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, (71) Applicant : MEHRA, Rajesh [IN/IN]; H-39, Tagore ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, Path, Bani Park, Jaipur 302 0 16, Rajasthan (IN). -
Tugboat, Volume 19 (1998), No. 2 115 an Overview of Indic Fonts For
TUGboat, Volume 19 (1998), No. 2 115 the Indo-Aryan and Dravidian language families of Fonts India. Such uniformity in phonetics is reflected in orthography, which in turn enables all scripts to be transliterated through a single scheme. This unifor- An Overview of Indic Fonts for TEX mity has subsequently been reflected in the translit- Anshuman Pandey eration schemes of the Indic language/script pack- ages. 1 Introduction Most packages have their own transliteration Many scholars and students in the humanities have scheme, but these schemes are essentially variations on a single scheme, differing merely in the coding preferred TEX over other “word processors” or doc- ument preparation systems because of the ease TEX of a few vowel, nasal, and retroflex letters. Most provides them in typesetting non-Roman scripts, the of these packages accept input in one of the two availability of TEX fonts of interest to them, and the primary 7-bit transliteration schemes— ITRANS or ability TEX has in producing well-structured docu- Velthuis—or a derivative of one of them. There ments. is also an 8-bit format called CS/CSX which a few However, this is not the case amongst Indol- of these packages support. CS/CSX is described in ogists. The lack of Indic fonts for TEXandthe further detail in Section 3. perceived difficulty of typesetting them have often 2 The Fonts and Packages turned Indologists away from using TEX. Little do they realize that TEXisthe foremost tool for de- Figure 1 shows examples of the various fonts de- veloping Indic language/script documents. -
Comparison of Different Orthographies for Machine
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages Bharathi Raja Chakravarthi Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland https://bharathichezhiyan.github.io/bharathiraja/ [email protected] Mihael Arcan Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland [email protected] John P. McCrae Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland https://john.mccr.ae/ [email protected] Abstract Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription. -
Phonetic Dictionary for Natural Language Processing: Kannada
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by ePrints@Bangalore University Mallamma V. Reddy et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 7( Version 3), July 2014, pp.01-04 RESEARCH ARTICLE OPEN ACCESS Phonetic Dictionary for Natural Language Processing: Kannada Mallamma V. Reddy*, Hanumanthappa M. **, Jyothi N.M***, Rashmi S. *(Department of Computer Science, Rani Channamma University, Vidyasangam, Belgaum-591156, India) ** (Department of Computer Science, Bangalore University, Jnanabharathi Campus, Bangalore-561156, India) ***(Department of Master of Computer Applications, Bapuji Institute of Engineering and Technology, Davangere-577004, India) ABSTRACT India has 22 officially recognized languages: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Clearly, India owns the language diversity problem. In the age of Internet, the multiplicity of languages makes it even more necessary to have sophisticated Systems for Natural Language Process. In this paper we are developing the phonetic dictionary for natural language processing particularly for Kannada. Phonetics is the scientific study of speech sounds. Acoustic phonetics studies the physical properties of sounds and provides a language to distinguish one sound from another in quality and quantity. Kannada language is one of the major Dravidian languages of India. The language uses forty nine phonemic letters, divided into three groups: Swaragalu (thirteen letters); Yogavaahakagalu (two letters); and Vyanjanagalu (thirty-four letters), similar to the vowels and consonants of English, respectively. Keywords - Information Retrieval, Natural Language processing (NLP), Phonetics I. INTRODUCTION Direction of writing: left to right in horizontal Kannada ( ) or Canarese, the official lines ಕꃍನಡ The phonetic can be defined as: language of the southern Indian state of Karnataka. -
The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc. -
Initial Literacy in Devanagari: What Matters to Learners
Initial literacy in Devanagari: What Matters to Learners Renu Gupta University of Aizu 1. Introduction Heritage language instruction is on the rise at universities in the US, presenting a new set of challenges for language pedagogy. Since heritage language learners have some experience of the target language from their home environment, language instruction cannot be modeled on foreign language instruction (Kondo-Brown, 2003). At the same time, there may be significant gaps in the learner’s competence; in describing heritage learners of South Asian languages, Moag (1996) notes that their acquisition of the native language has often atrophied at an early stage. For learners of South Asian languages, the writing system presents an additional hurdle. Moag (1996) states that heritage learners have more problems than their American counterparts in learning the script; the heritage learner “typically takes much more time to master the script, and persists in having problems with both reading and writing far longer than his or her American counterpart” (page 170). Moag attributes this to the different purposes for which learners study the language, arguing that non-native speakers rapidly learn the script since they study the language for professional purposes, whereas heritage learners study it for personal reasons. Although many heritage languages, such as Japanese, Mandarin, and the South Asian languages, use writing systems that differ from English, the difficulties of learning a second writing system have received little attention. At the elementary school level, Perez (2004) has summarized differences in writing systems and rhetorical structures, while Sassoon (1995) has documented the problems of school children learning English as a second script; in addition, Cook and Bassetti (2005) offer research studies on the acquisition of some writing systems. -
Evolution of Script in India
Evolution of script in India December 4, 2018 Manifest pedagogy UPSC in recent times has been asking tangential questions surrounding a personality. This is being done by linking dimensions in the syllabus with the personality. Iravatham Mahadevan which was in news last week. His contributions to scripts particularly Harappan Script and Brahmi script was immense. So the issue of growth of language and script become a relevant topic. In news Death of Iravatham Mahadevan an Indian epigraphist with expertise in Tamil-brahmi and Indus Valley script. Placing it in syllabus Indian culture will cover the salient aspects of Art Forms, Literature and Architecture from ancient to modern times. Dimensions 1. Difference between language and script 2. Indus valley script and the unending debate on its decipherment 3. The prominence of Brahmi script 4. The evolution of various scripts of India from Brahmi. 5. Modern Indian scripts. Content A language usually refers to the spoken language, a method of communication. A script refers to a collection of characters used to write one or more languages. A language is a method of communication. Scripts are writing systems that allow the transcription of a language, via alphabet sets. Indus script After the pictographic and petroglyph representations of early man the first evidence of a writing system can be seen in the Indus valley civilization. The earliest evidence of which is found on the pottery and pot shreds of Rahman Dheri and these potter’s marks, engraved or painted, are strikingly similar to those appearing in the Mature Indus symbol system. Later the writing system can be seen on the seals and sealings of Harappan period.