Proposal for a Telugu Script Root Zone Label Generation Ruleset (LGR)
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Ka И @И Ka M Л @Л Ga Н @Н Ga M М @М Nga О @О Ca П
ISO/IEC JTC1/SC2/WG2 N3319R L2/07-295R 2007-09-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal for encoding the Javanese script in the UCS Source: Michael Everson, SEI (Universal Scripts Project) Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Replaces: N3292 Date: 2007-09-11 1. Introduction. The Javanese script, or aksara Jawa, is used for writing the Javanese language, the native language of one of the peoples of Java, known locally as basa Jawa. It is a descendent of the ancient Brahmi script of India, and so has many similarities with modern scripts of South Asia and Southeast Asia which are also members of that family. The Javanese script is also used for writing Sanskrit, Jawa Kuna (a kind of Sanskritized Javanese), and Kawi, as well as the Sundanese language, also spoken on the island of Java, and the Sasak language, spoken on the island of Lombok. Javanese script was in current use in Java until about 1945; in 1928 Bahasa Indonesia was made the national language of Indonesia and its influence eclipsed that of other languages and their scripts. Traditional Javanese texts are written on palm leaves; books of these bound together are called lontar, a word which derives from ron ‘leaf’ and tal ‘palm’. 2.1. Consonant letters. Consonants have an inherent -a vowel sound. Consonants combine with following consonants in the usual Brahmic fashion: the inherent vowel is “killed” by the PANGKON, and the follow- ing consonant is subjoined or postfixed, often with a change in shape: §£ ndha = § NA + @¿ PANGKON + £ DA-MAHAPRANA; üù n. -
Friends, Enemies, and Fools: a Collection of Uyghur Proverbs by Michael Fiddler, Zhejiang Normal University
GIALens 2017 Volume 11, No. 3 Friends, enemies, and fools: A collection of Uyghur proverbs by Michael Fiddler, Zhejiang Normal University Erning körki saqal, sözning körki maqal. A beard is the beauty of a man; proverbs are the beauty of speech. Abstract: This article presents two groups of Uyghur proverbs on the topics of friendship and wisdom, to my knowledge only the second set of Uyghur proverbs published with English translation (after Mahmut & Smith-Finley 2016). It begins with a brief introduction of Uyghur language and culture, then a description of Uyghur proverb styles, then the two sets of proverbs, and finally a few concluding comments. Key words: Proverb, Uyghur, folly, wisdom, friend, enemy 1. Introduction Uyghur is spoken by about 10 million people, mostly in northwest China’s Xinjiang Uyghur Autonomous Region. It is a Turkic language, most closely related to Uzbek and Salar, but also sharing similarity with its neighbors Kazakh and Kyrgyz. The lexicon is composed of about 50% old Turkic roots, 40% relatively old borrowings from Arabic and Persian, and 6% from Russian and 2% from Mandarin (Nadzhip 1971; since then, the Russian and Mandarin borrowings have presumably seen a slight increase due to increased language contact, especially with Mandarin). The grammar is generally head-final, with default SOV clause structure, A-N noun phrases, and postpositions. Stress is typically word-final. Its phoneme inventory includes eight vowels and 24 consonants. The phonology includes vowel harmony in both roots and suffixes, which is sensitive to rounded/unrounded (labial harmony) and front/back (palatal harmony), consonant harmony across suffixes, which is sensitive to voiced/unvoiced and front/back distinctions; and frequent devoicing of high vowels (see Comrie 1997 and Hahn 1991 for details). -
Proposal for a Gurmukhi Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Gurmukhi Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-04-22 Document version: 2.7 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract This document lays down the Label Generation Ruleset for Gurmukhi script. Three main components of the Gurmukhi Script LGR i.e. Code point repertoire, Variants and Whole Label Evaluation Rules have been described in detail here. All these components have been incorporated in a machine-readable format in the accompanying XML file named "proposal-gurmukhi-lgr-22apr19-en.xml". In addition, a document named “gurmukhi-test-labels-22apr19-en.txt” has been provided. It provides a list of labels which can produce variants as laid down in Section 6 of this document and it also provides valid and invalid labels as per the Whole Label Evaluation laid down in Section 7. 2. Script for which the LGR is proposed ISO 15924 Code: Guru ISO 15924 Key N°: 310 ISO 15924 English Name: Gurmukhi Latin transliteration of native script name: gurmukhī Native name of the script: ਗੁਰਮੁਖੀ Maximal Starting Repertoire [MSR] version: 4 1 3. Background on Script and Principal Languages Using It 3.1. The Evolution of the Script Like most of the North Indian writing systems, the Gurmukhi script is a descendant of the Brahmi script. The Proto-Gurmukhi letters evolved through the Gupta script from 4th to 8th century, followed by the Sharda script from 8th century onwards and finally adapted their archaic form in the Devasesha stage of the later Sharda script, dated between the 10th and 14th centuries. -
A Review of Heat Stroke and Its Complications in the Canine Renee L
Volume 45 | Issue 1 Article 1 1983 A Review of Heat Stroke and Its Complications in the Canine Renee L. Larson Iowa State University Robert W. Carithers Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/iowastate_veterinarian Part of the Small or Companion Animal Medicine Commons, and the Veterinary Physiology Commons Recommended Citation Larson, Renee L. and Carithers, Robert W. (1983) "A Review of Heat Stroke and Its Complications in the Canine," Iowa State University Veterinarian: Vol. 45 : Iss. 1 , Article 1. Available at: https://lib.dr.iastate.edu/iowastate_veterinarian/vol45/iss1/1 This Article is brought to you for free and open access by the Journals at Iowa State University Digital Repository. It has been accepted for inclusion in Iowa State University Veterinarian by an authorized editor of Iowa State University Digital Repository. For more information, please contact [email protected]. A Review of Heat Stroke and Its Complications in the Canine Renee L. Larson, BS * Robert W. Carithers, DVM, MS, PhD** SUMMARY most obvious prerequisite is a high en A review of heat stroke and its complications vironmental temperature. When the ambient is presented. The etiology, physiology, clinical temperature increases above 86°F, a rise in in signs, secondary complications, diagnosis, ternal body temperature results. Dogs can treatment, necropsy results and prevention of tolerate rising environmental temperature heat stroke are discussed. A clinical case is quite well. However, when the body tempera then presented to illustrate the disorder of heat ture exceeds 104°F a breakdown of the stroke. animal's thermal equilibrium begins. -
Telugu Wordnet
Telugu WordNet S. Arulmozi Department of Dravidian & Computational Linguistics Dravidian University, Kuppam 517425, India [email protected] Abstract Section 4 gives a statistical account on the synsets developed. The last section This paper describes an attempt to develop Telugu WordNet, particularly construction of summarizes the work. synsets in Telugu language along the lines of Hindi synsets using the expansion approach. 2 The Telugu Language Based on the Hindi WordNet synsets, we assign Telugu synsets manually using the Offline Tool Telugu belongs to the South Central Dravidian Interface. We share the challenges faced in the subgroup of the Dravidian family of languages. construction of core synsets from Hindi into It has recorded history from 6th Century A.D. Telugu language. A brief account on Telugu th language and its notable features are also and literary history dating back to 11 Century provided. A.D. It has been recently awarded the Classical Status. It is the second most spoken language 1 Introduction after Hindi in India. Telugu has been the language of choice for lyrical compositions for WordNet building activities in Dravidian its vowel ending words, rightly called the languages started with the work of Tamil “Italian of the East”. WordNet1 at AU-KBC Research Centre using The vocabulary of Telugu is highly Rajendran’s (2001) ontological classification Sanskritized in addition to the Persian-Arabic of Tamil vocabulary. Work on Dravidian borrowings / kaburu/ `story ’, WordNet (comprising WordNets in four major కబురు జవాబు /javaabu/ `answer ’; Urdu /taraaju/ Dravidian languages, viz. Kannada, తరా灁 Malayalam, Tamil and Telugu) started during a 2 `balance’. It does have cognates in other Workshop held at Chennai in which synsets Dravidan languages such as puli/ `tiger ’, were built for Construction Domain. -
"9-41516)9? "9787:)4 ;7 -6+7,- )=1 16 ;0- & $
L2/20-256 "9-41516)9?"9787:)4;7-6+7,-)=116;0-&$ ᭛᭜᭛ <;079 ,1;?))?<"-9,)6)215-14,7;3755/5)14+75 40)5 <9=)6:)0140)56<9=)6:)0/5)14+75 );- ;0$-8;-5*-9 6;97,<+;176 ,=:#6L>H8G>EI>H6=>HIDG>86AG6=B>76H:9H8G>EI;DJC9>CK6G>DJH>CH8G>EI>DCH6C96GI:;68IHEGD9J8:97:IL::CI=: I=6C9I=: I=8:CIJGN>C>CHJA6G+DJI=:6HIH>6A6G<:EDGI>DCD;>IH8DGEJH>H;DJC9>C"6K67JI#6L>B6I:G>6AH =6K:6AHD7::C;DJC9>C+JB6IG6%6A6N(:C>CHJA66A>6C9I=:(=>A>EE>C:H,=:H8G>EI>H;G:FJ:CIAN6HHD8>6I:9L>I= I=:'A9"6K6C:H:A6C<J6<:7JIB6I:G>6AHLG>II:C>C+6CH@G>I'A9%6A6N'A96A>C:H:6C9'A9+JC96C:H:A6C<J6<: =6H6AHD7::C;DJC9>CI=:#6L>H8G>EIGDBI=:B>9I=8:CIJGNH>BEA:;JC8I>DC6A#6L>L6HL>9:ANJH:9IDG:8DG9 A6C9 <G6CIH GDN6A :9>8IH 6C9 H>B>A6G 8=6C8:GN 9D8JB:CIH ,DL6G9H I=: :C9 D; I=: ;>GHI B>AA:CC>JB I=: H8G>EI 7:86B:>C8G:6H>C<AN9:8DG6I>K:6C986AA><G6E=>89J:ID>IHJH:6HI=:B6>CK:=>8A:D;'A9"6K6C:H:A>I:G6GNA6C<J6<: L>I=ADC<A6HI>C<A:<68N>CI=:A>I:G6GNIG69>I>DCD;I=:BD9:GC"6K6C:H:6C96A>C:H:A6C<J6<:H$6I:G#6L>H=DLH B6CNK6G>6I>DCHDK:G6L>9:<:D<G6E=>89>HIG>7JI>DC'K:GI>B:I=:H:K6G>6CIH=6K::KDAK:9>G:8IANDG>C9>G:8IAN >CIDI=:B6CNBD9:GCG6=B>8H8G>EIHD;>CHJA6G+H>6HJ8=6H6A>C:H:6I6@"6K6C:H:$DCI6G6:I8 /=>A:I=:68I>K:JH:D;#6L>H8G>EI=6H7::CG:EA68:97NDI=:GH8G>EIHH>C8:I=: I=8:CIJGNI=:G:6G:6CJB7:GD; BD9:GC96N:CI=JH>6HIH6C98DBBJC>I>:HL=DJH:I=:H8G>EIID96N;DGDI=:GEJGEDH:HI=6C6C8>:CIG:EGD9J8I>DC ;DG:M6BEA:ID8=6I>CHD8>6A6EEA>86I>DC6C98G:6I:>B6<:EDHIH!CI=>HG:K>K6AINE:D;JH:I=:#6L>H8G>EIB6N7: JH:9IDLG>I:A6C<J6<:HI=6I6G:CDI;DJC9>C‘6JI=:CI>8’#6L>8DGEJHHJ8=6HI=:BD9:GC"6K6C:H:A6C<J6<:DG I=: !C9DC:H>6C A6C<J6<: H#6L>=6H CDI 7::C :C8D9:9>C I=: -C>8D9: N:I I=: -
T'ahsíi Gha Gots'ęh Náets'ehndı́h Gha Goɂǫ
CANADA GOGHA AMÍI GOHÉH DZÁA NĮONIDHE SÍI MEGHA ?E?Á T’AHSÍI GHA GOTS’ĘH NÁETS’EHNDÍH GHA GOɁǪ agots’ı̨ la t’áh ǫkı edáidzę gots’ę́ gots’áendíh íle gha T’ahsíi gha gots’ęh naets’éhndı́h énidé gots’áendıh́ ts’ęh edı̨ htł’éh gogháts’ı̨ ɂa gots’ęh gha go ǫ azhíi ǫt’e? eghálats’enda íle gha godı̨ eghálats’enda éhsíi edı̨ htł’éh ɂ seets’eleh gogháts’ıdhah.́ T’ahsíi goghǫ ts’enéhe T’ahsíi gha gots’ęh náets’éhndıh́ gha goɂǫ énidé t’áh dádéhtí íle énidé gots’ęh t’ahsíi tsıts’ı́ ̨ hthe dádéhtí amíi méh dzaah nıgot’ǫ́ ts’ıhɂǫh gosaamba hule agújá, náts’ehndíh mets’ęh edı̨ htł’éh ats’ehɂı̨ gogháts’ı̨ ɂá íle amíi mets’ę́ t’áhsíi ts’enıdhę t’áh. T’ahsíi gha gots’ęh énidé dádétí náíndí ts’enidhę t’áh gots’ęnoendíh. náets’éhndıh́ gha goɂǫ énidé t’áh amíi t’ahsíi tsı̨ guıdhe meghǫh názhaetı gots’eh zǫ́ dúle gots’ęh noendıh́ Gohseenízhaehti adi t’áh amíi mehéh dzaah nıgot’ǫ́ t’áh ats’eleh. Amíi t’ahsíi tsı̨ gųihthe meghǫnázhaeti k’eodı̨ ɂá saamba ts’ęhke mesaamba hule énidé dúle edi káondıh́ ǫt’e gots’ęh dúle thahne t’áh meghǫnázhaetı t’áh t’áh gots’áhodi: ats’eleh íle énidé t’ahsíi tśı̨ ts’ıhthé t’áh goghǫnázhaetı • Dzaah níots’éniɂǫ t’áh t’ahsí gots’ęh ts’enıdhę t’ah gǫndi k’ę́ ę́ ats’et’ı̨̨ gha góɂǫ íle énidé goghǫnázhaeti íle énidé godı̨ náts’edéh gokųę theɂǫ goch’á t’áh gogha gǫndi ts’ehtsi. -
The Dravidian Languages
THE DRAVIDIAN LANGUAGES BHADRIRAJU KRISHNAMURTI The Pitt Building, Trumpington Street, Cambridge, United Kingdom The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York, NY 10011–4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarc´on 13, 28014 Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org C Bhadriraju Krishnamurti 2003 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2003 Printed in the United Kingdom at the University Press, Cambridge Typeface Times New Roman 9/13 pt System LATEX2ε [TB] A catalogue record for this book is available from the British Library ISBN 0521 77111 0hardback CONTENTS List of illustrations page xi List of tables xii Preface xv Acknowledgements xviii Note on transliteration and symbols xx List of abbreviations xxiii 1 Introduction 1.1 The name Dravidian 1 1.2 Dravidians: prehistory and culture 2 1.3 The Dravidian languages as a family 16 1.4 Names of languages, geographical distribution and demographic details 19 1.5 Typological features of the Dravidian languages 27 1.6 Dravidian studies, past and present 30 1.7 Dravidian and Indo-Aryan 35 1.8 Affinity between Dravidian and languages outside India 43 2 Phonology: descriptive 2.1 Introduction 48 2.2 Vowels 49 2.3 Consonants 52 2.4 Suprasegmental features 58 2.5 Sandhi or morphophonemics 60 Appendix. Phonemic inventories of individual languages 61 3 The writing systems of the major literary languages 3.1 Origins 78 3.2 Telugu–Kannada. -
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad. -
A Script Independent Approach for Handwritten Bilingual Kannada and Telugu Digits Recognition
IJMI International Journal of Machine Intelligence ISSN: 0975–2927 & E-ISSN: 0975–9166, Volume 3, Issue 3, 2011, pp-155-159 Available online at http://www.bioinfo.in/contents.php?id=31 A SCRIPT INDEPENDENT APPROACH FOR HANDWRITTEN BILINGUAL KANNADA AND TELUGU DIGITS RECOGNITION DHANDRA B.V.1, GURURAJ MUKARAMBI1, MALLIKARJUN HANGARGE2 1Department of P.G. Studies and Research in Computer Science Gulbarga University, Gulbarga, Karnataka. 2Department of Computer Science, Karnatak Arts, Science and Commerce College Bidar, Karnataka. *Corresponding author. E-mail: [email protected],[email protected],[email protected] Received: September 29, 2011; Accepted: November 03, 2011 Abstract- In this paper, handwritten Kannada and Telugu digits recognition system is proposed based on zone features. The digit image is divided into 64 zones. For each zone, pixel density is computed. The KNN and SVM classifiers are employed to classify the Kannada and Telugu handwritten digits independently and achieved average recognition accuracy of 95.50%, 96.22% and 99.83%, 99.80% respectively. For bilingual digit recognition the KNN and SVM classifiers are used and achieved average recognition accuracy of 96.18%, 97.81% respectively. Keywords- OCR, Zone Features, KNN, SVM 1. Introduction recognition of isolated handwritten Kannada numerals Recent advances in Computer technology, made every and have reported the recognition accuracy of 90.50%. organization to implement the automatic processing Drawback of this procedure is that, it is not free from systems for its activities. For example, automatic thinning. U. Pal et al. [11] have proposed zoning and recognition of vehicle numbers, postal zip codes for directional chain code features and considered a feature sorting the mails, ID numbers, processing of bank vector of length 100 for handwritten Kannada numeral cheques etc. -
Machine Translation for Dravidian Languages Using Stacked Long Short Term Memory
MUCS@ - Machine Translation for Dravidian Languages using Stacked Long Short Term Memory Asha Hegde Ibrahim Gashaw H. L. Shashirekha Dept of Computer Science Dept of Computer Science Dept of Computer Science Mangalore University Mangalore University Mangalore University [email protected] [email protected] [email protected] Abstract are small literary languages. All these languages except Kodava have their own script. Further, these The Dravidian language family is one of 1 the largest language families in the world. languages consists of 80 different dialects namely In spite of its uniqueness, Dravidian lan- Brahui, Kurukh, Malto, Kui, Kuvi, etc. Dravid- guages have gained very less attention due ian Languages are mainly spoken in southern In- to scarcity of resources to conduct language dia, Sri Lanka, some parts of Pakistan and Nepal technology tasks such as translation, Parts- by over 222 million people (Hammarstrom¨ et al., of-Speech tagging, Word Sense Disambigua- 2017). It is thought that Dravidian languages are tion etc. In this paper, we, team MUCS, native to the Indian subcontinent and were origi- describe sequence-to-sequence stacked Long nally spread throughout India1. Tamil have been Short Term Memory (LSTM) based Neural Machine Translation (NMT) models submit- distributed to Burma, Indonesia, Malaysia, Fiji, ted to “Machine Translation in Dravidian lan- Madagascar, Mauritius, Guyana, Martinique and guages”, a shared task organized by EACL- Trinidad through trade and emigration. With over 2021. The NMT models are applied for trans- two million speakers, primarily in Pakistan and two lation using English-Tamil, English-Telugu, million speakers in Afghanistan, Brahui is the only English-Malayalam and Tamil-Telugu corpora Dravidian language spoken entirely outside India provided by the organizers.