Malayalam Range: 0D00–0D7F
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Ka И @И Ka M Л @Л Ga Н @Н Ga M М @М Nga О @О Ca П
ISO/IEC JTC1/SC2/WG2 N3319R L2/07-295R 2007-09-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal for encoding the Javanese script in the UCS Source: Michael Everson, SEI (Universal Scripts Project) Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Replaces: N3292 Date: 2007-09-11 1. Introduction. The Javanese script, or aksara Jawa, is used for writing the Javanese language, the native language of one of the peoples of Java, known locally as basa Jawa. It is a descendent of the ancient Brahmi script of India, and so has many similarities with modern scripts of South Asia and Southeast Asia which are also members of that family. The Javanese script is also used for writing Sanskrit, Jawa Kuna (a kind of Sanskritized Javanese), and Kawi, as well as the Sundanese language, also spoken on the island of Java, and the Sasak language, spoken on the island of Lombok. Javanese script was in current use in Java until about 1945; in 1928 Bahasa Indonesia was made the national language of Indonesia and its influence eclipsed that of other languages and their scripts. Traditional Javanese texts are written on palm leaves; books of these bound together are called lontar, a word which derives from ron ‘leaf’ and tal ‘palm’. 2.1. Consonant letters. Consonants have an inherent -a vowel sound. Consonants combine with following consonants in the usual Brahmic fashion: the inherent vowel is “killed” by the PANGKON, and the follow- ing consonant is subjoined or postfixed, often with a change in shape: §£ ndha = § NA + @¿ PANGKON + £ DA-MAHAPRANA; üù n. -
Design of Javanese Text to Speech Application
Design of Javanese Text to Speech Application Yulia, Liliana, Rudy Adipranata, Gregorius Satia Budhi Informatics Department, Industrial Technology Faculty, Petra Christian University Surabaya, Indonesia [email protected] Abstract—Javanese is one of the many regional languages used in Indonesia. Javanese language is used by most of the population in Java. But now along with the development of the era, the use of regional languages including Javanese language is to be re- duced especially among the younger generation. One way to help conserve the use of Javanese language is to utilize information technologies, one of them is by developing a text to speech appli- cation that can be used to find out how the pronunciation of Ja- vanese language. In this paper, we discussed the design for Java- nese text to speech applications uses finite state automata. The design result will be used as rules to separate syllables when im- plementing text to speech application. Index Terms—Javanese language; Finite state automata; Text to speech. Figure 1: Basic Javanese characters I. INTRODUCTION In addition to the basic characters, the Javanese character Javanese language is a language widely spoken by the peo- has supplementary characters, consist of symbols for express- ple of Java. It is one of the regional languages of many region- ing vowels as well as a combination of two specific conso- al languages spoken in Indonesia. As one of the assets of na- nants. This supplementary characters is called sandhangan tional culture, Javanese language needs to be preserved. The and can be seen in Figure 2 [5]. younger generation is now more interested in learning a for- Symbol Example Read eign language, rather than the native Indonesian local lan- guage. -
Names of Foodstuffs in Indian Languages
NAMES OF FOODSTUFFS IN INDIAN LANGUAGES CEREAL GRAINS AND PRODUCTS 1. Pearl Millet: Pennisetum typhoides Bajra (Bengali, Hindi, Oriya), Bajri (Gujarati, Marathi), Sajje (Kannada), Bajr’u (Kashmiri), Cambu (Malayalam, Tamil), Sazzalu (Telugu). Other names : Spiked millet, Pearl millet 2. Italian millet: Setaria italica Syama dhan (Bengali), Ral Kang (Gujarati), Kangni (Hindi), Thene (Kannada), Shol (Kashmiri), Thina (Malayalam), Rala (Marathi), Kaon (Punjabi), Thenai (Tamil), Korralu (Telugu), Other names: Foxtail millet , Moha millet, Kakan kora 3. Sorghum: Sorghum bicolor Juar (Bengali , Gujarati , Hindi), Jola (Kannada), Cholam (Malayalam , Tamil), Jwari (Marathi), Janha (Oriya), Jonnalu (Telugu), Other names: Milo , Chari 4. Maize: Zea mays Bhutta (Bengali), Makai (Gujarati), Maka (Hindi , Marathi , Oriya), Musikinu jola (Kannada), Makaa’y (Kashmiri), Cholam (Malayalam), Makka Cholam (Tamil), Mokka jonnalu (Telugu) 5. Finger Millet: Eleusine coracana Madua (Bengali , Hindi), Bhav (Gujarati), Ragi (Kannada) , Moothari (Malayalam), Nachni (Marathi), Mandia (Oriya), Kezhvaragu (Tamil), Ragulu (Telugu), Other names: Korakan 6. Rice, parboiled: Oryza sativa Siddha chowl (Bengali) Ukadello chokha (Gujarati), Usna chawal (Hindi), Kusubalakki (Kannada), Puzhungal ari (Malayalam), Ukadla tandool (Marathi), Usuna chaula (Oriya), Puzhungal arisi (Tamil), Uppudu biyyam (Telugu) 7. Rice raw: Orya sativa Chowl (Bengali), Chokha (Gujarati), Chawal (Hindi), Akki (Kannada), Tomul (Kashmiri), Ari (Malayalam), Tandool (Marathi), Chaula (Oriya), Arisi -
Exploring Language Similarities with Dimensionality Reduction Techniques
EXPLORING LANGUAGE SIMILARITIES WITH DIMENSIONALITY REDUCTION TECHNIQUES Sangarshanan Veeraraghavan Final Year Undergraduate VIT Vellore [email protected] Abstract In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages. 1. Introduction Language is a method of communication and ironically has long remained a communication barrier. Written representations of all languages look quite different but inherently share similarity between them. For example if we show a person with no knowledge of English alphabets a text in English and then in Spanish they might be oblivious to the similarity between them as they look like gibberish to them. There might also be several languages which may seem completely different to even experienced linguists but might share a subtle hidden similarity as they is a possibility that languages with no shared vocabulary might still have some similarity. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik
Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Punjabi Machine Transliteration. 21st international Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the ACL, Jul 2006, Sydney, France. pp.1137-1144. hal-01002160 HAL Id: hal-01002160 https://hal.archives-ouvertes.fr/hal-01002160 Submitted on 15 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Punjabi Machine Transliteration M. G. Abbas Malik Department of Linguistics Denis Diderot, University of Paris 7 Paris, France [email protected] Transliteration refers to phonetic translation Abstract across two languages with different writing sys- tems (Knight & Graehl, 1998), such as Arabic to Machine Transliteration is to transcribe a English (Nasreen & Leah, 2003). Most prior word written in a script with approximate work has been done for Machine Translation phonetic equivalence in another lan- (MT) (Knight & Leah, 97; Paola & Sanjeev, guage. It is useful for machine transla- 2003; Knight & Stall, 1998) from English to tion, cross-lingual information retrieval, other major languages of the world like Arabic, multilingual text and speech processing. Chinese, etc. for cross-lingual information re- Punjabi Machine Transliteration (PMT) trieval (Pirkola et al, 2003), for the development is a special case of machine translitera- of multilingual resources (Yan et al, 2003; Kang tion and is a process of converting a word & Kim, 2000) and for the development of cross- from Shahmukhi (based on Arabic script) lingual applications. -
An Introduction to Indic Scripts
An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set. -
Specifying Optional Malayalam Conjuncts
Specifying Optional Malayalam Conjuncts Cibu Johny <[email protected]> Roozbeh Poornader <[email protected]> 2013Jan28 Current status Indic conjunct formation scheme currently favors the full conjunct for a given set of characters. Example: क् + ष → is prefered as opposed to क् ष. (KAd + SSAl → K.SSAn ) क् ष can be obtained by क् + ZWJ + ष which is KAd + ZWJ + SSAl → KAh + SSAn The Need In Malayalam there are two prevailing orthographies traditional and reformed both written with same Malayalam character set. The difference between them is typically manifested only by the font. Traditional orthography fonts accomodate lot more full conjuncts, while reformed orthography fonts would use visibile virama (Chandrakkala) separated sequences for many of those full conjuncts. For the vowel signs of U, UU, and Vocalic vowels and also for the RAsign, reformed orthography font would use visually separate conjoining form. However, there is a definite need for the ability in a reformed orthography font to display the traditional full conjuncts on demand. As of now there is no mechanism specified in the standard to suggest a full conjunct of a cluster. The reverse case is also needed a traditional orthography font might want to display reformed othrography grapheme clusters optionally. Following proposal uses ZWJ and ZWNJ insertions to achieve this need. However, potentially Chillu forming sequence <Consonant + Virama + ZWJ> is not used for any of the cases listed below. Proposal Case 1 1 The sequence <Consonant + ZWJ + Conjoining Vowel Sign> has following fallback order for display: 1. Full Conjunct 2. Consonant + nonconjoining vowel sign Example with reformed orthography font (in a reformed orthography Malayalam font that can allow optional traditional orthography) SA + Vowel Sign U → SA + ZWJ + Vowel Sign U → Case 2 <Consonant1 + ZWJ + Virama + Consonant2> has following display fallback order: 1. -
From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806
JIOWSJournal of Indian Ocean World Studies From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806 Michael Laffan To cite this article: Laffan, Michael. “From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806.” Journal of Indian Ocean World Studies, 1 (2017), pp. 38-59. More information about the Journal of Indian Ocean World Studies can be found at: jiows.mcgill.ca © Michael Laffan. This is an Open Access article distributed under the terms of the Creative Commons License CC BY NC SA, which permits users to share, use, and remix the material provide they give proper attribution, the use is non-commercial, and any remixes/transformations of the work are shared under the same license as the original. Journal of Indian Ocean World Studies, 1 (2017), pp. 38-59. © Michael Laffan, CC BY-NC-SA 4.0 | 38 From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806 Michael Laffan Princeton University, New Jersey Abstract This article assembles clues related to the life and impact of an eighteenth century exile to Cape Town known as Oupa or Tuan Skapie (Grandpa/Lord Sheepy). Remembered as a slave sent from Java in the 1770s who tended herds and dug wells on the slopes of Signal Hill in between periods of meditation, it would appear that this subaltern might well have been more than that. Certainly he was successful at concealing his identity (and abilities) from his former jailers and two colonial regimes, finally taking his resting place high on the ridge above Cape Town in 1806, above the space assigned to a more scripturally-charged rival. -
The Making of Modern Malayalam Prose and Fiction: Translations from European Languages Into Malayalam in the First Half of the Twentieth Century
The Making of Modern Malayalam Prose and Fiction: Translations from European Languages into Malayalam in the First Half of the Twentieth Century K.M. Sherrif Abstract Translations from European languages have played a crucial role in the evolution of Malayalam prose and fiction in the first half of the Twentieth Century. Many of them are directly linked to the socio- political movements in Kerala which have been collectively designated ‘Kerala’s Renaissance.’ The nature of the translated texts reveal the operation of ideological and aesthetic filters in the interface between literatures, while the overwhelming presence of secondary translations indicate the hegemonic status of English as a receptor language. The translations never occupied a central position in the Malayalam literature and served mostly as mere literary and political stimulants. Keywords: Translation - evolution of genres, canon - political intervention The role of translation in the development of languages and literatures has been extensively discussed by translation scholars in the West during the last quarter of a century. The proliferation of diachronic translation studies that accompanied the revolutionary breakthroughs in translation theory in the mid-Eighties of the Twentieth Century resulted in the extensive mapping of the intervention of translation in the development of discourses and shifts of ideological paradigms in cultures, in the development of genres and the construction and disruption of the canon in literatures and in altering the idiomatic and structural paradigms of languages. One of the most detailed studies in the area was made by Andre Lefevere (1988, pp 75-114) Lefevere showed with convincing 118 Translation Today K.M. -
Data Issues in English-To-Hindi Machine Translation
Data Issues in English-to-Hindi Machine Translation Ondřej Bojar, Pavel Straňák, Daniel Zeman Univerzita Karlova v Praze, Ústav formální a aplikované lingvistiky Malostranské náměstí 25, CZ-11800 Praha {bojar|stranak|zeman}@ufal.mff.cuni.cz http://ufal.mff.cuni.cz/umc/ Abstract Statistical machine translation to morphologically richer languages is a challenging task and more so if the source A dataset originally collected for the DARPA-TIDES surprise- and target languages differ in word order. Current state-of-the art MT systems thus deliver mediocre results. language contest in 2002, later refined at IIIT Hyderabad and Adding more parallel data often helps improve the results; if it does not, it may be caused by various problems such provided for the NLP Tools Contest at ICON 2008. Corpus Sentences En Tokens Hi Tokens as different domains, bad alignment or noise in the new data. We evaluate several available parallel data sources Tides.train 50,000 1,226,144 1,312,435 A journalist Daniel Pipes' website (http://www.danielpipes.org/) and provide cross-evaluation results on their combinations using two freely available statistical MT systems. We Tides.dev 1,000 22,485 24,363 demonstrate various problems encountered in the data and describe automatic methods of data cleaning and limited-domain articles about the Middle East. Written in English, Tides.test 1,000 27,169 28,574 normalization. We also show that the contents of two independently distributed data sets can unexpectedly overlap, many of them translated to up to 25 other languages. which negatively affects translation quality. Together with the error analysis, we also present a new tool for viewing Daniel Pipes 6,761 176,392 122,108 Monolingual, parallel and annotated corpora for fourteen South Emille 3,501 55,660 71,010 aligned corpora, which makes it easier to detect difficult parts in the data even for a developer not speaking the Asian languages (including Hindi) and English. -
History-. of ··:Kerala: - • - ' - - ..>
HISTORY-. OF ··:KERALA: - • - ' - - ..> - K~ P. PAD!rlANABHA .MENON.. Rs. 8. 18 sh. ~~~~~~ .f-?2> ~ f! P~~-'1 IY~on-: f. L~J-... IYt;;_._dh, 4>.,1.9 .£,). c~c~;r.~, ~'").-)t...q_ A..Ja:..:..-. THE L ATE lVIn. K. P . PADJVIANABHA MENON. F rontispiece.] HISTORY· op::KERALA. .. :. ' ~ ' . Oowright and right of t'fanslation:. resen;e~ witk ' Mrs. K. P. PADMANABHA MENON. Copies can be had of . '. Mrs. K. P. PAI>MANABHA MENON, Sri Padmanabhalayam Bungalow~, Diwans' Road;-!Jmakulam, eochin state, S.INDIA. HISTORY OF KERALA. A HISTORY OF KERALA. WBI'l!TEN, IN THE FOBH OF NOTES ON VISSCHER'S LETTERS FROM MALABAR, BY K. P. PADMANABHA MENON, B.A., S.L, M.~.A.S., ' . Author of the History of Codzin, anti of severai P~p~rsconnectedwith the early History of Kerala; Jiak•l of the H1g-k Courts of Madras 0,.. of Travam:ore and of tke Ckief Court of. Cochin, ~ . • r . AND .EDITED BY SAHITHYAKUSALAN. "T. :K.' KRISHNA MENON, B. A.;~. • ' "'•.t . fl' Formerly, a Member of Jhe Royal Asiatic Society, and of the ~ocieties of Arts and of Aut/tors, 'anti a Fellow of the .Royal Histor~cal .Soci'ety. Kun.kamhu NamfJiyar Pr~sd:ian. For some ti'me, anE:cami'n.er for .Malayalam to the Umverfities of Madras, Benares and Hydera bad. A Member (Jf the .Board of Stzediet for Malayalam. A fJUOndum Editor of Pid,.a Vinodini. A co-Editor of tke ., .Sciene~ Primers Seriu in Malayalam. Editor of .Books for Malabar Bairns: The Author &- Editor of several works in Malayalam. A Member of the, Indian Women's Uni'r,ersity, and · · a .Sadasya of Visvn-.Bharatki, &-c.