Malayalam Range: 0D00–0D7F

Total Page:16

File Type:pdf, Size:1020Kb

Malayalam Range: 0D00–0D7F Malayalam Range: 0D00–0D7F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. Copying characters from the character code tables or list of character names is not recommended, because for production reasons the PDF files for the code charts cannot guarantee that the correct character codes will always be copied. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See https://www.unicode.org/charts/fonts.html for a list. Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these code charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site. See https://www.unicode.org/pending/pending.html and https://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0D00 Malayalam 0D7F 0D0 0D1 0D2 0D3 0D4 0D5 0D6 0D7 0 $ഀ ഐ ഠ ര $ീ ൠ ൰ 0D00 0D10 0D20 0D30 0D40 0D60 0D70 1 $ഁ ഡ റ $ു ൡ ൱ 0D01 0D21 0D31 0D41 0D61 0D71 2 $ം ഒ ഢ ല $ൂ $ൢ ൲ 0D02 0D12 0D22 0D32 0D42 0D62 0D72 3 $ഃ ഓ ണ ള $ൃ $ൣ ൳ 0D03 0D13 0D23 0D33 0D43 0D63 0D73 4 ഄ ഔ ത ഴ $ൄ ൔ ൴ 0D04 0D14 0D24 0D34 0D44 0D54 0D74 5 അ ക ഥ വ ൕ ൵ 0D05 0D15 0D25 0D35 0D55 0D75 6 ആ ഖ ദ ശ $െ ൖ ൦ ൶ 0D06 0D16 0D26 0D36 0D46 0D56 0D66 0D76 7 ഇ ഗ ധ ഷ $േ $ൗ ൧ ൷ 0D07 0D17 0D27 0D37 0D47 0D57 0D67 0D77 8 ഈ ഘ ന സ $ൈ ൘ ൨ ൸ 0D08 0D18 0D28 0D38 0D48 0D58 0D68 0D78 9 ഉ ങ ഩ ഹ ൙ ൩ ൹ 0D09 0D19 0D29 0D39 0D59 0D69 0D79 A ഊ ച പ ഺ $ൊ ൚ ൪ ൺ 0D0A 0D1A 0D2A 0D3A 0D4A 0D5A 0D6A 0D7A B ഋ ഛ ഫ $഻ $ോ ൛ ൫ ൻ 0D0B 0D1B 0D2B 0D3B 0D4B 0D5B 0D6B 0D7B C ഌ ജ ബ $഼ $ൌ ൜ ൬ ർ 0D0C 0D1C 0D2C 0D3C 0D4C 0D5C 0D6C 0D7C D ഝ ഭ ഽ $് ൝ ൭ ൽ 0D1D 0D2D 0D3D 0D4D 0D5D 0D6D 0D7D E എ ഞ മ $ാ ൎ ൞ ൮ ൾ 0D0E 0D1E 0D2E 0D3E 0D4E 0D5E 0D6E 0D7E F ഏ ട യ $ി ൥ ൟ ൯ ൿ 0D0F 0D1F 0D2F 0D3F 0D4F 0D5F 0D6F 0D7F The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0D00 Malayalam 0D4C Various signs 0D29 ഩ MALAYALAM LETTER NNNA 0D00 $ഀ MALAYALAM SIGN COMBINING ANUSVARA • scholarly use only ABOVE 0D2A പ MALAYALAM LETTER PA 0D01 $ഁ MALAYALAM SIGN CANDRABINDU 0D2B ഫ MALAYALAM LETTER PHA 0D02 $ം MALAYALAM SIGN ANUSVARA 0D2C ബ MALAYALAM LETTER BA • used in Prakrit language texts to indicate 0D2D ഭ MALAYALAM LETTER BHA gemination of the following consonant 0D2E മ MALAYALAM LETTER MA 0D03 $ഃ MALAYALAM SIGN VISARGA • also used to denote the fraction one eightieth 0D04 ഄ MALAYALAM LETTER VEDIC ANUSVARA (kaani) MALAYALAM LETTER YA Independent vowels 0D2F യ MALAYALAM LETTER RA MALAYALAM LETTER A 0D30 ര 0D05 അ MALAYALAM LETTER RRA MALAYALAM LETTER AA 0D31 റ 0D06 ആ MALAYALAM LETTER LA MALAYALAM LETTER I 0D32 ല 0D07 ഇ MALAYALAM LETTER LLA MALAYALAM LETTER II 0D33 ള 0D08 ഈ MALAYALAM LETTER LLLA MALAYALAM LETTER U 0D34 ഴ 0D09 ഉ = zha MALAYALAM LETTER UU 0D0A ഊ 0D35 വ MALAYALAM LETTER VA MALAYALAM LETTER VOCALIC R 0D0B ഋ 0D36 ശ MALAYALAM LETTER SHA MALAYALAM LETTER VOCALIC L 0D0C ഌ = soft sha <reserved> 0D0D " 0D37 ഷ MALAYALAM LETTER SSA 0D0E എ MALAYALAM LETTER E = sha 0D0F ഏ MALAYALAM LETTER EE 0D38 സ MALAYALAM LETTER SA 0D10 ഐ MALAYALAM LETTER AI 0D39 ഹ MALAYALAM LETTER HA 0D11 " <reserved> 0D3A ഺ MALAYALAM LETTER TTTA 0D12 ഒ MALAYALAM LETTER O • scholarly use only 0D13 ഓ MALAYALAM LETTER OO Variant shape viramas 0D14 ഔ MALAYALAM LETTER AU 0D3B $഻ MALAYALAM SIGN VERTICAL BAR VIRAMA Consonants 0D3C $഼ MALAYALAM SIGN CIRCULAR VIRAMA Alternate romanizations are shown as aliases for some letters Addition for Sanskrit to clarify their identity. MALAYALAM SIGN AVAGRAHA MALAYALAM LETTER KA 0D3D ഽ 0D15 ക = praslesham 0D16 ഖ MALAYALAM LETTER KHA 0D17 ഗ MALAYALAM LETTER GA Dependent vowel signs 0D18 ഘ MALAYALAM LETTER GHA 0D3E $ാ MALAYALAM VOWEL SIGN AA 0D19 ങ MALAYALAM LETTER NGA 0D3F $ി MALAYALAM VOWEL SIGN I 0D1A ച MALAYALAM LETTER CA 0D40 $ീ MALAYALAM VOWEL SIGN II = cha 0D41 $ു MALAYALAM VOWEL SIGN U 0D1B ഛ MALAYALAM LETTER CHA 0D42 $ൂ MALAYALAM VOWEL SIGN UU = chha 0D43 $ൃ MALAYALAM VOWEL SIGN VOCALIC R 0D1C ജ MALAYALAM LETTER JA 0D44 $ൄ MALAYALAM VOWEL SIGN VOCALIC RR 0D1D ഝ MALAYALAM LETTER JHA 0D45 " <reserved> 0D1E ഞ MALAYALAM LETTER NYA 0D46 $െ MALAYALAM VOWEL SIGN E = nha • stands to the left of the consonant 0D1F ട MALAYALAM LETTER TTA 0D47 $േ MALAYALAM VOWEL SIGN EE = ta • stands to the left of the consonant 0D20 ഠ MALAYALAM LETTER TTHA 0D48 $ൈ MALAYALAM VOWEL SIGN AI = tta • stands to the left of the consonant 0D21 ഡ MALAYALAM LETTER DDA = hard da Two-part dependent vowel signs 0D22 ഢ MALAYALAM LETTER DDHA These vowel signs have glyph pieces which stand on both = hard dda sides of the consonant; they follow the consonant in logical 0D23 ണ MALAYALAM LETTER NNA order, and should be handled as a unit for most processing. = hard na 0D4A $ൊ MALAYALAM VOWEL SIGN O MALAYALAM LETTER TA 0D24 ത ≡ 0D46 $െ 0D3E $ാ = tha 0D4B $ോ MALAYALAM VOWEL SIGN OO MALAYALAM LETTER THA 0D25 ഥ ≡ 0D47 $േ 0D3E $ാ = ttha 0D4C $ൌ MALAYALAM VOWEL SIGN AU MALAYALAM LETTER DA 0D26 ദ • archaic form of the /au/ dependent vowel = soft da 0D57 $ൗ malayalam au length mark MALAYALAM LETTER DHA → 0D27 ധ 0D46 $െ 0D57 $ൗ = soft dda ≡ 0D28 ന MALAYALAM LETTER NA The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0D4D Malayalam 0D7F Virama 0D6A ൪ MALAYALAM DIGIT FOUR 0D4D $് MALAYALAM SIGN VIRAMA 0D6B ൫ MALAYALAM DIGIT FIVE = candrakkala (the preferred name) 0D6C ൬ MALAYALAM DIGIT SIX = vowel half-u 0D6D ൭ MALAYALAM DIGIT SEVEN MALAYALAM DIGIT EIGHT Dot reph 0D6E ൮ 0D6F ൯ MALAYALAM DIGIT NINE 0D4E ൎ MALAYALAM LETTER DOT REPH • not used in reformed modern Malayalam Malayalam numerics orthography 0D70 ൰ MALAYALAM NUMBER TEN Measurement symbol 0D71 ൱ MALAYALAM NUMBER ONE HUNDRED MALAYALAM NUMBER ONE THOUSAND 0D4F ൥ MALAYALAM SIGN PARA 0D72 ൲ • used historically to measure rice Fractions Additional historic chillu letters 0D73 ൳ MALAYALAM FRACTION ONE QUARTER 0D54 ൔ MALAYALAM LETTER CHILLU M = kaal 0D55 ൕ MALAYALAM LETTER CHILLU Y → A830 ⁄1 north indic fraction one quarter MALAYALAM FRACTION ONE HALF 0D56 ൖ MALAYALAM LETTER CHILLU LLL 0D74 ൴ = ara Dependent vowel sign → A831 ⁄1 north indic fraction one half 0D57 $ൗ MALAYALAM AU LENGTH MARK 0D75 ൵ MALAYALAM FRACTION THREE QUARTERS • used alone to write the /au/ dependent vowel = mukkaal in modern texts → A832 ⁄3 north indic fraction three quarters → 0D4C $ൌ malayalam vowel sign au 0D76 ൶ MALAYALAM FRACTION ONE SIXTEENTH Minor fractions = maakaani 0D77 ൷ MALAYALAM FRACTION ONE EIGHTH Some minor fractions are represented by letters. = arakkaal The fraction one three-hundred and twentieth "muntiri" is 0D78 ൸ MALAYALAM FRACTION THREE SIXTEENTHS denoted by the syllable "pta" (0D2A 0D4D 0D24). = muntaani 0D58 ൘ MALAYALAM FRACTION ONE ONE-HUNDRED- AND-SIXTIETH Date mark = arakaani 0D79 ൹ MALAYALAM DATE MARK 0D59 ൙ MALAYALAM FRACTION ONE FORTIETH Chillu letters = aramaa MALAYALAM LETTER CHILLU NN MALAYALAM FRACTION THREE EIGHTIETHS 0D7A ൺ 0D5A ൚ 0D7B ൻ MALAYALAM LETTER CHILLU N = muunnukaani 0D7C ർ MALAYALAM LETTER CHILLU RR 0D5B ൛ MALAYALAM FRACTION ONE TWENTIETH = orumaa • historically derived from the full letter ra MALAYALAM FRACTION ONE TENTH • also used for chillu r 0D5C ൜ MALAYALAM LETTER CHILLU L = rantumaa 0D7D ൽ 0D5D ൝ MALAYALAM FRACTION THREE TWENTIETHS • historically derived from the full letter ta = muunnumaa • used for chillu t and chillu d MALAYALAM LETTER CHILLU LL 0D5E ൞ MALAYALAM FRACTION ONE FIFTH 0D7E ൾ = naalumaa 0D7F ൿ MALAYALAM LETTER CHILLU K Additional historic vowel 0D5F ൟ MALAYALAM LETTER ARCHAIC II Additional vowels for Sanskrit 0D60 ൠ MALAYALAM LETTER VOCALIC RR 0D61 ൡ MALAYALAM LETTER VOCALIC LL Dependent vowels 0D62 $ൢ MALAYALAM VOWEL SIGN VOCALIC L 0D63 $ൣ MALAYALAM VOWEL SIGN VOCALIC LL Reserved For viram punctuation, use the generic Indic 0964 and 0965.
Recommended publications
  • Ka И @И Ka M Л @Л Ga Н @Н Ga M М @М Nga О @О Ca П
    ISO/IEC JTC1/SC2/WG2 N3319R L2/07-295R 2007-09-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal for encoding the Javanese script in the UCS Source: Michael Everson, SEI (Universal Scripts Project) Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Replaces: N3292 Date: 2007-09-11 1. Introduction. The Javanese script, or aksara Jawa, is used for writing the Javanese language, the native language of one of the peoples of Java, known locally as basa Jawa. It is a descendent of the ancient Brahmi script of India, and so has many similarities with modern scripts of South Asia and Southeast Asia which are also members of that family. The Javanese script is also used for writing Sanskrit, Jawa Kuna (a kind of Sanskritized Javanese), and Kawi, as well as the Sundanese language, also spoken on the island of Java, and the Sasak language, spoken on the island of Lombok. Javanese script was in current use in Java until about 1945; in 1928 Bahasa Indonesia was made the national language of Indonesia and its influence eclipsed that of other languages and their scripts. Traditional Javanese texts are written on palm leaves; books of these bound together are called lontar, a word which derives from ron ‘leaf’ and tal ‘palm’. 2.1. Consonant letters. Consonants have an inherent -a vowel sound. Consonants combine with following consonants in the usual Brahmic fashion: the inherent vowel is “killed” by the PANGKON, and the follow- ing consonant is subjoined or postfixed, often with a change in shape: §£ ndha = § NA + @¿ PANGKON + £ DA-MAHAPRANA; üù n.
    [Show full text]
  • Design of Javanese Text to Speech Application
    Design of Javanese Text to Speech Application Yulia, Liliana, Rudy Adipranata, Gregorius Satia Budhi Informatics Department, Industrial Technology Faculty, Petra Christian University Surabaya, Indonesia [email protected] Abstract—Javanese is one of the many regional languages used in Indonesia. Javanese language is used by most of the population in Java. But now along with the development of the era, the use of regional languages including Javanese language is to be re- duced especially among the younger generation. One way to help conserve the use of Javanese language is to utilize information technologies, one of them is by developing a text to speech appli- cation that can be used to find out how the pronunciation of Ja- vanese language. In this paper, we discussed the design for Java- nese text to speech applications uses finite state automata. The design result will be used as rules to separate syllables when im- plementing text to speech application. Index Terms—Javanese language; Finite state automata; Text to speech. Figure 1: Basic Javanese characters I. INTRODUCTION In addition to the basic characters, the Javanese character Javanese language is a language widely spoken by the peo- has supplementary characters, consist of symbols for express- ple of Java. It is one of the regional languages of many region- ing vowels as well as a combination of two specific conso- al languages spoken in Indonesia. As one of the assets of na- nants. This supplementary characters is called sandhangan tional culture, Javanese language needs to be preserved. The and can be seen in Figure 2 [5]. younger generation is now more interested in learning a for- Symbol Example Read eign language, rather than the native Indonesian local lan- guage.
    [Show full text]
  • Names of Foodstuffs in Indian Languages
    NAMES OF FOODSTUFFS IN INDIAN LANGUAGES CEREAL GRAINS AND PRODUCTS 1. Pearl Millet: Pennisetum typhoides Bajra (Bengali, Hindi, Oriya), Bajri (Gujarati, Marathi), Sajje (Kannada), Bajr’u (Kashmiri), Cambu (Malayalam, Tamil), Sazzalu (Telugu). Other names : Spiked millet, Pearl millet 2. Italian millet: Setaria italica Syama dhan (Bengali), Ral Kang (Gujarati), Kangni (Hindi), Thene (Kannada), Shol (Kashmiri), Thina (Malayalam), Rala (Marathi), Kaon (Punjabi), Thenai (Tamil), Korralu (Telugu), Other names: Foxtail millet , Moha millet, Kakan kora 3. Sorghum: Sorghum bicolor Juar (Bengali , Gujarati , Hindi), Jola (Kannada), Cholam (Malayalam , Tamil), Jwari (Marathi), Janha (Oriya), Jonnalu (Telugu), Other names: Milo , Chari 4. Maize: Zea mays Bhutta (Bengali), Makai (Gujarati), Maka (Hindi , Marathi , Oriya), Musikinu jola (Kannada), Makaa’y (Kashmiri), Cholam (Malayalam), Makka Cholam (Tamil), Mokka jonnalu (Telugu) 5. Finger Millet: Eleusine coracana Madua (Bengali , Hindi), Bhav (Gujarati), Ragi (Kannada) , Moothari (Malayalam), Nachni (Marathi), Mandia (Oriya), Kezhvaragu (Tamil), Ragulu (Telugu), Other names: Korakan 6. Rice, parboiled: Oryza sativa Siddha chowl (Bengali) Ukadello chokha (Gujarati), Usna chawal (Hindi), Kusubalakki (Kannada), Puzhungal ari (Malayalam), Ukadla tandool (Marathi), Usuna chaula (Oriya), Puzhungal arisi (Tamil), Uppudu biyyam (Telugu) 7. Rice raw: Orya sativa Chowl (Bengali), Chokha (Gujarati), Chawal (Hindi), Akki (Kannada), Tomul (Kashmiri), Ari (Malayalam), Tandool (Marathi), Chaula (Oriya), Arisi
    [Show full text]
  • Exploring Language Similarities with Dimensionality Reduction Techniques
    EXPLORING LANGUAGE SIMILARITIES WITH DIMENSIONALITY REDUCTION TECHNIQUES Sangarshanan Veeraraghavan Final Year Undergraduate VIT Vellore [email protected] Abstract In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages. 1. Introduction Language is a method of communication and ironically has long remained a communication barrier. Written representations of all languages look quite different but inherently share similarity between them. For example if we show a person with no knowledge of English alphabets a text in English and then in Spanish they might be oblivious to the similarity between them as they look like gibberish to them. There might also be several languages which may seem completely different to even experienced linguists but might share a subtle hidden similarity as they is a possibility that languages with no shared vocabulary might still have some similarity.
    [Show full text]
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]
  • Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik
    Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Punjabi Machine Transliteration. 21st international Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the ACL, Jul 2006, Sydney, France. pp.1137-1144. hal-01002160 HAL Id: hal-01002160 https://hal.archives-ouvertes.fr/hal-01002160 Submitted on 15 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Punjabi Machine Transliteration M. G. Abbas Malik Department of Linguistics Denis Diderot, University of Paris 7 Paris, France [email protected] Transliteration refers to phonetic translation Abstract across two languages with different writing sys- tems (Knight & Graehl, 1998), such as Arabic to Machine Transliteration is to transcribe a English (Nasreen & Leah, 2003). Most prior word written in a script with approximate work has been done for Machine Translation phonetic equivalence in another lan- (MT) (Knight & Leah, 97; Paola & Sanjeev, guage. It is useful for machine transla- 2003; Knight & Stall, 1998) from English to tion, cross-lingual information retrieval, other major languages of the world like Arabic, multilingual text and speech processing. Chinese, etc. for cross-lingual information re- Punjabi Machine Transliteration (PMT) trieval (Pirkola et al, 2003), for the development is a special case of machine translitera- of multilingual resources (Yan et al, 2003; Kang tion and is a process of converting a word & Kim, 2000) and for the development of cross- from Shahmukhi (based on Arabic script) lingual applications.
    [Show full text]
  • An Introduction to Indic Scripts
    An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
    [Show full text]
  • Specifying Optional Malayalam Conjuncts
    Specifying Optional Malayalam Conjuncts Cibu Johny <[email protected]> Roozbeh Poornader <[email protected]> 2013­Jan­28 Current status Indic conjunct formation scheme currently favors the full conjunct for a given set of characters. Example: क् + ष → is prefered as opposed to क् ​ष. (KAd + SSAl → K.SSAn ) क् ​ष can be obtained by क् + ZWJ + ष which is KAd + ZWJ + SSAl → KAh + SSAn The Need In Malayalam there are two prevailing orthographies ­ traditional and reformed ­ both written with same Malayalam character set. The difference between them is typically manifested only by the font. Traditional orthography fonts accomodate lot more full conjuncts, while reformed orthography fonts would use visibile virama (Chandrakkala) separated sequences for many of those full conjuncts. For the vowel signs of U, UU, and Vocalic vowels and also for the RA­sign, reformed orthography font would use visually separate conjoining form. However, there is a definite need for the ability in a reformed orthography font to display the traditional full conjuncts on demand. As of now there is no mechanism specified in the standard to suggest a full conjunct of a cluster. The reverse case is also needed ­ a traditional orthography font might want to display reformed othrography grapheme clusters optionally. Following proposal uses ZWJ and ZWNJ insertions to achieve this need. However, potentially Chillu forming sequence <Consonant + Virama + ZWJ> is not used for any of the cases listed below. Proposal Case 1 1 The sequence <Consonant + ZWJ + Conjoining Vowel Sign> has following fallback order for display: 1. Full Conjunct 2. Consonant + non­conjoining vowel sign Example with reformed orthography font (in a reformed orthography Malayalam font that can allow optional traditional orthography) SA + Vowel Sign U → SA + ZWJ + Vowel Sign U → Case 2 <Consonant1 + ZWJ + Virama + Consonant2> has following display fallback order: 1.
    [Show full text]
  • From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806
    JIOWSJournal of Indian Ocean World Studies From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806 Michael Laffan To cite this article: Laffan, Michael. “From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806.” Journal of Indian Ocean World Studies, 1 (2017), pp. 38-59. More information about the Journal of Indian Ocean World Studies can be found at: jiows.mcgill.ca © Michael Laffan. This is an Open Access article distributed under the terms of the Creative Commons License CC BY NC SA, which permits users to share, use, and remix the material provide they give proper attribution, the use is non-commercial, and any remixes/transformations of the work are shared under the same license as the original. Journal of Indian Ocean World Studies, 1 (2017), pp. 38-59. © Michael Laffan, CC BY-NC-SA 4.0 | 38 From Javanese Court to African Grave: How Noriman Became Tuan Skapie, 1717-1806 Michael Laffan Princeton University, New Jersey Abstract This article assembles clues related to the life and impact of an eighteenth century exile to Cape Town known as Oupa or Tuan Skapie (Grandpa/Lord Sheepy). Remembered as a slave sent from Java in the 1770s who tended herds and dug wells on the slopes of Signal Hill in between periods of meditation, it would appear that this subaltern might well have been more than that. Certainly he was successful at concealing his identity (and abilities) from his former jailers and two colonial regimes, finally taking his resting place high on the ridge above Cape Town in 1806, above the space assigned to a more scripturally-charged rival.
    [Show full text]
  • The Making of Modern Malayalam Prose and Fiction: Translations from European Languages Into Malayalam in the First Half of the Twentieth Century
    The Making of Modern Malayalam Prose and Fiction: Translations from European Languages into Malayalam in the First Half of the Twentieth Century K.M. Sherrif Abstract Translations from European languages have played a crucial role in the evolution of Malayalam prose and fiction in the first half of the Twentieth Century. Many of them are directly linked to the socio- political movements in Kerala which have been collectively designated ‘Kerala’s Renaissance.’ The nature of the translated texts reveal the operation of ideological and aesthetic filters in the interface between literatures, while the overwhelming presence of secondary translations indicate the hegemonic status of English as a receptor language. The translations never occupied a central position in the Malayalam literature and served mostly as mere literary and political stimulants. Keywords: Translation - evolution of genres, canon - political intervention The role of translation in the development of languages and literatures has been extensively discussed by translation scholars in the West during the last quarter of a century. The proliferation of diachronic translation studies that accompanied the revolutionary breakthroughs in translation theory in the mid-Eighties of the Twentieth Century resulted in the extensive mapping of the intervention of translation in the development of discourses and shifts of ideological paradigms in cultures, in the development of genres and the construction and disruption of the canon in literatures and in altering the idiomatic and structural paradigms of languages. One of the most detailed studies in the area was made by Andre Lefevere (1988, pp 75-114) Lefevere showed with convincing 118 Translation Today K.M.
    [Show full text]
  • Data Issues in English-To-Hindi Machine Translation
    Data Issues in English-to-Hindi Machine Translation Ondřej Bojar, Pavel Straňák, Daniel Zeman Univerzita Karlova v Praze, Ústav formální a aplikované lingvistiky Malostranské náměstí 25, CZ-11800 Praha {bojar|stranak|zeman}@ufal.mff.cuni.cz http://ufal.mff.cuni.cz/umc/ Abstract Statistical machine translation to morphologically richer languages is a challenging task and more so if the source A dataset originally collected for the DARPA-TIDES surprise- and target languages differ in word order. Current state-of-the art MT systems thus deliver mediocre results. language contest in 2002, later refined at IIIT Hyderabad and Adding more parallel data often helps improve the results; if it does not, it may be caused by various problems such provided for the NLP Tools Contest at ICON 2008. Corpus Sentences En Tokens Hi Tokens as different domains, bad alignment or noise in the new data. We evaluate several available parallel data sources Tides.train 50,000 1,226,144 1,312,435 A journalist Daniel Pipes' website (http://www.danielpipes.org/) and provide cross-evaluation results on their combinations using two freely available statistical MT systems. We Tides.dev 1,000 22,485 24,363 demonstrate various problems encountered in the data and describe automatic methods of data cleaning and limited-domain articles about the Middle East. Written in English, Tides.test 1,000 27,169 28,574 normalization. We also show that the contents of two independently distributed data sets can unexpectedly overlap, many of them translated to up to 25 other languages. which negatively affects translation quality. Together with the error analysis, we also present a new tool for viewing Daniel Pipes 6,761 176,392 122,108 Monolingual, parallel and annotated corpora for fourteen South Emille 3,501 55,660 71,010 aligned corpora, which makes it easier to detect difficult parts in the data even for a developer not speaking the Asian languages (including Hindi) and English.
    [Show full text]
  • History-. of ··:Kerala: - • - ' - - ..>
    HISTORY-. OF ··:KERALA: - • - ' - - ..> - K~ P. PAD!rlANABHA .MENON.. Rs. 8. 18 sh. ~~~~~~ .f-?2> ~ f! P~~-'1 IY~on-: f. L~J-... IYt;;_._dh, 4>.,1.9 .£,). c~c~;r.~, ~'").-)t...q_ A..Ja:..:..-. THE L ATE lVIn. K. P . PADJVIANABHA MENON. F rontispiece.] HISTORY· op::KERALA. .. :. ' ~ ' . Oowright and right of t'fanslation:. resen;e~ witk ' Mrs. K. P. PADMANABHA MENON. Copies can be had of . '. Mrs. K. P. PAI>MANABHA MENON, Sri Padmanabhalayam Bungalow~, Diwans' Road;-!Jmakulam, eochin state, S.INDIA. HISTORY OF KERALA. A HISTORY OF KERALA. WBI'l!TEN, IN THE FOBH OF NOTES ON VISSCHER'S LETTERS FROM MALABAR, BY K. P. PADMANABHA MENON, B.A., S.L, M.~.A.S., ' . Author of the History of Codzin, anti of severai P~p~rsconnectedwith the early History of Kerala; Jiak•l of the H1g-k Courts of Madras 0,.. of Travam:ore and of tke Ckief Court of. Cochin, ~ . • r . AND .EDITED BY SAHITHYAKUSALAN. "T. :K.' KRISHNA MENON, B. A.;~. • ' "'•.t . fl' Formerly, a Member of Jhe Royal Asiatic Society, and of the ~ocieties of Arts and of Aut/tors, 'anti a Fellow of the .Royal Histor~cal .Soci'ety. Kun.kamhu NamfJiyar Pr~sd:ian. For some ti'me, anE:cami'n.er for .Malayalam to the Umverfities of Madras, Benares and Hydera­ bad. A Member (Jf the .Board of Stzediet for Malayalam. A fJUOndum Editor of Pid,.a Vinodini. A co-Editor of tke ., .Sciene~ Primers Seriu in Malayalam. Editor of .Books for Malabar Bairns: The Author &- Editor of several works in Malayalam. A Member of the, Indian Women's Uni'r,ersity, and · · a .Sadasya of Visvn-.Bharatki, &-c.
    [Show full text]